Markov decision process: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
Line 18: Line 18:


= Introduction =
= Introduction =
Optimizating of a quadratic function.12
A Markov Decision Process (MDP) is a decision making method that takes into account information from the environment, actions performed by the agent, and rewards in order to decide the optimal next action. MDP works in discrete time, meaning at each point in time the decision process is carried out. The name Markov refers to the Russian mathematician Andrey Markov, since the Markov Decision Process is based on the Markov Property. MDPs can be used as control schemes in machine learning applications. Machine learning can be divided into three main categories: unsupervised learning, supervised learning, and reinforcement learning. The Markov decision process is used as a method for decision making in the reinforcement learning category.


Quadratic programming (QP) is the problem of optimizing a quadratic objective function and is one of the simplests form of non-linear programming.1 The objective function can contain bilinear or up to second order polynomial terms,2 and the constraints are linear and can be both equalities and inequalities. QP is widely used in image and signal processing, to optimize financial portfolios, to perform the least-squares method of regression, to control scheduling in chemical plants, and in sequential quadratic programming, a technique for solving more complex non-linear programming problems.3,4 The problem was first explored in the early 1950s, most notably by Princeton University's Wolfe and Frank, who developed its theoretical background,1 and by Harry Markowitz, who applied it to portfolio optimization, a subfield of finance.
= Theory and Methodology =
In order to understand the Makov Decision Process, first the Markov Proprerty must be defined. the Markov Property states that the future is independent of the past given the present. In other words, only the present in nededed to determine the future, not the past, since the present contatins all necessary information from the past.


= Theory and Methodology =
<math display="inline">P[S_{t+1} | St] =</math>
Optimizating of a quadratic function.12


= Numerical Example =
= Numerical Example =
Line 30: Line 30:


= Applications =
= Applications =
Optimizating of a quadratic function.12
Markov decision Processes have been used widely within reinforcement learning to teach robots or other computer-based systems how to do something they previously were unable to do. For example, Markov decision processes have been used to teach a computer how to play computer games like Pong, Pacman, or Alpha Go. MDPs have been used to teach a simulated robot how to walk and run.


= Conclusion =
= Conclusion =
Optimizating of a quadratic function.12
 




= References =
= References =
Optimizating of a quadratic function.12


<references />
<references />

Revision as of 00:29, 26 November 2020

Author: Eric Berg (eb645)

Requirements:

- An introduction of the topic

- Theory, methodology, and/or algorithmic discussions

- At least one numerical example (step-by-step solution process, like

what you did in the HWs)

- A section to discuss and/or illustrate the applications

- A conclusion section

- References

Introduction

A Markov Decision Process (MDP) is a decision making method that takes into account information from the environment, actions performed by the agent, and rewards in order to decide the optimal next action. MDP works in discrete time, meaning at each point in time the decision process is carried out. The name Markov refers to the Russian mathematician Andrey Markov, since the Markov Decision Process is based on the Markov Property. MDPs can be used as control schemes in machine learning applications. Machine learning can be divided into three main categories: unsupervised learning, supervised learning, and reinforcement learning. The Markov decision process is used as a method for decision making in the reinforcement learning category.

Theory and Methodology

In order to understand the Makov Decision Process, first the Markov Proprerty must be defined. the Markov Property states that the future is independent of the past given the present. In other words, only the present in nededed to determine the future, not the past, since the present contatins all necessary information from the past.

Numerical Example

Optimizating of a quadratic function.12


Applications

Markov decision Processes have been used widely within reinforcement learning to teach robots or other computer-based systems how to do something they previously were unable to do. For example, Markov decision processes have been used to teach a computer how to play computer games like Pong, Pacman, or Alpha Go. MDPs have been used to teach a simulated robot how to walk and run.

Conclusion

References