Markov decision process

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search

Author: Eric Berg (eb645)


- An introduction of the topic

- Theory, methodology, and/or algorithmic discussions

- At least one numerical example (step-by-step solution process, like

what you did in the HWs)

- A section to discuss and/or illustrate the applications

- A conclusion section

- References


A Markov Decision Process (MDP) is a decision making method that takes into account information from the environment, actions performed by the agent, and rewards in order to decide the optimal next action. MDP works in discrete time, meaning at each point in time the decision process is carried out. The name Markov refers to the Russian mathematician Andrey Markov, since the Markov Decision Process is based on the Markov Property. MDPs can be used as control schemes in machine learning applications. Machine learning can be divided into three main categories: unsupervised learning, supervised learning, and reinforcement learning. The Markov decision process is used as a method for decision making in the reinforcement learning category.

Theory and Methodology

In order to understand the Markov Decision Process, first the Markov Property must be defined. the Markov Property states that the future is independent of the past given the present. In other words, only the present in needed to determine the future, not the past, since the present contains all necessary information from the past.

Numerical Example

Optimizating of a quadratic function.12


Markov decision Processes have been used widely within reinforcement learning to teach robots or other computer-based systems how to do something they previously were unable to do. For example, Markov decision processes have been used to teach a computer how to play computer games like Pong, Pacman, or Alpha Go. MDPs have been used to teach a simulated robot how to walk and run.