Author: Eric Berg (eb645)

Requirements:

- An introduction of the topic

- Theory, methodology, and/or algorithmic discussions

- At least one numerical example (step-by-step solution process, like

what you did in the HWs)

- A section to discuss and/or illustrate the applications

- A conclusion section

- References

Introduction

A Markov Decision Process (MDP) is a decision making method that takes into account information from the environment, actions performed by the agent, and rewards in order to decide the optimal next action. MDP works in discrete time, meaning at each point in time the decision process is carried out. The name Markov refers to the Russian mathematician Andrey Markov, since the Markov Decision Process is based on the Markov Property. MDPs can be used as control schemes in machine learning applications. Machine learning can be divided into three main categories: unsupervised learning, supervised learning, and reinforcement learning. The Markov decision process is used as a method for decision making in the reinforcement learning category.

Theory and Methodology

In order to understand the Markov Decision Process, first the Markov Property must be defined. the Markov Property states that the future is independent of the past given the present. In other words, only the present in needed to determine the future, not the past, since the present contains all necessary information from the past.

${\textstyle P[S_{t+1}|S_{t}]=P[S_{t+1}|S_{1},S_{2},S_{3}...S_{t}]}$

Numerical Example

Optimizating of a quadratic function.12

Applications

Markov decision Processes have been used widely within reinforcement learning to teach robots or other computer-based systems how to do something they previously were unable to do. For example, Markov decision processes have been used to teach a computer how to play computer games like Pong, Pacman, or Alpha Go. MDPs have been used to teach a simulated robot how to walk and run.

Markov decision process

Contents

Introduction

Theory and Methodology

Numerical Example

Applications

Conclusion

References

Navigation menu