# Difference between revisions of "Markov decision process"

Author: Eric Berg (eb645)

Requirements:

- An introduction of the topic

- Theory, methodology, and/or algorithmic discussions

- At least one numerical example (step-by-step solution process, like

what you did in the HWs)

- A section to discuss and/or illustrate the applications

- A conclusion section

- References

# Introduction

A Markov Decision Process (MDP) is a decision making method that takes into account information from the environment, actions performed by the agent, and rewards in order to decide the optimal next action. MDP works in discrete time, meaning at each point in time the decision process is carried out. The name Markov refers to the Russian mathematician Andrey Markov, since the Markov Decision Process is based on the Markov Property. MDPs can be used as control schemes in machine learning applications. Machine learning can be divided into three main categories: unsupervised learning, supervised learning, and reinforcement learning. The Markov decision process is used as a method for decision making in the reinforcement learning category.

# Theory and Methodology

In order to understand the Markov Decision Process, first the Markov Property must be defined. the Markov Property states that the future is independent of the past given the present. In other words, only the present in needed to determine the future, not the past, since the present contains all necessary information from the past.

${\textstyle P[S_{t+1}|S_{t}]=P[S_{t+1}|S_{1},S_{2},S_{3}...S_{t}]}$