Stochastic dynamic programming: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:
In any stochastic dynamic programming problem, we must define the following concepts:
In any stochastic dynamic programming problem, we must define the following concepts:
* Policy, which is the set of rules used to make a decision.  
* Policy, which is the set of rules used to make a decision.  
* Initial vector, p where p∈D and D is a finite closed region.
* Initial vector, <math>p</math> where <math>p\in\ D</math> and <math>D</math> is a finite closed region.
* Choice made, q where q∈S and S is a set of possible choices.
* Choice made, <math>q</math> where <math>q\in\ S</math> and <math>S</math> is a set of possible choices.
* Stochastic vector, z.
* Stochastic vector, <math>z</math>.
* Distribution function dG_q (p,z), associated with z and dependent on p and q.
* Distribution function dG_q (p,z), associated with <math>z</math> and dependent on <math>p</math> and <math>q</math>.
* Return, which is the expected value of the function after the final stage.
* Return, which is the expected value of the function after the final stage.


In a stochastic dynamic programming problem, we assume that z is known after the decision of stage n-1 has been made and before the decision of stage n has to be made.
In a stochastic dynamic programming problem, we assume that <math>z</math> is known after the decision of stage <math>n-1</math> has been made and before the decision of stage <math>n</math> has to be made.


===Methodology and algorithm===
===Methodology and algorithm===
First, we define the N-stage return obtained using the optimal policy and starting with vector p:
First, we define the N-stage return obtained using the optimal policy and starting with vector <math>p</math>:


<math>f_N\left(p\right)=\max{R\left(p_N\right)}</math>
<math>f_N\left(p\right)=\max{R\left(p_N\right)}</math>
where <math>R\left(p_N\right)</math> is the function of the final state <math>p_N</math>
where <math>R\left(p_N\right)</math> is the function of the final state <math>p_N</math>


Second, we define the initial transformation as <math>T_q</math>, and z, as the state resulting from it. The return after N-1 stages will be <math>f_{N-1}(z)</math> using the optimal policy. Therefore, we can formulate the expected return due to the initial choice made in <math>T_q</math>:
Second, we define the initial transformation as <math>T_q</math>, and <math>z</math>, as the state resulting from it. The return after <math>N-1</math> stages will be <math>f_{N-1}(z)</math> using the optimal policy. Therefore, we can formulate the expected return due to the initial choice made in <math>T_q</math>:

Revision as of 00:02, 23 November 2021

Authors: Bo Yuan, Ali Amadeh, Max Greenberg, Raquel Sarabia Soto and Claudia Valero De la Flor (CHEME/SYSEN 6800, Fall 2021)

Theory, methodology and algorithm discussion

Theory

Stochastic dynamic programming combines stochastic programming and dynamic programming. Therefore, to understand better what it is, it is better first to give two definitions:

  • Stochastic programming. Unlike in a deterministic problem, where a decision’s outcome is only determined by the decision itself and all the parameters are known, in stochastic programming there is uncertainty and the decision results in a distribution of transformations.
  • Dynamic programming. It is an optimization method that consists in dividing a complex problem into easier subprobems and solving them recursively to find the optimal sub-solutions which lead to the complex problem optima.

In any stochastic dynamic programming problem, we must define the following concepts:

  • Policy, which is the set of rules used to make a decision.
  • Initial vector, Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p} where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p\in\ D} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle D} is a finite closed region.
  • Choice made, Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q} where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q\in\ S} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S} is a set of possible choices.
  • Stochastic vector, Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z} .
  • Distribution function dG_q (p,z), associated with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z} and dependent on Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q} .
  • Return, which is the expected value of the function after the final stage.

In a stochastic dynamic programming problem, we assume that Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z} is known after the decision of stage Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n-1} has been made and before the decision of stage Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n} has to be made.

Methodology and algorithm

First, we define the N-stage return obtained using the optimal policy and starting with vector Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p} :

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f_N\left(p\right)=\max{R\left(p_N\right)}} where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R\left(p_N\right)} is the function of the final state Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_N}

Second, we define the initial transformation as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T_q} , and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z} , as the state resulting from it. The return after Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N-1} stages will be Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f_{N-1}(z)} using the optimal policy. Therefore, we can formulate the expected return due to the initial choice made in Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T_q} :