Adaptive robust optimization: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
(Typed out problem formulation section)
(Typed up Algorithms section)
Line 4: Line 4:


== Introduction ==
== Introduction ==
Adaptive Robust Optimization (ARO), also known as adjustable robust optimization, models situations where decision makers make two types of decisions: here-and-now decisions that must be made immediately, and wait-and-see decisions that can be made at some point in the future. ARO improves on the robust optimization framework by accounting for any information the decision maker does not know now, but may learn before making future decisions. In the real-world, ARO is applicable whenever past decisions and new information together influence future decisions. Common applications include power systems control, inventory management, shift scheduling, and other resource allocation problems.
Adaptive Robust Optimization (ARO), also known as adjustable robust optimization, models situations where decision makers make two types of decisions: here-and-now decisions that must be made immediately, and wait-and-see decisions that can be made at some point in the future[ref Ben-Tal 2004]. ARO improves on the robust optimization framework by accounting for any information the decision maker does not know now, but may learn before making future decisions. In the real-world, ARO is applicable whenever past decisions and new information together influence future decisions. Common applications include power systems control, inventory management, shift scheduling, and other resource allocation problems[ref various papers].


Compared to traditional robust optimization models, ARO gives less conservative and more realistic solutions, however, this improvement comes at the cost of computation time. Indeed, even the general linear ARO with linear uncertainty is proven computationally intractable. However, researchers have developed a wide variety of solution and approximation methods for specific industrial ARO problems over the past 15 years and the field continues to grow rapidly.  
Compared to traditional robust optimization models, ARO gives less conservative and more realistic solutions, however, this improvement comes at the cost of computation time[ref basically any paper]. Indeed, even the general linear ARO with linear uncertainty is proven computationally intractable[ref Ben-Tal 2004]. However, researchers have developed a wide variety of solution and approximation methods for specific industrial ARO problems over the past 15 years and the field continues to grow rapidly [ref various papers].  


== Problem Formulation ==
== Problem Formulation ==
Line 51: Line 51:
Where <math>f(x_1)</math> was redefined to be the part of <math>x_1</math> representing <math>t</math>.
Where <math>f(x_1)</math> was redefined to be the part of <math>x_1</math> representing <math>t</math>.


For many problems of interest, the functions <math>f</math> and <math>g</math> vary linearly with <math>x_1</math> and <math>x_2</math>, that is, they are affine functions of <math>x_1</math> and <math>x_2</math>. In such cases, if <math>x_1</math> and <math>x_2</math> are treated as vectors, then we can write:
For many problems of interest, the functions <math>f</math> and <math>g</math> vary linearly with <math>x_1</math> and <math>x_2</math>, that is, they are affine functions of <math>x_1</math> and <math>x_2</math>[ref various papers]. In such cases, if <math>x_1</math> and <math>x_2</math> are treated as vectors, then we can write:


<math>f(x_1) = c^Tx_1</math>
<math>f(x_1) = c^Tx_1</math>
Line 65: Line 65:


This L2ARO will be the primary focus of the Algorithms section.
This L2ARO will be the primary focus of the Algorithms section.
==Algorithms and Methodology==
General ARO problems are computationally intractable[ref basically any paper]. Taking the L2ARO for example, deriving the optimal function <math>x_2(u)</math> poses a tremendous challenge for many choices of uncertainty set <math>U</math>. If <math>U</math> is large or infinite, or is non convex, deciding what <math>x_2</math> should be for each <math>u</math> in <math>U</math> may take a long time. In real world applications, then, the uncertainty set <math>U</math> must be chosen carefully to include a representative set of possible parameter values for <math>u</math>, but it must not be too large or complex and render the problem intractable.
The L2ARO model has been proven tractable only for simple uncertainty sets <math>U</math> or with restrictions imposed on the function <math>x_2(u)</math>[ref Ben-Tal 2004, other extensions of the work]. Therefore, ARO problems are usually solved on a case by case basis, using methods such as multi-level optimization, branch-and-cut, and decomposition[ref various papers]. This section will first present the L2ARO solution method using the <i>affine decision rule</i> approximation under <i>fixed recourse</i> conditions from Ben-Tal's 2004 paper [ref], then discuss how this method might be extended to other L2ARO problems.
General L2ARO problems were first proven intractable by Guslitser, in his master's thesis[ref Guslitser master's thesis]. Ben-Tal took this result and suggested simplifying the problem by restricting <math>x_2(u)</math> to vary linearly with <math>u</math>, that is,
<math>x_2(u) = w + Wu</math>
Where <math>w</math> is a vector and <math>W</math> is a matrix, both variable. This simplification is known as the <i>affine decision rule</i> (ADR). To further simplify the problem, Ben-Tal proposed that the matrix <math>A_2(u)</math> be fixed to some matrix <math>V</math> (<i>fixed recourse conditions</i>), and make <math>A_1(u)</math> and <math>b(u)</math> affine functions of <math>u</math>:
<math>A_1(u) = m + M(u)</math>
<math>b(u) = b + Bu</math>
Where <math>m</math> and <math>b</math> are fixed vectors and <math>M</math> and <math>B</math> are fixed matrices. Then, the overall problem can be rewritten:
<math>\begin{align}\text{min}(x_1, w, W) \; &c^Tx_1\\
\text{s.t.}\;&(m + Mu)x_1 + V(w + Wu) \leq b + Bu \; \text{ for all }u\text{ in }U\end{align}</math>
Now, both the objective function and constraint function are affine functions of <math>x_1</math>, <math>w</math>, and <math>W</math>, so the problem has been reduced to a simple robust linear program, for which solution methods already exist.
The above solution method, although simple and tractable, suffers from potential sub optimality of ADR. Indeed, Ben-Tal motivates this assumption citing only the tractability of the result. In real world scenarios, this sub optimality can be mitigated by using ADR to make the initial decision, then resolving the problem after <math>u</math> is revealed. That is, if solving the L2ARO gives <math>x_1^*</math> as the optimal <math>x_1</math> and <math>x_2^*(u)</math> as the optimal <math>x_2(u)</math>, decision <math>x_1^*</math> is implemented immediately; when <math>u</math> is revealed (to be, say, <math>u^*</math>), decision <math>x_2</math> is decided not by computing <math>x_2^*(u^*)</math>, but by re-solving the whole problem fixing <math>x_1</math> to <math>x_1^*</math> and fixing <math>u</math> to <math>u^*</math>. This method reflects the wait-and-see nature of the decision <math>x_2</math> - ADR is used to find a pretty-good <math>x_1</math>, then <math>u</math> is revealed, then the information is used to solve for the optimal <math>x_2</math> in that circumstance [ref that paper]. This iterative, stage-by-stage solution performs better than using only ADR, but is feasible only when there is enough time between stages to re-solve the problem. Further, numerical experiments indicate that classical robust optimization models yield equally good, if not better initial decisions than ADR on L2ARO, limiting ADR on L2ARO to situations where the problem cannot be feasibly re-solved, or in the special cases where the ADR approximation actually generates the optimal solution[ref that paper].
This leads to the natural question, under what conditions are ADRs optimal? Bertsimias and Goyal showed in 2012 that if both <math>A(u)</math> matrices are independent of <math>u</math>, <math>x_1</math> and <math>x_2</math> are restricted to vectors with nonnegative entries, and <math>b(u)</math> is restricted to be vectors with nonpositive entries, then ADRs are optimal if <math>b(u)</math> is restricted to a polyhedral set with the same number of vertices as <math>b(u)</math>'s dimension[ref B+G 2012]. In a 2016 paper, Ben-Tal and colleagues noted that if the only restrictions were that the <math>A(u)</math> matrices are independent of <math>u</math>, then <i>piecewise</i> ADRs can be optimal, although there may be a large number of pieces [ref Ben-Tal et al 2016]. ADRs can be optimal in other, more specific cases, but these cases will not be discussed here.
Still, ADRs are usually suboptimal. If the goal is to minimize the objective function, then the true optimal objective cannot exceed the optimal objective generated under ADRs, that is, applying ADRs actually gives an upper bound to the optimal solution. Thus, one approach to characterize the suboptimality of ADRs is to generate a lower bound to the optimal solution, and the difference between the upper and lower bounds can be used to estimate suboptimality. Several approaches exist for doing so. A simple approach is to approximate the uncertainty set using a small number of well-chosen points (“sampling” the uncertainty set), and to solve the model at each of these points. Since the true worst case scenario must be at least as bad as one of the selected points, this sampling approach must generate a solution no worse than the true optimal, or a lower bound to the objective. This method, although simple, suffers from being excessively optimistic unless a large number of points are sampled, but trying to solve the model at a large number of points can take a long time. Thus, authors have investigated methods for choosing fewer points that can better represent the whole uncertainty set to improve both the optimality and computation time of this method. Other methods seek to extend duality theory to the L2ARO problem to find maximal lower bounds, to various degrees of complexity and optimality.
The other important assumption in the given solution methodology is the <i>fixed recourse condition</i>, that <math>A_2(u)</math> is fixed to some matrix <math>V</math>. If this is not true, that <math>A_2(u)</math> is instead some affine function of <math>u</math>, then even under the ADR assumption, the problem is intractable [ref Gusliter]. However, Ben-Tal has proposed a tight approximation method for when the uncertainty set <math>U</math> is the intersection of ellipsoidal sets, an approximation that becomes exact if <math>U</math> is an ellipsoidal set.

Revision as of 09:01, 22 November 2020

Author: Ralph Wang (ChemE 6800 Fall 2020)

Steward: Allen Yang, Fengqi You

Introduction

Adaptive Robust Optimization (ARO), also known as adjustable robust optimization, models situations where decision makers make two types of decisions: here-and-now decisions that must be made immediately, and wait-and-see decisions that can be made at some point in the future[ref Ben-Tal 2004]. ARO improves on the robust optimization framework by accounting for any information the decision maker does not know now, but may learn before making future decisions. In the real-world, ARO is applicable whenever past decisions and new information together influence future decisions. Common applications include power systems control, inventory management, shift scheduling, and other resource allocation problems[ref various papers].

Compared to traditional robust optimization models, ARO gives less conservative and more realistic solutions, however, this improvement comes at the cost of computation time[ref basically any paper]. Indeed, even the general linear ARO with linear uncertainty is proven computationally intractable[ref Ben-Tal 2004]. However, researchers have developed a wide variety of solution and approximation methods for specific industrial ARO problems over the past 15 years and the field continues to grow rapidly [ref various papers].

Problem Formulation

Suppose, for an optimization problem of interest, is the set of allowed decisions and is a decision in . Let bee a vector representing the set of parameters of interest in this problem. If the goal is to minimize some function , and we want to adhere to a set of constraints , then the problem may be formulated as:

Or more simply:

In this formulation, we call the objective function and the constraint function. If is known, then the problem can be solved using methods such as branch and cut or KKT conditions. However, in many real world scenarios, is not known. To address this uncertainty, the robust optimization approach generates the set of possible values of , called the uncertainty set , and solves for the decision such that the constraint is satisfied in all cases and is optimized for the worst case. The problem can be written as:

Adaptive robust optimization expands this robust optimization framework by separating the decision into multiple stages. For simplicity, assume there are two stages of decisions: in the first stage, only a subset of the decisions are made. After these decisions are made, the true values of the parameters are revealed, then the remaining set of decisions need to be made. The model is like a game of poker: the player needs to make initial bets based on incomplete information (the cards in his hand), then makes further bets as more and more cards are dealt. Mathematically, let the set of possible decisions in the first stage be and the set of possible decisions in the second stage be , so that the objective and constraint functions become functions of the parameters , the first stage decision (), and the second stage decision ( in ). Then, we can formulate the problem as:

The reasoning used in this construction can be extended to multi-stage formulations.

In the literature, adaptive robust optimization problems are usually formulated differently but equivalently. Note that because is selected only after the uncertain parameter is revealed, is a function of . Then, the function is independent of , so the function can be decided before is revealed. This allows us to rewrite the problem as:

And if we introduce a variable , then we can rewrite the problem as:

Which allows us to remove from the objective function. Since represents all the variables the decide immediately, can be collapsed into ; similarly, the first constraint can be collapsed into the second. This generates the formulation most commonly seen in the literature (up to a change of variable names and functions and ):

Where was redefined to be the part of representing .

For many problems of interest, the functions and vary linearly with and , that is, they are affine functions of and [ref various papers]. In such cases, if and are treated as vectors, then we can write:

Where is some vector, and

Where the 's are matrices and is a vector, to give the linear, two-stage ARO (L2ARO):

This L2ARO will be the primary focus of the Algorithms section.

Algorithms and Methodology

General ARO problems are computationally intractable[ref basically any paper]. Taking the L2ARO for example, deriving the optimal function poses a tremendous challenge for many choices of uncertainty set . If is large or infinite, or is non convex, deciding what should be for each in may take a long time. In real world applications, then, the uncertainty set must be chosen carefully to include a representative set of possible parameter values for , but it must not be too large or complex and render the problem intractable. The L2ARO model has been proven tractable only for simple uncertainty sets or with restrictions imposed on the function [ref Ben-Tal 2004, other extensions of the work]. Therefore, ARO problems are usually solved on a case by case basis, using methods such as multi-level optimization, branch-and-cut, and decomposition[ref various papers]. This section will first present the L2ARO solution method using the affine decision rule approximation under fixed recourse conditions from Ben-Tal's 2004 paper [ref], then discuss how this method might be extended to other L2ARO problems.

General L2ARO problems were first proven intractable by Guslitser, in his master's thesis[ref Guslitser master's thesis]. Ben-Tal took this result and suggested simplifying the problem by restricting to vary linearly with , that is,

Where is a vector and is a matrix, both variable. This simplification is known as the affine decision rule (ADR). To further simplify the problem, Ben-Tal proposed that the matrix be fixed to some matrix (fixed recourse conditions), and make and affine functions of :

Where and are fixed vectors and and are fixed matrices. Then, the overall problem can be rewritten:

Now, both the objective function and constraint function are affine functions of , , and , so the problem has been reduced to a simple robust linear program, for which solution methods already exist.

The above solution method, although simple and tractable, suffers from potential sub optimality of ADR. Indeed, Ben-Tal motivates this assumption citing only the tractability of the result. In real world scenarios, this sub optimality can be mitigated by using ADR to make the initial decision, then resolving the problem after is revealed. That is, if solving the L2ARO gives as the optimal and as the optimal , decision is implemented immediately; when is revealed (to be, say, ), decision is decided not by computing , but by re-solving the whole problem fixing to and fixing to . This method reflects the wait-and-see nature of the decision - ADR is used to find a pretty-good , then is revealed, then the information is used to solve for the optimal in that circumstance [ref that paper]. This iterative, stage-by-stage solution performs better than using only ADR, but is feasible only when there is enough time between stages to re-solve the problem. Further, numerical experiments indicate that classical robust optimization models yield equally good, if not better initial decisions than ADR on L2ARO, limiting ADR on L2ARO to situations where the problem cannot be feasibly re-solved, or in the special cases where the ADR approximation actually generates the optimal solution[ref that paper].

This leads to the natural question, under what conditions are ADRs optimal? Bertsimias and Goyal showed in 2012 that if both matrices are independent of , and are restricted to vectors with nonnegative entries, and is restricted to be vectors with nonpositive entries, then ADRs are optimal if is restricted to a polyhedral set with the same number of vertices as 's dimension[ref B+G 2012]. In a 2016 paper, Ben-Tal and colleagues noted that if the only restrictions were that the matrices are independent of , then piecewise ADRs can be optimal, although there may be a large number of pieces [ref Ben-Tal et al 2016]. ADRs can be optimal in other, more specific cases, but these cases will not be discussed here.

Still, ADRs are usually suboptimal. If the goal is to minimize the objective function, then the true optimal objective cannot exceed the optimal objective generated under ADRs, that is, applying ADRs actually gives an upper bound to the optimal solution. Thus, one approach to characterize the suboptimality of ADRs is to generate a lower bound to the optimal solution, and the difference between the upper and lower bounds can be used to estimate suboptimality. Several approaches exist for doing so. A simple approach is to approximate the uncertainty set using a small number of well-chosen points (“sampling” the uncertainty set), and to solve the model at each of these points. Since the true worst case scenario must be at least as bad as one of the selected points, this sampling approach must generate a solution no worse than the true optimal, or a lower bound to the objective. This method, although simple, suffers from being excessively optimistic unless a large number of points are sampled, but trying to solve the model at a large number of points can take a long time. Thus, authors have investigated methods for choosing fewer points that can better represent the whole uncertainty set to improve both the optimality and computation time of this method. Other methods seek to extend duality theory to the L2ARO problem to find maximal lower bounds, to various degrees of complexity and optimality.

The other important assumption in the given solution methodology is the fixed recourse condition, that is fixed to some matrix . If this is not true, that is instead some affine function of , then even under the ADR assumption, the problem is intractable [ref Gusliter]. However, Ben-Tal has proposed a tight approximation method for when the uncertainty set is the intersection of ellipsoidal sets, an approximation that becomes exact if is an ellipsoidal set.