|
|
| Line 1: |
Line 1: |
| {| id="mp-topbanner" style="width:100%; background:#f6f6f6; margin-top:1.2em; border:1px solid #ddd;"
| | Author: Aolei Cao (ac3237), Ziyang Li (zl986), Junjia Liang (jl4439) (ChemE 6800 Fall 2024) |
| | style="width:61%; color:#000;" |
| |
| {| style="width:100%; border:none; background:none;"
| |
| | style="text-align:center; white-space:nowrap; color:#000;" |
| |
| <div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">Welcome to the Cornell University Computational Optimization Open Textbook</div>
| |
|
| |
|
| This electronic textbook is a student-contributed open-source text covering a variety of topics on process optimization.<br />
| | Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu |
| '''If you have any comments or suggestions on this open textbook, please contact [https://www.engineering.cornell.edu/faculty-directory/fengqi-you Professor Fengqi You].'''
| |
| |}
| |
| |}
| |
|
| |
|
| {| id="mp-upper" style="width: 100%; margin:6px 0 0 0; background:none; border-spacing: 0px;"
| | == Introduction == |
| | class="MainPageBG" style="width:50%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |
| | == Problem formulation == |
| {| id="mp-left" style="width:100%; vertical-align:top; background:#f5fffa;"
| | === 1. Objective === |
| ! style="padding:2px;" | <h2 id="mp-tfa-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Linear Programming (LP)</h2>
| | Minimize the loss function <math>f(x)</math>, where <math>x \in R^n</math> and <math>x</math> is the weight vector to be optimized. |
| |-
| |
| | style="color:#000;" | <div id="mp-tfa" style="padding:2px 5px">
| |
| <li>[[Duality]]</li>
| |
| <li>[[Simplex algorithm]]</li>
| |
| <li>[[Computational complexity]]</li>
| |
| <li>[[Network flow problem]]</li>
| |
| <li>[[Interior-point method for LP]]</li>
| |
| <li>[[Optimization with absolute values]]</li>
| |
| <li>[[Matrix game (LP for game theory)]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">NonLinear Programming (NLP)</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Line search methods]]</li>
| |
| <li>[[Trust-region methods]]</li>
| |
| <li>[[Interior-point method for NLP]]</li>
| |
| <li>[[Conjugate gradient methods]]</li>
| |
| <li>[[Quasi-Newton methods]]</li>
| |
| <li>[[Quadratic programming]]</li>
| |
| <li>[[Sequential quadratic programming]]</li>
| |
| <li>[[Subgradient optimization]]</li>
| |
| <li>[[Mathematical programming with equilibrium constraints]]</li>
| |
| <li>[[Dynamic optimization]]</li>
| |
| <li>[[Geometric programming]]</li>
| |
| <li>[[Nondifferentiable Optimization]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Deterministic Global Optimization</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Exponential transformation]]</li>
| |
| <li>[[Logarithmic transformation]]</li>
| |
| <li>[[McCormick envelopes]]</li>
| |
| <li>[[Piecewise linear approximation]]</li>
| |
| <li>[[Spatial branch and bound method]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Dynamic Programming</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Markov decision process]]</li>
| |
| <li>[[Bellman equation]]</li>
| |
| <li>[[Eight step procedures]]</li>
| |
| <li>[[Stochastic dynamic programming]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Traditional Applications</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Facility location problem]]</li>
| |
| <li>[[Traveling salesman problem]]</li>
| |
| <li>[[Set covering problem]]</li>
| |
| <li>[[Quadratic assignment problem]]</li>
| |
| <li>[[Job shop scheduling]]</li>
| |
| <li>[[Newsvendor problem]]</li>
| |
| <li>[[Unit commitment problem]]</li>
| |
| <li>[[Portfolio optimization]]</li>
| |
| </div>
| |
|
| |
|
| |}
| | === 2. Parameters === |
| | * '''Gradient:''' |
| | <math>G_t = \nabla f(x_{t-1})</math> |
|
| |
|
| | style="border:1px solid transparent;" |
| | * '''Second moment estimate:''' |
| <!-- IN THE NEWS; ON THIS DAY --> | | <math>\hat{V}_t = \hat{\beta}_{2t} \hat{V}_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n)</math> |
| | class="MainPageBG" style="width:50%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|
| | ** Where: |
| {| id="mp-right" style="width:100%; vertical-align:top; background:#f5faff;" | | * <math>\hat{V}_t</math> is the running average of the squared gradient. |
| ! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer Linear Programming (MILP)</h2>
| | * <math>\hat{\beta}_{2t}</math> is the corrected decay parameter. |
| |-
| | * <math>\epsilon_1</math> is a regularization constant. |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-otd">
| |
| <li>[[Mixed-integer cuts]]</li>
| |
| <li>[[Disjunctive inequalities]]</li>
| |
| <li>[[Lagrangean duality]]</li>
| |
| <li>[[Column generation algorithms]]</li>
| |
| <li>[[Heuristic algorithms]]</li>
| |
| <li>[[Branch and cut]]</li>
| |
| <li>[[Local branching]]</li></div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer NonLinear Programming (MINLP)</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-otd">
| |
| <li>[[Signomial problems]]</li>
| |
| <li>[[Mixed-integer linear fractional programming (MILFP)]]</li>
| |
| <li>[[Convex generalized disjunctive programming (GDP)]]</li>
| |
| <li>[[Nonconvex generalized disjunctive programming (GDP)]]</li>
| |
| <li>[[Branch and bound (BB) for MINLP]]</li>
| |
| <li>[[Branch and cut for MINLP]]</li>
| |
| <li>[[Generalized Benders decomposition (GBD)]]</li>
| |
| <li>[[Outer-approximation (OA)]]</li>
| |
| <li>[[Extended cutting plane (ECP)]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization under Uncertainty</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Stochastic programming]]</li>
| |
| <li>[[Chance-constraint method]]</li>
| |
| <li>[[Fuzzy programming]]</li>
| |
| <li>[[Classical robust optimization]]</li>
| |
| <li>[[Adaptive robust optimization]]</li>
| |
| <li>[[Data driven robust optimization]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization for Machine Learning and Data Analytics</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Stochastic gradient descent]]</li>
| |
| <li>[[Momentum]]</li>
| |
| <li>[[AdaGrad]]</li>
| |
| <li>[[RMSProp]]</li>
| |
| <li>[[Adam]]</li>
| |
| <li>[[Frank-Wolfe]]</li>
| |
| <li>[[Sparse Reconstruction with Compressed Sensing]]</li>
| |
| </div>
| |
| |-
| |
| ! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Emerging Applications</h2>
| |
| |-
| |
| | style="color:#000;padding:2px 5px 5px" | <div id="mp-dyk">
| |
| <li>[[Wing shape optimization]]</li>
| |
| <li>[[Optimization in game theory]]</li>
| |
| <li>[[Quantum computing for optimization]]</li>
| |
| </div>
| |
| |}
| |
| |}
| |
|
| |
|
| == Sponsor == | | * '''Step size:''' |
| [[File:Peese-logo.jpg|Cornell Prof. Fengqi You Research Group |link=https://www.peese.org]]
| | <math>\alpha_t = \max(\epsilon_2, \text{RMS}(x_{t-1})) \rho_t</math> |
| | ** Where: |
| | * <math>\rho_t</math> is the relative step size. |
| | * <math>\epsilon_2</math> is a regularization constant. |
| | * <math>\text{RMS}</math> is the root mean square, defined as: |
| | <math>u_{xt} = \frac{-g_{xt}}{\sqrt{\hat{v}_{xt}}}</math> |
| | <math>\text{RMS}(U_t) = \text{RMS}_{x \in X}(u_{xt}) = \sqrt{\text{Mean}_{x \in X}\left(\frac{(g_{xt})^2}{\hat{v}_{xt}}\right)}</math> |
|
| |
|
| </noinclude>__NOTOC____NOEDITSECTION__ | | === 3. Problem Formulation === |
| | ==== Adafactor for Weighted Vectors ==== |
| | '''Inputs:''' |
| | * Initial point: <math>X_0 \in \mathbb{R}^n</math> |
| | * Relative step sizes: <math>\rho_t</math> for <math>t = 1</math> to <math>T</math> |
| | * Second moment decay: <math>\hat{\beta}_{2t}</math> for <math>t = 1</math> to <math>T</math>, with <math>\hat{\beta}_{21} = 0</math> |
| | * Regularization constants: <math>\epsilon_1, \epsilon_2</math> |
| | * Clipping threshold: <math>d</math> |
| | |
| | '''Algorithm:''' |
| | # For <math>t = 1</math> to <math>T</math>: |
| | ## Compute adaptive step size: |
| | <math>\alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t</math> |
| | ## Compute gradient: |
| | <math>G_t = \nabla f_t(X_{t-1})</math> |
| | ## Update second moment estimate: |
| | <math>\hat{V}_t = \hat{\beta}_{2t} \hat{V}_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n)</math> |
| | ## Compute normalized gradient: |
| | <math>U_t = \frac{G_t}{\sqrt{\hat{V}_t}}</math> |
| | ## Apply clipping: |
| | <math>\hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}</math> |
| | ## Update parameter: |
| | <math>X_t = X_{t-1} - \alpha_t \hat{U}_t</math> |
| | # End for |
| | |
| | ==== Adafactor for Weighted Matrices ==== |
| | '''Inputs:''' |
| | * Initial point: <math>X_0 \in \mathbb{R}^{n \times m}</math> |
| | * Relative step sizes: <math>\rho_t</math> for <math>t = 1</math> to <math>T</math> |
| | * Second moment decay: <math>\hat{\beta}_{2t}</math> for <math>t = 1</math> to <math>T</math>, with <math>\hat{\beta}_{21} = 0</math> |
| | * Regularization constants: <math>\epsilon_1, \epsilon_2</math> |
| | * Clipping threshold: <math>d</math> |
| | |
| | '''Algorithm:''' |
| | # For <math>t = 1</math> to <math>T</math>: |
| | ## Compute adaptive step size: |
| | <math>\alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t</math> |
| | ## Compute gradient: |
| | <math>G_t = \nabla f_t(X_{t-1})</math> |
| | ## Update row-wise second moment: |
| | <math>R_t = \hat{\beta}_{2t} R_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n 1_m^T) 1_m</math> |
| | ## Update column-wise second moment: |
| | <math>C_t = \hat{\beta}_{2t} C_{t-1} + (1 - \hat{\beta}_{2t}) 1_n^T (G_t^2 + \epsilon_1 1_n 1_m^T)</math> |
| | ## Update overall second moment estimate: |
| | <math>\hat{V}_t = \frac{R_t C_t}{1_n^T R_t}</math> |
| | ## Compute normalized gradient: |
| | <math>U_t = \frac{G_t}{\sqrt{\hat{V}_t}}</math> |
| | ## Apply clipping: |
| | <math>\hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}</math> |
| | ## Update parameter: |
| | <math>X_t = X_{t-1} - \alpha_t \hat{U}_t</math> |
| | # End for |
| | |
| | === 4. Proposed Hyperparameters for Adafactor === |
| | * Regularization constant 1: <math>\epsilon_1 = 10^{-30}</math> |
| | * Regularization constant 2: <math>\epsilon_2 = 10^{-3}</math> |
| | * Clipping threshold: <math>d = 1</math> |
| | * Relative step size: <math>\rho_t = \min(10^{-2}, 1/\sqrt{t})</math> |
| | * Second moment decay: <math>\hat{\beta}_{2t} = 1 - t^{-0.8}</math> |
| | |
| | == Numerical Examples == |
| | == Applications == |
| | == Conclusion == |
| | == Reference == |
Author: Aolei Cao (ac3237), Ziyang Li (zl986), Junjia Liang (jl4439) (ChemE 6800 Fall 2024)
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
Introduction
Problem formulation
1. Objective
Minimize the loss function
, where
and
is the weight vector to be optimized.
2. Parameters
** Where:
*
is the running average of the squared gradient.
*
is the corrected decay parameter.
*
is a regularization constant.
** Where:
*
is the relative step size.
*
is a regularization constant.
*
is the root mean square, defined as:
3. Problem Formulation
Adafactor for Weighted Vectors
Inputs:
- Initial point:

- Relative step sizes:
for
to 
- Second moment decay:
for
to
, with 
- Regularization constants:

- Clipping threshold:

Algorithm:
- For
to
:
- Compute adaptive step size:
- Compute gradient:
- Update second moment estimate:
- Compute normalized gradient:
- Apply clipping:
- Update parameter:
- End for
Adafactor for Weighted Matrices
Inputs:
- Initial point:

- Relative step sizes:
for
to 
- Second moment decay:
for
to
, with 
- Regularization constants:

- Clipping threshold:

Algorithm:
- For
to
:
- Compute adaptive step size:
- Compute gradient:
- Update row-wise second moment:
- Update column-wise second moment:
- Update overall second moment estimate:
- Compute normalized gradient:
- Apply clipping:
- Update parameter:
- End for
4. Proposed Hyperparameters for Adafactor
- Regularization constant 1:

- Regularization constant 2:

- Clipping threshold:

- Relative step size:

- Second moment decay:

Numerical Examples
Applications
Conclusion
Reference