Cornell University Computational Optimization Open Textbook - Optimization Wiki - User contributions [en]

2024 Cornell Optimization Open Textbook Feedback

2024-12-17T22:33:03Z

Wc593: /* Differential evolution */

== [[Computational complexity]] ==
*References were not provided to support applications on computer science and quantum computing.

== [[Heuristic algorithms]] ==
*Adding more details to the Application Section is recommended.
*Please use FigureX as a reference in the text.
*Avoid using contraction (e.g., there's) in scientific writing.

== [[Local branching]] ==
*The title should be “Problem Solution” instead of ”Problem Resolution“.
*The added figure is too small for viewing from the main page and also lacks captions and explanations.
*There are still some formatting issues with the subtitles. For example, some are in bold form while others are not in the content.

== [[Trust-region methods]] ==
*Please label the figures with numbers and direct readers to the figures in text.
*References were not well formatted (e.g., Yuan, Y. (2015b)).

== [[Quadratic programming]] ==
*An inappropriate nonconvex example was used in the Wiki Page. The problem is an MIQP problem instead of a single QP problem.
*Some symbols used in the pseudocodes still lack explanation.

== [[Subgradient optimization]] ==
*References were not well formatted.
*Symbols used in equations lack explanations.
*The application section was not well drafted and supported by references.

== [[Dynamic optimization]] ==
*For the algorithm description, it would be better to have a pseudocode or a flow chart to summarize it.
*Please use Latex equation editor for equations.
*Please number and label all figures and tables, use FigureX, TableX as a reference in the text.
*Please place references after the period at the end of each sentence and avoid after the optimization problems. This goes for all the sections in the wiki.

== [[Nondifferentiable Optimization]] ==
*Most contents are from previous year. The Wiki page should be your original content.
*It is also recommended to revise the citation format based on our example files.
*Please add the case of non convex functions.
*Please add references to support the content in the Introduction section. In-text citations are required.
*Please use Latex equation editor for typing symbols and equations.
*The Numerical Example section is incomplete.
*Application and Conclusion sections are missing.

== [[Evolutionary multimodal optimization]] ==
*Please include more citations in Algorithm Discussion section to support the contents.
*Please use FigureX as a reference in the text.

== [[Stackelberg leadership model]] ==
*Please use flowchart/pseudocode for representing the steps of algorithm
*Check the consistency of abbreviations (e.g. what is PAWS?)

== [[Quadratic constrained quadratic programming]] ==
*Form of in-text citation is not proper
*Abbreviations should be introduced only once throughout all sections (e.g., QCQP, QP, SDP)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Derivative free optimization]] ==
*Please try to include a flowchart or pseudocode for illustration of the algorithm.
*If DFO is defined in the previous sections, please use such abbreviations consistently (same for other abbreviations).

== [[Signomial problems]] ==
*There is a lack of numerical examples for illustrating the global optimization method introduced on the Wiki page.
*The clarification of the equations used in the Introduction section needs to be improved.

== [[Adadelta]] ==
*There is an extra line of citation links in the References section.

== [[Adafactor]] ==
*The clarity of the alghrithm and numerical example session needs to be improved. It is convenient to list all equations, but not good to present it to others.
*More citations are needed for supporting your statement.

== [[AdamW]] ==
*For the application section, it would be good to emphasize the advantages of AdamW compared to other approach by citing the quantitative results from previous literature.
*Avoid using pronouns (e.g., we, let's) in scientific writing.

== [[Adamax]] ==
*The pseudocode was not well defined. More explanations and logical flow are needed.
*Citations need to be included in the punctuation like period.
*Since this is a modified version of Adam, a comparison with Adam is needed for the numerical example.
*A machine learning case is needed since this is an algorithm designed for machine learning models.

== [[FTRL algorithm]] ==
*It is very difficult to clarify different levels of subtitles based on the text.
*References were not well formatted. For example, "[1]" is added to many references with no meaning.
*Some symbols used in the equation lack explanations.

== [[LossScaleOptimizer|LossscaleOptimizer]] ==
*More citations are needed for supporting the statement and applications.

== [[Nadam]] ==
*The result figures of the numerical example have not been placed in the proper place.
*The numerical example and application parts are still not representative of illustrating NDAM's performance for machine learning models.

== [[Bayesian optimization|Beyesian optimization]] ==
*Citation form (in text) should be double checked.

== [[Genetic algorithm]] ==
*Please provide some citations for supporting your statement (e.g. in Introduction)

== [[Simulated annealing]] ==
*Once the abbr. is defined please use it throughout the context (e.g. SA)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Particle swarm optimization]] ==
*The caption was missed for the iteration results.

== [[Differential evolution]] ==
*Once abbreviations are introduced, please make sure the full term do not appear throughout the context.
*Visualization section is not necessary. Figures should be embedded in the Wiki page.
*Conclusion and References sections should not belong to Application section.
*References are already shown at the bottom of the page. The extra list should be removed.

2024 Cornell Optimization Open Textbook Feedback

2024-12-17T22:29:48Z

Wc593: /* Evolutionary multimodal optimization */

== [[Computational complexity]] ==
*References were not provided to support applications on computer science and quantum computing.

== [[Heuristic algorithms]] ==
*Adding more details to the Application Section is recommended.
*Please use FigureX as a reference in the text.
*Avoid using contraction (e.g., there's) in scientific writing.

== [[Local branching]] ==
*The title should be “Problem Solution” instead of ”Problem Resolution“.
*The added figure is too small for viewing from the main page and also lacks captions and explanations.
*There are still some formatting issues with the subtitles. For example, some are in bold form while others are not in the content.

== [[Trust-region methods]] ==
*Please label the figures with numbers and direct readers to the figures in text.
*References were not well formatted (e.g., Yuan, Y. (2015b)).

== [[Quadratic programming]] ==
*An inappropriate nonconvex example was used in the Wiki Page. The problem is an MIQP problem instead of a single QP problem.
*Some symbols used in the pseudocodes still lack explanation.

== [[Subgradient optimization]] ==
*References were not well formatted.
*Symbols used in equations lack explanations.
*The application section was not well drafted and supported by references.

== [[Dynamic optimization]] ==
*For the algorithm description, it would be better to have a pseudocode or a flow chart to summarize it.
*Please use Latex equation editor for equations.
*Please number and label all figures and tables, use FigureX, TableX as a reference in the text.
*Please place references after the period at the end of each sentence and avoid after the optimization problems. This goes for all the sections in the wiki.

== [[Nondifferentiable Optimization]] ==
*Most contents are from previous year. The Wiki page should be your original content.
*It is also recommended to revise the citation format based on our example files.
*Please add the case of non convex functions.
*Please add references to support the content in the Introduction section. In-text citations are required.
*Please use Latex equation editor for typing symbols and equations.
*The Numerical Example section is incomplete.
*Application and Conclusion sections are missing.

== [[Evolutionary multimodal optimization]] ==
*Please include more citations in Algorithm Discussion section to support the contents.
*Please use FigureX as a reference in the text.

== [[Stackelberg leadership model]] ==
*Please use flowchart/pseudocode for representing the steps of algorithm
*Check the consistency of abbreviations (e.g. what is PAWS?)

== [[Quadratic constrained quadratic programming]] ==
*Form of in-text citation is not proper
*Abbreviations should be introduced only once throughout all sections (e.g., QCQP, QP, SDP)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Derivative free optimization]] ==
*Please try to include a flowchart or pseudocode for illustration of the algorithm.
*If DFO is defined in the previous sections, please use such abbreviations consistently (same for other abbreviations).

== [[Signomial problems]] ==
*There is a lack of numerical examples for illustrating the global optimization method introduced on the Wiki page.
*The clarification of the equations used in the Introduction section needs to be improved.

== [[Adadelta]] ==
*There is an extra line of citation links in the References section.

== [[Adafactor]] ==
*The clarity of the alghrithm and numerical example session needs to be improved. It is convenient to list all equations, but not good to present it to others.
*More citations are needed for supporting your statement.

== [[AdamW]] ==
*For the application section, it would be good to emphasize the advantages of AdamW compared to other approach by citing the quantitative results from previous literature.
*Avoid using pronouns (e.g., we, let's) in scientific writing.

== [[Adamax]] ==
*The pseudocode was not well defined. More explanations and logical flow are needed.
*Citations need to be included in the punctuation like period.
*Since this is a modified version of Adam, a comparison with Adam is needed for the numerical example.
*A machine learning case is needed since this is an algorithm designed for machine learning models.

== [[FTRL algorithm]] ==
*It is very difficult to clarify different levels of subtitles based on the text.
*References were not well formatted. For example, "[1]" is added to many references with no meaning.
*Some symbols used in the equation lack explanations.

== [[LossScaleOptimizer|LossscaleOptimizer]] ==
*More citations are needed for supporting the statement and applications.

== [[Nadam]] ==
*The result figures of the numerical example have not been placed in the proper place.
*The numerical example and application parts are still not representative of illustrating NDAM's performance for machine learning models.

== [[Bayesian optimization|Beyesian optimization]] ==
*Citation form (in text) should be double checked.

== [[Genetic algorithm]] ==
*Please provide some citations for supporting your statement (e.g. in Introduction)

== [[Simulated annealing]] ==
*Once the abbr. is defined please use it throughout the context (e.g. SA)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Particle swarm optimization]] ==
*The caption was missed for the iteration results.

== [[Differential evolution]] ==
*Make sure abbreviations are consistent throughout the context (e.g. Please define DE at the introduction section not in the later section, please check the similar cases as well).
*Visualization section is not necessary. Figures should be embedded in the Wiki page.
*Conclusion and References sections should not belong to Application section.
*References are already shown at the bottom of the page. The extra list should be removed.

2024 Cornell Optimization Open Textbook Feedback

2024-12-17T20:47:48Z

Wc593: Edited Team 23-26 feedbacks

== [[Computational complexity]] ==
*References were not provided to support applications on computer science and quantum computing.

== [[Heuristic algorithms]] ==
*Adding more details to the Application Section is recommended.
*Please use FigureX as a reference in the text.
*Avoid using contraction (e.g., there's) in scientific writing.

== [[Local branching]] ==
*The title should be “Problem Solution” instead of ”Problem Resolution“.
*The added figure is too small for viewing from the main page and also lacks captions and explanations.
*There are still some formatting issues with the subtitles. For example, some are in bold form while others are not in the content.

== [[Trust-region methods]] ==
*Please label the figures with numbers and direct readers to the figures in text.
*References were not well formatted (e.g., Yuan, Y. (2015b)).

== [[Quadratic programming]] ==
*An inappropriate nonconvex example was used in the Wiki Page. The problem is an MIQP problem instead of a single QP problem.
*Some symbols used in the pseudocodes still lack explanation.

== [[Subgradient optimization]] ==
*References were not well formatted.
*Symbols used in equations lack explanations.
*The application section was not well drafted and supported by references.

== [[Dynamic optimization]] ==
*For the algorithm description, it would be better to have a pseudocode or a flow chart to summarize it.
*Please use Latex equation editor for equations.
*Please number and label all figures and tables, use FigureX, TableX as a reference in the text.
*Please place references after the period at the end of each sentence and avoid after the optimization problems. This goes for all the sections in the wiki.

== [[Nondifferentiable Optimization]] ==
*Most contents are from previous year. The Wiki page should be your original content.
*It is also recommended to revise the citation format based on our example files.
*Please add the case of non convex functions.
*Please add references to support the content in the Introduction section. In-text citations are required.
*Please use Latex equation editor for typing symbols and equations.
*The Numerical Example section is incomplete.
*Application and Conclusion sections are missing.

== [[Evolutionary multimodal optimization]] ==
*Please include more citations in Algorithm Discussion section to support the contents.
*Adding some figures may help reader to understand the content.
*Please use FigureX as a reference in the text.

== [[Stackelberg leadership model]] ==
*Please use flowchart/pseudocode for representing the steps of algorithm
*Check the consistency of abbreviations (e.g. what is PAWS?)

== [[Quadratic constrained quadratic programming]] ==
*Form of in-text citation is not proper
*Abbreviations should be introduced only once throughout all sections (e.g., QCQP, QP, SDP)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Derivative free optimization]] ==
*Please try to include a flowchart or pseudocode for illustration of the algorithm.
*If DFO is defined in the previous sections, please use such abbreviations consistently (same for other abbreviations).

== [[Signomial problems]] ==
*There is a lack of numerical examples for illustrating the global optimization method introduced on the Wiki page.
*The clarification of the equations used in the Introduction section needs to be improved.

== [[Adadelta]] ==
*There is an extra line of citation links in the References section.

== [[Adafactor]] ==
*The clarity of the alghrithm and numerical example session needs to be improved. It is convenient to list all equations, but not good to present it to others.
*More citations are needed for supporting your statement.

== [[AdamW]] ==
*For the application section, it would be good to emphasize the advantages of AdamW compared to other approach by citing the quantitative results from previous literature.
*Avoid using pronouns (e.g., we, let's) in scientific writing.

== [[Adamax]] ==
*The pseudocode was not well defined. More explanations and logical flow are needed.
*Citations need to be included in the punctuation like period.
*Since this is a modified version of Adam, a comparison with Adam is needed for the numerical example.
*A machine learning case is needed since this is an algorithm designed for machine learning models.

== [[FTRL algorithm]] ==
*It is very difficult to clarify different levels of subtitles based on the text.
*References were not well formatted. For example, "[1]" is added to many references with no meaning.
*Some symbols used in the equation lack explanations.

== [[LossScaleOptimizer|LossscaleOptimizer]] ==
*More citations are needed for supporting the statement and applications.

== [[Nadam]] ==
*The result figures of the numerical example have not been placed in the proper place.
*The numerical example and application parts are still not representative of illustrating NDAM's performance for machine learning models.

== [[Bayesian optimization|Beyesian optimization]] ==
*Citation form (in text) should be double checked.

== [[Genetic algorithm]] ==
*Please provide some citations for supporting your statement (e.g. in Introduction)

== [[Simulated annealing]] ==
*Once the abbr. is defined please use it throughout the context (e.g. SA)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Particle swarm optimization]] ==
*The caption was missed for the iteration results.

== [[Differential evolution]] ==
*Make sure abbreviations are consistent throughout the context (e.g. Please define DE at the introduction section not in the later section, please check the similar cases as well).
*Visualization section is not necessary. Figures should be embedded in the Wiki page.
*Conclusion and References sections should not belong to Application section.
*References are already shown at the bottom of the page. The extra list should be removed.

2024 Cornell Optimization Open Textbook Feedback

2024-12-17T20:24:30Z

Wc593: Edited Team19-22 feedback

== [[Computational complexity]] ==
*References were not provided to support applications on computer science and quantum computing.

== [[Heuristic algorithms]] ==
*More details on the Application Section.
*Please come up with one numerical example with hill-climbing.
*Please add more references to support the content in the Introduction section.
*Please add figure captions for the pseudocodes.
*Avoid using contraction (e.g., we're) and pronouns (e.g., we) in scientific writing.
*ResearchGate is not a publisher. Please check the reference again.

== [[Local branching]] ==
*The title should be “Problem Solution” instead of ”Problem Resolution“.
*The added figure is too small for viewing from the main page and also lacks captions and explanations.
*There are still some formatting issues with the subtitles. For example, some are in bold form while others are not in the content.

== [[Trust-region methods]] ==
*Please label the figures with numbers and direct readers to the figures in text.
*References were not well formatted (e.g., Yuan, Y. (2015b)).

== [[Quadratic programming]] ==
*An inappropriate nonconvex example was used in the Wiki Page. The problem is an MIQP problem instead of a single QP problem.
*Some symbols used in the pseudocodes still lack explanation.

== [[Subgradient optimization]] ==
*References were not well formatted.
*Symbols used in equations lack explanations.
*The application section was not well drafted and supported by references.

== [[Dynamic optimization]] ==
*For the algorithm description, it would be better to have a pseudocode or a flow chart to summarize it.
*Please use Latex equation editor for equations.
*Please number and label all figures and tables, use FigureX, TableX as a reference in the text.
*Please place references after the period at the end of each sentence and avoid after the optimization problems. This goes for all the sections in the wiki.

== [[Nondifferentiable Optimization]] ==
*Most contents are from previous year. The Wiki page should be your original content.
*It is also recommended to revise the citation format based on our example files.
*Please add the case of non convex functions.
*Please add references to support the content in the Introduction section. In-text citations are required.
*Please use Latex equation editor for typing symbols and equations.
*The Numerical Example section is incomplete.
*Application and Conclusion sections are missing.

== [[Evolutionary multimodal optimization]] ==
*It would be better to have subsections of each approach in the Algorithm Discussion section to improve the clarity.
*The logic flow and clarity of both the Algorithm Discussion section and Numerical Example section need to be largely improved.
*Please include citations to support the Wiki contents.
*Please make sure abbreviations should be consistent throughout the context.
*In-text citations are required.
*Please add mathematical expressions to Algorithm Discussion section for explicitness.
*Please provide step by step calculation for the numerical example.
*Adding some figures may help reader to understand the content.

== [[Stackelberg leadership model]] ==
*Please use flowchart/pseudocode for representing the steps of algorithm
*Check the consistency of abbreviations (e.g. what is PAWS?)

== [[Quadratic constrained quadratic programming]] ==
*Form of in-text citation is not proper
*Abbreviations should be introduced only once throughout all sections (e.g., QCQP, QP, SDP)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Derivative free optimization]] ==
*Please try to include a flowchart or pseudocode for illustration of the algorithm.
*If DFO is defined in the previous sections, please use such abbreviations consistently (same for other abbreviations).

== [[Signomial problems]] ==
*There is a lack of numerical examples for illustrating the global optimization method introduced on the Wiki page.
*The clarification of the equations used in the Introduction section needs to be improved.

== [[Adadelta]] ==
*There is an extra line of citation links in the References section.

== [[Adafactor]] ==
*The clarity of the alghrithm and numerical example session needs to be improved. It is convenient to list all equations, but not good to present it to others.
*More citations are needed for supporting your statement.

== [[AdamW]] ==
*For the application section, it would be good to emphasize the advantages of AdamW compared to other approach by citing the quantitative results from previous literature.
*Avoid using pronouns (e.g., we, let's) in scientific writing.

== [[Adamax]] ==
*The pseudocode was not well defined. More explanations and logical flow are needed.
*Citations need to be included in the punctuation like period.
*Since this is a modified version of Adam, a comparison with Adam is needed for the numerical example.
*A machine learning case is needed since this is an algorithm designed for machine learning models.

== [[FTRL algorithm]] ==
*It is very difficult to clarify different levels of subtitles based on the text.
*References were not well formatted. For example, "[1]" is added to many references with no meaning.
*Some symbols used in the equation lack explanations.

== [[LossScaleOptimizer|LossscaleOptimizer]] ==
*More citations are needed for supporting the statement and applications.

== [[Nadam]] ==
*The result figures of the numerical example have not been placed in the proper place.
*The numerical example and application parts are still not representative of illustrating NDAM's performance for machine learning models.

== [[Bayesian optimization|Beyesian optimization]] ==
*Citation form (in text) should be double checked.

== [[Genetic algorithm]] ==
*Please provide some citations for supporting your statement (e.g. in Introduction)

== [[Simulated annealing]] ==
*Once the abbr. is defined please use it throughout the context (e.g. SA)
*Avoid using pronouns (e.g., we) in scientific writing.

== [[Particle swarm optimization]] ==
*The caption was missed for the iteration results.

== [[Differential evolution]] ==
*Provide more references to support the statements made in the Application Section.
*Make sure abbreviations are consistent throughout the context (e.g. Please define DE at the introduction section not in the later section, please check the similar cases as well).
*In-text citations are required.
*The symbols in the pseudocode is not in proper format. Please use Latex equation editor for this.
*Please number and label all figures, use FigureX as a reference in the text.
*Please add more details to numerical example (e.g., replace x = -7 with -5 because f(x1) > f(u1))

Main Page

2024-12-15T22:35:50Z

Wc593:

{| id="mp-topbanner" style="width:100%; background:#f6f6f6; margin-top:1.2em; border:1px solid #ddd;"
| style="width:61%; color:#000;" |
{| style="width:100%; border:none; background:none;"
| style="text-align:center; white-space:nowrap; color:#000;" |
<div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">Welcome to the Cornell University Computational Optimization Open Textbook</div>

This electronic textbook is a student-contributed open-source text covering a variety of topics on process optimization. 
'''If you have any comments or suggestions on this open textbook, please contact [https://www.engineering.cornell.edu/faculty-directory/fengqi-you Professor Fengqi You].'''
|}
|}

{| id="mp-upper" style="width: 100%; margin:6px 0 0 0; background:none; border-spacing: 0px;"
| class="MainPageBG" style="width:50%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |
{| id="mp-left" style="width:100%; vertical-align:top; background:#f5fffa;"
! style="padding:2px;" | <h2 id="mp-tfa-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Linear Programming (LP)</h2>
|-
| style="color:#000;" | <div id="mp-tfa" style="padding:2px 5px 5px 15px">
<li>[[Duality]]</li>
<li>[[Simplex algorithm]]</li>
<li>[[Computational complexity]]</li>
<li>[[Network flow problem]]</li>
<li>[[Interior-point method for LP]]</li>
<li>[[Optimization with absolute values]]</li>
<li>[[Matrix game (LP for game theory)]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">NonLinear Programming (NLP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Line search methods]]</li>
<li>[[Trust-region methods]]</li>
<li>[[Interior-point method for NLP]]</li>
<li>[[Conjugate gradient methods]]</li>
<li>[[Quasi-Newton methods]]</li>
<li>[[Quadratic programming]]</li>
<li>[[Sequential quadratic programming]]</li>
<li>[[Subgradient optimization]]</li>
<li>[[Mathematical programming with equilibrium constraints]]</li>
<li>[[Dynamic optimization]]</li>
<li>[[Geometric programming]]</li>
<li>[[Nondifferentiable Optimization]]</li>
<li>[[Evolutionary multimodal optimization]]</li>
<li>[[Stackelberg leadership model]]</li>
<li>[[Quadratic constrained quadratic programming]]</li>
<li>[[Derivative free optimization]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Deterministic Global Optimization</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Exponential transformation]]</li>
<li>[[Logarithmic transformation]]</li>
<li>[[McCormick envelopes]]</li>
<li>[[Piecewise linear approximation]]</li>
<li>[[Spatial branch and bound method]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Dynamic Programming</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Markov decision process]]</li>
<li>[[Bellman equation]]</li>
<li>[[Eight step procedures]]</li>
<li>[[Stochastic dynamic programming]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Traditional Applications</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Facility location problem]]</li>
<li>[[Traveling salesman problem]]</li>
<li>[[Set covering problem]]</li>
<li>[[Quadratic assignment problem]]</li>
<li>[[Job shop scheduling]]</li>
<li>[[Newsvendor problem]]</li>
<li>[[Unit commitment problem]]</li>
<li>[[Portfolio optimization]]</li>
<li>[[A-star algorithm]]</li>
</div>

|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;"> Emerging Applications</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Wing shape optimization]]</li>
<li>[[Optimization in game theory]]</li>
<li>[[Quantum computing for optimization]]</li>
</div>

|}

| style="border:1px solid transparent;" |
| class="MainPageBG" style="width:50%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|
{| id="mp-right" style="width:100%; vertical-align:top; background:#f5faff;"
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer Linear Programming (MILP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-otd">
<li>[[Mixed-integer cuts]]</li>
<li>[[Disjunctive inequalities]]</li>
<li>[[Lagrangean duality]]</li>
<li>[[Column generation algorithms]]</li>
<li>[[Heuristic algorithms]]</li>
<li>[[Branch and cut]]</li>
<li>[[Local branching]]</li></div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer NonLinear Programming (MINLP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-otd">
<li>[[Signomial problems]]</li>
<li>[[Mixed-integer linear fractional programming (MILFP)]]</li>
<li>[[Convex generalized disjunctive programming (GDP)]]</li>
<li>[[Nonconvex generalized disjunctive programming (GDP)]]</li>
<li>[[Branch and bound (BB) for MINLP]]</li>
<li>[[Branch and cut for MINLP]]</li>
<li>[[Generalized Benders decomposition (GBD)]]</li>
<li>[[Outer-approximation (OA)]]</li>
<li>[[Extended cutting plane (ECP)]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization under Uncertainty</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Stochastic programming]]</li>
<li>[[Chance-constraint method]]</li>
<li>[[Fuzzy programming]]</li>
<li>[[Classical robust optimization]]</li>
<li>[[Adaptive robust optimization]]</li>
<li>[[Data driven robust optimization]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization for Machine Learning and Data Analytics</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Stochastic gradient descent]]</li>
<li>[[Momentum]]</li>
<li>[[AdaGrad]]</li>
<li>[[RMSProp]]</li>
<li>[[Adam]]</li>
<li>[[Frank-Wolfe]]</li>
<li>[[Sparse Reconstruction with Compressed Sensing]]</li>
<li>[[Adadelta]]</li>
<li>[[Adafactor]]</li>
<li>[[AdamW]]</li>
<li>[[Adamax]]</li>
<li>[[FTRL algorithm]]</li>
<li>[[Lion algorithm]]</li>
<li>[[LossScaleOptimizer]]</li>
<li>[[Nadam]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Black-box Optimization</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Bayesian optimization]]</li>
<li>[[Genetic algorithm]]</li>
<li>[[Simulated annealing]]</li>
<li>[[Particle swarm optimization]]</li>
<li>[[Differential evolution]]</li>
</div>
|}
|}

== Sponsor ==
[[File:Peese-logo.jpg|Cornell Prof. Fengqi You Research Group |link=https://www.peese.org]]

</noinclude>__NOTOC____NOEDITSECTION__

Dynamic optimization

2024-12-15T22:30:29Z

Wc593: Undo revision 6844 by SYSEN5800TAs (talk)

This web page is a duplicate of https://optimization.mccormick.northwestern.edu/index.php/Dynamic_optimization

Authors: Hanyu Shi (ChE 345 Spring 2014)

Steward: Dajun Yue, Fengqi You

Date Presented: Apr. 10, 2014

Authors: Issac Newton, Albert Einstein (ChE 345 Spring 2014)

Steward: Dajun Yue, Fengqi You

Date Presented: Apr. 10, 2014

==Introduction==
In this work, we will focus on the “at the same time” or direct transcription approach which allow a simultaneous method for the dynamic optimization problem. In particular, we formulate the dynamic optimization model with orthogonal collocation methods. These methods can also be regarded as a special class of implicit Runge–Kutta (IRK) methods. We apply the concepts and properties of IRK methods to the differential equations directly. With locating potential break points appropriately, this approach can model large-scale optimization formulations with the property of maintaining accurate state and control profiles. We mainly follows Biegler's work.

==General Dynamic Optimization Problem==

Differential algebraic equations in process engineering often have following characteristics: first,large-scale models – not easily scaled; second, sparse but no regular structure; third, direct linear solvers widely used; last, coarse-grained decomposition of linear algebra.

[[File:shy 345 wiki fig 02.png]]

Figure 2. Dynamic optimization approach

There are several approaches can be applied to solve the dynamic optimization problems, which are shown in Figure 2.

Differential equations can usually be used to express conservation Laws, such as mass, energy, momentum. Algebraic equations can usually be used to express constitutive equations, equilibrium, such as physical properties, hydraulics, rate laws. Algebraic equations usually have semi-explicit form and assume to be index one i.e., algebraic variables can be solved uniquely by algebraic equations.

Dynamic Optimization Problem has the following general form:

<math>

\begin{array}{l}
\min \;\Phi \left( {z\left( t \right),y\left( t \right),u\left( t \right),p,{t_f}} \right)\\
s.t.\;\;\frac{{dz\left( t \right)}}{{dt}} = f\left( {z\left( t \right),y\left( t \right),u\left( t \right),p} \right)\\
g\left( {z\left( t \right),y\left( t \right),u\left( t \right),p} \right) = 0\\
{z^0} = z\left( 0 \right)\\
{z^l} \le z\left( t \right) \le {z^u}\\
{y^l} \le y\left( t \right) \le {y^u}\\
{u^l} \le u\left( t \right) \le {u^u}\\
{p^l} \le p \le {p^u}
\end{array}

</math>

<math>t</math>, time

<math>z</math>, differential variables y, algebraic variables

<math>t_f</math> , final time

<math>u</math>, control variables

<math>p</math>, time independent parameters

(This follows Biegler's slides ）

==Derivation of Collocation Methods==
We first consider the differential algebraic system shown as follows:

<math>

\begin{array}{l}
\frac{{dz}}{{dt}} = f\left( {z\left( t \right),y\left( t \right),u\left( t \right),p} \right),\;z\left( 0 \right) = {z_0}\\
g\left( {z\left( t \right),y\left( t \right),u\left( t \right),p} \right) = 0
\end{array}

</math> （1）

The simultaneous approach requires discretizing of the state variables <math> z\left( t \right) </math>, output variables <math> y\left( t \right) </math> and manipulate variables <math> u\left( t \right) </math>. We require the following properties to yield an efficient NLP formulation:

1) The explicit ODE discretization holds little computational advantage because Since the nonlinear program requires an iterative solution of the KKT conditions.

2) A single step approach which is self-starting and does not rely on smooth profiles that extend over previous time steps is preferred, because the NLP formulation needs to deal with discontinuities in control profiles.

3) The high-order implicit discretization provides accurate profiles with relatively few finite elements. As a result, the number of finite elements need not be excessively large, particularly for problems with many states and controls.

[[File:shy 345 wiki fig 01.png]]

Figure 1: Polynomial approximation for state profile across a finite element.

==Polynomial Representation for ODE Solutions==

We consider the following ODE:

<math>

\frac{{dz}}{{dt}} = f\left( {z\left( t \right),t} \right),\;z\left( 0 \right) = {z_0}

</math> (2)

to apply the collocation method, we need to solve the differential equation (2) at certain points. For the state variable, we consider a polynomial approximation of order <math>K+1</math> (i.e., degree ≤ <math>K</math> ) over a single finite element, as shown in the above figure. This polynomial, denoted by <math>{z^K}(t)</math>, can be represented in a number of equivalent ways, including the power series representation shown in equation (3), the Newton divided difference approximation, or B-splines.

<math>

\frac{{dz}}{{dt}} = f\left( {z\left( t \right),t} \right),\;z\left( 0 \right) = {z_0}

</math> (3)

We apply representations based on Lagrange interpolation polynomials to generate the NLP formulation, because the polynomial coefficients and the profiles have the same variable bounds. Here we select <math>K+1 </math> interpolation points in element i and represent the state in a given element <math>i</math> as

<math>

\begin{array}{l}
\left. {\begin{array}{*{20}{c}}
{t = {t_{i - 1}} + {h_i} \cdot \tau ,}\\
{{z^K}\left( t \right) = \sum\limits_{j = 0}^K {{l_j}\left( \tau \right) \cdot {z_{ij}},} }
\end{array}} \right\}t \in \left[ {{t_{i - 1}},{t_i}} \right],\tau \in \left[ {0,1} \right],\\
where\;{l_j}\left( \tau \right) = \prod\limits_{k = 0, \ne j}^K {\frac{{\left( {\tau - {\tau _k}} \right)}}{{\left( {{\tau _j} - {\tau _k}} \right)}}} ,
\end{array}

</math> （4）

<math> {\tau _0} = 0,\;{\tau _i} < {\tau _{i + 1}},\;j = 0,...,K - 1 </math>， and hi is the length of element <math>i</math>. This polynomial representation has the desirable property that <math>{z^K}({t_{ij}}) = {z_{ij}}</math>, where <math>{t_{ij}} = {t_{i - 1}} + {\tau _j}{h_j}</math>.

We use a Lagrange polynomial with K interpolation points to represent the time derivative of the state. This leads to the Runge–Kutta basis representation for the differential state:

<math>

{z^K}\left( t \right) = {z_{i - 1}} + {h_i} \cdot \sum\limits_{j = 1}^K {{\Omega _j}\left( \tau \right)} \cdot {\dot z_{ij}}

</math> (5)

where <math> {z_{i - 1}} </math> is a coefficient that represents the differential state at the beginning of element <math>i</math>, <math> {\dot z_{ij}} </math>represents the time derivative <math>\frac{{d{z^K}({t_{ij}})}}{{d\tau }} </math>, and <math> {\Omega _j}(\tau ) </math> is a polynomial of order K satisfying

<math>

{\Omega _j}\left( \tau \right) = \int_0^\tau {{l_j}(\tau ')} d\tau ',t \in \left[ {{t_{i - 1}},{t_i}} \right],\tau \in \left[ {0,1} \right]

</math> (6)

We substitute the polynomial into equation (1) to calculate the polynomial coefficients, which is an approximation of the DAE. This results in the following collocation equations.

<math>

{z^K}\left( t \right) = {z_{i - 1}} + {h_i} \cdot \sum\limits_{j = 1}^K {{\Omega _j}\left( \tau \right)} \cdot {\dot z_{ij}}

</math> (7)

with <math> {z^k}({t_i} - 1) </math> calculated separately. For the polynomial representations (4) and (5), we normalize time over the element, write the state profile as a function of τ , and apply <math> \frac{{d{z^K}}}{{d\tau }} = {h_i}\frac{{d{z^K}}}{{dt}} </math> easily. For the Lagrange polynomial (4), the collocation equations become

<math>

\sum\limits_{j = 0}^K {{z_{ij}} \cdot \frac{{d{l_j}\left( {{\tau _k}} \right)}}{{d\tau }}} = {h_i} \cdot f\left( {{z_{ik}},{t_{ik}}} \right),\;k = 1,...,K

</math> (8)

while the collocation equations for the Runge–Kutta basis are given by

<math>

{\dot z_{ik}} = f\left( {{z_{ik}},{t_{ik}}} \right)

</math> (9)

<math>

{z_{ik}} = {z_{i - 1}} + {h_i} \cdot \sum\limits_{j = 1}^K {{\Omega _j}\left( \tau \right)} \cdot {\dot z_{ij}},\;k = 1,...,K

</math> (10)

with <math> {z_i} - 1 </math>determined from the previous element <math> i-1 </math> or from the initial condition on the ODE.

==Example==

An example is given here to demonstrate the application of the collocation method.

A differential equation is given as follows:

<math>

\frac{{dz}}{{dt}} = {z^2} - 2 \cdot z + 1,\;z\left( 0 \right) = - 3

</math> (11)

With t \in \left[ {0,1} \right], The analytic solution of this differential equation is <math>z\left( t \right) = \frac{{4 \cdot t - 3}}{{4 \cdot t + 1}}</math>.

Lagrange interpolation and collocation method is applied to this differential equation respectively. And the number of collocation points in each finite element is 3. The number of finite elements is N , and the length of the finite element is 1/N . The following equations is given then:

<math>

\sum\limits_{j = 0}^3 {{z_{ij}}\frac{{d{l_j}\left( {{\tau _k}} \right)}}{{d\tau }}} = h\left( {z_{ik}^2 - 2 \cdot {z_{ik}} + 1} \right),\;k = 1,...,3,\;i = 1,...,N

</math> (12)

<math>

{z_{i + 1,0}} = \sum\limits_{j = 0}^0 {{l_j}\left( 1 \right)} \cdot {z_{ij}},\;i = 1,...,N - 1

</math> (13)

<math>

{z_f} = \sum\limits_{j = 0}^K {{l_j}\left( 1 \right)} \cdot {z_{Nj}},\;{z_{1,0}} = -3

</math> (14)

With Radau collocation method, <math> {\tau _0} = 0 </math>, <math> {\tau _1} = {\rm{0.155051}}</math>, <math> {\tau _2} = {\rm{0.644949}}</math> and <math> {\tau _3} = 1 </math> can be obtained. The collocation equations are given as follows:

<math>

\sum\limits_{j = 0}^3 {{z_j}\frac{{d{l_j}\left( {{\tau _k}} \right)}}{{d\tau }}} = \left( {z_k^2 - 2 \cdot {z_k} + 1} \right),\;k = 1,...,3

</math> (14)

which can be formulated as:

<math>

\begin{array}{l}
{z_0} \cdot \left( { - 30 \cdot \tau _k^2 + 36 \cdot {\tau _k} - 9} \right) + {z_1} \cdot \left( {{\rm{46.7423}} \cdot \tau _k^2 - {\rm{51.2592}} \cdot {\tau _k} + {\rm{10.0488}}} \right)\\
+ {z_3} \cdot \left( { - {\rm{26.7423}} \cdot \tau _k^2 + {\rm{20.5925}} \cdot {\tau _k} - {\rm{ 1.38214}}} \right) + {z_3} \cdot \left( {10 \cdot \tau _k^2 - \frac{{16}}{3} \cdot {\tau _k} + \frac{1}{3}} \right)\\
= \left( {z_k^2 - 2 \cdot {z_k} + 1} \right),\;k = 1,...,3
\end{array}

</math> (15)

[[File:shy 345 wiki fig 03.png]]

Figure 3. Comparison of Radau collocation solution with exact solution

The results are given as following by solving the above equations:

<math>

\left\{ {\begin{array}{*{20}{c}}
{{z_0} = - 3}\\
{{z_1} = - {\rm{1.65701}}}\\
{{z_2} = {\rm{0.032053}}}\\
{{z_3} = {\rm{0.207272}}}
\end{array}} \right.

</math> (16)

As shown in Figure 3.2 the error <math>\left\| {z\left( 1 \right) - {z^K}\left( 1 \right)} \right\|</math>， is less than <math>10^-6</math> for <math>N=5</math> and converges with <math>O(h^5)</math>, which is consistent with the expected order <math>2K-1</math>.

(This example follows the work of Biegler and can be found in P293 of “ Nonlinear Programmng”.)

==Conclusion==

In this work, we mainly discussed simultaneous collocation approach for dynamic optimization problems, which formulated the differential equations to a set of algebraic equations. These direct transcription formulations depended on fully discretizing of the differential algebraic equations (DAE), which enabled us solve the simultaneous optimization problem without relying on embedded DAE solvers. Because of this simultaneous formulation, we got the exact first and second order derivatives through the optimization modeling system, and both structure and sparsity can be exploited.

==References==

1. Biegler, Lorenz T. Nonlinear programming: concepts, algorithms, and applications to chemical processes. Vol. 10. SIAM, 2010.

2. Chu, Yunfei, and Fengqi You. "Integration of scheduling and control with online closed-loop implementation: Fast computational strategy and large-scale global optimization algorithm." Computers & Chemical Engineering 47 (2012): 248-268.

3. http://en.wikipedia.org/wiki/Dynamic_programming

4. http://en.wikipedia.org/wiki/Differential_algebraic_equation

5. http://numero.cheme.cmu.edu/uploads/dynopt.pdf

File:Tsp-graph.png

2024-12-11T19:35:50Z

Wc593:

Tsp-graph

Main Page

2024-12-05T23:35:27Z

Wc593: Moved sections to balance column length

{| id="mp-topbanner" style="width:100%; background:#f6f6f6; margin-top:1.2em; border:1px solid #ddd;"
| style="width:61%; color:#000;" |
{| style="width:100%; border:none; background:none;"
| style="text-align:center; white-space:nowrap; color:#000;" |
<div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">Welcome to the Cornell University Computational Optimization Open Textbook</div>

This electronic textbook is a student-contributed open-source text covering a variety of topics on process optimization. 
'''If you have any comments or suggestions on this open textbook, please contact [https://www.engineering.cornell.edu/faculty-directory/fengqi-you Professor Fengqi You].'''
|}
|}

{| id="mp-upper" style="width: 100%; margin:6px 0 0 0; background:none; border-spacing: 0px;"
| class="MainPageBG" style="width:50%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |
{| id="mp-left" style="width:100%; vertical-align:top; background:#f5fffa;"
! style="padding:2px;" | <h2 id="mp-tfa-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Linear Programming (LP)</h2>
|-
| style="color:#000;" | <div id="mp-tfa" style="padding:2px 5px 5px 15px">
<li>[[Duality]]</li>
<li>[[Simplex algorithm]]</li>
<li>[[Computational complexity]]</li>
<li>[[Network flow problem]]</li>
<li>[[Interior-point method for LP]]</li>
<li>[[Optimization with absolute values]]</li>
<li>[[Matrix game (LP for game theory)]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">NonLinear Programming (NLP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Line search methods]]</li>
<li>[[Trust-region methods]]</li>
<li>[[Interior-point method for NLP]]</li>
<li>[[Conjugate gradient methods]]</li>
<li>[[Quasi-Newton methods]]</li>
<li>[[Quadratic programming]]</li>
<li>[[Sequential quadratic programming]]</li>
<li>[[Subgradient optimization]]</li>
<li>[[Mathematical programming with equilibrium constraints]]</li>
<li>[[Dynamic optimization]]</li>
<li>[[Geometric programming]]</li>
<li>[[Nondifferentiable Optimization]]</li>
<li>[[Evolutionary multimodal optimization]]</li>
<li>[[Stackelberg leadership model]]</li>
<li>[[Quadratic constrained quadratic programming]]</li>
<li>[[Derivative free optimization]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Deterministic Global Optimization</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Exponential transformation]]</li>
<li>[[Logarithmic transformation]]</li>
<li>[[McCormick envelopes]]</li>
<li>[[Piecewise linear approximation]]</li>
<li>[[Spatial branch and bound method]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Dynamic Programming</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Markov decision process]]</li>
<li>[[Bellman equation]]</li>
<li>[[Eight step procedures]]</li>
<li>[[Stochastic dynamic programming]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Traditional Applications</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Facility location problem]]</li>
<li>[[Traveling salesman problem]]</li>
<li>[[Set covering problem]]</li>
<li>[[Quadratic assignment problem]]</li>
<li>[[Job shop scheduling]]</li>
<li>[[Newsvendor problem]]</li>
<li>[[Unit commitment problem]]</li>
<li>[[Portfolio optimization]]</li>
</div>

|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;"> Emerging Applications</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Wing shape optimization]]</li>
<li>[[Optimization in game theory]]</li>
<li>[[Quantum computing for optimization]]</li>
</div>

|}

| style="border:1px solid transparent;" |
| class="MainPageBG" style="width:50%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|
{| id="mp-right" style="width:100%; vertical-align:top; background:#f5faff;"
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer Linear Programming (MILP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-otd">
<li>[[Mixed-integer cuts]]</li>
<li>[[Disjunctive inequalities]]</li>
<li>[[Lagrangean duality]]</li>
<li>[[Column generation algorithms]]</li>
<li>[[Heuristic algorithms]]</li>
<li>[[Branch and cut]]</li>
<li>[[Local branching]]</li></div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer NonLinear Programming (MINLP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-otd">
<li>[[Signomial problems]]</li>
<li>[[Mixed-integer linear fractional programming (MILFP)]]</li>
<li>[[Convex generalized disjunctive programming (GDP)]]</li>
<li>[[Nonconvex generalized disjunctive programming (GDP)]]</li>
<li>[[Branch and bound (BB) for MINLP]]</li>
<li>[[Branch and cut for MINLP]]</li>
<li>[[Generalized Benders decomposition (GBD)]]</li>
<li>[[Outer-approximation (OA)]]</li>
<li>[[Extended cutting plane (ECP)]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization under Uncertainty</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Stochastic programming]]</li>
<li>[[Chance-constraint method]]</li>
<li>[[Fuzzy programming]]</li>
<li>[[Classical robust optimization]]</li>
<li>[[Adaptive robust optimization]]</li>
<li>[[Data driven robust optimization]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization for Machine Learning and Data Analytics</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Stochastic gradient descent]]</li>
<li>[[Momentum]]</li>
<li>[[AdaGrad]]</li>
<li>[[RMSProp]]</li>
<li>[[Adam]]</li>
<li>[[Frank-Wolfe]]</li>
<li>[[Sparse Reconstruction with Compressed Sensing]]</li>
<li>[[Adadelta]]</li>
<li>[[Adafactor]]</li>
<li>[[AdamW]]</li>
<li>[[Adamax]]</li>
<li>[[FTRL algorithm]]</li>
<li>[[Lion algorithm]]</li>
<li>[[LossScaleOptimizer]]</li>
<li>[[Nadam]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Black-box Optimization</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Bayesian optimization]]</li>
<li>[[Genetic algorithm]]</li>
<li>[[Simulated annealing]]</li>
<li>[[Particle swarm optimization]]</li>
<li>[[Differential evolution]]</li>
</div>
|}
|}

== Sponsor ==
[[File:Peese-logo.jpg|Cornell Prof. Fengqi You Research Group |link=https://www.peese.org]]

</noinclude>__NOTOC____NOEDITSECTION__

Main Page

2024-12-05T20:24:28Z

Wc593: Added Fall 2024 new topics

{| id="mp-topbanner" style="width:100%; background:#f6f6f6; margin-top:1.2em; border:1px solid #ddd;"
| style="width:61%; color:#000;" |
{| style="width:100%; border:none; background:none;"
| style="text-align:center; white-space:nowrap; color:#000;" |
<div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">Welcome to the Cornell University Computational Optimization Open Textbook</div>

This electronic textbook is a student-contributed open-source text covering a variety of topics on process optimization. 
'''If you have any comments or suggestions on this open textbook, please contact [https://www.engineering.cornell.edu/faculty-directory/fengqi-you Professor Fengqi You].'''
|}
|}

{| id="mp-upper" style="width: 100%; margin:6px 0 0 0; background:none; border-spacing: 0px;"
| class="MainPageBG" style="width:50%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |
{| id="mp-left" style="width:100%; vertical-align:top; background:#f5fffa;"
! style="padding:2px;" | <h2 id="mp-tfa-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Linear Programming (LP)</h2>
|-
| style="color:#000;" | <div id="mp-tfa" style="padding:2px 5px 5px 15px">
<li>[[Duality]]</li>
<li>[[Simplex algorithm]]</li>
<li>[[Computational complexity]]</li>
<li>[[Network flow problem]]</li>
<li>[[Interior-point method for LP]]</li>
<li>[[Optimization with absolute values]]</li>
<li>[[Matrix game (LP for game theory)]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">NonLinear Programming (NLP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Line search methods]]</li>
<li>[[Trust-region methods]]</li>
<li>[[Interior-point method for NLP]]</li>
<li>[[Conjugate gradient methods]]</li>
<li>[[Quasi-Newton methods]]</li>
<li>[[Quadratic programming]]</li>
<li>[[Sequential quadratic programming]]</li>
<li>[[Subgradient optimization]]</li>
<li>[[Mathematical programming with equilibrium constraints]]</li>
<li>[[Dynamic optimization]]</li>
<li>[[Geometric programming]]</li>
<li>[[Nondifferentiable Optimization]]</li>
<li>[[Evolutionary multimodal optimization]]</li>
<li>[[Stackelberg leadership model]]</li>
<li>[[Quadratic constrained quadratic programming]]</li>
<li>[[Derivative free optimization]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Deterministic Global Optimization</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Exponential transformation]]</li>
<li>[[Logarithmic transformation]]</li>
<li>[[McCormick envelopes]]</li>
<li>[[Piecewise linear approximation]]</li>
<li>[[Spatial branch and bound method]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Dynamic Programming</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Markov decision process]]</li>
<li>[[Bellman equation]]</li>
<li>[[Eight step procedures]]</li>
<li>[[Stochastic dynamic programming]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-dyk-h2" style="margin:3px; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">Traditional Applications</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Facility location problem]]</li>
<li>[[Traveling salesman problem]]</li>
<li>[[Set covering problem]]</li>
<li>[[Quadratic assignment problem]]</li>
<li>[[Job shop scheduling]]</li>
<li>[[Newsvendor problem]]</li>
<li>[[Unit commitment problem]]</li>
<li>[[Portfolio optimization]]</li>
</div>

|}

| style="border:1px solid transparent;" |
| class="MainPageBG" style="width:50%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|
{| id="mp-right" style="width:100%; vertical-align:top; background:#f5faff;"
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer Linear Programming (MILP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-otd">
<li>[[Mixed-integer cuts]]</li>
<li>[[Disjunctive inequalities]]</li>
<li>[[Lagrangean duality]]</li>
<li>[[Column generation algorithms]]</li>
<li>[[Heuristic algorithms]]</li>
<li>[[Branch and cut]]</li>
<li>[[Local branching]]</li></div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Mixed-Integer NonLinear Programming (MINLP)</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-otd">
<li>[[Signomial problems]]</li>
<li>[[Mixed-integer linear fractional programming (MILFP)]]</li>
<li>[[Convex generalized disjunctive programming (GDP)]]</li>
<li>[[Nonconvex generalized disjunctive programming (GDP)]]</li>
<li>[[Branch and bound (BB) for MINLP]]</li>
<li>[[Branch and cut for MINLP]]</li>
<li>[[Generalized Benders decomposition (GBD)]]</li>
<li>[[Outer-approximation (OA)]]</li>
<li>[[Extended cutting plane (ECP)]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization under Uncertainty</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Stochastic programming]]</li>
<li>[[Chance-constraint method]]</li>
<li>[[Fuzzy programming]]</li>
<li>[[Classical robust optimization]]</li>
<li>[[Adaptive robust optimization]]</li>
<li>[[Data driven robust optimization]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Optimization for Machine Learning and Data Analytics</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Stochastic gradient descent]]</li>
<li>[[Momentum]]</li>
<li>[[AdaGrad]]</li>
<li>[[RMSProp]]</li>
<li>[[Adam]]</li>
<li>[[Frank-Wolfe]]</li>
<li>[[Sparse Reconstruction with Compressed Sensing]]</li>
<li>[[Adadelta]]</li>
<li>[[Adafactor]]</li>
<li>[[AdamW]]</li>
<li>[[Adamax]]</li>
<li>[[FTRL algorithm]]</li>
<li>[[Lion algorithm]]</li>
<li>[[LossScaleOptimizer]]</li>
<li>[[Nadam]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Black-box Optimization</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Bayesian optimization]]</li>
<li>[[Genetic algorithm]]</li>
<li>[[Simulated annealing]]</li>
<li>[[Particle swarm optimization]]</li>
<li>[[Differential evolution]]</li>
</div>
|-
! style="padding:2px" | <h2 id="mp-otd-h2" style="margin:3px; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">Emerging Applications</h2>
|-
| style="color:#000;padding:2px 5px 5px 15px" | <div id="mp-dyk">
<li>[[Wing shape optimization]]</li>
<li>[[Optimization in game theory]]</li>
<li>[[Quantum computing for optimization]]</li>
</div>
|}
|}

== Sponsor ==
[[File:Peese-logo.jpg|Cornell Prof. Fengqi You Research Group |link=https://www.peese.org]]

</noinclude>__NOTOC____NOEDITSECTION__

Adam

2020-12-21T11:43:09Z

Wc593:

Author: Nicholas Kincaid (ChemE 6800 Fall 2020)

== Introduction ==
Adam <ref name="adam"> Kingma, Diederik P., and Jimmy Lei Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015, pp. 1–15.</ref> is a variant of gradient descent that has become widely popular in the machine learning community. Presented in 2015, the Adam algorithm is often recommended as the default algorithm for training neural networks as it has shown improved performance over other variants of gradient descent algorithms for a wide range of problems. Adam's name is derived from adaptive moment estimation because uses estimates of the first and second moments of the gradient to perform updates, which can be seen as incorporating gradient descent with momentum (the first-order moment) and [https://optimization.cbe.cornell.edu/index.php?title=RMSProp RMSProp] algorithm<ref>Tieleman, Tijmen, and Hinton, Geoffrey. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 2012.</ref> (the second-order moment).

== Background ==
=== Batch Gradient Descent ===
In standard batch gradient descent, the parameters, <math>\theta</math>, of the objective function <math>f(\theta)</math>, are updated based on the gradient of <math>f</math> with respect to
<math>\theta</math> for the entire training dataset, as

<math> g_t =\nabla_{\theta_{t-1}} f \big(\theta_{t-1} \big) </math> 
<math> \theta_t = \theta_{t-1} - \alpha g_t , </math> 

where <math>\alpha</math> is defined as the learning rate and is a hyper-parameter of the optimization algorithm, and <math>t</math> is the iteration number. Key challenges of the standard gradient descent method are the tendency to get stuck in local minima and/or saddle points of the objective function, as well as choosing a proper learning rate, <math>\alpha</math>, which can lead to poor convergence.<ref>Ruder, Sebastian. An Overview of Gradient Descent Optimization Algorithms, 2016, pp. 1–14, http://arxiv.org/abs/1609.04747.</ref>

=== Stochastic Gradient Descent ===
Another variant of gradient descent is [https://optimization.cbe.cornell.edu/index.php?title=Stochastic_gradient_descent stochastic gradient descent (SGD)], the gradient is computed and parameters are updated as in equation 1, but for each training sample in the training set.
=== Mini-Batch Gradient Descent ===
In between batch gradient descent and stochastic gradient descent, mini-batch gradient descent computes parameters updates on the gradient computed from a subset of the training set, where the size of the subset is often referred to as the batch size.

== Adam Algorithm ==
The Adam algorithm first computes the gradient, <math>g_t</math> of the objective function with respect to the parameters <math>\theta</math>, but then computes and stores first and second order moments of the gradient, <math>m_t</math> and <math>v_t</math>
respectively, as

<math> m_t = \beta_1 \cdot m_{t-1} + (1-\beta_1) \cdot g_t </math> 
<math> v_t = \beta_2 \cdot v_{t-1} + (1-\beta_2) \cdot g_t^2, </math> 

where <math>\beta_1</math> and <math>\beta_2</math> are hyper-parameters that are <math>\in [0,1]</math>. These parameters can seen as exponential decay rates of the estimated moments, as the previous value is successively multiplied by the value less than 1 in each iteration. The authors of the original paper suggest values <math>\beta_1 = 0.9</math> and <math>\beta_2 = 0.999</math>. In the current notation, the first iteration of the algorithm is at <math>t=1</math> and both, <math>m_0</math> and <math>v_0</math> are initialized to zero. Since both moments are initialized to zero, at early time steps, these values are biased towards zero. To counter this, the authors proposed a corrected update to <math>m_t</math> and <math>v_t</math> as

<math> \hat{m}_t = m_t / (1-\beta_1 ^t) </math> 
<math> \hat{v}_t = v_t / (1-\beta_2 ^t). </math> 
Finally, the parameter update is computed as

<math> \theta_t = \theta_{t-1} - \alpha \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon), </math> 

where <math>\epsilon</math> is a small constant for stability. The authors recommend a value of <math>\epsilon=10^{-8}</math>.

== Numerical Example ==

[[File:Contour.png|thumb|Contour plot of the loss function showing the trajectory of Adam algorithm from the initial point]]

[[File:Model fit .png|thumb|Plot showing original data points and resulting model fit from the Adam algorithm]]

To illustrate how updates occur in the Adam algorithm, consider a linear, least-squares regression problem formulation. The table below shows a sample data-set of student exam grades and the number of hours spent studying for the exam. The goal of this example will be to generate a linear model to predict exam grades as a function of time spent studying.

{| class="wikitable"
|-
| Hours Studying || 9.0 || 4.9 || 1.6 || 1.9 || 7.9 || 2.0 || 11.5 || 3.9 || 1.1 || 1.6 || 5.1 || 8.2 || 7.3 || 10.4 || 11.2
|-
| Exam Grad || 88.0 || 72.3 || 66.5 || 65.1 || 79.5 || 60.8 || 94.3, || 66.7 || 65.4 || 63.8 || 68.4 || 82.5 || 75.9 || 87.8 || 85.2
|}

The hypothesized model function will be

<math>f_\theta(x) = \theta_0 + \theta_1 x.</math>

The cost function is defined as

<math> J({\theta}) = \frac{1}{2}\sum_i^n \big(f_\theta(x_i) - y_i \big)^2, </math>

Where the <math>1/2</math> coefficient is used only to make the derivatives cleaner. The optimization problem can then be formulated as trying to find the values of <math>\theta</math> that minimize the squared residuals of <math>f_\theta(x)</math> and <math>y</math>.

<math> \mathrm{argmin}_{\theta} \quad \frac{1}{n}\sum_{i}^n \big(f_\theta(x_i) - y_i \big) ^2 </math>

For simplicity, parameters will be updated after every data point i.e. a batch size of 1. For a single data point the derivatives of the cost function with respect to <math>\theta_0</math> and <math>\theta_1</math> are

<math> \frac{\partial J(\theta)}{\partial \theta_0} = \big(f_\theta(x) - y \big) </math> 
<math> \frac{\partial J(\theta)}{\partial \theta_1} = \big(f_\theta(x) - y \big) x </math>

The initial values of <math>{\theta}</math> will be set to [50, 1] and The learning rate, <math>\alpha</math>, is set to 0.1 and the suggested parameters for <math>\beta_1</math>, <math>\beta_2</math>, and <math>\epsilon</math> are used. With the first data sample of <math> (x,y)=[8.98, 88.01]</math>, the computed gradients are

<math> \frac{\partial J(\theta)}{\partial \theta_0} = \big((50 + 1\cdot 9 - 88.01 \big) = -29.0 </math> 
<math> \frac{\partial J(\theta)}{\partial \theta_1} = \big((50 + 1\cdot 9 - 88.01 \big)\cdot 9.0 = -261 </math> 

With <math>m_0</math> and <math>v_0</math> being initialized to zero, the calculations of <math>m_1</math> and <math>v_1</math> are

<math> m_1 = 0.9 \cdot 0 + (1-0.9) \cdot \begin{bmatrix} -29\\ -261 \end{bmatrix} = \begin{bmatrix} -2.9\\ -26.1\end{bmatrix} </math> 
<math> v_1 = 0.999\cdot 0 + (1-0.999) \cdot \begin{bmatrix} -29^2\\-261^2 \end{bmatrix} = \begin{bmatrix} 0.84\\ 68.2\end{bmatrix} , </math> 

The bias-corrected terms are computed as

<math> \hat{m}_1 = \begin{bmatrix} -2.9\\ -26.1\end{bmatrix} \frac{1}{ (1-0.9^1)} = \begin{bmatrix} -29.0\\-261.1\end{bmatrix}</math> 
<math> \hat{v}_1 = \begin{bmatrix} 0.84\\ 68.2\end{bmatrix} \frac{1} {(1-0.999^1)} = \begin{bmatrix} 851.5\\68168\end{bmatrix}. </math> 

Finally, the parameter update is

<math> \theta_0 = 50 - 0.1 \cdot -29 / (\sqrt{851.5} + 10^{-8}) = 50.1 </math> 
<math> \theta_1 = 1 - 0.1 \cdot -261 / (\sqrt{68168} + 10^{-8}) = 1.1 </math> 

This procedure is repeated until the parameters have converged, giving <math>\theta</math> values of <math>[58.98, 2.72]</math>. The figures to the right show the trajectory of the Adam algorithm over a contour plot of the objective function and the resulting model fit. It should be noted that the stochastic gradient descent algorithm with a learning rate of 0.1 diverges and with a rate of 0.01, SGD oscillates around the global minimum due to the large magnitudes of the gradient in the <math>\theta_1</math> direction.

== Applications ==
[[File:Adam training.png|thumb|Comparison of training a multilayer neural network on MNIST images for different gradient descent algorithms published in the original Adam paper (Kingma, 2015)<ref name="adam" />.]]

The Adam optimization algorithm has been widely used in machine learning applications to train model parameters. When used with backpropagation, the Adam algorithm has been shown to be a very robust and efficient method for training artificial neural networks and is capable of working well with a variety of structures and applications. In their original paper, the authors present three different training examples, logistic regression, multi-layer neural networks for classification of MNIST images, and a convolutional neural network (CNN). The training results from the original Adam paper showing the objective function cost vs. the iteration over the entire data set for the multi-layer neural network is shown to the right.

== Variants of Adam ==
=== AdaMax ===
AdaMax<ref name="adam" /> is a variant of the Adam algorithm proposed in the original Adam paper that uses an exponentially weighted infinity norm instead of the second-order moment estimate. The weighted infinity norm updated <math>u_t</math>, is computed as

<math> u_t = \max(\beta_2 \cdot u_{t-1}, |g_t|). </math>

The parameter update then becomes

<math> \theta_t = \theta_{t-1} - (\alpha / (1-\beta_1^t)) \cdot m_t / u_t. </math>

=== Nadam ===
The Nadam algorithm<ref>Dozat, Timothy. Incorporating Nesterov Momentum into Adam. ICLR Workshop, no. 1, 2016, pp. 2013–16. </ref> was proposed in 2016 and incorporates the Nesterov Accelerate Gradient (NAG)<ref>Nesterov, Yuri. A method of solving a convex programming problem with convergence rate O(1/k^2). In Soviet Mathematics Doklady, 1983, pp. 372-376.</ref>, a popular momentum like SGD variation, into the first-order moment term.

== Conclusion ==
Adam is a variant of the gradient descent algorithm that has been widely adopted in the machine learning community. Adam can be seen as the combination of two other variants of gradient descent, SGD with momentum and RMSProp. Adam uses estimations of the first and second-order moments of the gradient to adapt the parameter update. These moment estimations are computed via moving averages,<math>m_t</math> and <math>v_t</math>, of the gradient and the squared gradient respectfully. In a variety of neural network training applications, Adam has shown increased convergence and robustness over other gradient descent algorithms and is often recommended as the default optimizer for training.<ref> "Neural Networks Part 3: Learning and Evaluation," CS231n: Convolutional Neural Networks for Visual Recognition, Stanford Unversity, 2020</ref>

== References ==
<references/>

Adam

2020-12-21T11:42:55Z

Wc593:

Authors: Nicholas Kincaid (ChemE 6800 Fall 2020)

== Introduction ==
Adam <ref name="adam"> Kingma, Diederik P., and Jimmy Lei Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015, pp. 1–15.</ref> is a variant of gradient descent that has become widely popular in the machine learning community. Presented in 2015, the Adam algorithm is often recommended as the default algorithm for training neural networks as it has shown improved performance over other variants of gradient descent algorithms for a wide range of problems. Adam's name is derived from adaptive moment estimation because uses estimates of the first and second moments of the gradient to perform updates, which can be seen as incorporating gradient descent with momentum (the first-order moment) and [https://optimization.cbe.cornell.edu/index.php?title=RMSProp RMSProp] algorithm<ref>Tieleman, Tijmen, and Hinton, Geoffrey. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 2012.</ref> (the second-order moment).

== Background ==
=== Batch Gradient Descent ===
In standard batch gradient descent, the parameters, <math>\theta</math>, of the objective function <math>f(\theta)</math>, are updated based on the gradient of <math>f</math> with respect to
<math>\theta</math> for the entire training dataset, as

<math> g_t =\nabla_{\theta_{t-1}} f \big(\theta_{t-1} \big) </math> 
<math> \theta_t = \theta_{t-1} - \alpha g_t , </math> 

where <math>\alpha</math> is defined as the learning rate and is a hyper-parameter of the optimization algorithm, and <math>t</math> is the iteration number. Key challenges of the standard gradient descent method are the tendency to get stuck in local minima and/or saddle points of the objective function, as well as choosing a proper learning rate, <math>\alpha</math>, which can lead to poor convergence.<ref>Ruder, Sebastian. An Overview of Gradient Descent Optimization Algorithms, 2016, pp. 1–14, http://arxiv.org/abs/1609.04747.</ref>

=== Stochastic Gradient Descent ===
Another variant of gradient descent is [https://optimization.cbe.cornell.edu/index.php?title=Stochastic_gradient_descent stochastic gradient descent (SGD)], the gradient is computed and parameters are updated as in equation 1, but for each training sample in the training set.
=== Mini-Batch Gradient Descent ===
In between batch gradient descent and stochastic gradient descent, mini-batch gradient descent computes parameters updates on the gradient computed from a subset of the training set, where the size of the subset is often referred to as the batch size.

== Adam Algorithm ==
The Adam algorithm first computes the gradient, <math>g_t</math> of the objective function with respect to the parameters <math>\theta</math>, but then computes and stores first and second order moments of the gradient, <math>m_t</math> and <math>v_t</math>
respectively, as

<math> m_t = \beta_1 \cdot m_{t-1} + (1-\beta_1) \cdot g_t </math> 
<math> v_t = \beta_2 \cdot v_{t-1} + (1-\beta_2) \cdot g_t^2, </math> 

where <math>\beta_1</math> and <math>\beta_2</math> are hyper-parameters that are <math>\in [0,1]</math>. These parameters can seen as exponential decay rates of the estimated moments, as the previous value is successively multiplied by the value less than 1 in each iteration. The authors of the original paper suggest values <math>\beta_1 = 0.9</math> and <math>\beta_2 = 0.999</math>. In the current notation, the first iteration of the algorithm is at <math>t=1</math> and both, <math>m_0</math> and <math>v_0</math> are initialized to zero. Since both moments are initialized to zero, at early time steps, these values are biased towards zero. To counter this, the authors proposed a corrected update to <math>m_t</math> and <math>v_t</math> as

<math> \hat{m}_t = m_t / (1-\beta_1 ^t) </math> 
<math> \hat{v}_t = v_t / (1-\beta_2 ^t). </math> 
Finally, the parameter update is computed as

<math> \theta_t = \theta_{t-1} - \alpha \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon), </math> 

where <math>\epsilon</math> is a small constant for stability. The authors recommend a value of <math>\epsilon=10^{-8}</math>.

== Numerical Example ==

[[File:Contour.png|thumb|Contour plot of the loss function showing the trajectory of Adam algorithm from the initial point]]

[[File:Model fit .png|thumb|Plot showing original data points and resulting model fit from the Adam algorithm]]

To illustrate how updates occur in the Adam algorithm, consider a linear, least-squares regression problem formulation. The table below shows a sample data-set of student exam grades and the number of hours spent studying for the exam. The goal of this example will be to generate a linear model to predict exam grades as a function of time spent studying.

{| class="wikitable"
|-
| Hours Studying || 9.0 || 4.9 || 1.6 || 1.9 || 7.9 || 2.0 || 11.5 || 3.9 || 1.1 || 1.6 || 5.1 || 8.2 || 7.3 || 10.4 || 11.2
|-
| Exam Grad || 88.0 || 72.3 || 66.5 || 65.1 || 79.5 || 60.8 || 94.3, || 66.7 || 65.4 || 63.8 || 68.4 || 82.5 || 75.9 || 87.8 || 85.2
|}

The hypothesized model function will be

<math>f_\theta(x) = \theta_0 + \theta_1 x.</math>

The cost function is defined as

<math> J({\theta}) = \frac{1}{2}\sum_i^n \big(f_\theta(x_i) - y_i \big)^2, </math>

Where the <math>1/2</math> coefficient is used only to make the derivatives cleaner. The optimization problem can then be formulated as trying to find the values of <math>\theta</math> that minimize the squared residuals of <math>f_\theta(x)</math> and <math>y</math>.

<math> \mathrm{argmin}_{\theta} \quad \frac{1}{n}\sum_{i}^n \big(f_\theta(x_i) - y_i \big) ^2 </math>

For simplicity, parameters will be updated after every data point i.e. a batch size of 1. For a single data point the derivatives of the cost function with respect to <math>\theta_0</math> and <math>\theta_1</math> are

<math> \frac{\partial J(\theta)}{\partial \theta_0} = \big(f_\theta(x) - y \big) </math> 
<math> \frac{\partial J(\theta)}{\partial \theta_1} = \big(f_\theta(x) - y \big) x </math>

The initial values of <math>{\theta}</math> will be set to [50, 1] and The learning rate, <math>\alpha</math>, is set to 0.1 and the suggested parameters for <math>\beta_1</math>, <math>\beta_2</math>, and <math>\epsilon</math> are used. With the first data sample of <math> (x,y)=[8.98, 88.01]</math>, the computed gradients are

<math> \frac{\partial J(\theta)}{\partial \theta_0} = \big((50 + 1\cdot 9 - 88.01 \big) = -29.0 </math> 
<math> \frac{\partial J(\theta)}{\partial \theta_1} = \big((50 + 1\cdot 9 - 88.01 \big)\cdot 9.0 = -261 </math> 

With <math>m_0</math> and <math>v_0</math> being initialized to zero, the calculations of <math>m_1</math> and <math>v_1</math> are

<math> m_1 = 0.9 \cdot 0 + (1-0.9) \cdot \begin{bmatrix} -29\\ -261 \end{bmatrix} = \begin{bmatrix} -2.9\\ -26.1\end{bmatrix} </math> 
<math> v_1 = 0.999\cdot 0 + (1-0.999) \cdot \begin{bmatrix} -29^2\\-261^2 \end{bmatrix} = \begin{bmatrix} 0.84\\ 68.2\end{bmatrix} , </math> 

The bias-corrected terms are computed as

<math> \hat{m}_1 = \begin{bmatrix} -2.9\\ -26.1\end{bmatrix} \frac{1}{ (1-0.9^1)} = \begin{bmatrix} -29.0\\-261.1\end{bmatrix}</math> 
<math> \hat{v}_1 = \begin{bmatrix} 0.84\\ 68.2\end{bmatrix} \frac{1} {(1-0.999^1)} = \begin{bmatrix} 851.5\\68168\end{bmatrix}. </math> 

Finally, the parameter update is

<math> \theta_0 = 50 - 0.1 \cdot -29 / (\sqrt{851.5} + 10^{-8}) = 50.1 </math> 
<math> \theta_1 = 1 - 0.1 \cdot -261 / (\sqrt{68168} + 10^{-8}) = 1.1 </math> 

This procedure is repeated until the parameters have converged, giving <math>\theta</math> values of <math>[58.98, 2.72]</math>. The figures to the right show the trajectory of the Adam algorithm over a contour plot of the objective function and the resulting model fit. It should be noted that the stochastic gradient descent algorithm with a learning rate of 0.1 diverges and with a rate of 0.01, SGD oscillates around the global minimum due to the large magnitudes of the gradient in the <math>\theta_1</math> direction.

== Applications ==
[[File:Adam training.png|thumb|Comparison of training a multilayer neural network on MNIST images for different gradient descent algorithms published in the original Adam paper (Kingma, 2015)<ref name="adam" />.]]

The Adam optimization algorithm has been widely used in machine learning applications to train model parameters. When used with backpropagation, the Adam algorithm has been shown to be a very robust and efficient method for training artificial neural networks and is capable of working well with a variety of structures and applications. In their original paper, the authors present three different training examples, logistic regression, multi-layer neural networks for classification of MNIST images, and a convolutional neural network (CNN). The training results from the original Adam paper showing the objective function cost vs. the iteration over the entire data set for the multi-layer neural network is shown to the right.

== Variants of Adam ==
=== AdaMax ===
AdaMax<ref name="adam" /> is a variant of the Adam algorithm proposed in the original Adam paper that uses an exponentially weighted infinity norm instead of the second-order moment estimate. The weighted infinity norm updated <math>u_t</math>, is computed as

<math> u_t = \max(\beta_2 \cdot u_{t-1}, |g_t|). </math>

The parameter update then becomes

<math> \theta_t = \theta_{t-1} - (\alpha / (1-\beta_1^t)) \cdot m_t / u_t. </math>

=== Nadam ===
The Nadam algorithm<ref>Dozat, Timothy. Incorporating Nesterov Momentum into Adam. ICLR Workshop, no. 1, 2016, pp. 2013–16. </ref> was proposed in 2016 and incorporates the Nesterov Accelerate Gradient (NAG)<ref>Nesterov, Yuri. A method of solving a convex programming problem with convergence rate O(1/k^2). In Soviet Mathematics Doklady, 1983, pp. 372-376.</ref>, a popular momentum like SGD variation, into the first-order moment term.

== Conclusion ==
Adam is a variant of the gradient descent algorithm that has been widely adopted in the machine learning community. Adam can be seen as the combination of two other variants of gradient descent, SGD with momentum and RMSProp. Adam uses estimations of the first and second-order moments of the gradient to adapt the parameter update. These moment estimations are computed via moving averages,<math>m_t</math> and <math>v_t</math>, of the gradient and the squared gradient respectfully. In a variety of neural network training applications, Adam has shown increased convergence and robustness over other gradient descent algorithms and is often recommended as the default optimizer for training.<ref> "Neural Networks Part 3: Learning and Evaluation," CS231n: Convolutional Neural Networks for Visual Recognition, Stanford Unversity, 2020</ref>

== References ==
<references/>

Stochastic gradient descent

2020-12-21T11:41:40Z

Wc593:

Authors: Jonathon Price, Alfred Wong, Tiancheng Yuan, Joshua Mathews, Taiwo Olorunniwo (SysEn 5800 Fall 2020)

== Introduction ==
'''Stochastic gradient descent''' (abbreviated as '''SGD''') is an iterative method often used for [https://en.wikipedia.org/wiki/Machine_learning machine learning], optimizing the [https://en.wikipedia.org/wiki/Gradient_descent gradient descent] during each search once a random weight vector is picked. The gradient descent is a strategy that searches through a large or infinite hypothesis space whenever 1) there are hypotheses continuously being parameterized and 2) the errors are differentiable based on the parameters. The problem with gradient descent is that [https://en.wikipedia.org/wiki/Convergence_(logic) converging] to a [https://en.wikipedia.org/wiki/Maxima_and_minima local minimum] takes extensive time and determining a global minimum is not guaranteed.<ref name=McGrawHill2003>Mitchell, T. M. (1997). Machine Learning (1st ed.). McGraw-Hill Education. Page 92. ISBN 0070428077.</ref> In SGD, the user initializes the weights and the process updates the weight vector using one data point<ref name="bishop" />. The gradient descent continuously updates it incrementally when an error calculation is completed to improve convergence.<ref name="Needell=">Needell, D., Srebro, N., & Ward, R. (2015, January). Stochastic gradient descent weighted sampling, and the randomized Kaczmarz algorithm. https://arxiv.org/pdf/1310.5715.pdf</ref> The method seeks to determine the steepest descent and it reduces the number of [https://en.wikipedia.org/wiki/Iteration iterations] and the time taken to search large quantities of data points. Over the recent years, the data sizes have increased immensely such that current processing capabilities are not enough.<ref name=Bottou1991>Bottou, L. (1991) Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91. https://leon.bottou.org/publications/pdf/nimes-1991.pdf</ref> Stochastic gradient descent is being used in [https://en.wikipedia.org/wiki/Neural_network neural networks] and decreases machine computation time while increasing complexity and performance for large-scale problems.<ref name=bottou2012>Bottou, L. (2012) Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, 421– 436. Springer.</ref>

== Theory ==
[[File:Gradient Descent Visualization.png|alt=Visualization of the gradient descent algorithm|thumb|Visualization of the gradient descent algorithm<ref name=":0">Lau, S., Gonzalez, J., Nolan, D. (2020) <nowiki>https://www.textbook.ds100.org/ch/11/gradient_stochastic.html</nowiki></ref>]]
SGD is a variation on gradient descent, also called batch gradient descent. As a review, gradient descent seeks to minimize an objective function <math>J(\theta)</math> by iteratively updating each parameter <math>\theta</math> by a small amount based on the negative gradient of a given data set. The steps for performing gradient descent are as follows:<blockquote>Step 1: Select a learning rate <math>\alpha</math>

Step 2: Select initial parameter values <math>\theta</math> as the starting point

Step 3: Update all parameters from the gradient of the training data set, i.e. compute <math>\theta_{i+1}=\theta_i-\alpha\times{\nabla_\theta}J(\theta)</math>

Step 4: Repeat Step 3 until a local minima is reached</blockquote>

Under batch gradient descent, the gradient, <math>{\nabla_\theta}J(\theta)</math>, is calculated at every step against a full [[wikipedia:Data_set|data set]]. When the training data is large, [[wikipedia:Computation|computation]] may be slow or require large amounts of [[wikipedia:Computer_memory#:~:text=In%20computing%2C%20memory%20refers%20to,or%20related%20computer%20hardware%20device.&text=Examples%20of%20non%2Dvolatile%20memory,storing%20firmware%20such%20as%20BIOS).|computer memory]].<ref name="bishop">Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Springer.</ref>
[[File:Visualization of stochastic gradient descent.png|alt=Visualization of the stochastic gradient descent algorithm|thumb|Visualization of the stochastic gradient descent algorithm<ref name=":0" />]]

===== Stochastic Gradient Descent Algorithm =====
SGD modifies the batch gradient descent [https://en.wikipedia.org/wiki/Algorithm algorithm] by calculating the gradient for only one training example at every iteration.<ref name=ruder>Ruder, S. (2020, March 20). An overview of gradient descent optimization algorithms. Sebastian Ruder. https://ruder.io/optimizing-gradient-descent/index.html#batchgradientdescent</ref> The steps for performing SGD are as follows: <blockquote>Step 1: Randomly shuffle the data set of size m

Step 2: Select a learning rate <math>\alpha</math>

Step 3: Select initial parameter values <math>\theta</math> as the starting point

Step 4: Update all parameters from the gradient of a single training example <math>x^j, y^j</math>, i.e. compute <math>\theta_{i+1}=\theta_i-\alpha\times{\nabla_\theta}J(\theta;x^j;y^j)</math>

Step 5: Repeat Step 4 until a local minimum is reached </blockquote>By calculating the gradient for one data set per iteration, SGD takes a less direct route towards the local minimum. However, SGD has the advantage of having the ability to [https://en.wikipedia.org/wiki/Increment_and_decrement_operators incrementally] update an objective function <math>J(\theta)</math> when new training data is available at minimum cost.

===== Learning Rate =====
The [https://en.wikipedia.org/wiki/Learning_rate learning rate] is used to calculate the step size at every iteration. Too large a learning rate and the step sizes may overstep too far past the optimum value. Too small a learning rate may require many iterations to reach a [https://en.wikipedia.org/wiki/Maxima_and_minima local minimum]. A good starting point for the learning rate is 0.1 and adjust as necessary.<ref>Srinivasan, A. (2019, September) Stochastic Gradient Descent — Clearly Explained. https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31</ref>
===== Mini-Batch Gradient Descent =====
A variation on stochastic gradient descent is the mini-batch gradient descent. In SGD, the gradient is computed on only one training example and may result in a large number of iterations required to converge on a local minimum. Mini-batch gradient descent offers a compromise between batch gradient descent and SGD by splitting the training data into smaller batches. The steps for performing mini-batch gradient descent are identical to SGD with one exception - when updating the parameters from the gradient, rather than calculating the gradient of a single training example, the gradient is calculated against a batch size of <math>n</math> training examples, i.e. compute <math>\theta_{i+1}=\theta_i-\alpha\times{\nabla_\theta}J(\theta;x^{j:j+n};y^{j:j+n})</math>

== Numerical Example ==
===== Data preparation =====
Consider a simple 2-D data set with only 6 data points (each point has <math>x_1, x_2</math>), and each data point have a label value <math>y</math> assigned to them.
===== Model overview =====
For the purpose of demonstrating the computation of the SGD process, simply employ a linear regression model: <math>y = w_1\ x_1 + w_2\ x_2 + b </math>, where <math>w_1</math> and <math>w_2</math> are weights and <math>b</math> is the constant term. In this case, the goal of this model is to find the best value for <math>w_1, w_2</math> and <math>b</math>, based on the datasets.
===== Definition of loss function =====
In this example, the loss function should be l2 norm square, that is <math>L = (\widehat{y} - y)^2 </math>.
===== Forward =====
<blockquote>'''Initial Weights:'''
The linear regression model starts by [https://en.wikipedia.org/wiki/Initialization_(programming) initializing] the weights <math>w_1, w_2</math> and setting the bias term at 0. In this case, initiate [<math>w_1, w_2</math>] = [-0.044, -0.042].

'''Dataset:'''

For this problem, the batch size is set to 1 and the entire dataset of [ <math>x_1</math>, <math>x_2</math>, <math>y</math>] is given by:
{| class="wikitable"
! <math>x_1</math> !! <math>x_2</math> !! <math>y</math>
|-
| 4 || 1 || 2
|-
| 2 || 8 || -14
|-
| 1 || 0 || 1
|-
| 3 || 2 || -1
|-
| 1 || 4 || -7
|-
| 6 || 7 || -8
|}

===== Gradient Computation and Parameter Update =====
The purpose of BP is to obtain the impact of the weights and bias terms for the entire model. The update of the model is entirely dependent on the gradient values. To minimize the loss during the process, the model needs to ensure the gradient is dissenting so that it could finally converge to a global optimal point. All the 3 partial differential equations are shown as:

<math>\omega_1^' = \omega_1 - \eta\ {\partial L\over\partial \omega_1} = \omega_1 - \eta\ {\partial L\over\partial \widehat{y}}\cdot {\partial \widehat{y}\over\partial \omega_1} = \omega_1 - \eta\ [2(\widehat{y} - y)\cdot x_1] </math>

<math>\omega_2^' = \omega_2 - \eta\ {\partial L\over\partial \omega_2} = \omega_2 - \eta\ {\partial L\over\partial \widehat{y}}\cdot {\partial \widehat{y}\over\partial \omega_2} = \omega_2 - \eta\ [2(\widehat{y} - y)\cdot x_2]</math>

<math>b^' = b - \eta\ {\partial L\over\partial b} = b - \eta\ {\partial L\over\partial \widehat{y}}\cdot {\partial \widehat{y}\over\partial b} = b - \eta\ [2(\widehat{y} - y)\cdot 1]</math>

Where the <math>\eta</math> stands for the learning rate and in this model, is set to be 0.05. To update each parameter, simply substitute the value of resulting <math>\widehat{y}</math>.

Use the first data point [<math>x_1, x_2</math>] = [4, 1] and the corresponding <math>y</math> being 2. The <math>\widehat{y}</math> the model gave should be -0.2. Now with <math>\widehat{y}</math> and <math>y</math> value, update the new parameters as [0.843, 0.179, 0.222] = [<math>w'_1, w'_2, b'</math>]. That marks the end of iteration 1.

Now, iteration 2 begins, with the next data point [2, 8] and the label -14. The estimation , <math>\widehat{y}</math> is now 3.3. With the new <math>\widehat{y}</math> and <math>y</math> value, once again, we update the weight as [-2.625, -13.696, 1.513]. And that marks the end of iteration 2.

Keep on updating the model through additional iterations to output [<math>w_1, w_2, b</math>] = [-19.021, -35.812, -1.232].

This is just a simple demonstration of the SGD process. In actual practice, more epochs can be utilized to run through the entire dataset enough times to ensure the best learning results based on the training dataset<ref name=":1">Lawrence, S., & Giles, C. L. (2000). Overfitting and neural networks: conjugate gradient and backpropagation. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, 1, 114–119. https://doi.org/10.1109/ijcnn.2000.857823</ref>. But learning overly specific with the training dataset could sometimes also expose the model to the risk of overfitting<ref name=":1" />. Therefore, tuning such parameters is quite tricky and often costs days or even weeks before finding the best results.

==Application==
SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. [[wikipedia:Deep_learning|Deep learning]] is a machine learning technique that teaches computers to do what comes naturally to humans. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Models are trained by using a large set of labeled data and neural network architectures that contain many layers. Neural networks make up the backbone of deep learning algorithms. A neural network that consists of more than three layers which would be inclusive of the inputs and the output can be considered a deep learning algorithm. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.

===Support Vector Machine===
SGD is a simple yet very efficient approach to fitting linear classifiers and regressors under convex functions such as (linear) [https://en.wikipedia.org/wiki/Support_vector_machine Support Vector Machines] (SVM). A support vector machine is a supervised machine learning model that uses classification algorithms for two-group classification problems. An SVM finds what is known as a separating hyperplane: a hyperplane (a line, in the two-dimensional case) which separates the two classes of points from one another. It is a fast and dependable classification algorithm that performs very well with a limited amount of data to analyze. However, because SVM is computationally costly, software applications often do not provide sufficient performance in order to meet time requirements for large amounts of data. To improve SVM scalability regarding the size of the data set, SGD algorithms are used as a simplified procedure for evaluating the gradient of a function.<ref name=lopes>Lopes, F.F.; Ferreira, J.C.; Fernandes, M.A.C. Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent. Electronics 2019, 8, 631.</ref>

===Logistic regression===
Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable. The objective of training a machine learning model is to minimize the loss or error between ground truths and predictions by changing the trainable parameters. Logistic regression has two phases: training, and testing. The system, specifically the weights w and b, is trained using stochastic gradient descent and the cross-entropy loss.

===Full Waveform Inversion (FWI)===
The Full Waveform Inversion (FWI) is a [https://en.wikipedia.org/wiki/Geophysical_imaging seismic imaging] process by drawing information from the physical parameters of samples. Companies use the process to produce high-resolution high velocity depictions of subsurface activities. SGD supports the process because it can identify the minima and the overall global minimum in less time as there are many local minimums.<ref name=witte>Witte, P., Louboutin, M., Lensink, K., Lange, M., Kukreja, N., Luporini, F., Gorman, G., Herrmann, F.J.; Full-waveform inversion, Part 3: Optimization. The Leading Edge ; 37 (2): 142–145. doi: https://doi.org/10.1190/tle37020142.1</ref>

==Conclusion==
SGD is an algorithm that seeks to find the steepest descent during each iteration. The process decreases the time it takes to search large data sets and determine local minima immensely. The SGD provides many applications in machine learning, geophysics, least mean squares (LMS), and other areas.

==References==

<references />

Adaptive robust optimization

2020-12-21T11:41:15Z

Wc593:

Author: Ralph Wang (ChemE 6800 Fall 2020)

== Introduction ==
Adaptive Robust Optimization (ARO), also known as adjustable robust optimization, models situations where decision makers make two types of decisions: here-and-now decisions that must be made immediately, and wait-and-see decisions that can be made at some point in the future.<ref>Yanikognlu, I., Gorissen, B. L., den Hertog, D. (2019) A Survey of Adjustable Robust Optimization. European Journal of Operational Research, (277)3:799-813.</ref> ARO improves on the robust optimization framework by accounting for any information the decision maker does not know now, but may learn before making future decisions. In the real-world, ARO is applicable whenever past decisions and new information together influence future decisions. Common applications include power systems control, inventory management, shift scheduling, and other resource allocation problems.<ref>B. Hu and L. Wu, "Robust SCUC Considering Continuous/Discrete Uncertainties and Quick-Start Units: A Two-Stage Robust Optimization With Mixed-Integer Recourse," in IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1407-1419, March 2016, doi: 10.1109/TPWRS.2015.2418158.</ref><ref>J. Warrington, C. Hohl, P. J. Goulart and M. Morari, "Rolling Unit Commitment and Dispatch With Multi-Stage Recourse Policies for Heterogeneous Devices," in IEEE Transactions on Power Systems, vol. 31, no. 1, pp. 187-197, Jan. 2016, doi: 10.1109/TPWRS.2015.2391233.</ref><ref>Chuen-Teck See, Melvyn Sim, (2010) Robust Approximation to Multiperiod Inventory Management. Operations Research 58(3):583-594.</ref><ref>Marcus Ang, Yun Fong Lim, Melvyn Sim, (2012) Robust Storage Assignment in Unit-Load Warehouses. Management Science 58(11):2114-2130.</ref><ref>Mattia, S., Rossi, F., Servilio, M., Smriglio, S. (2017). Staffing and Scheduling Flexible Call Centers by Two-Stage Robust Optimization. Omega 72:25-37.</ref><ref>Gong, J. and You, F. (2017), Optimal processing network design under uncertainty for producing fuels and value‐added bioproducts from microalgae: Two‐stage adaptive robust mixed integer fractional programming model and computationally efficient solution algorithm. AIChE J., 63: 582-600.</ref>

Compared to traditional robust optimization models, ARO gives less conservative and more realistic solutions, however, this improvement comes at the cost of computation time. Indeed, even the general linear ARO with linear uncertainty is proven computationally intractable.<ref>Ben-Tal, A., Goryashko, A., Guslitzer, E. et al. Adjustable robust solutions of uncertain linear programs. Math. Program., Ser. A 99, 351–376 (2004).</ref> However, researchers have developed a wide variety of solution and approximation methods for specific types of industrial ARO problems over the past 15 years and the field continues to grow rapidly.<ref>Ben-Tal, A., Goryashko, A., Guslitzer, E. et al. Adjustable robust solutions of uncertain linear programs. Math. Program., Ser. A 99, 351–376 (2004).</ref><ref>Zhao, L., & Zeng, B. (2012). An Exact Algorithm for Two-stage Robust Optimization with Mixed Integer Recourse Problems.</ref><ref>Chen, Bokan, "A new trilevel optimization algorithm for the two-stage robust unit commitment problem" (2013). Graduate Theses and Dissertations. 13065.</ref><ref>Shi, H. and You, F. (2016), A computational framework and solution algorithms for two‐stage adaptive robust scheduling of batch manufacturing processes under uncertainty. AIChE J., 62: 687-703.</ref><ref>Bertsimias, D., Georghiou, A. (2015). Design of Near Optimal Decision Rules in Multistage Adaptive Mixed-Integer Optimization. Operations Research 63(3): 610-627.</ref>

== Problem Formulation ==
Suppose, for an optimization problem of interest, <math>S</math> is the set of allowed decisions and <math>x</math> is a decision in <math>S</math>. Let <math>u</math> be a vector representing the set of parameters of interest in this problem. If the goal is to minimize some function <math>f(u, x)</math>, and we want <math>x</math> to adhere to a set of constraints <math>g(u, x) \leq 0</math>, then the problem may be formulated as:

<math>\begin{align}\text{minimize, choosing x: } f&(u, x)\\

\text{under constraints: } g&(u, x) \leq 0\end{align}</math>

Or more simply:

<math>\begin{align}\text{min}(x) \; &f(u, x)\\
\text{s.t. } \; &g(u, x) \leq 0\end{align}</math>

In this formulation, we call <math>f</math> the objective function and <math>g</math> the constraint function.
If <math>u</math> is known, then the problem can be solved using methods such as branch and cut or Karush-Kuhn-Tucker conditions. However, in many real world scenarios, <math>u</math> is not known. To address this uncertainty, the robust optimization approach generates the set of possible values of <math>u</math>, called the uncertainty set <math>U</math>, and solves for the decision <math>x</math> such that the constraint <math>g</math> is satisfied in all cases and <math>f</math> is optimized for the worst case. The problem can be written as:

<math>\begin{align}\text{min}(x)\text{ max}(u)\;&f(u, x)\\
\text{s.t.}\;&g(u, x) \leq 0 \end{align}</math>

Adaptive robust optimization expands this robust optimization framework by separating the decision <math>x</math> into multiple stages. For simplicity, assume there are two stages of decisions. In the first stage, only the urgent, here-and-now decisions are made. After these decisions are made, the true values of the parameters <math>u</math> are revealed, then the remaining, wait-and-see decisions are decided. The model is like a game of poker: the player needs to make initial bets based on incomplete information (the cards in his hand), then makes further bets as more and more cards are dealt. Mathematically, let the set of possible decisions in the first stage be <math>S_1</math> and the set of possible decisions in the second stage be <math>S_2</math>, so that the objective and constraint functions become functions of the parameters <math>u</math>, the first stage decision <math>x_1</math> (<math>x_1</math> in <math>S_1</math>), and the second stage decision <math>x_2</math> (<math>x_2</math> in <math>S_2</math>). Then, we can formulate the problem as:

<math>\begin{align} \text{min}(x_1)\text{ max}(u)\text{ min}(x_2)\;&f(u, x_1, x_2)\\
\text{s.t.}\;\;\;&g(u, x_1, x_2) \leq 0 \; \text{for all } u \text{ in } U\end{align}</math>

The reasoning used in this construction can be extended to multi-stage formulations.

In the literature, adaptive robust optimization problems are usually formulated differently but equivalently. Note that because <math>x_2</math> is selected only after the uncertain parameter <math>u</math> is revealed, <math>x_2</math> is a function of <math>u</math>. Expressing <math>x_2</math> as a function of <math>u</math> allows us to choose the function <math>x_2(u)</math> before learning <math>u</math>, which allows the problem to be rewritten as:

<math>\begin{align}\text{min}(x_1, x_2(u))\text{ max}(u) \; &f(u, x_1, x_2(u))\\
\text{s.t.} \; &g(u, x_1, x_2(u)) \leq 0 \; \text{for all } u \text{ in } U \end{align}</math>

And if we introduce a variable <math>t = \text{max}(u)\;f(u, x_1, x_2(u))</math>, then we can rewrite the problem as:

<math>\begin{align}\text{min}(x_1, x_2(u), t)\;\;&t\\
\text{s.t.} \; &f(u, x_1, x_2(u)) \leq t \text{ for all }u\text{ in }U\\
&g(u, x_1, x_2(u)) \leq 0 \text{ for all }u\text{ in }U\end{align}</math>

Which allows us to remove <math>u</math> from the objective function. Since <math>x_1</math> represents all the variables the decide immediately, <math>t</math> can be collapsed into <math>x_1</math>; similarly, the first constraint can be collapsed into the second. This generates the formulation most commonly seen in the literature (up to a change of variable names and functions <math>f</math> and <math>g</math>):

<math>\begin{align}\text{min}(x_1, x_2(u)) \;&f(x_1)\\
\text{s.t.}\; &g(u, x_1, x_2(u)) \leq 0 \text{ for all }u\text{ in }U\end{align}</math>

Where <math>f(x_1)</math> was redefined to be the part of <math>x_1</math> representing <math>t</math>.

For many problems of interest, the functions <math>f</math> and <math>g</math> vary linearly with <math>x_1</math> and <math>x_2</math>, that is, they are affine functions of <math>x_1</math> and <math>x_2</math>.<ref>B. Hu and L. Wu, "Robust SCUC Considering Continuous/Discrete Uncertainties and Quick-Start Units: A Two-Stage Robust Optimization With Mixed-Integer Recourse," in IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1407-1419, March 2016, doi: 10.1109/TPWRS.2015.2418158.</ref><ref>J. Warrington, C. Hohl, P. J. Goulart and M. Morari, "Rolling Unit Commitment and Dispatch With Multi-Stage Recourse Policies for Heterogeneous Devices," in IEEE Transactions on Power Systems, vol. 31, no. 1, pp. 187-197, Jan. 2016, doi: 10.1109/TPWRS.2015.2391233.</ref><ref>Chuen-Teck See, Melvyn Sim, (2010) Robust Approximation to Multiperiod Inventory Management. Operations Research 58(3):583-594.</ref><ref>Marcus Ang, Yun Fong Lim, Melvyn Sim, (2012) Robust Storage Assignment in Unit-Load Warehouses. Management Science 58(11):2114-2130.</ref><ref>Mattia, S., Rossi, F., Servilio, M., Smriglio, S. (2017). Staffing and Scheduling Flexible Call Centers by Two-Stage Robust Optimization. Omega 72:25-37.</ref><ref>Gong, J. and You, F. (2017), Optimal processing network design under uncertainty for producing fuels and value‐added bioproducts from microalgae: Two‐stage adaptive robust mixed integer fractional programming model and computationally efficient solution algorithm. AIChE J., 63: 582-600.</ref> In such cases, if <math>x_1</math> and <math>x_2</math> are treated as vectors, then we can write:

<math>f(x_1) = c^Tx_1</math>

Where <math>c</math> is some vector, and

<math>g(u, x_1, x_2) = A_1(u)x_1 + A_2(u)x_2(u) - b(u)</math>

Where the <math>A(u)</math>'s are matrices and <math>b(u)</math> is a vector, to give the linear, two-stage ARO (L2ARO):

<math>\begin{align}\text{min}(x_1, x_2(u))\;&c^Tx_1\\
\text{s.t.}\;&A_1(u)x_1 + A_2(u)x_2(u) \leq b(u)\;\text{ for all }u\text{ in }U\end{align}</math>

This L2ARO will be the primary focus of the Algorithms section.

==Algorithms and Methodology==
General ARO problems are computationally intractable.<ref>Guslitser, E. (2002). Uncertainty-Immunized Solutions in Linear Programming (Master’s Thesis, Technion-Israel Institute of Technology).</ref> Taking the L2ARO for example, deriving the optimal function <math>x_2(u)</math> poses a tremendous challenge for many choices of uncertainty set <math>U</math>. If <math>U</math> is large or infinite, or is non convex, deciding what <math>x_2</math> should be for each <math>u</math> in <math>U</math> may take a long time. In real world applications, then, the uncertainty set <math>U</math> must be chosen carefully to include a representative set of possible parameter values for <math>u</math>, but it must not be too large or complex and render the problem intractable.
The L2ARO model has been proven tractable only for simple uncertainty sets <math>U</math> or with restrictions imposed on the function <math>x_2(u)</math>.<ref>Yanikognlu, I., Gorissen, B. L., den Hertog, D. (2019) A Survey of Adjustable Robust Optimization. European Journal of Operational Research, (277)3:799-813.</ref><ref>Ben-Tal, A., Goryashko, A., Guslitzer, E. et al. Adjustable robust solutions of uncertain linear programs. Math. Program., Ser. A 99, 351–376 (2004).</ref> Therefore, ARO problems are usually solved on a case by case basis, using methods such as multi-level optimization, branch-and-cut, and decomposition.<ref>Ben-Tal, A., Goryashko, A., Guslitzer, E. et al. Adjustable robust solutions of uncertain linear programs. Math. Program., Ser. A 99, 351–376 (2004).</ref><ref>Zhao, L., & Zeng, B. (2012). An Exact Algorithm for Two-stage Robust Optimization with Mixed Integer Recourse Problems.</ref><ref>Chen, Bokan, "A new trilevel optimization algorithm for the two-stage robust unit commitment problem" (2013). Graduate Theses and Dissertations. 13065.</ref><ref>Shi, H. and You, F. (2016), A computational framework and solution algorithms for two‐stage adaptive robust scheduling of batch manufacturing processes under uncertainty. AIChE J., 62: 687-703.</ref><ref>Bertsimias, D., Georghiou, A. (2015). Design of Near Optimal Decision Rules in Multistage Adaptive Mixed-Integer Optimization. Operations Research 63(3): 610-627.</ref> This section will first present the L2ARO solution method using the affine decision rule approximation under fixed recourse conditions from Ben-Tal's 2004 paper<ref>Ben-Tal, A., Goryashko, A., Guslitzer, E. et al. Adjustable robust solutions of uncertain linear programs. Math. Program., Ser. A 99, 351–376 (2004).</ref>, then discuss how this method might be extended to other L2ARO problems.

General L2ARO problems were first proven intractable by Guslitser, in his master's thesis.<ref>Guslitser, E. (2002). Uncertainty-Immunized Solutions in Linear Programming (Master’s Thesis, Technion-Israel Institute of Technology).</ref> Ben-Tal took this result and suggested simplifying the problem by restricting <math>x_2(u)</math> to vary linearly with <math>u</math>, that is,

<math>x_2(u) = w + Wu</math>

Where <math>w</math> is a vector and <math>W</math> is a matrix, both variable. This simplification is known as the affine decision rule (ADR). To further simplify the problem, Ben-Tal proposed that the matrix <math>A_2(u)</math> be fixed to some matrix <math>V</math> (fixed recourse conditions), and make <math>A_1(u)</math> and <math>b(u)</math> affine functions of <math>u</math>:

<math>A_1(u) = m + M(u)</math>

<math>b(u) = b + Bu</math>

Where <math>m</math> and <math>b</math> are fixed vectors and <math>M</math> and <math>B</math> are fixed matrices. Then, the overall problem can be rewritten:

<math>\begin{align}\text{min}(x_1, w, W) \; &c^Tx_1\\
\text{s.t.}\;&(m + Mu)x_1 + V(w + Wu) \leq b + Bu \; \text{ for all }u\text{ in }U\end{align}</math>

Now, both the objective function and constraint function are affine functions of <math>x_1</math>, <math>w</math>, and <math>W</math>, so the problem has been reduced to a simple robust linear program, for which solution methods already exist.

The above solution method, although simple and tractable, suffers from potential sub optimality of ADR. Indeed, Ben-Tal motivates this assumption citing only the tractability of the result. In real world scenarios, this sub optimality can be mitigated by using ADR to make the initial decision, then resolving the problem after <math>u</math> is revealed. That is, if solving the L2ARO gives <math>x_1^*</math> as the optimal <math>x_1</math> and <math>x_2^*(u)</math> as the optimal <math>x_2(u)</math>, decision <math>x_1^*</math> is implemented immediately; when <math>u</math> is revealed (to be, say, <math>u^*</math>), decision <math>x_2</math> is decided not by computing <math>x_2^*(u^*)</math>, but by re-solving the whole problem fixing <math>x_1</math> to <math>x_1^*</math> and fixing <math>u</math> to <math>u^*</math>. This method reflects the wait-and-see nature of the decision <math>x_2</math> - ADR is used to find a pretty-good <math>x_1</math>, then <math>u</math> is revealed, then the information is used to solve for the optimal <math>x_2</math> in that circumstance.<ref>Ben-Tal, A., Golany, B., Nemirovski, A., Vial, J. (2005). Retailer-Supplier Flexible Commitments Contracts: A Robust Optimization Approach. Manufacturing and Service Operations Management 7(3):248-271.</ref> This iterative, stage-by-stage solution performs better than using only ADR, but is feasible only when there is enough time between stages to re-solve the problem. Further, numerical experiments indicate that classical robust optimization models yield equally good, if not better initial decisions than ADR on L2ARO, limiting ADR on L2ARO to situations where the problem cannot be feasibly re-solved, or in the special cases where the ADR approximation actually generates the optimal solution.<ref>Gorissen, B., Yanikognlu, I., den Hertog, D. (2015). A Practical Guide to Robust Optimization. Omega 53:124-137.</ref>

This leads to the natural question, under what conditions are ADRs optimal? Bertsimias and Goyal showed in 2012 that if both <math>A(u)</math> matrices are independent of <math>u</math>, <math>x_1</math> and <math>x_2</math> are restricted to vectors with nonnegative entries, and <math>b(u)</math> is restricted to be vectors with nonpositive entries, then ADRs are optimal if <math>b(u)</math> is restricted to a polyhedral set with a number of vertices one more than <math>b(u)</math>'s dimension.<ref>Bertsimas, D., Goyal, V. On the power and limitations of affine policies in two-stage adaptive optimization. Math. Program. 134, 491–531 (2012).</ref> In a 2016 paper, Ben-Tal and colleagues noted that whenever the <math>A(u)</math> matrices are independent of <math>u</math>, then a piecewise ADR can be optimal, albeit one with a large number of pieces.<ref>Ben-Tal, A., El Housni, O. & Goyal, V. A tractable approach for designing piecewise affine policies in two-stage adjustable robust optimization. Math. Program. 182, 57–102 (2020).</ref> ADRs can be optimal in other, more specific cases, but these cases will not be discussed here.<ref>Iancu, D.A., Parrilo, P.A.(2010). Optimality of Affine Policies in Multistage Robust Optimization. Mathematics of Operations Research 35(2):363-394</ref><ref>Dan A. Iancu, Mayank Sharma, Maxim Sviridenko (2013) Supermodularity and Affine Policies in Dynamic Robust Optimization. Operations Research 61(4):941-956</ref>

In most cases, however, ADRs are suboptimal, and it becomes useful to characterize its degree of suboptimality. The most common approach is to generate upper and lower bounds on the optimal value of the objective function. If the goal is to minimize the objective function, then any valid solution (via ADRs or some other method) gives an upper bound, so the problem reduces to computing lower bounds. A simple approach to doing so is to approximate the uncertainty set using a small number of well-chosen points (“sampling” the uncertainty set), solve the model at each of these points, and find the worst case among these sampled solutions. Since the true worst case scenario must be at least as bad as one of the selected points, this sampling approach must generate a solution no worse than the true optimal, or a lower bound to the objective.<ref>M. J. Hadjiyiannis, P. J. Goulart and D. Kuhn, "A scenario approach for estimating the suboptimality of linear decision rules in two-stage robust optimization," 2011 50th IEEE Conference on Decision and Control and European Control Conference, Orlando, FL, 2011, pp. 7386-7391.</ref> This method, although simple, generates excessively optimistic lower bounds unless a large number of points are sampled, but solving the model at many such points can take a long time. Thus, authors have investigated methods for choosing fewer points that can better represent the whole uncertainty set to improve both the lower bound quality and computation time for this method.<ref>Ayoub, J., Poss, M. Decomposition for adjustable robust linear optimization subject to uncertainty polytope. Comput Manag Sci 13, 219–239 (2016).</ref> For example, Bertsimias and De Ruiter discovered that constructing the dual and sampling the dual uncertainty set gives better bounds and faster computation time.<ref>Bertsimias, D., deRuiter, F. J. C. T. (2016). Duality in Two-Stage Adaptive Linear Optimization: Faster Computation and Stronger Bounds. INFORMS Journal on Computing 28(3):500-511.</ref>

The other important assumption in the given solution methodology is the fixed recourse condition, that <math>A_2(u)</math> is fixed to some matrix <math>V</math>. If this is not true, that <math>A_2(u)</math> is instead some affine function of <math>u</math>, then even under the ADR assumption, the problem is intractable.<ref>Guslitser, E. (2002). Uncertainty-Immunized Solutions in Linear Programming (Master’s Thesis, Technion-Israel Institute of Technology).</ref> However, Ben-Tal has proposed a tight approximation method for cases where the uncertainty set <math>U</math> is the intersection of ellipsoidal sets, an approximation that becomes exact if <math>U</math> itself is an ellipsoidal set.<ref>Ben-Tal, A., Goryashko, A., Guslitzer, E. et al. Adjustable robust solutions of uncertain linear programs. Math. Program., Ser. A 99, 351–376 (2004).</ref>

==Numerical Example==
Consider a simple inventory management problem over two business periods involving one product that loses all value at the end of the second period. Let the storage cost of every unused unit of product be <math>$10</math> per business period. Let the unit price of the product be <math>$40</math> be in the first business period and <math>$55</math> in the second period. Let the demand in each period be uncertain, but given the following information:
#Both demands are between <math>50</math> and <math>100</math> units.
#The total demand over the two periods is <math>150</math> units.
The problem is that the manager must decide the quantity of each product to purchase at the start of each business period minimizing storage and purchasing costs. If we denote the demand in the first business period <math>d</math> (so the demand in the second period is <math>150 - d</math>) and the quantity purchased in the <math>i</math>th period <math>n_i</math>, then we can formulate this as an L2ARO as follows:

<math>\begin{align} \text{min} \; &cost\\
\text{s.t.} \; &cost \geq 10(n_1-d) + 10(n_1+n_2-150) + 40n_1 + 55n_2 \;\text{ for all }d\\
&n_1 - d \geq 0 \;\text{ for all }d\\
&n_1 + n_2 \geq 150\\
&n_1 \geq 0\\
&n_2 \geq 0\\
&50 \leq d \leq 100
\end{align}</math>

The uncertain parameter is <math>d</math>, and the uncertainty set is the closed interval from <math>50</math> to <math>100</math>. The first stage decision is for <math>n_1</math> and <math>cost</math>; the second stage decision is <math>n_2</math>. Rewriting <math>n_2</math> as a function of <math>d</math> and rearranging into matrix form:

<math>\text{min}(n_1, cost, n_2(d)) \; cost</math>

<math>\text{s.t.}\;\begin{bmatrix} -60 & 1 \\ 1 & 0 \\ 1 & 0 \\ 1 & 0 \\ 0 & 1
\end{bmatrix}\begin{bmatrix}n_1 \\ cost\end{bmatrix} +
\begin{bmatrix} -65 \\ 0 \\ 1 \\ 0 \\ 1\end{bmatrix}n_2(u) \geq
\begin{bmatrix} -1500 \\ 0 \\ 150 \\ 0 \\ 0\end{bmatrix} +
\begin{bmatrix} -10 \\ 1 \\ 0 \\ 0 \\ 0\end{bmatrix}d\;\text{ for all }d
</math>

Applying the affine decision rule <math>n_2(d) = w + Wd</math>, noting that <math>w</math> and <math>W</math> are <math>1\times1</math> matrices, gives:

<math>\text{min}(n_1, cost, w, W) \; cost</math>

<math>\text{s.t.}\;\begin{bmatrix} -60 & 1 \\ 1 & 0 \\ 1 & 0 \\ 1 & 0 \\ 0 & 1
\end{bmatrix}\begin{bmatrix}n_1 \\ cost\end{bmatrix} +
\begin{bmatrix} -65 \\ 0 \\ 1 \\ 0 \\ 1\end{bmatrix}(w + Wd) \geq
\begin{bmatrix} -1500 \\ 0 \\ 150 \\ 0 \\ 0\end{bmatrix} +
\begin{bmatrix} -10 \\ 1 \\ 0 \\ 0 \\ 0\end{bmatrix}d\;\text{ for all }d
</math>

Which rearranges to:

<math>\text{min}(n_1, cost, w, W) \; cost</math>

<math>\text{s.t.}\;
\begin{bmatrix} -60 & 1 & -65 & -65d \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & d \\
1 & 0 & 0 & 0 \\ 0 & 0 & 1 & d\end{bmatrix}
\begin{bmatrix}n_1 \\ cost \\ w \\ W\end{bmatrix} \geq
\begin{bmatrix}-1500 \\ 0 \\ 150 \\ 0 \\ 0\end{bmatrix} +
\begin{bmatrix}-10 \\ 1 \\ 0 \\ 0 \\ 0\end{bmatrix}d\;
\text{ for all }d
</math>

Which is a robust linear program. Since the constraints are linear inequalities in <math>d</math> and <math>d</math> is bounded between <math>50</math> and <math>100</math>, it suffices to check the constraint only for <math>d = 50</math> and <math>d = 100</math>. Writing down the constraints for both values gives a deterministic linear program:

<math>\text{min}(n_1, cost, w, W) \; cost</math>

<math>\text{s.t.}\;
\begin{bmatrix} -60 & 1 & -65 & -3250 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 50 \\
1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 50 \\ -60 & 1 & -65 & -6500 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 100 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 100\end{bmatrix}
\begin{bmatrix}n_1 \\ cost \\ w \\ W\end{bmatrix} \geq
\begin{bmatrix}-2000 \\ 50 \\ 150 \\ 0 \\ 0 \\ -2500 \\ 100 \\ 150 \\ 0 \\ 0\end{bmatrix}
</math>

Which can be solved using the Simplex Algorithm. The solution to this linear program is <math>(n_1, cost, w, W) = (150, 7000, 0, 0)</math>, corresponding to a worst-case cost of <math>7000</math>. The solution corresponds to buying all 150 demand units for the two periods at start of the first period, but only having a demand of 50 units in the first business period. This solution makes intuitive sense because the purchase price for the second period is <math>$15</math> more than for the first period, but storing any extra units from the first period costs <math>$10</math>, so in any case, the price increase outweighs the storage cost. This is indeed the optimal solution to the problem.

Note that the ADR approximation found the optimal solution. This is not surprising because the optimal strategy as described above does not depend on the first period demand.

==Applications==
Applications of adaptive robust optimization typically involve multi-stage allocation of resources under uncertain supply or demand, including problems in energy systems, inventory management, and shift scheduling.<ref>B. Hu and L. Wu, "Robust SCUC Considering Continuous/Discrete Uncertainties and Quick-Start Units: A Two-Stage Robust Optimization With Mixed-Integer Recourse," in IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1407-1419, March 2016, doi: 10.1109/TPWRS.2015.2418158.</ref><ref>J. Warrington, C. Hohl, P. J. Goulart and M. Morari, "Rolling Unit Commitment and Dispatch With Multi-Stage Recourse Policies for Heterogeneous Devices," in IEEE Transactions on Power Systems, vol. 31, no. 1, pp. 187-197, Jan. 2016, doi: 10.1109/TPWRS.2015.2391233.</ref><ref>Chuen-Teck See, Melvyn Sim, (2010) Robust Approximation to Multiperiod Inventory Management. Operations Research 58(3):583-594.</ref><ref>Marcus Ang, Yun Fong Lim, Melvyn Sim, (2012) Robust Storage Assignment in Unit-Load Warehouses. Management Science 58(11):2114-2130.</ref><ref>Mattia, S., Rossi, F., Servilio, M., Smriglio, S. (2017). Staffing and Scheduling Flexible Call Centers by Two-Stage Robust Optimization. Omega 72:25-37.</ref>.

===Energy Systems===
Energy systems aim to meet energy demand while minimizing costs. An energy system may involve multiple units that can each be turned on or off, with corresponding startup, shut down, and operation costs. A coal plant may be expensive to start up and shut down, but be cheap to operate, for example, while a solar farm may be easier to start up and shut down but more difficult to maintain. Let each day be partitioned into <math>n</math> different blocks of time, and suppose the problem was to determine the optimal combination of units to run during each time block on a given day. The unit combination for the first block of time must be decided immediately, but decisions for the subsequent time blocks can wait until after learning the energy demand in preceding time blocks. However, past decisions influence future decisions - starting up the coal plant in the first time block allows it to produce cheap energy all day, potentially reducing a reliance on other energy sources. Such a decision structure, where past decisions and new information guide future decisions, lends itself naturally to an ARO model. The decision is the combination of units to run in each time block, the uncertainty set <math>U</math> is the set of possible energy demands for each time block, the constraint is that the power produced meets demand for each time block, and the objective would be to minimize the total startup, shut down, and operation costs for the day. For more detailed treatments of ARO applied to energy systems, we refer the reader to the references.<ref>B. Hu and L. Wu, "Robust SCUC Considering Continuous/Discrete Uncertainties and Quick-Start Units: A Two-Stage Robust Optimization With Mixed-Integer Recourse," in IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1407-1419, March 2016, doi: 10.1109/TPWRS.2015.2418158.</ref><ref>J. Warrington, C. Hohl, P. J. Goulart and M. Morari, "Rolling Unit Commitment and Dispatch With Multi-Stage Recourse Policies for Heterogeneous Devices," in IEEE Transactions on Power Systems, vol. 31, no. 1, pp. 187-197, Jan. 2016, doi: 10.1109/TPWRS.2015.2391233.</ref>

===Inventory Management===
The inventory management problem seeks to purchase goods at regular intervals and store them such that there’s always enough product on hand to satisfy demand while minimizing purchase and storage costs. Purchasing large amounts when the prices are low saves on purchasing costs later on, but this incurs large storage costs. On the other extreme, keeping too little inventory risks running out of stock or requiring large purchases at inconvenient times. If an inventory planner wanted to plan purchases for the next <math>n</math> time blocks (for simplicity assuming purchases take place only at the start of each time block), then he must immediately decide how much to purchase for the first time block, then use each past time block’s prices and demands to decide how much to buy at future time blocks. As in the case of energy systems, the inventory management problem has a staggered decision structure where past decisions and new information inform future decisions, and can be translated naturally into an ARO model. The decisions are the quantities of product to purchase at the start of each time block, the uncertainty set is the set of possible prices and demands for each of the <math>n</math> time blocks, the constraint is that the inventory has enough stock to meet demand in every time block, and the objective is to minimize purchasing and storage costs. For more detailed analyses of the inventory management problem, we refer again readers to the references.<ref>Chuen-Teck See, Melvyn Sim, (2010) Robust Approximation to Multiperiod Inventory Management. Operations Research 58(3):583-594.</ref><ref>Marcus Ang, Yun Fong Lim, Melvyn Sim, (2012) Robust Storage Assignment in Unit-Load Warehouses. Management Science 58(11):2114-2130.</ref>

===Shift Scheduling===
The shift scheduling problem involves carefully choosing a shift for each employee such that the operation center has enough staff at all times. For example, a customer service line would ideally like to predict the frequency and length of calls at every hour of the day so it employs exactly enough operators to handle the calls, however, call volume is hard to predict and call centers often end up overstaffed or understaffed, so that the company may spend excess on paying workers or deliver unsatisfactory customer service. However, Mattia and colleagues note that consecutive periods of high volume calls are not independent, so that past call volumes help predict future call volumes, so they formulated the shift scheduling problem as a two-stage ARO. In the first stage (over the weekend), employees shifts are laid out for the workweek; in the second stage (during the week), employees are allocated to different jobs in the office. The uncertainty set is the set of possible call volume distributions through the week (represented as the deviations in the number of staff for handling all other jobs), the constraint is that enough staff is present to handle the various office jobs, and the objective is to minimize the total cost of hiring the employees for the hours they work. For more detail on this problem, we refer readers to the paper by Mattia and colleagues.<ref>Mattia, S., Rossi, F., Servilio, M., Smriglio, S. (2017). Staffing and Scheduling Flexible Call Centers by Two-Stage Robust Optimization. Omega 72:25-37.</ref>

==Conclusion==
Adaptive robust optimization models multi-stage decision making where past decisions affect future decision making, but new information is learned between decision stages. It finds less conservative solutions than traditional robust optimization without sacrificing robustness, at the expense of simplicity and computation time. Many ARO problems are computationally intractable, but ARO problems have also been solved for many specific problems in the field, and will continue to grow in the coming decades.

== References ==

<references />

Convex generalized disjunctive programming (GDP)

2020-12-21T11:40:30Z

Wc593:

Authors: Nicholas Schafhauser, Blerand Qeriqi, Ryan Cuppernull (SysEn 5800 Fall 2020)

== Introduction ==
Generalized disjunctive programming (GDP) involves logic propositions (Boolean variables) and sets of constraints that are chained together using the logical OR operator ( II ). GDP is an extension of linear disjunctive programming<ref>Balas, Egon. "Disjunctive Programming." Annals of Discrete Mathematics, 1979.</ref> that can be applied to Mixed Integer Non-Linear Programming (MINLP). GDP<ref>Raman and Grossman. "Modelling and Computational Techniques for Logic Based Integer Programming." Computers & Chemical Engineering, 1994.</ref>, is a generalization of disjunctive convex programming in the sense that it also allows the use of logic propositions that are expressed in terms of Boolean variables. In order to take advantage of current mixed-integer nonlinear programming solvers (e.g. DICOPT<ref name=":3">GAMS. DICOPT, https://www.gams.com/latest/docs/S_DICOPT.html</ref>, SBB<ref name=":4" />, α-ECP<ref name=":5">GAMS. AlphaECP, 1995, https://www.gams.com/latest/docs/S_ALPHAECP.html</ref>, BARON<ref name=":6">BARON, 1996, https://minlp.com/baron</ref>, Couenne<ref name=":7">Couenne, 2006, https://projects.coin-or.org/Couenne</ref> etc.), GDPs are often reformulated as MINLPs.<ref name=":0">P. Ruiz, Juan; Grossmann, Ignacio E. (2012): A hierarchy of relaxations for nonlinear convex generalized disjunctive programming. Carnegie Mellon University. Journal contribution. <nowiki>https://doi.org/10.1184/R1/6466535.v1</nowiki> </ref>
[[File:GDP Intro.jpg|none|thumb|523x523px|Figure 1: Generalized Disjunctive Programming Methods<ref>Grossman, Ignacio E: Overview of Generalized Disjunctive Programming. Carnegie Mellon University.https://www.minlp.org/pdf/GBDEWOGrossmann.pdf</ref>]]

== Theory ==
The general form of an MINLP model is as follows

<math>\begin{align} \min z=f(x,y)\\

s.t.g(x,y) \leq 0\\
x \in X\\
y \in Y\\

\end{align}</math>

where f(x) and g(x) are twice differentiable functions, x are the continuous variables and y are the discrete variables. There are three main types of sub problems that arise from the MINLP: Continuous Relaxation, NLP subproblem for a fix
<math>\begin{align}
Y_p
\end{align}</math>
and the feasibility problem.

==== Continuous Relaxation ====
The sub problem of continuous relaxation takes the form of

<math>\begin{align} \min z=f(x,y)\\

s.t.g(x,y) \leq 0\\
x \in X\\
y \in Y_R\\

\end{align}</math>

Where <math>Y_R</math> is the continuous relaxation of Y. Not that in this sub-problem all of the integer variables y are treated as continuous. This also returns a Lower Bound when it returns a feasible solution<ref name=":2">Grossmann, Ignacio. Review of Mixed-Integer Nonlinear and Generalized Disjunctive Programming Applications in Process Systems Engineering.</ref>

==== NLP Subproblem for a fixed <math>Y_p</math> ====
The subproblem for a fixed <math>Y_p</math> is shown in the form below

<math>\begin{align} \min z=f(x,y^p)\\

s.t. g(x,y^p) \leq 0\\
x \in \Re^n\\

\end{align}</math>

In this sub problem you return an upper bound for the MINLP program when it has a feasible solution. So with that being said you can fix a integer variables and continuously relax the others in order to get a range of feasible values.<ref name=":2" />

'''Feasibility Problem'''

When the fixed MINLP subproblem is not feasible the following feasibility problem is considered.

<math>\begin{align} \min z=f(x,y)\\

s.t.g(x,y) \leq 0\\
j \in J\\
u \in \Re\\

\end{align}</math>

Where J is the index set for inequalities and the feasibility problem attempts to minimize the infeasibility of the solution with the most violated constraints.<ref name=":2" />

==== GDP ====
GDP provides a high level framework for solving the mixed non-linear integer programs. By provide a methodology for converting the disjunctive problems into a MINLP the problem becomes simplified and easier to solve using current processing and algorithmic capabilities. These methodologies that can not only solve both the Convex and Non-Convex Problems. A Convex GDP is when both f(x) and g(x) are convex functions. Which is defined as a graph where any line segment that passes through any 2 points of the plot will always be greater than the plot itself. This allows for simple relaxations/approximations to occur which will create a faster solving methodology.<ref>Grossmann, Ignacio. Review of Mixed-Integer Nonlinear and Generalized Disjunctive Programming Applications in Process Systems Engineering.</ref>

== Methodology ==

Below is a GDP problem that will be used for demonstration purposes in this section.

<math>\begin{align} \min z=f(x)\\
s.t. g(x) \leq 0\\
\bigvee_ {i \in D_k} \begin{bmatrix} Y_{ki} \\
r_{ki}(x) \leq 0
\end{bmatrix} \quad k \in K \\
\underline{\bigvee}_ {i \in D_k} Y_{ki} \quad k \in K\\
\Omega(Y)=True\\
x^{lo} \leq x \leq x^{up}\\
x \in \Re^n\\
y_{ki} \in {True,False}
\quad k \in K, i \in D_k \end{align}</math>

The two most common ways of reformulating a GDP problem into an MINLP are through Big-M (BM) and Hull Reformulation (HR). BM is the simpler of the two, while HR results in tighter relaxation (smaller feasible region) and faster solution times.<ref>Trespalacios, Francisco; Grossmann, Ignacio E. (2018): Improved Big-M Reformulation for Generalized Disjunctive Programs. Carnegie Mellon University. Journal contribution. <nowiki>https://doi.org/10.1184/R1/6467063.v1</nowiki> </ref>

Below is an example of the the GDP problem from above reformulated into an MINLP by using the BM method.

<math>\begin{align} \min z=f(x)\\

s.t.g(x) \leq 0\\
r_{ki}(x) \leq M^{ki}(1-y_{ki})\quad k \in K,i \in D_k\\

\sum_{i \in D_k} y_{ki} = 1\quad k \in K\\
Hy \geq h\\
x^{lo} \leq x \leq x^{up}\\
x \in \Re^n\\

y_{ki} \in {0,1} \quad k \in K, i \in D_k \end{align}</math>

Notice that the boolean term from the original GDP has been converted into a numerical {0,1}. The logic relations have also been converted into linear integer constraints (Hy)<ref name=":0" />.

This MINLP reformulation can now be used in well-known solvers to calculate a solution.

The same GDP form will now be reformulated into an MINLP by using the HR method.

<math>\begin{align} \min z=f(x)\\
s.t. g(x) \leq 0\\
x = \sum_{i \in D_k} v^{ki}\quad k \in K\\
y_{ki}r_{ki}(v^{ki}/y_{ki}) \leq 0\quad k \in K, i \in D_k\\
\sum_{i \in D_k} y_{ki} = 1\quad k \in K\\
Hy \geq h\\
x^{lo}y_{ki} \leq v^{ki} \leq x^{up}y_{ki}\quad k \in K, i \in D_k\\
x \in \Re^n\\
y_{ki} \in {0,1} \quad k \in K, i \in D_k\\
\end{align}</math>

HR significantly increases the number of variables that are required in the same BM variant. The decrease in time needed to solve computations could very well be argued to be worth the reduced simplicity that one can get from BM.<ref>Trespalacios, Francisco; Grossmann, Ignacio E. (2015): Algorithmic Approach for Improved Mixed-Integer Reformulations of Convex Generalized Disjunctive Programs. Carnegie Mellon University. Journal contribution. <nowiki>https://doi.org/10.1184/R1/6466700.v1</nowiki> </ref>

==== Solvers: ====

* DICOPT<ref name=":3" />
* SBB<ref name=":4">GAMS. ''SBB'', 2020, www.gams.com/latest/docs/S_SBB.html.</ref>
* BARON<ref name=":6" />
* Couenne<ref name=":7" />

== Numerical Example ==
The following example was taken from the paper titled ''Generalized Disjunctive Programming: A Framework For Formulation and Alternative Algorithms For MINLP Optimization''.''<ref name=":1">P. Ruize, Juan; Grossmann, Ignacio E.: Generalized Disjunctive Programming: A Framework For Formulation And Alternative Algorithms For MINLP Optimization. Carnegie Mellon University. http://egon.cheme.cmu.edu/Papers/IMAGrossmannRuiz.pdf</ref>''

[[File:GDP numeric example 3.png|frameless|600x600px]]

[[File:GDP numeric example 4.png|alt=http://egon.cheme.cmu.edu/Papers/IMAGrossmannRuiz.pdf|frameless|661x661px]]

[[File:GDP numeric example 5.png|alt=http://egon.cheme.cmu.edu/Papers/IMAGrossmannRuiz.pdf|frameless|600x600px]]

== Applications ==
GDP formulations are useful for real-world applications where multiple branches are available when making decisions. Solving the GDP in these instances will allow the user to calculate which decisions should be made at each branching point in order to get the optimal solution. This disjunctive formulation is common in complex chemical reactions and production planning.
[[File:Process network example.png|none|thumb|600x600px|Figure 2: Process Network Example. Each decision point represents another disjunctive set. <ref name=":1" />]]
The process network depicted in the Figure 2 depicts multiple decisions that could be made to all end up at the goal (B) in a chemical reaction. This problem is able to be formulated into a GDP in order to figure out which route should be taken in order to maximize the profit.
[[File:GDP numeric example 1.png|none|thumb|600x600px|Figure 3: A more complex process network.<ref name=":1" />]]
This same idea can be scaled to larger problems with more complex branching. Figure 3 illustrates a larger process network and all of the different decision points. This problem is able to be formulated into a GDP so that the most optimal route can be calculated to take through the network.
== Conclusion ==
GDP is a programming method that applies disjunctive programming to MINLP problems. This method facilitates modeling discrete or continuous optimization problems by implementing algebraic constraints and logic expressions. The formulation of a GDP consists of Boolean and continuous variables and disjunctions and logic propositions. In the case of convex functions, GDPs can be reformulated using the BM and the HR methods. Formulation methods also include logic based methods disjunctive branch and bound and decomposition. Once reformulated into a standard MINLP, standard MILNP solvers, such as DICOPT<ref name=":3" />, SBB<ref name=":4" />, α-ECP<ref name=":5" /> and BARON<ref name=":6" />, can be used to determine optimal solutions<ref name=":0" />. The GDP method has important applications that include the optimization of complex chemical reactions and process planning.

== References ==
<references />

Fuzzy programming

2020-12-21T11:40:08Z

Wc593:

Authors: Kyle Clark, Matt Schweider, Tommy Sheehan, Jarred Melancon (SysEn 5800 Fall 2020)

== Introduction ==
Fuzzy Programming is an optimization model that deals with performing optimization in the presence of uncertainty. This optimization technique is used when determining the exactness of a system's performance criteria/parameters and decision variables is not possible. Specifically, the truth values associated with the system can be completely false (0), completely true (1), or some value between the two extremes. This aims to capture the concept of partial truth. One approach to account for uncertainty in a system is to model the uncertainty using probability distributions, also known as statistical analysis. However, sometimes uncertainty is sometimes described using qualitative adjectives, or 'Fuzzy' statements, such as young or old and hot or cold, because exact boundaries do not necessarily exist [1].

Fuzzy Programming is built on the concept of Fuzzy Logic. The motivation for Fuzzy Logic, or more precisely Fuzzy Set Theory, is to accurately model and represent real world data which is often 'Fuzzy' due to uncertainty. This uncertainty can be introduced into a system by a number of factors such as imprecision in measurement tools or due to the use of vague language [2].

== Fuzzy Logic ==
While Boolean Logic is used to describe situations as completely true or completely false, Fuzzy Logic allows for a mathematical representation of partial truth or partial falsehood. Rather than having strict criteria for defining what is part of the set and what is not (e.g. hot or cold, young or old), we allow data to have a degree of membership (u) to each set. A membership function defines how each input value is mapped to a degree of membership (u) between the two extremes, 0 and 1. Membership functions can be several different types of functions. However, they are often Piece-Wise Linear Functions [3]. Below is an example of an L-Function.

<math>u_A(x) = \begin{cases} 0,\qquad x\leq b \\ \frac{x-a}{b-a},\quad a\leq x \leq b \\ 1,\qquad x>b \end{cases}</math>

For instance, let's say that we have a set of values that describe temperatures over the course of a week. In Boolean logic, we could create two sets, a cold set and a hot set. We could say that temperatures [0°F, 60°F) belong to the cold set and temperatures [60°F, 100°F] belong to the hot set. However, it is not very accurate to say that 60°F is cold, but 60.1°F is hot. Instead, we could use Fuzzy Logic to describe temperatures 0°F - 50°F as not hot (u=0). As temperatures increase from 50°F, they are given a higher degree of membership (u > 0) to describe that they are "hotter" or warmer. Lastly, temperatures above 70°F are definitely hot (u=1) [3].

== Flexible Mathematical Programming Method ==
An optimization technique used to implement Fuzzy Programming is Flexible Mathematical Programming. This kind of problem takes on the form of

<math>\begin{cases} \tilde{min} f(x) \\ s.t. \ g_i(x) \leq \sim b_i; i=1,...,m \\ x \in X = \{ x \in \reals^2 | x \geq 0 \} \end{cases}</math>

where the "~" conveys the concept that the objective statement and constraints have some freedom in how they are satisfied. This approach is useful when strict satisfaction of the constraints creates an empty feasible set. Relaxing the constraints with the "~" allows for maneuverability within the potential solutions.

An easier way to represent the constraints is through the use of membership functions which are fuzzy sets of <math>\reals</math>.

<math>u_i(x) = 0 \qquad \ \ \ if \ g_i(x) > b_i + d_i</math>

<math>u_i(x) \in (0,1) \quad if \ b_i < g_i(x) \leq b_i + d_i</math>

<math>u_i(x) = 1 \qquad \ \ if \ g_i(x) \leq b_i</math>

where <math>d_i(i = 1,...,m)</math> represents the set of constraints which have a certain threshold that can be violated. The above membership functions are used to determine the degree of membership or how violated a certain constraint is. If <math>u_i(x) = 1</math>, then the constraint is not violation. If <math>u_i(x) = 0</math>, then the constraint is violated. The in between case of <math>u_i(x) \in (0,1)</math> allows for partial violation of a constraint. The values of <math>d_i(i = 1,...,m)</math> can be carefully selected to create constraints that allow for the desired amount of flexibility.

The above membership functions can be combined into a single piecewise function like the function shown within the Fuzzy Logic section of this page.

<math>u_i(x) = \begin{cases} 1, \qquad \qquad \quad \ if \ g_i(x) \leq b_i \\ 1- \frac{g_i(x)-b_i}{d_i},\quad if \ b_i < g_i(x) \leq b_i + d_i \\ 0,\qquad \qquad \quad \ if \ g_i(x) > b_i + d_i \end{cases}</math>

The optimal solution then becomes the value of x that provides the highest degree of membership while satisfying all constraints expressed by the above fuzzy sets or <math>maximize \ u_D(x) = min \ u_i(x)</math> [4].

== Applications ==
Fuzzy Programming can be applied in a number of fields including media selection in advertising, automated braking in cars, water resource management, and control systems in HVAC systems. HVAC (Heating, Ventilation, and Air Condition) Systems are used to maintain a comfortable environment with a building, such as an office or school. The system works to maintain a certain temperature/humidity according to a set schedule. The system monitors the environment closely to understand when it has reach a certain set point. Fuzzy programming is applied to the control systems to create a more cost efficient systems as these HVAC systems can often be expensive to run. This is because the system might be aiming for that specific temperature and so to reach and maintain that exact degree, it could repeatedly turn on and off either the heating or air conditioning. This can result in a lot of wasted energy as opposed to dealing with a range of temperatures, which gives the system more flexibility as to when a certain subsystem needs to turn on. Compared to the traditional PID (Proportional, Integral, Derivative) controller, the application of Fuzzy programming has been shown to be a more efficient way to run these systems [4].

== Example ==

An example that showcases fuzzy logic can be described by a simple water allocation problem [1]. Suppose we have a scenario where we have 3 firms wishing to receive a certain amount of water from the flow of a river. Each firm has its own benefit from the water allocation and the amount of water allocated to all of the firms can't exceed the amount in the river or the amount of flow Q.
[[File:Tps94 water allocation.png|thumb|712x712px|Water Allocation Scenario|alt=|center]]

Our goal in this problem is to maximize the water allocation to three separate firms from a single source, in this case being a river. Therefore we get this optimization problem:

<math display="inline"> max \ \ TB(X)=(6x_1-x_1^2)+(7x_2-1.5x_2^2)+(8x_3-0.5x_3^2)</math>

<math>s.t. \ x_1+x_2+x_3\leq K</math>

<math display="inline">x_i \geq 0 \ \ i=1,2,3 </math>

As we mentioned before, the total allocation of water for these three firms cannot exceed the total amount of water available, which will be represented by the variable Q. This value deducted from that total will be the amount of water that has to remain in the river, R, which gives us our value: <math>Q-R=K </math>. Using this will give us an idea of the water that can be allocated to the firms. For our case, we will assume that the value <math>K = 6</math>. Thus our new optimization function becomes:

<math display="inline"> max \ \ TB(X)=(6x_1-x_1^2)+(7x_2-1.5x_2^2)+(8x_3-0.5x_3^2)</math>

<math display="inline">s.t. \ x_1+x_2+x_3\leq 6</math>.

<math display="inline">x_i \geq 0 \ \ i=1,2,3 </math>

With that constraint, the optimal solution will be <math>x_1=1,x_2=1, x_3=4, </math> giving a value of <math>TB(X)=34.5 </math>.

The problem depicted above was an example of a crisp problem where we knew the exact value for the limit of water to allocate. However, in the real world, we don't always have exact values; therefore, we can apply fuzzy logic to make the problem more realistic.

A fuzzy variant of this model would be when each firm's benefits are maximized. The first step for this fuzzy variant would be adding a new factor involving the membership function for each of the firms. The member function can be summed up into the equation below:

<math>m(X)= [(6x_1 - x_1^2) +(7x_2- 1.5x_2^2)+ (8x_3- 0.5x_3^2)]/ 49.17</math>

This has a similar constraint to the linear version in regards to the total water, <math display="inline">x_1+x_2+x_3\leq 6</math>. The optimal solution of this function is thus the same as the linear variant and the degree of satisfaction is <math>m(X)=0.7</math>. However, things begin to change when the total amount of units of water becomes '''more or less 6 units''' instead of just a crisp 6. The nomenclature "more or less 6 " is where we start to apply the fuzzy logic implying that the value will be around 6. Therefore we can classify the possibilities into membership functions around the values (5, 6, 7) adjusting the membership value between 0 and 1.

Adjusting the membership with these values yields the membership function:

<math>m_c(x) = \begin{cases} 1, \qquad \qquad \quad \ if \ x_{1}+x_{2}+x_{3}\leq 5\\ \frac{7-(x_{1}+x_{2}+x_{3})}{2},\quad\ if \ 5 < x_{1}+x_{2}+x_{3}\leq7 \\ 0,\qquad \qquad \quad \ if \ x_{1}+x_{2}+x_{3} > 7 \end{cases}</math>

Thus the overall optimization problem changes to a maximum/minimum dilemma where we are maximizing <math>M_G(X)</math> and minimizing <math>M_C(X)</math>:

<math>m_G(X)=[(6x_1-x_1^2)+(7x_2-1.5x_2^2)+(8x_3-0.5x_3^2)]/49.17
</math>

<math>m_C(X)=[7-(x_1+x_2+x_3)]/2</math>

This results in <math>x_1=0.91, x_2=0.94, x_3=3.81, m(X)=0.67</math>, and the total benefit being <math>TB(X)=33.1</math>.

This result accounts for some uncertainty in our assumptions which is a common issue in a lot of real world problems which is why fuzzy programming is used in a lot of real world applications and controllers for various systems.

== Conclusion ==
The optimization technique of Fuzzy Programming is useful when qualitative adjectives are the only available descriptors for a system's performance criteria/parameters and decision variables. It's a vital tool that can help characterize and solve optimization models in the presence of uncertainty which are very common in the real world. The premise of Fuzzy Programming centers around Fuzzy Logic which allows for a mathematical representation of partial truth or partial falsehood rather than strict Boolean Logic system. Incorporating this partial state introduces a flexibility or "fuzziness" to the problem to allow it to better interpret imprecisions and unknowns we encounter in the real world data. Instead of fixed categories, we define degrees of membership to our function where we apply certain ranges or criteria which relate to different membership functions within our Fuzzy Set. Therefore, instead of the basic black and white scenario, we also consider the gray area between the two sets. Strict sets and precise measurements are nearly impossible to find in the real world so the use of Fuzzy Programming is essential to get optimal solutions that can accurately relate to real world situations. The versatility that it provides is precisely why it is so widely used in numerous controllers in all industries from HVAC systems to automated breaking systems.

== References ==
[1] Daniel P. Loucks, Eelco van Beek, Jery R. Stedinger, Jozef P.M. Dijkman, Monique T. Villars, [https://ecommons.cornell.edu/bitstream/handle/1813/2804/05_chapter05.pdf?sequence=16&isAllowed=y ''Water Resources Systems Planning and Management: An Introduction to Methods, Models, and Applications''], UNESCO, p.135-142, 2005.

[2] Nitin A. Bansod, Vaishali Kulkerni and S.H. Paul, [https://books.google.com/books?id=IkajJC9iGxMC&pg=PA73#v=onepage&q&f=false ''Soft Computing-A Fuzzy Logic Approach''], Bharati Vidyapeeth College of Engineering, p.73-74, 2005.

[3] MathWorks, (2020). ''Foundations of Fuzzy Logic'', Retrieved November 6th, 2020 from https://www.mathworks.com/help/fuzzy/foundations-of-fuzzy-logic.html#:~:text=northern%20hemisphere%20climates).-,Membership%20Functions,name%20for%20a%20simple%20concept

[4] M.K. Luhandjula, [http://www.worldacademicunion.com/journal/jus/jusVol01No2paper03.pdf ''Fuzzy Mathematical Programming: Theory, Applications, and Extension''], University of South Africa Department of Decision Sciences, Journal of Uncertain Systems, Vol.1, No.2, p.124-136, 2007.

Convex generalized disjunctive programming (GDP)

2020-12-21T11:39:42Z

Wc593:

Author: Nicholas Schafhauser, Blerand Qeriqi, Ryan Cuppernull (SysEn 5800 Fall 2020)

== Introduction ==
Generalized disjunctive programming (GDP) involves logic propositions (Boolean variables) and sets of constraints that are chained together using the logical OR operator ( II ). GDP is an extension of linear disjunctive programming<ref>Balas, Egon. "Disjunctive Programming." Annals of Discrete Mathematics, 1979.</ref> that can be applied to Mixed Integer Non-Linear Programming (MINLP). GDP<ref>Raman and Grossman. "Modelling and Computational Techniques for Logic Based Integer Programming." Computers & Chemical Engineering, 1994.</ref>, is a generalization of disjunctive convex programming in the sense that it also allows the use of logic propositions that are expressed in terms of Boolean variables. In order to take advantage of current mixed-integer nonlinear programming solvers (e.g. DICOPT<ref name=":3">GAMS. DICOPT, https://www.gams.com/latest/docs/S_DICOPT.html</ref>, SBB<ref name=":4" />, α-ECP<ref name=":5">GAMS. AlphaECP, 1995, https://www.gams.com/latest/docs/S_ALPHAECP.html</ref>, BARON<ref name=":6">BARON, 1996, https://minlp.com/baron</ref>, Couenne<ref name=":7">Couenne, 2006, https://projects.coin-or.org/Couenne</ref> etc.), GDPs are often reformulated as MINLPs.<ref name=":0">P. Ruiz, Juan; Grossmann, Ignacio E. (2012): A hierarchy of relaxations for nonlinear convex generalized disjunctive programming. Carnegie Mellon University. Journal contribution. <nowiki>https://doi.org/10.1184/R1/6466535.v1</nowiki> </ref>
[[File:GDP Intro.jpg|none|thumb|523x523px|Figure 1: Generalized Disjunctive Programming Methods<ref>Grossman, Ignacio E: Overview of Generalized Disjunctive Programming. Carnegie Mellon University.https://www.minlp.org/pdf/GBDEWOGrossmann.pdf</ref>]]

== Theory ==
The general form of an MINLP model is as follows

<math>\begin{align} \min z=f(x,y)\\

s.t.g(x,y) \leq 0\\
x \in X\\
y \in Y\\

\end{align}</math>

where f(x) and g(x) are twice differentiable functions, x are the continuous variables and y are the discrete variables. There are three main types of sub problems that arise from the MINLP: Continuous Relaxation, NLP subproblem for a fix
<math>\begin{align}
Y_p
\end{align}</math>
and the feasibility problem.

==== Continuous Relaxation ====
The sub problem of continuous relaxation takes the form of

<math>\begin{align} \min z=f(x,y)\\

s.t.g(x,y) \leq 0\\
x \in X\\
y \in Y_R\\

\end{align}</math>

Where <math>Y_R</math> is the continuous relaxation of Y. Not that in this sub-problem all of the integer variables y are treated as continuous. This also returns a Lower Bound when it returns a feasible solution<ref name=":2">Grossmann, Ignacio. Review of Mixed-Integer Nonlinear and Generalized Disjunctive Programming Applications in Process Systems Engineering.</ref>

==== NLP Subproblem for a fixed <math>Y_p</math> ====
The subproblem for a fixed <math>Y_p</math> is shown in the form below

<math>\begin{align} \min z=f(x,y^p)\\

s.t. g(x,y^p) \leq 0\\
x \in \Re^n\\

\end{align}</math>

In this sub problem you return an upper bound for the MINLP program when it has a feasible solution. So with that being said you can fix a integer variables and continuously relax the others in order to get a range of feasible values.<ref name=":2" />

'''Feasibility Problem'''

When the fixed MINLP subproblem is not feasible the following feasibility problem is considered.

<math>\begin{align} \min z=f(x,y)\\

s.t.g(x,y) \leq 0\\
j \in J\\
u \in \Re\\

\end{align}</math>

Where J is the index set for inequalities and the feasibility problem attempts to minimize the infeasibility of the solution with the most violated constraints.<ref name=":2" />

==== GDP ====
GDP provides a high level framework for solving the mixed non-linear integer programs. By provide a methodology for converting the disjunctive problems into a MINLP the problem becomes simplified and easier to solve using current processing and algorithmic capabilities. These methodologies that can not only solve both the Convex and Non-Convex Problems. A Convex GDP is when both f(x) and g(x) are convex functions. Which is defined as a graph where any line segment that passes through any 2 points of the plot will always be greater than the plot itself. This allows for simple relaxations/approximations to occur which will create a faster solving methodology.<ref>Grossmann, Ignacio. Review of Mixed-Integer Nonlinear and Generalized Disjunctive Programming Applications in Process Systems Engineering.</ref>

== Methodology ==

Below is a GDP problem that will be used for demonstration purposes in this section.

<math>\begin{align} \min z=f(x)\\
s.t. g(x) \leq 0\\
\bigvee_ {i \in D_k} \begin{bmatrix} Y_{ki} \\
r_{ki}(x) \leq 0
\end{bmatrix} \quad k \in K \\
\underline{\bigvee}_ {i \in D_k} Y_{ki} \quad k \in K\\
\Omega(Y)=True\\
x^{lo} \leq x \leq x^{up}\\
x \in \Re^n\\
y_{ki} \in {True,False}
\quad k \in K, i \in D_k \end{align}</math>

The two most common ways of reformulating a GDP problem into an MINLP are through Big-M (BM) and Hull Reformulation (HR). BM is the simpler of the two, while HR results in tighter relaxation (smaller feasible region) and faster solution times.<ref>Trespalacios, Francisco; Grossmann, Ignacio E. (2018): Improved Big-M Reformulation for Generalized Disjunctive Programs. Carnegie Mellon University. Journal contribution. <nowiki>https://doi.org/10.1184/R1/6467063.v1</nowiki> </ref>

Below is an example of the the GDP problem from above reformulated into an MINLP by using the BM method.

<math>\begin{align} \min z=f(x)\\

s.t.g(x) \leq 0\\
r_{ki}(x) \leq M^{ki}(1-y_{ki})\quad k \in K,i \in D_k\\

\sum_{i \in D_k} y_{ki} = 1\quad k \in K\\
Hy \geq h\\
x^{lo} \leq x \leq x^{up}\\
x \in \Re^n\\

y_{ki} \in {0,1} \quad k \in K, i \in D_k \end{align}</math>

Notice that the boolean term from the original GDP has been converted into a numerical {0,1}. The logic relations have also been converted into linear integer constraints (Hy)<ref name=":0" />.

This MINLP reformulation can now be used in well-known solvers to calculate a solution.

The same GDP form will now be reformulated into an MINLP by using the HR method.

<math>\begin{align} \min z=f(x)\\
s.t. g(x) \leq 0\\
x = \sum_{i \in D_k} v^{ki}\quad k \in K\\
y_{ki}r_{ki}(v^{ki}/y_{ki}) \leq 0\quad k \in K, i \in D_k\\
\sum_{i \in D_k} y_{ki} = 1\quad k \in K\\
Hy \geq h\\
x^{lo}y_{ki} \leq v^{ki} \leq x^{up}y_{ki}\quad k \in K, i \in D_k\\
x \in \Re^n\\
y_{ki} \in {0,1} \quad k \in K, i \in D_k\\
\end{align}</math>

HR significantly increases the number of variables that are required in the same BM variant. The decrease in time needed to solve computations could very well be argued to be worth the reduced simplicity that one can get from BM.<ref>Trespalacios, Francisco; Grossmann, Ignacio E. (2015): Algorithmic Approach for Improved Mixed-Integer Reformulations of Convex Generalized Disjunctive Programs. Carnegie Mellon University. Journal contribution. <nowiki>https://doi.org/10.1184/R1/6466700.v1</nowiki> </ref>

==== Solvers: ====

* DICOPT<ref name=":3" />
* SBB<ref name=":4">GAMS. ''SBB'', 2020, www.gams.com/latest/docs/S_SBB.html.</ref>
* BARON<ref name=":6" />
* Couenne<ref name=":7" />

== Numerical Example ==
The following example was taken from the paper titled ''Generalized Disjunctive Programming: A Framework For Formulation and Alternative Algorithms For MINLP Optimization''.''<ref name=":1">P. Ruize, Juan; Grossmann, Ignacio E.: Generalized Disjunctive Programming: A Framework For Formulation And Alternative Algorithms For MINLP Optimization. Carnegie Mellon University. http://egon.cheme.cmu.edu/Papers/IMAGrossmannRuiz.pdf</ref>''

[[File:GDP numeric example 3.png|frameless|600x600px]]

[[File:GDP numeric example 4.png|alt=http://egon.cheme.cmu.edu/Papers/IMAGrossmannRuiz.pdf|frameless|661x661px]]

[[File:GDP numeric example 5.png|alt=http://egon.cheme.cmu.edu/Papers/IMAGrossmannRuiz.pdf|frameless|600x600px]]

== Applications ==
GDP formulations are useful for real-world applications where multiple branches are available when making decisions. Solving the GDP in these instances will allow the user to calculate which decisions should be made at each branching point in order to get the optimal solution. This disjunctive formulation is common in complex chemical reactions and production planning.
[[File:Process network example.png|none|thumb|600x600px|Figure 2: Process Network Example. Each decision point represents another disjunctive set. <ref name=":1" />]]
The process network depicted in the Figure 2 depicts multiple decisions that could be made to all end up at the goal (B) in a chemical reaction. This problem is able to be formulated into a GDP in order to figure out which route should be taken in order to maximize the profit.
[[File:GDP numeric example 1.png|none|thumb|600x600px|Figure 3: A more complex process network.<ref name=":1" />]]
This same idea can be scaled to larger problems with more complex branching. Figure 3 illustrates a larger process network and all of the different decision points. This problem is able to be formulated into a GDP so that the most optimal route can be calculated to take through the network.
== Conclusion ==
GDP is a programming method that applies disjunctive programming to MINLP problems. This method facilitates modeling discrete or continuous optimization problems by implementing algebraic constraints and logic expressions. The formulation of a GDP consists of Boolean and continuous variables and disjunctions and logic propositions. In the case of convex functions, GDPs can be reformulated using the BM and the HR methods. Formulation methods also include logic based methods disjunctive branch and bound and decomposition. Once reformulated into a standard MINLP, standard MILNP solvers, such as DICOPT<ref name=":3" />, SBB<ref name=":4" />, α-ECP<ref name=":5" /> and BARON<ref name=":6" />, can be used to determine optimal solutions<ref name=":0" />. The GDP method has important applications that include the optimization of complex chemical reactions and process planning.

== References ==
<references />

Mixed-integer linear fractional programming (MILFP)

2020-12-21T11:38:35Z

Wc593:

Author: Xiang Zhao (SysEn 6800 Fall 2020)

==Introduction==
The mixed-integer linear fractional programming (MILFP) is a kind of mixed-integer nonlinear programming (MINLP) that is widely applied in chemical engineering,[https://aiche.onlinelibrary.wiley.com/doi/full/10.1002/btpr.2479] environmental engineering,[http://ourspace.uregina.ca/handle/10294/5449] and their hybrid field ranging from cyclic-scheduling problems to the life cycle optimization (LCO).[https://pubs.acs.org/doi/abs/10.1021/acssuschemeng.7b00002?casa_token=hJNBUOc-zyIAAAAA:8gqZM144_Hjovhq_fLHXRQT66FGp0tf6oZ3rWiuRJLD4YKp4f1S44UkUspsNZuCCrcCFIWYME1v0dGPYLA] Specifically, the objective function of the MINFP is shown as a ratio of two linear functions formed by various continuous variables and discrete variables. However, the pseudo-convexity and the combinatorial nature of the fractional objective function can cause computational challenges to the general-purpose global optimizers, such as [[wikipedia:BARON|BARON]], to solve this MILFP problem.[https://www.sciencedirect.com/science/article/pii/S0098135413003396?casa_token=Y6pefF84TQAAAAAA:ALrnGQIOGXr3SA-oqbD3FmlFsMyjp_z4zgmY8LkWscSWtbO8pMjFGix35FsroEVxI9ut0mWjffZc] In this regard, we introduce the basic knowledge and solution steps of three algorithms, namely the Parametric Algorithm, Reformulation-Linearization method, and Branch-and-Bound with Charnes-Cooper Transformation Method, to efficiently and effectively tackle this computational challenge.

==Standard Form and Properties==
Consider such standard form of the MILFP:

<math>\begin{align} \max T(x,y)={c_0+\sum_{i}c_{1,i}m_i+\sum_{j}c_{2,j}y_j \over d_0+\sum_{i}d_{1,i}m_i+\sum_{j}d_{2,j}y_j}\\

s.t.\quad\ a_{0,k}+\sum_{i}a_{1,i}m_i+\sum_{j}a_{2,j}y_j=0,\quad \forall k \in K\\

m_i\ge0,\quad \forall i \in I\\

y_j\in {0,1},\quad \forall j \in J \end{align}</math>

The properties of the objective function <math>T(x,y)</math> are shown as follows:[https://www.sciencedirect.com/science/article/pii/S0098135409001367?casa_token=Sj60B1tEjccAAAAA:kMeO3BLDWNBd7jkBDqcpR5nTrB3yryQ8_CNqyN1mMooiuZxSiLfoVwtkDuU3cTWu4e0FsmeWN_uw]
# <math>T(x,y)</math> is (strictly) pseudoconcave and pseudoconvex over its domain.
# The local optimality of <math>T(x,y)</math> is the same as its global optimality.

Notably, several nonlinear solvers that can deal with the pseudoconvexity, such as the spatial branch-and-bound (SBB),[https://link.springer.com/article/10.1007/BF01106605] are capable of solving the MILFP. However, the memory usage of these solvers is enormous when solving a large-scale problem that is applied in industrial scheduling or [[wikipedia:supply chain|supply chain]] optimization project. Hence, we introduce the parametric algorithm, and reformulation-linearization method, which can reformulate the MILFP into the mixed-integer linear programming (MILP) problem, to reduce the memory usage and enhance solution efficiency.

==Parametric Algorithm==
One way to successively reformulate and solve the MILFP is to apply the parametric algorithm, which can find the global optimality within finite iterations. The linearly [[wikipedia:parametric form|parametric form]] of the reformulated objective function has the advantage of directly finding the [[wikipedia:global optimum|global optimum]], while the size of the sub-problem remains the same. The reformulation approach is shown as follows:

The original form of the objective function is:
<math>T(x,y)={c_0+\sum_{i}c_{1,i}m_i+\sum_{j}c_{2,j}y_j \over d_0+\sum_{i}d_{1,i}m_i+\sum_{j}d_{2,j}y_j}</math>

We use a parametric parameter <math>q</math> to reformulate the objective function <math>T(x,y)</math> into <math>M(x,y,q)</math>:

<math>\max T(x,y)={A(x,y) \over B(x,y)}</math>

is reformulated into

<math>\max M(x,y,q)=A(x,y)-q*B(x,y)</math>

<math>A(x,y)={c_0+\sum_{i}c_{1,i}m_i+\sum_{j}c_{2,j}y_j}</math>

<math>B(x,y)={d_0+\sum_{i}d_{1,i}m_i+\sum_{j}d_{2,j}y_j}</math>

Notably, the optimal solution of the parametric objective function <math>M(x,y,q)</math> has only one zero-point, which is the same as its global optimal solution. Hence, we need to find the zero-point iteratively following the approaches below:[https://ieeexplore.ieee.org/abstract/document/6858622?casa_token=jvj28BEMe0cAAAAA:utpZe4zST7nz0SVcdNUoX-CjmqmtU_v3CZnU-oTAxvR8B7ZV2iBjyhqDy3s-228w7Aw4_lcJjFw]

# Initialize the parametric parameter <math>q=0</math>. Set the tolerance <math>tol=10^{-6}</math>
# Solve the sub-problem via using [[wikipedia:CPLEX|CPLEX]], whose objective function is <math>M(x,y,q)</math> with the same original constraints. The optimal solution is <math>x^{*},y^{*}</math>.
# Calculate the value of parametric objective function <math>M(x^{*},y^{*},q)=A(x^{*},y^{*})-q*B(x^{*},y^{*})</math>, if the value is within the tolerance <math>tol</math>, then the optimality <math>(x^{*},y^{*})</math> is found.
# Update the parametric parameter <math>q={A(x^{*},y^{*}) \over B(x^{*},y^{*})}</math> and redo step 1.

==Reformulation-Linearization Method==
The reformulation-linearization method, which incorporates the Glover’s linearization into the Charnes-Cooper transformation,[https://onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800200308?casa_token=BsaOkI0dilIAAAAA:wPELPH83o1FuB9xHW8rRDhwInT3xsGjqqqk6LWID7WYpLexkAhgiymU-4-ew7c0nEoC3wM49-oFHa1m5] introduce auxiliary variables to reformulate the MILFP into equivalent MINLP. The resulting MINLP is subsequently transformed into MILP, which can be efficiently solved by typical MILP solvers like [[wikipedia:CPLEX|CPLEX]], via using Glover’s linearization.The reformulation approach is shown as follows:

The original form of the optimization model is:

<math>\begin{align} \max T(x,y)={c_0+\sum_{i}c_{1,i}m_i+\sum_{j}c_{2,j}y_j \over d_0+\sum_{i}d_{1,i}m_i+\sum_{j}d_{2,j}y_j}\\

s.t.\quad\ a_{0,k}+\sum_{i}a_{1,i}m_i+\sum_{j}a_{2,j}y_j=0,\quad \forall k \in K\\

m_i\ge0,\quad \forall i \in I\\

y_j\in {0,1},\quad \forall j \in J \end{align}</math>

Firstly, we convert the fractional objective function into a bilinear constraint, as well as a substutional term <math>g_i</math>:

<math>u={1\over d_0+\sum_{i}d_{1,i}m_i+\sum_{j}d_{2,j}y_j}</math>

<math>g_i={m_i*u}</math>

<math>h_j={y_j*u}</math>

To get the MILP equivalent model, we use the Glover's Linearization to transform the bilinear constraint (<math>h_j={y_j*u}</math>):

<math>h_j={y_j*u}</math>

is equivalent to

<math>h_j\leq u,\quad \forall j \in J</math>

<math>h_j\leq M*y_j,\quad \forall j \in J</math>

<math>h_j\geq u-M*y_j,\quad \forall j \in J</math>

<math>h_j\geq 0,\quad \forall j \in J</math>

<math>u\geq 0,\quad \forall j \in J</math>

<math>g_i\geq 0,\quad \forall i \in I</math>

<math>h_j\geq 0,\quad \forall j \in J</math>

<math>y_j\in {0,1},\quad \forall j \in J</math>

In this regard, we reformulate the original MILFP model into MILP model, which can be effectively solved by a typical [[wikipedia:branch and cut|branch-and-cut]] solver like [[wikipedia:CPLEX|CPLEX]]. To summarize, the reformulated MILP model is shown below:

<math>\begin{align}\max W(u,g,h)={c_0*u+\sum_{i}c_{1,i}g_i+\sum_{j}c_{2,j}h_j}\\

s.t.\quad\ a_{0,k}*u+\sum_{i}a_{1,i}g_i+\sum_{j}a_{2,j}h_j=0,\quad \forall k \in K\\

d_0*u+\sum_{i}d_{1,i}g_i+\sum_{j}d_{2,j}h_j=1\\

h_j\leq u,\quad \forall j \in J\\

h_j\leq M*y_j,\quad \forall j \in J\\

h_j\geq u-M*y_j,\quad \forall j \in J\\

h_j\geq 0,\quad \forall j \in J\\

u\geq 0,\quad \forall j \in J\\

g_i\geq 0,\quad \forall i \in I\\

h_j\geq 0,\quad \forall j \in J\\

y_j\in {0,1},\quad \forall j \in J\end{align}</math>

==Branch-and-Bound with Charnes-Cooper Transformation Method==
The integration of the Charnes-Cooper transformation method with the [[wikipedia:Branch and Bound|Branch-and-Bound (B&B)]] algorithm can reformulate the relaxation form of the fractional objective problem in each node into an LP subproblem, which can reach its global optimality via using MILP solvers like [[wikipedia:CPLEX|CPLEX]]. Since solution steps are similar to those of B&B and the reformulation step is shown in Reformulation-Linearization Method, we encourage readers to search for B&B algorithm and paper of Gao et al.[https://aiche.onlinelibrary.wiley.com/doi/full/10.1002/aic.14705]

==Application and Modeling for Numerical Examples==

=== Applications of MILFP ===

Two typical applications are introduced in this section, namely [[wikipedia:cyclic scheduling|cyclic scheduling]] and life-cycle optimization.[https://pubs.acs.org/doi/abs/10.1021/acssuschemeng.7b00631#:~:text=Life%20cycle%20optimization%20(LCO)%20enables,and%20optimization%20of%20process%20alternatives.] One typical cyclic scheduling problem was illustrated in Yue et al.[https://www.sciencedirect.com/science/article/pii/S0098135413000781], the fractional objective was optimized to reflect both the absolute profit and scheduling aspect, which were shown in the numerator and denominator, respectively. The combination of Reformulation-Linearization method and [[wikipedia:CPLEX|CPLEX]] were regarded as solution algorithm and the optimization framework was applied in a case study corresponding to a multiproduct batch plant that used 14 processing stages for producing three acrylic fiber formulations with a time horizon of 100 h.

In the life-cycle optimization problem of a certain processing system, the unreasonably maximum or minimum treatment amount from optimizing linear objective functions can be avoided via optimizing the fractional objective, and thus the balanced processing amount can be obtained to address the sustainable design and synthesis of this processing system.[https://pubs.acs.org/doi/abs/10.1021/acssuschemeng.7b03198?casa_token=dokyfl7kzigAAAAA:Z6riBbyfYZgakqA0Qw6du37ClfOFBuBKQxcuExnuWUvniwFWEjx17ivfLo4uvTgsl4eMBukRxfLYXW6dsA] Notably, the functional unit is shown in the denominator, while the total economic and environmental performances are denoted as numerators in fractional objective functions. Specifically as illustrated in Gong et al.,[https://aiche.onlinelibrary.wiley.com/doi/full/10.1002/aic.15882] the sustainable design and synthesis of the shale gas processing system was obtained via optimizing the unit net present value (NPV), unit global warming potential (GWP), and unit freshwater consumption simultaneously. The optimization framework was applied in the Marcella Shale gas site.

In the next two numerical examples, we present a “simple form” of MILFP that can be used for selecting optimal processing pathways via maximizing unit NPV or minimizing unit GWP, respectively.

=== Numerical Examples of MILFP ===

==== Introduction of Numerical Examples ====

[[File:Opti wiki.jpg|thumb|right|Figure 1. Superstructure of the chemical processing system]]
Let’s consider a simple chemical plant, whose superstructure is shown on the right side. The superstructure denotes all technology options, and only one of them in each level can be chosen simultaneously. To find the optimal processing pathway on the basis of economic and environmental aspects, we consider maximizing the [[wikipedia:net present value|net present value (NPV)]] or minimizing unit greenhouse gas (GHG) emissions, respectively. Notably, the unit NPV equals the ratio of the NPV with the total mass flow rate of product I within the project lifespan of ten years. The discount rate is 10%.

==== Input Parameters of Numerical Examples ====

{| class="wikitable"
|+ Conversion Rate of each Chemical
|-
! Processing Level !! Conversion Rate !! Conversion Rate !! Conversion Rate
|-
| Level 1 || D to E: 0.8 || D to F: 0.9
|-
| Level 2 || E to G: 0.7 || E to H: 0.8 || F to H: 0.4
|-
| Level 3 || G to I: 0.5 || H to I: 0.6
|}

{| class="wikitable"
|+ Fixed Capital Cost for each Technology Alternative ($)
|-
! A1 !! A2 !! A3 !! B1 !! B2 !! B3 !! C1 !! C2 !! C3
|-
| 6,000,000 || 7,000,000 || 7,500,000 || 5,000,000 || 6,000,000 || 7,500,000 || 11,000,000 || 10,000,000 || 10,500,000
|}

{| class="wikitable"
|+ Variable Capital Cost for each Technology Alternative ($/(ton/yr))
|-
! A1 !! A2 !! A3 !! B1 !! B2 !! B3 !! C1 !! C2 !! C3
|-
| 50 || 40 || 35 || 60 || 55 || 45 || 30 || 35 || 33
|}

{| class="wikitable"
|+ Operating Cost for each Technology Alternative ($/(ton/yr))
|-
! A1 !! A2 !! A3 !! B1 !! B2 !! B3 !! C1 !! C2 !! C3
|-
| 25 || 30 || 20 || 30 || 28 || 50 || 27 || 25 || 15
|}

{| class="wikitable"
|+ Feedstock Supply and Demand of Product (ton/yr)/Feedstock and Product Price ($/(ton/yr))
|-
! Item !! Supply/Demand !! Feedstock/Product Price
|-
| D || 2,000,000 || 100
|-
| I || 200,000 || 2000
|}

{| class="wikitable"
|+ GHG Emissions from each Technology Option (ton CO2-eq/ton inlet chemicals)
|-
! A1 !! A2 !! A3 !! B1 !! B2 !! B3 !! C1 !! C2 !! C3
|-
| 1.2 || 0.9 || 0.7 || 1.4 || 1.6 || 1.3 || 2.1 || 2.4 || 2.7
|}

==== Nomenclatures for the Mathematical Model of the Numerical Examples ====

{| class="wikitable"
|+ Nomenclature
|-
! Nomenclature !! Meaning
|-
| ''<math>I</math>'' || Set of production stages indexed by <math>i</math>.
|-
| ''<math>J</math>'' || Set of process alternatives <math>j</math>.
|-
| ''<math>D</math>'' || Demand of product I.
|-
| ''<math>CAV_{i,j}</math>'' || Unit variable capital cost in the process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>CV_{i,j}</math>'' || Conversion rate from input flow to output flow in the process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>FIXI_{i,j}</math>'' || Fixed capital cost in the process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>GHG_{i,j}</math>'' || Unit GHG emissions from the process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>OPERI_{i,j}</math>'' || Unit operating cost in the process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>PRI</math>'' || Price of product I.
|-
| ''<math>PRID</math>'' || Price of chemical D.
|-
| ''<math>S</math>'' || Supply of chemical D.
|-
| ''<math>y_{i,j}</math>'' || 0-1 variable. Equals to one if the process alternative <math>j</math> at the production stage <math>i</math> is selected.
|-
| ''<math>ca_{i,j}</math>'' || Capacity of process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>fec</math>'' || Total feedstock cost.
|-
| ''<math>fix</math>'' || Total fixed capital cost.
|-
| ''<math>ghgt</math>'' || Total GHG emissions.
|-
| ''<math>mi_{i,j}</math>'' || Mass flow rate of the feedstock flow to process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>mo_{i,j}</math>'' || Mass flow rate of the output flow to process alternative <math>j</math> at the production stage <math>i</math>.
|-
| ''<math>oper</math>'' || Total operating cost.
|-
| ''<math>objc</math>'' || Unit net present value (NPV).
|-
| ''<math>obje</math>'' || Unit GHG emissions (within one operating year).
|-
| ''<math>npv</math>'' || Net present value.
|-
| ''<math>sale</math>'' || Total sales.
|-
| ''<math>vai</math>'' || Total variable capital cost.
|}

===== Mass Balance Constraints =====

<math>mi_{i,j} \leq ca_{i,j},\quad \forall i \in I,\forall j \in J</math>

This aforementioned constraint denotes that the mass flow rate of the inlet flow should not exceed the treatment capacity.

<math>ca_{i,j} \leq M*y_{i,j},\quad \forall i \in I,\forall j \in J</math>

This aforementioned constraint represents that the treatment capacity would be zero if the corresponding technology option is not selected.

<math>mo_{i,j} = mi_{i,j}*CV_{i,j},\quad \forall i \in I,\forall j \in J</math>

This aforementioned constraint illustrates the conversion of the inlet flow and the outlet flow.

<math>\sum_{j}mo_{(i-1),j} = \sum_{j}mi_{i,j},\quad \forall i \geq 2,\forall j \in J</math>

This aforementioned constraint denotes that the summation of mass flow rates of the outlet flows from the previous processing level equals to those of the inlet flows in the next processing level.

<math>\sum_{j}mo_{3,j} \geq D,\quad \forall i \in I,\forall j \in J</math>

This aforementioned constraint represents that the summation of mass flow rates of the technology options in the third processing level should be larger than the demand of the product I.

<math>\sum_{j}mi_{1,j} \leq S,\quad \forall i \in I,\forall j \in J</math>

This aforementioned constraint represents that the summation of mass flow rates of the technology options in the first processing level should be less than the supply of the chemical D.

===== Superstructure Configuration Constraints =====

Notably, the superstructure configuration constraints illustrate the logic relationship between each technology option within the superstructure. If the binary variable <math>y_{i,j}</math> equals to 1, than the technology option <math>j</math> in the process level <math>i</math> is selected.

<math>y_{1,1}+y_{1,2}+y_{1,3} = 1</math>

<math>y_{1,1}+y_{1,2}=y_{2,1}+y_{2,2}</math>

<math>y_{1,3}=y_{2,3}</math>

<math>y_{2,1}=y_{3,1}</math>

<math>y_{2,2}+y_{2,3}=y_{3,2}+y_{3,3}</math>

===== Economic Evaluation Constraints =====

We consider the fixed capital cost (<math>fix</math>), variable capital cost (<math>vai</math>), operating cost (<math>oper</math>), and feedstock cost (<math>fec</math>) as the expenses for the chemical processing system.

<math>fix=\sum_{i}{\sum_{j}FIXI_{i,j}*y_{i,j}}</math>

<math>vai=\sum_{i}{\sum_{j}CAV_{i,j}*ca_{i,j}}</math>

<math>oper=\sum_{i}{\sum_{j}OPERI_{i,j}*mo_{i,j}}</math>

<math>fec=PRID*\sum_{j}mi_{1,j}</math>

<math>sale=PRI*\sum_{j}mo_{3,j}</math>

The net present value is calculated in the constraint below (<math>fix</math>), where we account for the total discounted cash flow and <math>SP</math> represents for the lifespan of the this project.

<math>npv={DR*(1+DR)^{SP} \over (1+DR)^{SP}-1}*(sale-(vai+oper+fec))-fix</math>

===== Environmental Evaluation Constraint =====

The total GHG emissions from the chemical processing system is calculated in the constraint below.
<math>ghgt=\sum_{i}{\sum_{j}GHG_{i,j}*mi_{i,j}}</math>

===== Objective Functions =====

Two numerical examples are presented as optimizing each fractional objective function, which is shown as below. Since all constraints denote the relationship between various continuous and discrete variables, two aforementioned numerical problems can be regarded as MILFPs. We consider maximizing unit NPV (<math>obje</math>) or minimizing unit global warming potential [[wikipedia:global warming potential|global warming potential (GWP)]] (<math>objc</math>) in two numerical examples, respectively.

<math>obje={npv \over \sum_{j}mo_{3,j}}</math>

<math>objc={ghgt \over \sum_{j}mo_{3,j}}</math>

==Solution for Numerical Examples==

===Maximizing Unit NPV===

We consider the first objective function (<math>obje</math>) and all constraints in the mathematical model, where we can reformulate the objective function into a parametric form (<math>obj_1</math>) using parametric parameter <math>q_1</math>:

<math>\max \quad\ obj_1={npv-q_1*\sum_{j}mo_{3,j}}</math>

<math>s.t.\quad\ Mass \ \ Balance\ \ Constraints, Superstructure\ \ Configuration\ \ Constraints, Economic\ \ Evaluation\ \ Constraints</math>

This reformulated model can be solved by the [[wikipedia:CPLEX|CPLEX]] iteratively, and the solution is shown as follows:
{| class="wikitable"
|+ Process to be built
|-
! Level 1 !! Level 2 !! Level 3
|-
| A1 || - || -
|-
| - || B2 || -
|-
| - || - || C3
|}

{| class="wikitable"
|+ Performance
|-
! Production Amount (ton/yr) !! Unit NPV ($/ton) !! Unit GHG Emissions (ton CO2-eq/ton products)
|-
| 768,000 || 187.75 || 10.18
|}

===Minimizing Unit GHG Emissions===

We consider the first objective function (<math>objc</math>) and all constraints in the mathematical model, where we can reformulate the objective function into a parametric form (<math>obj_2</math>) using parametric parameter <math>q_2</math>:

<math>\min \quad\ obj_2={ghgt-q_2*\sum_{j}mo_{3,j}}</math>

<math>s.t.\quad\ Mass\ \ Balance\ \ Constraints, Superstructure\ \ Configuration\ \ Constraints, Environmental\ \ Evaluation\ \ Constraints</math>

This reformulated model can be solved by the [[wikipedia:CPLEX|CPLEX]] iteratively, and the solution is shown as follows:
{| class="wikitable"
|+ Process to be built
|-
! Level 1 !! Level 2 !! Level 3
|-
| - || - || -
|-
| A2 || B2 || C2
|-
| - || - || -
|}

{| class="wikitable"
|+ Performance
|-
! Production Amount (ton/yr) !! Unit NPV ($/ton) !! Unit GHG Emissions (ton CO2-eq/ton products)
|-
| 768,000 || 186.23 || 9.68
|}

===Computational performance===
The computational performances of branch-and-refine algorithm and BARON are shown in the table below, where we find that the former algorithm has advantage over the latter one. The optimal solutions for both algorithm are the same, which illustrates the global optimality of the solution from branch-and-refine algorithm.
{| class="wikitable"
|+ Computational Performance for maximizing unit NPV
|-
! CPUs for branch-and-refine (s) !! CPUs for BARON (s)
|-
| 0.125 || 98.2
|}

{| class="wikitable"
|+ Performance for BARON algorithm
|-
! Production Amount (ton/yr) !! Unit NPV ($/ton) !! Unit GHG Emissions (ton CO2-eq/ton products)
|-
| 768,000 || 187.75 || 10.18
|}

==Conclusion==
The mixed-integer linear fractional programming (MILFP) is a kind of mixed-integer nonlinear programming (MINLP) that is implemented into evaluating the average performance of a certain project. The Parametric Algorithm, Reformulation-Linearization Method, and Branch-and-Bound with Charnes-Cooper Transformation Method are three typical algorithms that aim to tackle the computational challenge caused by the fractional objective. The optimization framework can be applied to the chemical engineering, environmental engineering, and their combined area such as life-cycle optimization.

==References==
# Liu, S., Gerontas, S., Gruber, D., Turner, R., Titchener‐Hooker, N. J., & Papageorgiou, L. G. (2017). Optimization‐based framework for resin selection strategies in biopharmaceutical purification process development. Biotechnology Progress, 33(4), 1116-1126.
# Zhu, H. (2014). Inexact fractional optimization for multicriteria resources and environmental management under uncertainty (Doctoral dissertation, Faculty of Graduate Studies and Research, University of Regina).
# Gao, J., & You, F. (2017). Economic and environmental life cycle optimization of noncooperative supply chains and product systems: modeling framework, mixed-integer bilevel fractional programming algorithm, and shale gas application. ACS Sustainable Chemistry & Engineering, 5(4), 3362-3381.
# Zhong, Z., & You, F. (2014). Globally convergent exact and inexact parametric algorithms for solving large-scale mixed-integer fractional programs and applications in process systems engineering. Computers & Chemical Engineering, 61, 90-101.
# You, F., Castro, P. M., & Grossmann, I. E. (2009). Dinkelbach's algorithm as an efficient method to solve a class of MINLP models for large-scale cyclic scheduling problems. Computers & Chemical Engineering, 33(11), 1879-1889.
# Quesada, I., & Grossmann, I. E. (1995). A global optimization algorithm for linear fractional and bilinear programs. Journal of Global Optimization, 6(1), 39-76.
# Zhong, Z., & You, F. (2014, June). Parametric algorithms for global optimization of mixed-integer fractional programming problems in process engineering. In 2014 American Control Conference (pp. 3609-3614). IEEE.
# Charnes, A., & Cooper, W. W. (1973). An explicit general solution in linear fractional programming. Naval Research Logistics Quarterly, 20(3), 449-467.
# Gao, J., & You, F. (2015). Optimal design and operations of supply chain networks for water management in shale gas production: MILFP model and algorithms for the water‐energy nexus. AIChE Journal, 61(4), 1184-1208.
# Gong, J., & You, F. (2017). Consequential life cycle optimization: general conceptual framework and application to algal renewable diesel production. ACS Sustainable Chemistry & Engineering, 5(7), 5887-5911.
# Yue, D., & You, F. (2013). Sustainable scheduling of batch processes under economic and environmental criteria with MINLP models and algorithms. Computers & Chemical Engineering, 54, 44-59.
# Gao, J., & You, F. (2018). Integrated hybrid life cycle assessment and optimization of shale gas. ACS Sustainable Chemistry & Engineering, 6(2), 1803-1824.
# Gong, J., & You, F. (2018). A new superstructure optimization paradigm for process synthesis with product distribution optimization: Application to an integrated shale gas processing and chemical manufacturing process. AIChE Journal, 64(1), 123-143.

Branch and cut

2020-12-21T11:38:14Z

Wc593:

Author: Lindsay Siegmundt, Peter Haddad, Chris Babbington, Jon Boisvert, Haris Shaikh (SysEn 5800 Fall 2020)

== Introduction ==
The Branch and Cut methodology was discovered in the 90s as a way to solve/optimize Mixed-Integer Linear Programs (Karamanov, Miroslav)<ref>Karamanov, Miroslav. “Branch and Cut: An Empirical Study.” ''Carnegie Mellon University'' , Sept. 2006, https://www.cmu.edu/tepper/programs/phd/program/assets/dissertations/2006-operations-research-karamanov-dissertation.pdf.</ref>. This concept is comprised of two known optimization methodologies - Branch and Bound and Cutting Planes. Utilizing these two tools allows for the Branch and Cut to find an optimal solution through relaxing the problem to produce the upper bound. Relaxing the problem allows for the complex problem to be simplified in order for it to be solve more easily. Furthermore, the upper bound represents the highest value the objective can take in order to be feasible. The optimal solution is found when the objective is equal to the upper bound (Luedtke, Jim)<ref>Luedtke, Jim. “The Branch-and-Cut Algorithm for Solving Mixed-Integer Optimization Problems.” ''Institute for Mathematicians and Its Applications'', 10 Aug. 2016, https://www.ima.umn.edu/materials/2015-2016/ND8.1-12.16/25397/Luedtke-mip-bnc-forms.pdf.</ref>. This methodology is critical to the future of optimization since it combines two common tools in order to utilize each component in order to find the optimal solution. Moving forward, the critical components of different methodologies could be combined in order to find optimality in a more simple and direct manner.

== Methodology & Algorithm ==

=== Methodology ===
{| class="wikitable"
|+Abbreviation Details
!Acronym
!Expansion
|-
|LP
|Linear Programming
|-
|B&B
|Branch and Bound
|}

==== Most Infeasible Branching: ====
Most infeasible branching is a very popular method that picks the variable with fractional part closest to <math>0:5</math>, i.e.,<math> si = 0:5-|xA_i- xA_i-0:5|</math><ref>Branching rules revisited Tobias Achterberga;∗, Thorsten Kocha, Alexander Martinb https://www-m9.ma.tum.de/downloads/felix-klein/20B/AchterbergKochMartin-BranchingRulesRevisited.pdf</ref>. Most infeasible branching picks a variable where the least tendency can be recognized to which side the variable should be rounded. However, the performance of this method is not any superior to the rule of selecting a variable randomly.

==== '''Strong Branching:''' ====
For each fractional variable, strong branching tests the dual bound increase by computing the LP relaxations result from the branching on that variable. As a branching variable for the current node, the variable that leads to the largest increases is selected. Despite its obvious simplicity, strong branching is so far the most powerful branching technique in terms of the number of nodes available in the B&B tree, this effectiveness can however be accomplished only at the cost of computation.<ref>A Branch-and-Cut Algorithm for Mixed Integer Bilevel Linear Optimization Problems and Its Implementation<nowiki/>https://coral.ise.lehigh.edu/~ted/files/papers/MIBLP16.pdf</ref>

==== '''Pseudo Cost:''' ====
[[File:Image.png|thumb|Pure psuedo cost branching]]

Another way to approximate a relaxation value is by utilizing a pseudo cost method. The pseudo-cost of a variable is an estimate of the per unit change in the objective function from making the value of the variable to be rounded up or down. For each variable we choose variable with the largest estimated LP objective gain<ref>Advances in Mixed Integer Programming http://scip.zib.de/download/slides/SCIP-branching.ppt</ref>.
==='''Algorithm'''===
Branch and Cut for is a variation of the Branch and Bound algorithm. Branch and Cut incorporates Gomery cuts allowing the search space of the given problem. The standard Simplex Algorithm will be used to solve each Integer Linear Programming Problem (LP).

<math>min: c^tx
</math>

<math>s.t. Ax 

<math>x \geq 0
</math>

<math>x_i = int, i = 1,2,3...,n
</math>

Above is a mix-integer linear programming problem. x and c are a part of the n-vector. These variables can be set to 0 or 1 allow binary variables. The above problem can be denoted as <math>LP_n </math>

Below is an Algorithm to utilize the Branch and Cut algorithm with Gomery cuts and Partitioning:

'''Step 0:'''
Upper Bound = ∞
Lower Bound = -∞
'''Step 1. Initialize:'''

Set the first node as <math>LP_0</math> while setting the active nodes set as <math>L</math>. The set can be accessed via <math>LP_n </math>

===='''Step 2. Terminate:'''====
Step 3. Iterate through list L:

While <math>L</math> is not empty (i is the index of the list of L), then:

'''Step 3.1. Convert to a Relaxation:'''

'''Solve 3.2.'''

Solve for the Relaxed

'''Step 3.3.'''
If Z is infeasible:
Return to step 3.
else:
Continue with solution Z.
'''Step 4. Cutting Planes:'''
If a cutting plane is found:
then add to the Linear Relaxation problem (as a constraint) and return to step 3.2
Else:
Continue.
'''Step 5. Pruning and Fathoming:'''

(a)If ≥ Z:, then go to step 3.
If Z^l <= Z AND X_i is an integral feasible:
Z = Z^i
Remove all Z^i from Set(L)
'''Step 6. Partition'''

Let <math>D^{lj=k}_{j=1}</math> be a partition of the constraint set <math>D</math> of problem <math>LP_l</math>. Add problems <math>D^{lj=k}_{j=1}</math> to L, where <math>LP^l_j</math> is <math>LP_l</math> with feasible region restricted to <math>D^l_j</math> and <math>Z_{lj}</math> for j=1,...k is set to the value of <math>Z^l</math> for the parent problem l. Go to step 3.<ref name=":0">Benders, J. F. (Sept. 1962), "Partitioning procedures for solving mixed-variables programming problems", Numerische Mathematik 4(3): 238–252.</ref>

==Numerical Example==
First, list out the MILP:

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>x_1,x_2\geq0</math>

Solution to original LP

<math>z =-19.56, x_1=1.88, x_2=1.72 </math>

Branch on x1 to generate sub-problems

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>x_1\geq2</math>

<math>x_1,x_2\geq0</math>

Solution to fist branch sub-problem

<math>z =-15, x_1=2, x_2=1</math>

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>x_1\leq1</math>

<math>x_1,x_2\geq0</math>

Solution to second branch sub-problem

<math>z =-14.5, x_1=1, x_2=1.5</math>

Adding a cut

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>2x_1+x_2\leq 3</math>

<math>x_1\leq1</math>

<math>x_1,x_2\geq0</math>

Solution to cut LP

<math>z=-13.222,x_1=.778,x_2=1.444</math>

==Application==
Several of the Branch and Cut applications are described below in more detail and how they can be used. These applications serve as methods in which Branch and Cut can be used to optimize various problems efficiently.

=== '''Combinatorial Optimization''' ===
Combinatorial Optimization is a great application for Branch and Cut. This style of optimization is the methodology of utilizing the finite known sets and information of the sets to optimize the solution. The original intent for this application was for maximizing flow as well as in the transportation industry (Maltby and Ross). This combinatorial optimization has also taken on some new areas where it is used often. Combinatorial Optimization is now an imperative component in studying artificial intelligence and machine learning algorithms to optimize solutions. The finite sets that Combinatorial Optimization tends to utilize and focus on includes graphs, partially ordered sets, and structures that define linear independence call matroids.<ref>[https://brilliant.org/wiki/combinatorial-optimization/ Maltby, Henry, and Eli Ross. “Combinatorial Optimization.” ''Brilliant Math & Science Wiki'', https://brilliant.org/wiki/combinatorial-optimization/.]</ref>

=== '''Bender’s Decomposition''' ===
Bender’s Decomposition is another Branch and Cut application that is utilized widely in Stochastic Programming. Bender’s Decomposition is where you take the initial problem and divide into two distinct subsets. By dividing the problem into two separate problems you are able to solve each set easier than the original instance (Benders). Therefore the first problem within the subset created can be solved for the first variable set. The second sub problem is then solved for, given that first problem solution. Doing this allows for the sub problem to be solved to determine whether the first problem is infeasible (Benders). Bender’s cuts can be added to constrain the problem until a feasible solution can be found.<ref name=":0" />

=== '''Large-Scale Symmetric Traveling Salesmen Problem''' ===
The Large-Scale Symmetric Traveling Salesmen Problem is a common problem that was always looked into optimizing for the shortest route while visiting each city once and returning to the original city at the end. On a larger scale this style of problem must be broken down into subsets or nodes (SIAM). By constraining this style of problem such as the methods of Combinatorial Optimization, the Traveling Salesmen Problem can be viewed as partially ordered sets. By doing this on a large scale with finite cities you are able to optimize the shortest path taken and ensure each city is only visited once.<ref>Society for Industrial and Applied Mathematics. “SIAM Rev.” ''SIAM Review'', 18 July 2006, https://epubs.siam.org/doi/10.1137/1033004</ref>

=== '''Submodular Function''' ===
Submodular Function is another function in which is used throughout artificial intelligence as well as machine learning. The reason for this is because as inputs are increased into the function the value or outputs decrease. This allows for a great optimization features in the cases stated above because inputs are continually growing. This allows for machine learning and artificial intelligence to continue to grow based on these algorithms (Tschiatschek, Iyer, and Bilmes)<ref>S. Tschiatschek, R. Iyer, H. Wei and J. Bilmes, Learning Mixtures of Submodular Functions for Image Collection Summarization, NIPS-2014.</ref>. By enforcing new inputs to the system the system will learn more and more to ensure it optimizes the solution that is to be made.<ref>A. Krause and C. Guestrin, Beyond Convexity: Submodularity in Machine Learning, Tutorial at ICML-2008</ref>

==Conclusion==
The Branch and Cut is an optimization algorithm used to optimize integer linear programming. It combines two other optimization algorithms - branch and bound and cutting planes in order to utilize the results from each method in order to create the most optimal solution. There are three different methodologies used within the specific method - most infeasible branching, strong branching, and pseudo code. Furthermore, Branch and Cut can be utilized it multiple scenarios - Submodular function, large-scale symmetric traveling salesmen problem, bender's decomposition, and combination optimization which increases the impact of the methodology.

==Reference==
<references />

Heuristic algorithms

2020-12-21T11:37:33Z

Wc593:

Author: Anmol Singh (as2753) (ChemE 6800 Fall 2020)

== Introduction ==
In mathematical programming, a heuristic algorithm is a procedure that determines near-optimal solutions to an optimization problem. However, this is achieved by trading optimality, completeness, accuracy, or precision for speed.<ref> Eiselt, Horst A et al. Integer Programming and Network Models. Springer, 2011.</ref> Nevertheless, heuristics is a widely used technique for a variety of reasons:

*Problems that do not have an exact solution or for which the formulation is unknown
*The computation of a problem is computationally intensive
*Calculation of bounds on the optimal solution in branch and bound solution processes
==Methodology==
Optimization heuristics can be categorized into two broad classes depending on the way the solution domain is organized:

===Construction methods (Greedy algorithms)===
The greedy algorithm works in phases, where the algorithm makes the optimal choice at each step as it attempts to find the overall optimal way to solve the entire problem.<ref>
''Introduction to Algorithms'' (Cormen, Leiserson, Rivest, and Stein) 2001, Chapter 16 "Greedy Algorithms".</ref> It is a technique used to solve the famous “traveling salesman problem” where the heuristic followed is: "At each step of the journey, visit the nearest unvisited city."

====Example: Scheduling Problem====
You are given a set of N schedules of lectures for a single day at a university. The schedule for a specific lecture is of the form (s time, f time) where s time represents the start time for that lecture, and similarly, the f time represents the finishing time. Given a list of N lecture schedules, we need to select a maximum set of lectures to be held out during the day such that none of the lectures overlaps with one another i.e. if lecture Li and Lj are included in our selection then the start time of j ≥ finish time of i or vice versa. The most optimal solution to this would be to consider the earliest finishing time first. We would sort the intervals according to the increasing order of their finishing times and then start selecting intervals from the very beginning.

===Local Search methods===
The Local Search method follows an iterative approach where we start with some initial solution, explore the neighborhood of the current solution, and then replace the current solution with a better solution.<ref> Eiselt, Horst A et al. Integer Programming and Network Models. Springer, 2011.</ref> For this method, the “traveling salesman problem” would follow the heuristic in which a solution is a cycle containing all nodes of the graph and the target is to minimize the total length of the cycle.

==== Example Problem ====
Suppose that the problem P is to find an optimal ordering of N jobs in a manufacturing system. A solution to this problem can be described as an N-vector of job numbers, in which the position of each job in the vector defines the order in which the job will be processed. For example, [3, 4, 1, 6, 5, 2] is a possible ordering of 6 jobs, where job 3 is processed first, followed by job 4, then job 1, and so on, finishing with job 2. Define now M as the set of moves that produce new orderings by the swapping of any two jobs. For example, [3, 1, 4, 6, 5, 2] is obtained by swapping the positions of jobs 4 and 1.
==Popular Heuristic Algorithms==

===Genetic Algorithm===
The term Genetic Algorithm was first used by John Holland.<ref>J.H. Holland (1975) ''Adaptation in Natural and Artificial Systems,'' University of Michigan Press, Ann Arbor, Michigan; re-issued by MIT Press (1992).</ref> They are designed to mimic the Darwinian theory of evolution, which states that populations of species evolve to produce more complex organisms and fitter for survival on Earth. Genetic algorithms operate on string structures, like biological structures, which are evolving in time according to the rule of survival of the fittest by using a randomized yet structured information exchange. Thus, in every generation, a new set of strings is created, using parts of the fittest members of the old set.<ref>Optimal design of heat exchanger networks, Editor(s): Wilfried Roetzel, Xing Luo, Dezhen Chen, Design and Operation of Heat Exchangers and their Networks, Academic Press, 2020, Pages 231-317, <nowiki>ISBN 9780128178942</nowiki>, https://doi.org/10.1016/B978-0-12-817894-2.00006-6.</ref> The algorithm terminates when the satisfactory fitness level has been reached for the population or the maximum generations have been reached. The typical steps are<ref>Wang FS., Chen LH. (2013) Genetic Algorithms. In: Dubitzky W., Wolkenhauer O., Cho KH., Yokota H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9863-7_412 </ref>:

1. Choose an initial population of candidate solutions

2. Calculate the fitness, how well the solution is, of each individual

3. Perform crossover from the population. The operation is to randomly choose some pair of individuals like parents and exchange so parts from the parents to generate new individuals

4. Mutation is to randomly change some individuals to create other new individuals

5. Evaluate the fitness of the offspring

6. Select the survive individuals

7. Proceed from 3 if the termination criteria have not been reached

===Tabu Search Algorithm===
Tabu search (TS) is a heuristic algorithm created by Fred Glover<ref>Fred Glover (1986). "Future Paths for Integer Programming and Links to Artificial Intelligence". Computers and Operations Research. '''13''' (5): 533–549,https://doi.org/10.1016/0305-0548(86)90048-1</ref> using a gradient-descent search with memory techniques to avoid cycling for determining an optimal solution. It does so by forbidding or penalizing moves that take the solution, in the next iteration, to points in the solution space previously visited. The algorithm spends some memory to keep a Tabu list of forbidden moves, which are the moves of the previous iterations or moves that might be considered unwanted. A general algorithm is as follows<ref>Optimization of Preventive Maintenance Program for Imaging Equipment in Hospitals, Editor(s): Zdravko Kravanja, Miloš Bogataj, Computer-Aided Chemical Engineering, Elsevier, Volume 38, 2016, Pages 1833-1838, ISSN 1570-7946, <nowiki>ISBN 9780444634283</nowiki>, https://doi.org/10.1016/B978-0-444-63428-3.50310-6.</ref>:

1. Select an initial solution ''s''0 ∈ ''S''. Initialize the Tabu List ''L''0 = ∅ and select a list tabu size. Establish ''k'' = 0.

2. Determine the neighborhood feasibility ''N''(''sk'') that excludes inferior members of the tabu list ''Lk''.

3. Select the next movement ''sk'' + 1 from ''N''(''Sk'') or ''Lk'' if there is a better solution and update ''Lk'' + 1

4. Stop if a condition of termination is reached, else, ''k'' = ''k'' + 1 and return to 1

==== Example: The Classical Vehicle Routing Problem ====
''Vehicle Routing Problems'' have very important applications in distribution management and have become some of the most studied problems in the combinatorial optimization literature. These include several Tabu Search implementations that currently rank among the most effective. The ''Classical Vehicle Routing Problem'' (CVRP) is the basic variant in that class of problems. It can formally be defined as follows. Let ''G'' = (''V, A'') be a graph where ''V'' is the vertex set and ''A'' is the arc set. One of the vertices represents the ''depot'' at which a fleet of identical vehicles of capacity ''Q'' is based, and the other vertices customers that need to be serviced. With each customer vertex vi are associated a demand qi and a service time ti. With each arc (vi, vj) of ''A'' are associated a cost cij and a travel time tij.<ref>Glover, Fred, and Gary A Kochenberger. Handbook Of Metaheuristics. Kluwer Academic Publishers, 2003.</ref> The CVRP consists of finding a set of routes such that:

1. Each route begins and ends at the depot

2. Each customer is visited exactly once by exactly one route

3. The total demand of the customers assigned to each route does not exceed ''Q''

4. The total duration of each route (including travel and service times) does not exceed a specified value ''L''

5. The total cost of the routes is minimized

A feasible solution for the problem thus consists of a partition of the customers into m groups, each of total demand no larger than ''Q'', that are sequenced to yield routes (starting and ending at the depot) of duration no larger than ''L''.

===Simulated Annealing Algorithm===
The Simulated Annealing Algorithm was developed by Kirkpatrick et. al. in 1983<ref>Kirkpatrick, S., Gelatt, C., & Vecchi, M. (1983). Optimization by Simulated Annealing. ''Science,'' ''220''(4598), 671-680. Retrieved November 25, 2020, from http://www.jstor.org/stable/1690046</ref> and is based on the analogy of ideal crystals in thermodynamics. The annealing process in metallurgy can make particles arrange themselves in the position with minima potential as the temperature is slowly decreased. The Simulation Annealing algorithm mimics this mechanism and uses the objective function of an optimization problem instead of the energy of a material to arrive at a solution. A general algorithm is as follows<ref>Brief review of static optimization methods, Editor(s): Stanisław Sieniutycz, Jacek Jeżowski, Energy Optimization in Process Systems and Fuel Cells (Third Edition), Elsevier, 2018, Pages 1-41, <nowiki>ISBN 9780081025574</nowiki>, https://doi.org/10.1016/B978-0-08-102557-4.00001-3.</ref> :

1. Fix initial temperature (''T''0)

2. Generate starting point '''x'''0 (this is the best point '''''X'''''* at present)

3. Generate randomly point '''''XS''''' (neighboring point)

4. Accept '''''XS''''' as '''''X'''''* (currently best solution) if an acceptance criterion is met. This must be such a condition that the probability of accepting a worse point is greater than zero, particularly at higher temperatures

5. If an equilibrium condition is satisfied, go to (6), otherwise jump back to (3).

6. If termination conditions are not met, decrease the temperature according to a certain cooling scheme and jump back to (1). If the termination conditions are satisfied, stop calculations accepting the current best value '''''X'''''* as the final (‘optimal’) solution.

== Numerical Example: Knapsack Problem ==
One of the most common applications of the heuristic algorithm is the Knapsack Problem, in which a given set of items (each with a mass and a value) are grouped to have a maximum value while being under a certain mass limit. It uses the Greedy Approximation Algorithm to sort the items based on their value per unit mass and then includes the items with the highest value per unit mass if there is still space remaining.

'''<big>Example</big>'''

The following table specifies the weights and values per unit of five different products held in storage. The quantity of each product is unlimited. A plane with a weight capacity of 13 is to be used, for one trip only, to transport the products. We would like to know how many units of each product should be loaded onto the plane to maximize the value of goods shipped.
{| class="wikitable"
|+
!
Product (i)
!Weight per unit (wi)
!Value per unit (vi)
|-
|1
|7
|9
|-
|2
|5
|4
|-
|3
|4
|3
|-
|4
|3
|2
|-
|5
|1
|0.5
|}
'''<big>Solution:</big>'''

'''(a) Stages:'''

We view each type of product as a stage, so there are 5 stages. We can also add a sixth stage representing the endpoint after deciding

'''(b) States:'''

We can view the remaining capacity as states, so there are 14 states in each stage: 0,1, 2, 3, …13

'''(c) Possible decisions at each stage:'''

Suppose we are in state s in stage n (n < 6), hence there are s capacity remaining. Then the possible number of items we can pack is:

j = 0, 1, …[s/wn]

For each such action j, we can have an arc going from the state s in stage n to the state n – j*wn in stage n + 1. For each arc in the graph, there is a corresponding benefit j*vn. We are trying to find a maximum benefit path from state 13 in stage 1, to stage 6.

'''(d) Optimization function:'''

Let fn(s) be the value of the maximum benefit possible with items of type n or greater using total capacity at most s

'''(e) Boundary conditions:'''

The sixth stage should have all zeros, that is, f6(s) = 0 for each s = 0,1, … 13

'''(f) Recurrence relation:'''

fn(s) = max {j*vn + fn+1(s – j*wn)}, j = 0, 1, …, [s/wn]

'''(g) Compute:'''

The solution will not show all the computations steps. Instead, only a few cases are given below to illustrate the idea.

* For stage 5, f5(s) = maxj=0, 1, …[s/1] {j*0.5 + 0} = 0.5s because given the all zero states in stage 6, the maximum possible value is to use up all the remaining s capacity.
* For stage 4, state 7,

f4(7) = maxj=0,1, …, [7/w4] = {j*v4 + f5(7 - w4*j)}

= max {0 + 3.5; 2 + 2; 4 + 0.5}

= 4.5

Using the recurrence relation above, we get the following table:
{| class="wikitable"
|+
!Unused Capacity
s
!f1(s)
!Type 1
opt
!f2(s)
!Type 2
opt
!f3(s)
!Type 3
opt
!f4(s)
!Type 4
opt
!f5(s)
!Type 5
opt
!f6(s)
|-
|13
|13.5
|1
|10
|2
|9.5
|3
|8.5
|4
|6.5
|13
|0
|-
|12
|13
|1
|9
|2
|9
|3
|8
|4
|6
|12
|0
|-
|11
|12
|1
|8.5
|2
|8
|2
|7
|3
|5.5
|11
|0
|-
|10
|11
|1
|8
|2
|7
|2
|6.5
|3
|5
|10
|0
|-
|9
|10
|1
|7
|1
|6.5
|2
|6
|3
|4.5
|9
|0
|-
|8
|9.5
|1
|6
|1
|6
|2
|5
|2
|4
|8
|0
|-
|7
|9
|1
|5
|1
|5
|1
|4.5
|2
|3.5
|7
|0
|-
|6
|4.5
|0
|4.5
|1
|4
|1
|4
|2
|3
|6
|0
|-
|5
|4
|0
|4
|1
|3.5
|1
|3
|1
|2.5
|5
|0
|-
|4
|3
|0
|3
|0
|3
|1
|2.5
|1
|2
|4
|0
|-
|3
|2
|0
|2
|0
|2
|0
|2
|1
|1.5
|3
|0
|-
|2
|1
|0
|1
|0
|1
|0
|1
|0
|1
|2
|0
|-
|1
|0.5
|0
|0.5
|0
|0.5
|0
|0.5
|0
|0.5
|1
|0
|-
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|}
'''Optimal solution:''' The maximum benefit possible is 13.5. Tracing forward to get the optimal solution: the optimal decision corresponding to the entry 13.5 for f1(1) is 1, therefore we should pack 1 unit of type 1. After that we have 6 capacity remaining, so look at f2(6) which is 4.5, corresponding to the optimal decision of packing 1 unit of type 2. After this, we have 6-5 = 1 capacity remaining, and f3(1) = f4(1) = 0, which means we are not able to pack any type 3 or type 4. Hence we go to stage 5 and find that f5(1) = 1, so we should pack 1 unit of type 5. This gives the entire optimal solution as can be seen in the table below:
{| class="wikitable"
|+
! colspan="2" |Optimal solution
|-
!Product (i)
!Number of units
|-
|1
|1
|-
|2
|1
|-
|5
|1
|}

==Applications==
Heuristic algorithms have become an important technique in solving current real-world problems. Its applications can range from optimizing the power flow in modern power systems<ref> NIU, M., WAN, C. & Xu, Z. A review on applications of heuristic optimization algorithms for optimal power flow in modern power systems. J. Mod. Power Syst. Clean Energy 2, 289–297 (2014), https://doi.org/10.1007/s40565-014-0089-4</ref> to groundwater pumping simulation models<ref> J. L. Wang, Y. H. Lin and M. D. Lin, "Application of heuristic algorithms on groundwater pumping source identification problems," 2015 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 2015, pp. 858-862, https://doi.org/10.1109/IEEM.2015.7385770.</ref>. Heuristic optimization techniques are increasingly applied in environmental engineering applications as well such as the design of a multilayer sorptive barrier system for landfill liner.<ref>Matott, L. Shawn, et al. “Application of Heuristic Optimization Techniques and Algorithm Tuning to Multilayered Sorptive Barrier Design.” Environmental Science & Technology, vol. 40, no. 20, 2006, pp. 6354–6360., https://doi.org/10.1021/es052560+.</ref> Heuristic algorithms have also been applied in the fields of bioinformatics, computational biology, and systems biology.<ref>Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A, Robles V (2006) Machine learning in bioinformatics. Brief Bioinform 7(1):86–112 </ref>

==Conclusion==
Heuristic algorithms are not a panacea, but they are handy tools to be used when the use of exact methods cannot be implemented. Heuristics can provide flexible techniques to solve hard problems with the advantage of simple implementation and low computational cost. Over the years, we have seen a progression in heuristics with the development of hybrid systems that combine selected features from various types of heuristic algorithms such as tabu search, simulated annealing, and genetic or evolutionary computing. Future research will continue to expand the capabilities of existing heuristics to solve complex real-world problems.

==References==
<references />

Column generation algorithms

2020-12-21T11:37:13Z

Wc593:

Author: Lorena Garcia Fernandez (lgf572) (SysEn 5800 Fall 2020)

== Introduction ==
Column Generation techniques have the scope of solving large linear optimization problems by generating only the variables that will have an influence on the objective function. This is important for big problems with many variables where the formulation with these techniques would simplify the problem formulation, since not all the possibilities need to be explicitly listed.<ref>Desrosiers, Jacques & Lübbecke, Marco. (2006). A Primer in Column Generation.p7-p14 10.1007/0-387-25486-2_1. </ref>

== Theory, methodology and algorithmic discussions ==
'''''Theory'''''

The way this method work is as follows; first, the original problem that is being solved needs to be split into two problems: the master problem and the sub-problem.

* The master problem is the original column-wise (i.e: one column at a time) formulation of the problem with only a subset of variables being considered.<ref>
AlainChabrier, Column Generation techniques, 2019 URL: https://medium.com/@AlainChabrier/column-generation-techniques-6a414d723a64
</ref>

* The sub-problem is a new problem created to identify a new promising variable. The objective function of the sub-problem is the reduced cost of the new variable with respect to the current dual variables, and the constraints require that the variable obeys the naturally occurring constraints. The subproblem is also referred to as the RMP or “restricted master problem”. From this we can infer that this method will be a good fit for problems whose constraint set admit a natural breakdown (i.e: decomposition) into sub-systems representing a well understood combinatorial structure.<ref>
AlainChabrier, Column Generation techniques, 2019 URL: https://medium.com/@AlainChabrier/column-generation-techniques-6a414d723a64
</ref>

To execute that decomposition from the original problem into Master and subproblems there are different techniques. The theory behind this method relies on the Dantzig-Wolfe decomposition.<ref>Dantzig-Wolfe decomposition. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Dantzig-Wolfe_decomposition&oldid=50750</ref>

In summary, when the master problem is solved, we are able to obtain dual prices for each of the constraints in the master problem. This information is then utilized in the objective function of the subproblem. The subproblem is solved. If the objective value of the subproblem is negative, a variable with negative reduced cost has been identified. This variable is then added to the master problem, and the master problem is re-solved. Re-solving the master problem will generate a new set of dual values, and the process is repeated until no negative reduced cost variables are identified. The subproblem returns a solution with non-negative reduced cost, we can conclude that the solution to the master problem is optimal.<ref>Wikipedia, the free encyclopeda. Column Generation. URL: https://en.wikipedia.org/wiki/Column_generation</ref>

'''''Methodology'''''<ref>L.A. Wolsey, Integer programming. Wiley,Column Generation Algorithms p185-p189,1998</ref>
[[File:Column Generation.png|thumb|468x468px|Column generation schematics<ref name=":4">GERARD. (2005). Personnel and Vehicle scheduling, Column Generation, slide 12. URL: https://slideplayer.com/slide/6574/</ref>]]
Consider the problem in the form:

(IP)
<math>z=max\left \{\sum_{k=1}^{K}c^{k}x^{k}:\sum_{k=1}^{K}A^{k}x^{k}=b,x^{k}\epsilon X^{k}\; \; \; for\; \; \; k=1,...,K \right \}</math>

Where <math>X^{k}=\left \{x^{k}\epsilon Z_{+}^{n_{k}}: D^{k}x^{k}\leq d^{_{k}} \right \}</math> for <math>k=1,...,K</math>. Assuming that each set <math>X^{k}</math> contains a large but finite set of points <math>\left \{ x^{k,t} \right \}_{t=1}^{T_{k}}</math>, we have that <math>X^{k}=</math>:

<math>\left \{ x^{k}\epsilon R^{n_{k}}:x^{k}=\sum_{t=1}^{T_{k}}\lambda _{k,t}x^{k,t},\sum_{t=1}^{T_{k}}\lambda _{k,t}=1,\lambda _{k,t}\epsilon \left \{ 0,1 \right \}for \; \; k=1,...,K \right \}</math>

Note that, on the assumption that each of the sets <math>X^{k}=</math> is bounded for <math>k=1,...,K</math> the approach will involve solving an equivalent problem of the form as below:

<math>max\left \{ \sum_{k=1}^{K}\gamma ^{k}\lambda ^{k}: \sum_{k=1}^{K}B^{k}\lambda ^{k}=\beta ,\lambda ^{k}\geq 0\; \; integer\; \; for\; \; k=1,...,K \right \}</math>

where each matrix <math>B^{k}</math> has a very large number of columns, one for each of the feasible points in <math>X^{k}</math>, and each vector <math>\lambda ^{k}</math> contains the corresponding variables.

Now, substituting for <math>x^{k}=</math> leads to an equivalent ''IP Master Problem (IPM)'':

(IPM)
<math>\begin{matrix}
z=max\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left(c^{k}x^{k,t}\right )\lambda _{k,t} \\ \sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( A^{k}x^{k,t} \right )\lambda _{k,t}=b\\
\sum_{t=1}^{T_{k}}\lambda _{k,t}=1\; \; for\; \; k=1,...,K \\
\lambda _{k,t}\epsilon \left \{ 0,1 \right \}\; \; for\; \; t=1,...,T_{k}\; \; and\; \; k=1,...,K.
\end{matrix}</math>

To solve the Master Linear Program, we use a column generation algorithm. This is in order to solve the linear programming relaxation of the Integer Programming Master Problem, called the ''Linear Programming Master Problem (LPM)'':

(LPM)
<math>\begin{matrix}
z^{LPM}=max\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( c^{k}x^{k,t} \right )\lambda _{k,t}\\
\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( A^{k}x^{k,t} \right )\lambda _{k,t}=b \\
\sum_{t=1}^{T_{k}}\lambda _{k,t}=1\; \;for\; \; k=1,...,K \\
\lambda _{k,t} \geq 0\; \; for\; \; t=1,...,T_{k},\; k=1,...,K
\end{matrix}</math>

Where there is a column <math>\begin{pmatrix}
c^{k}x\\
A^{k}x\\
e_{k}
\end{pmatrix}</math> for each ''<math>x</math>'' ''<math display="inline">\in</math> <math display="inline">X^{k}</math>''. On the next steps of this method, we will use <math>\left \{ \pi _{i} \right \}_{i=1}^{m}</math> as the dual variables associated with the joint constraints, and <math>\left \{ \mu_{k} \right \}_{k=1}^{K}</math> as dual variables for the second set of constraints.The latter are also known as convexity constraints.
The idea is to solve the linear program by the primal simplex algorithm. However, the pricing step of choosing a column to enter the basis must be modified because of the very big number of columns in play. Instead of pricing the columns one at a time, the question of finding a column with the biggest reduced price is itself a set of <math>K</math> optimization problems.

''Initialization:'' we suppose that a subset of columns (at least one for each <math>k</math>) is available, providing a feasible ''Restricted Linear Programming Master Problem'':

(RLPM)
<math>\begin{matrix}
z^{LPM}=max\tilde{c}\tilde{\lambda} \\
\tilde{A}\tilde{\lambda }=b \\
\tilde{\lambda }\geq 0
\end{matrix}</math>

where <math>\tilde{b}=\begin{pmatrix}
b\\
1\\
\end{pmatrix}</math>, <math>\tilde{A}</math> is generated by the available set of columns and <math>\tilde{c}\tilde{\lambda }</math> are the corresponding costs and variables. Solving the RLPM gives an optimal primal solution <math>\tilde{\lambda ^{*}}</math> and an optimal dual solution <math>\left ( \pi ,\mu \right )\epsilon\; R^{m}\times R^{k}</math>

''Primal feasibility:'' Any feasible solution of ''RLMP'' is feasible for ''LPM''. More precisely, <math>\tilde{\lambda^{*} }</math> is a feasible solution of ''LPM'', and hence <math>\tilde{z}^{LPM}=\tilde{c}\tilde{\lambda ^{*}}=\sum_{i=1}^{m}\pi _{i}b_{i}+\sum_{k=1}^{K}\mu _{k}\leq z^{LPM}</math>

''Optimality check for LPM:'' It is required to check whether <math>\left ( \pi ,\mu \right )</math> is dual feasible for ''LPM''. This means checking for each column, that is for each <math>k</math>, and for each <math>x\; \epsilon \; X^{k}</math> if the reduced price <math>c^{k}x-\pi A^{k}x-\mu _{k}\leq 0</math>. Rather than examining each point separately, we treat all points in <math>X^{k}</math> implicitly, by solving an optimization subproblem:

<math>\zeta _{k}=max\left \{ \left (c^{k}-\pi A^{k} \right )x-\mu _{k}\; :\; x\; \epsilon \; X^{k}\right \}.</math>

''Stopping criteria:'' If <math>\zeta _{k}> 0</math> for <math>k=1,...,K</math> the solution <math>\left ( \pi ,\mu \right )</math> is dual feasible for ''LPM'', and hence <math>z^{LPM}\leq \sum_{i=1}^{m}\pi _{i}b_{i}+\sum_{k=1}^{K}\mu _{k}</math>. As the value of the primal feasible solution <math>\tilde{\lambda }</math> equals that of this upper bound, <math>\tilde{\lambda }</math> is optimal for ''LPM''.

''Generating a new column:'' If <math>\zeta _{k}> 0</math> for some <math>k</math>, the column corresponding to the optimal solution <math>\tilde{x}^{k}</math> of the subproblem has a positive reduced price. Introducing the column <math>\begin{pmatrix}
c^{k}x\\
A^{k}x\\
e_{k}
\end{pmatrix}</math> leads then to a Restricted Linear Programming Master Problem that can be easily reoptimized (e.g., by the primal simplex algorithm)

== Numerical example: The Cutting Stock problem<ref>L.A. Wolsey, Integer programming. Wiley,Column Generation Algorithms p185-p189,1998The Cutting Stock problem</ref> ==

Suppose we want to solve a numerical example of the cutting stock problem, specifically a one-dimensional cutting stock problem.

''Problem Overview''

A company produces steel bars with diameter <math>45</math> millimeters and length <math>33</math> meters. The company also takes care of cutting the bars for their different customers, who each require different lengths. At the moment, the following demand forecast is expected and must be satisfied:
{| class="wikitable"
|+
|Pieces needed
|Piece length(m)
|Type of item
|-
|144
|6
|1
|-
|105
|13.5
|2
|-
|72
|15
|3
|-
|30
|16.5
|4
|-
|24
|22.5
|5
|}
The objective is to establish what is the minimum number of steel bars that should be used to satisfy the total demand.

A possible model for the problem, proposed by Gilmore and Gomory in the 1960ies is the one below:

'''Sets'''

<math>K=\left \{ 1,2,3,4,5 \right \}</math>: set of item types;

''<math display="inline">S</math>:'' set of patterns (i.e., possible ways) that can be adopted to cut a given bar into portions of the need lengths.

'''Parameters'''

<math display="inline">M</math>: bar length (before the cutting process);

<math display="inline">L_k</math>'':'' length of item ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'';

<math display="inline">R_s</math> : number of pieces of type ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'' required;

<math display="inline">N_{k,s}</math> : number of pieces of type ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'' in pattern ''<math display="inline">s</math>'' ''<math display="inline">\in</math> <math display="inline">S</math>''.

'''Decision variables'''

<math display="inline">Y_s</math> : number of bars that should be portioned using pattern ''<math display="inline">s</math>'' ''<math display="inline">\in</math> <math display="inline">S</math>''.

'''Model'''

<math>\begin{matrix}\min(y)\sum_{s=1}^Sy_s \\ \ s.t. \sum_kN_{ks}y_s\geq J_k \forall k\in K \\ y_s\in \Zeta_+\forall s\in S \end{matrix}

</math>

''Solving the problem''

The model assumes the availability of the set ''<math display="inline">K</math>'' and the parameters <math display="inline">N_{k,s}</math> . To generate this data, you would have to list all possible cutting patterns. However, the number of possible cutting patterns is a big number. This is why a direct implementation of the model above is not practical in real-world problems. In this case is when it makes sense to solve the continuous relaxation of the above model. This is because, in reality, the demand figures are so high that the number of bars to cut is also a large number, and therefore a good solution can be determined by rounding up to the next integer each variable <math>y_s

</math>found by solving the continuous relaxation. In addition to that, the solution of the relaxed problem will become the starting point for the application of an exact solution method (for instance, the Branch-and Bound).<blockquote>''Key take-away: In the next steps of this example we will analyze how to solve the continuous relaxation of the model.''</blockquote>As a starting point, we need any feasible solution. Such a solution can be constructed as follows:

# We consider any single-item cutting patterns, i.e., <math>\|K\|

</math> configurations, each containing <math display="inline">{\textstyle N_{k,s} } = \llcorner \frac{W}{L_k}\lrcorner

</math> pieces of type <math>k

</math>;
# Set <math display="inline">{\textstyle y_{k}} = \llcorner \frac{R_s}{N_{k,s}}\lrcorner

</math> for pattern <math>k

</math> (where pattern <math>k

</math> is the pattern containing only pieces of type <math>k

</math>).

This solution could also be arrived to by applying the simplex method to the model (without integrality constraints), considering only the decision variables that correspond to the above single-item patterns:

<math>\begin{align}
\text{min} & ~~ y_{1}+y_{2}+y_{3}+y_{4}+y_{5}\\
\text{s.t} & ~~ 15y_{1} \ge 144\\
\ & ~~ 6y_{2} \ge 105\\
\ & ~~ 6y_{3} \ge 72\\
\ & ~~ 6y_{4} \ge 30\\
\ & ~~ 3y_{5} \ge 24\\
\ & ~~ y_{1},y_{2},y_{3},y_{4},y_{5} \ge 0\\
\end{align}</math>

In fact, if we solve this problem (for example, use CPLEX solver in GAMS) the solution is as below:
{| class="wikitable"
|Y1
|28.8
|-
|Y2
|52.5
|-
|Y3
|24
|-
|Y4
|15
|-
|Y5
|24
|}
Next, a new possible pattern (number <math>6</math>) will be considered. This pattern contains only one piece of item type number <math>5</math>. So the question is if the new solution would remain optimal if this new pattern was allowed. Duality helps answer ths question. At every iteration of the simplex method, the outcome is a feasible basic solution (corresponding to some basis <math>B</math>) for the primal problem and a dual solution (the multipliers <math>u^{t}=c^{t}BB^{-1}</math>) that satisfy the complementary slackness conditions. (Note: the dual solution will be feasible only when the last iteration is reached)

The inclusion of new pattern <math>6</math> corresponds to including a new variable in the primal problem, with objective cost <math>1</math> (as each time pattern <math>6</math> is chosen, one bar is cut) and corresponding to the following column in the constraint matrix:

<math>D_6= \begin{bmatrix}
\ 1 \\
\ 0 \\
\ 0 \\
\ 0 \\
\ 1 \\
\end{bmatrix}</math>

These variables create a new dual constraint. We then have to check if this new constraint is violated by the current dual solution (or in other words, ''if the reduced cost of the new variable with respect to basis <math>B</math> is negative)''

The new dual constraint is:<math>1\times u_{1}+0\times u_{2}+0\times u_{3}+0\times u_{4}+1\times u_{5}\leq 1</math>

The solution for the dual problem can be computed in different software packages, or by hand. The example below shows the solution obtained with GAMS for this example:

(Note the solution for the dual problem would be: <math>u=c_{T}^{B}B^{-1}</math>)

{| class="wikitable"
|Dual variable
|Variable value
|-
|D1
|0.067
|-
|D2
|0.167
|-
|D3
|0.167
|-
|D4
|0.167
|-
|D5
|0.333
|}
Since <math>0.2+1=1.2> 1</math>, the new constraint is violated.

This means that the current primal solution (in which the new variable is <math>y_{6}=0</math>) may not be optimal anymore (although it is still feasible). The fact that the dual constraint is violated means the associated primal variable has negative reduced cost:

the norm of <math>c_6 = c_6-u^TD_6=1-0.4=0.6</math>

To help us solve the problem, the next step is to let <math>y_{6}</math> enter the basis. To do so, we modify the problem by inserting the new variable as below:

<math>\begin{align}
\text{min} & ~~ y_{1}+y_{2}+y_{3}+y_{4}+y_{5}+y_{6}\\
\text{s.t} & ~~ 15y_{1} +y_{6}\ge 144\\
\ & ~~ 6y_{2} \ge 105\\
\ & ~~ 6y_{3} \ge 72\\
\ & ~~ 6y_{4} \ge 30\\
\ & ~~ 3y_{5}+y_{6} \ge 24\\
\ & ~~ y_{1},y_{2},y_{3},y_{4},y_{5},y_{6} \ge 0\\
\end{align}</math>

If this problem is solved with the simplex method, the optimal solution is found, but restricted only to patterns <math>1</math> to <math>6</math>. If a new pattern is available, a decision should be made whether this new pattern should be used or not by proceeding as above. However, the problem is how to find a pattern (i.e., a variable; i.e, a column of the matrix) whose reduced cost is negative (i.e., which will mean it is convenient to include it in the formulation). At this point one can notice that number of possible patterns exponentially large,and all the patterns are not even known explicitly. The question then is:

''Given a basic optimal solution for the problem in which only some variables are included, how can we find (if any exists) a variable with negative reduced cost (i.e., a constraint violated by the current dual solution)?''

This question can be transformed into an optimization problem: in order to see whether a variable with negative reduced cost exists, we can look for the minimum of the reduced costs of all possible variables and check whether this minimum is negative:

<math>\bar{c}=1-u^Tz</math>

Because every column of the constraint matrix corresponds to a cutting pattern, and every entry of the column says how many pieces of a certain type are in that pattern. In order for <math>z

</math> to be a possible column of the constraint matrix, the following condition must be satisfied:

<math display="inline">\begin{matrix}z_k\in \Zeta_+\forall k\in K \\ \ \sum_kL_kz_k \leq M \end{matrix}

</math>

And by so doing, it enables the conversion of the problem of finding a variable with negative reduced cost into the integer linear programming problem below:

<math>\begin{matrix}\min\ \bar{c} = 1 - sum_{k=1}^K u_k \times z_k \\ \ s.t. \sum_kL_kz_k \leq M \\ z_k\in \Zeta_+\forall k\in K \end{matrix}

</math>

which, in turn, would be equivalent to the below formulation (we just write the objective in maximization form and ignore the additive constant <math>1</math>):

<math>\begin{matrix} \max\sum_{k=1}^K u_k \times z_k \\ \ s.t. \sum_kL_kz_k \leq M \\ z_k\in \Zeta_+\forall k\in K \end{matrix}</math>

The coefficients <math>z_k

</math> of a column with negative reduced cost can be found by solving the above integer [[wikipedia:Knapsack_problem|"knapsack"]] problem (which is a traditional type of problem that we find in integer programming).

In our example, if we start from the problem restricted to the five single-item patterns, the above problem reads as:

<math>\begin{align}
\text{min} & ~~ 0.067z_{1}+0.167z_{2}+0.167z_{3}+0.167z_{4}+z_{5}\\
\text{s.t} & ~~ 6z_{1} +13.5z_{2}+15z_{3}+16.5z_{4}+22.5z_{5}\le 33\\
\ & ~~ z_{1},z_{2},z_{3},z_{4},z_{5}\ge 0\\
\end{align}</math>

which has the following optimal solution: <math>z^T= [1 \quad 0\quad 0\quad 0\quad 1]</math>

This matches the pattern we called <math>D6</math>, earlier on in this page.

Optimality test

If : <math display="inline">\sum_{k=1}^{K}z_{k}^{*}u_{k}^{*}\leq 1</math>

then <math>y^*</math> is an optimal solution of the full continuous relaxed problem (that is, including all patterns in ''<math display="inline">S</math>'')

If this condition is not true, we go ahead and update the master problem by including in ''<math display="inline">S^'</math>'' the pattern <math>\lambda</math> defined by <math>N_{s,\lambda}</math> (in practical terms this means that the column '''<math>y^*</math>''' needs to be included in the constraint matrix)

For this example we find that the optimality test is met as <math>\sum_{k=1}^{K}z_{k}^{*}u_{k}^{*}=0.4 \leq 1</math> so we have have found an optimal solution of the relaxed continuous problem (if this was not the case we would have had to go back to reformulating and solving the master problem, as discussed in the methodology section of this page)

'''''Algorithm discussion'''''

The column generation subproblem is the critical part of the method is generating the new columns. It is not reasonable to compute the reduced costs of all variables <math>y_s

</math> for <math>s=1,...,S</math>, otherwise this procedure would reduce to the simplex method. In fact, n<math>n</math> can be very large (as in the cutting-stock problem) or, for some reason, it might not be possible or convenient to enumerate all decision variables. This is when it would be necessary to study a specific column generation algorithm for each problem; ''only if such an algorithm exists (and is practical)'', the method can be fully applied. In the one-dimensional cutting stock problem, we transformed the column generation subproblem into an easily solvable integer linear programming problem. In other cases, the computational effort required to solve the subproblem is too high, such that appying this full procedure becomes unefficient.

== Applications ==
As previously mentioned, column generation techniques are most relevant when the problem that we are trying to solve has a high ratio of number of variables with respect to the number of constraints. As such some common applications are:

* Bandwith packing
* Bus driver scheduling
* Generally, column generation algorithms are used for large delivery networks, often in combination with other methods, helping to implement real-time solutions for on-demand logistics. We discuss a supply chain scheduling application below.

'''''Bandwidth packing'''''

The objective of this problem is to allocate bandwidth in a telecommunications network to maximize total revenue. The routing of a set of traffic demands between different users is to be decided, taking into account the capacity of the network arcs and the fact that the traffic between each pair of users cannot be split The problem can be formulated as an integer programming problem and the linear programming relaxation solved using column generation and the simplex algorithm. A branch and bound procedure which branches upon a particular path is used in this particular paper<ref name=":3">Parker, Mark & Ryan, Jennifer. (1993). A column generation algorithm for bandwidth packing. Telecommunication Systems. 2. 185-195. 10.1007/BF02109857. </ref> that looks into bandwidth routing, to solve the IP. The column generation algorithm greatly reduces the complexity of this problem.

'''''Bus driver scheduling'''''

Bus driver scheduling aims to find the minimum number of bus drivers to cover a published timetable of a bus company. When scheduling bus drivers, contractual working rules must be enforced, thus complicating the problem. A column generation algorithm can decompose this complicated problem into a master problem and a series of pricing subproblems. The master problem would select optimal duties from a set of known feasible duties, and the pricing subproblem would augment the feasible duty set to improve the solution obtained in the master problem.<ref name=":2">Dung‐Ying Lin, Ching‐Lan Hsu. Journal of Advanced Transportation. Volume50, Issue8, December 2016, Pages 1598-1615. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/atr.1417</ref>

'''''Supply Chain scheduling problem'''''

A typical application is where we consider the problem of scheduling a set of shipments between different nodes of a supply chain network. Each shipment has a fixed departure time, as well as an origin and a destination node, which, combined, determine the duration of the associated trip. The aim is to schedule as many shipments as possible, while also minimizing the number of vehicles utilized for this purpose. This problem can be formulated by an integer programming model and an associated branch and price solution algorithm. The optimal solution to the LP relaxation of the problem can be obtained through column generation, solving the linear program a huge number of variables, without explicitly considering all of them. In the context of this application, the master problem schedules the maximum possible number of shipments using only a small set of vehicle-routes, and a column generation (colgen) sub-problem would generate cost-effective vehicle-routes to be fed fed into the master problem. After finding the optimal solution to the LP relaxation of the problem, the algorithm would branch on the fractional decision variables (vehicle-routes), in order to reach the optimal integer solution.<ref name=":1">Kozanidis, George. (2014). Column generation for scheduling shipments within a supply chain network with the minimum number of vehicles. OPT-i 2014 - 1st International Conference on Engineering and Applied Sciences Optimization, Proceedings. 888-898</ref>

== Conclusions ==
Column generation is a way of starting with a small, manageable part of a problem (specifically, with some of the variables), solving that part, analyzing that interim solution to find the next part of the problem (specifically, one or more variables) to add to the model, and then solving the full or extended model. In the column generation method, the algorithm steps are repeated until an optimal solution to the entire problem is achieved.<ref> ILOG CPLEX 11.0 User's Manual > Discrete Optimization > Using Column Generation: a Cutting Stock Example > What Is Column Generation? 1997-2007. URL:http://www-eio.upc.es/lceio/manuals/cplex-11/html/usrcplex/usingColumnGen2.html#:~:text=In%20formal%20terms%2C%20column%20generation,method%20of%20solving%20the%20problem.&text=By%201960%2C%20Dantzig%20and%20Wolfe,problems%20with%20a%20decomposable%20structure</ref>

This algorithm provides a way of solving a linear programming problem adding columns (corresponding to constrained variables) during the pricing phase of the problem solving phase, that would otherwise be very tedious to formulate and compute. Generating a column in the primal formulation of a linear programming problem corresponds to adding a constraint in its dual formulation.

== References ==

Newsvendor problem

2020-12-21T11:36:48Z

Wc593:

Authors: Morgan McCormick (mm3237), Brittany Yesner (by286), Daniel Aronson (da523), John Bednarek (jwb389) (SysEn 5800 Fall 2020)

== Introduction ==
The mathematical application for the Newsvendor Problem dates back to 1888, when Francis Ysidro Edgeworth used the central limit theorem to find the optimal cash reserves needed to satisfy various withdrawals from depositors.<ref>F. Y. Edgeworth (1888). "The Mathematical Theory of Banking". Journal of the Royal Statistical Society.</ref> The namesake for the problem comes from Morse and Kimball's book from 1951, where they used the term “newsboy” to describe this specific problem.<ref>R. R. Chen; T.C.E. Cheng; T.M. Choi; Y. Wang (2016). "Novel Advances in Applications of the Newsvendor Model". Decision Sciences.</ref> Also referred to as “newsboy problem”, it is named by analogy with the situation faced by a newspaper vendor who must decide how many copies of the day's paper to stock in the face of uncertain demand and knowing that unsold copies will be worthless at the end of the day.

T.M Whitin in 1955 was the first to consider not only the cost minimization portion of the problem, but also the profit maximization.<ref>Whitin, T. M. “Inventory Control and Price Theory.” Management Science, vol. 2, no. 1, 1955, pp. 61–68.</ref> To do so he formulated a newsvendor model with price effects, where the selling price and stocking quantity are set simultaneously. He then adapted his model to include a probability distribution for demand as a function of the selling price, therefore making the price of the product a decision variable rather than an assigned variable.

In general, this model can be used in any application with a perishable good and unknown, randomized demand.

== Description ==
The newsvendor model is a model used to determine the optimal inventory levels in operations management and applied economic applications. The assumptions for this problem usually include fixed prices and uncertain demands for perishable products with limited availability. In this model, any unit of demand, ''R'', over the current inventory level, x, is identified as a lost sale.

== Formulation ==

=== Overview ===
To formulate a standard newsvendor problem to determine profit, the function is <math display="inline">E[profit] = E[s * min(x, R)] - wx </math> . In the formulation, ''s'' represents the price a unit is sold for, x represents the number of units in inventory that the vendor ordered, ''R'' is a random variable representing a probability distribution for the demand a given day, and ''w'' is the wholesale cost for the vendor to purchase materials. The goal is to optimize the profit to be a maximum. This is achieved by maximizing the amount of inventory on hand to be able to sell while also minimizing the amount of unsold inventory that is void or considered perishable at the end of the day. The salvage cost for any unsold inventory at the end of the sales period is represented by ''v.''

The balance of being understocked and losing potential sales with the potential loss from being overstocked can be represented by the critical fractal. This is illustrated by the formula <math>n=F^{-1} ({s-w \over s})</math> where ''F-1'' is the inverse of the cumulative distribution function of R.<ref name=":0">"Newsvendor Model.” Wikipedia, Wikimedia Foundation, 12 Nov. 2020, en.wikipedia.org/wiki/Newsvendor_model.
</ref>,<ref>Yan Qin, Ruoxuan Wang, Asoo J. Vakharia, Yuwen Chen, Michelle M.H. Seref, “The newsvendor problem: Review and directions for future research.” European Journal of Operational Research. Volume 213, Issue 2. 2011. Pages 361-374, ISSN 0377-2217. <nowiki>https://doi.org/10.1016/j.ejor.2010.11.024</nowiki>.</ref>

=== Detailed Solution Steps ===
In formulation, a newsboy could purchase a given number of newspapers x one morning for a given wholesale bulk cost, ''b''. The selling price and salvage values are known constants ''s'' and ''v,'' respectively, and the demand is given by ''D''. The overage cost is co for the cost of ordering one unit too many. The cost of ordering one unit too few is the cost of underage, cu.

The activity variables are ''D(ω)'', the realization of random demand which is assumed to be continuous; ''p(ω)'', the probability of outcome ω; ''S0(ω)'', the overage which is equal to <math>[x - D(\omega)]^+</math>; and the underage ''Su(ω)'' which is equal to <math>[D(\omega)-x]^+</math>.

To calculate the '''wholesale cost per newspaper''', ''w,'' the formula <math display="inline">w = b/x</math> is used.

The '''marginal profit''', or net profit for the newsvendor per unit, ''m'' is found by the formula <math>m = s - w</math>.

The '''marginal loss''', or loss for each unsold unit, ''l'' is found using the formula <math>l = w - v</math>.

The '''profit''', ''P'', is calculated by <math>P = m * x</math> if every item in inventory was sold.

The '''expected profit''', ''E'', taking into account a given demand probability is calculated by <math>E = x * D * m</math> if every item in inventory is sold.

The objective function can be represented as <math>F(x,\omega)=c^o S^o (x,\omega) + c^u S^u (x,\omega)
= c^o [x-d(\omega)]^+ + c^u [D(\omega)-x]^+ </math><math>F(x) = E[F(x,\omega)]
= \int (c^o [x-D(\omega)]^+ + c^u [D(\omega) - x]^+ ) p(\omega)d\omega</math>

where the goal is to solve for <math>min_x F(x) = E [F(x,\omega)]</math>.

== Numerical Example ==
A historically relevant example of the newsvendor problem would be the working conditions that led to the newsboy strike of 1899 and subsequent labor movements.

In the late nineteenth century and before the Spanish-American War, newsboys in New York City could purchase 100 newspapers for 50 cents and sell the newspapers for 8 cents each. If a paper didn’t sell, assume the publisher would buy the newspaper back at 60% cost.<ref name=":1">“Labor History Lesson: The ‘Newsies’ Strike.” Labor History Lesson: The "Newsies" Strike | AFT Connecticut, 25 May 2016, aftct.org/story/labor-history-lesson-newsies-strike</ref>

Assume the newspaper sales in New York City followed the following demand schedule:
{| class="wikitable"
|+Table 1: Demand in New York City
!Quantity
!Probability of Demand
|-
|700
|0.450
|-
|800
|0.300
|-
|900
|0.220
|-
|1000
|0.015
|-
|1100
|0.010
|}
The '''wholesale cost price''' of the newspapers is $0.05/100 = $0.005 per newspaper.

The '''selling price''' of the newspapers is $0.08 per newspaper.

The '''salvage value''' of the newspapers is $0.003 per newspaper.

The '''marginal profit''' is equal to $0.08 - $0.005 = $0.075 per additional newspaper sold.

The '''marginal loss''' is equal to $0.005 - $0.003 = $0.002 per unsold newspaper.

<math>c^o = $0.005 - $0.005(0.6) = $0.002</math> per unit

<math>c^u = $0.08</math> per unit

x = purchase quantity, where <math>x \in (700, 800, 900, 1000, 1100)</math>

<math>S^o (\omega) = x - \omega, x > \omega</math>

<math>S^u (\omega) = \omega - x, x< \omega</math>

<math>S^o (\omega) = S^u (\omega), x = \omega</math>

<math>F(x,\omega) = loss function</math>

<math>F(x,\omega) = c^o S^o (x, \omega) + c^u s^u (x, \omega)</math>

<math>F(x,\omega) = c^o (x-\omega) + c^u (\omega -x)</math>

<math>F(x,\omega) = (0.002)(x- \omega)+(0.008)(\omega-x)</math>

<math>R(x,\omega) = 0.08\omega - F(x,\omega)</math>
{| class="wikitable"
|+Table 2: Tabulated Values
!Purchase Quantity (x)
!Units Sold (ω)
!Loss (F(x, ω))
!Probability of Demand (p(ω))
!Profit (ω*0.08)
!Revenue (Profit - Loss)
!Probability of Revenue
!Expected Revenue for Purchasing x
|-
| rowspan="5" |700
|700
|0
|0.45
|56
|56
|25.2
| rowspan="5" |55.75
|-
|800
|8
|0.3
|64
|56
|16.8
|-
|900
|16
|0.22
|72
|56
|12.32
|-
|1000
|24
|0.015
|80
|56
|0.84
|-
|1100
|32
|0.01
|88
|56
|0.56
|-
| rowspan="5" |800
|700
|0.2
|0.45
|56
|55.8
|25.11
| rowspan="5" |59.99
|-
|800
|0
|0.3
|64
|64
|19.2
|-
|900
|8
|0.22
|72
|64
|14.08
|-
|1000
|16
|0.015
|80
|64
|0.96
|-
|1100
|24
|0.01
|88
|64
|0.64
|-
| rowspan="5" |900
|700
|0.4
|0.45
|56
|55.6
|25.02
| rowspan="5" |61.08
|-
|800
|0.2
|0.3
|64
|63.8
|19.14
|-
|900
|0
|0.22
|72
|72
|15.84
|-
|1000
|8
|0.015
|80
|72
|1.08
|-
|1100
|88
|0.01
|88
|0
|0
|-
| rowspan="5" |1000
|700
|0.6
|0.45
|56
|55.4
|24.93
| rowspan="5" |61.806
|-
|800
|0.4
|0.3
|64
|63.6
|19.08
|-
|900
|0.2
|0.22
|72
|71.8
|15.796
|-
|1000
|0
|0.015
|80
|80
|1.2
|-
|1100
|8
|0.01
|88
|80
|0.8
|-
| rowspan="5" |1100
|700
|0.8
|0.45
|56
|55.2
|24.84
| rowspan="5" |61.689
|-
|800
|0.6
|0.3
|64
|63.4
|19.02
|-
|900
|0.4
|0.22
|72
|71.6
|15.752
|-
|1000
|0.2
|0.015
|80
|79.8
|1.197
|-
|1100
|0
|0.01
|88
|88
|0.88
|}

The optimal quantity to purchase is 1000 in order to minimize expected loss and maximize expected revenue.

== Demand Distributions ==
The newsvendor problem can be solved in a multitude of ways, the one uncertainty that always exists is the number of papers needed to fully maximize the profits. This can be estimated by a variety of ways, but most commonly there are uniform, normal, or lognormal distributions.

The uniform distribution estimates the probability to not change. In the case of the newspaper problem this would mean that the demand for a newspaper does not vary from day to day. This method can pose issues as the demand for papers can vary from days like Monday or Tuesday, to days like Sunday which historically have been a day recognized as always having a paper.

The next method that can be used to estimate the demand of a paper can be done using a normal distribution. A normal distribution’s standard deviation positions the curve of demand into being one that can be used to calculate the different demands that a salesman may face amongst the sales of a paper. The normal distribution allocates variations that enable the salesman to take calculated risks based on historical norms. These norms provide contextual evidence to accurately account for the demand that the seller may see.

While a normal distribution can provide estimates into how many papers may need to be printed for the public, it does not take into account the potential profit or loss that the vendor may undertake. The logarithmic method will show at what point the salesman optimal peak profit will be. The logarithmic curve is exponential and will ultimately determine the peak profit and printing point at which the business will succeed. This solution is meant to determine the optimal solution from a profit standpoint.<ref name=":0" />,<ref name=":1" />

== Applications ==
Beyond the namesake example of the newsvendor problem, the newsvendor problem model can be applied to a variety of other discrete optimization problems.

=== Personal Investments ===
The tradeoff between tying funds up in a stock against holding cash reserves follows the model of the newsvendor problem because putting too much much of your money in stocks could lead to having to sell stocks undervalue to free up cash while holding too much money in cash reserves could lead to money that is under performing. The newsvendor problem can help investors find an optimal way to allow to minimize risk while allowing enough opportunity to create a large gain. With recent trends of market volatility, evaluating cash positions and market exposure has become ever more important.<ref name=":2">Birge, J. and Louveaux, F. Introduction to Stochastic Programming, Springer, 2011.</ref>

=== Emergency Resources ===
The amount of emergency resources to hold on hand follows the model of the newsvendor problem because holding too many emergency resources could mean throwing out expensive inventory if there is no emergency while not having enough emergency resources could be disastrous in times of peril. Emergencies have the same tendencies of an unknown market. The first responders need to have an optimal amount of supplies to maximize their effectiveness. If items that are perishable are sent in mass quantities, it can bog down the supply lines and lead to important resources becoming expired.<ref name=":2" />

=== Manufacturing ===
The amount of units of a good to manufacture follows the model of the newsvendor problem because while overproduction would always meet demand, production costs increase and storage costs are introduced for the excess inventory. Manufacturers and wholesalers often rely on razor thin margins. By understanding how to limit excess storage and money that it puts out into the materials themselves the business can find an accurate way of maximizing the cash flow. Inventory is often one of the crippling factors of a business. Businesses often can save money on individual units by producing larger quantities, but this ultimately eats away at having a strong cash position to address the concerns of a changing market.<ref name=":2" />

=== Real Estate ===
House pricing in the real estate market follows the model of the newsvendor problem because if a house is priced too high it will take too long to sell and if the house is priced too low it will sell quickly but at lower price. The housing market is another investment that is exposed to a great deal of volatility and increased market risk. Markets can change rapidly from economic situations to also the crime, schools, and locations around a property. By understanding the market norms, one can find the adequate pricing method for a home using the newsvendor problem algorithm. Appraisers and realtors must focus on understanding these metrics to ensure the estimates are accurate.<ref name=":2" />

== Conclusion ==
The newsboy formulation is used to optimize the amount of profit while minimizing the excess materials that hold no value after a given period of time. This formulation can be adapted for different probabilities and distributions of expected sales. Additionally, nuances such as accounting for a salvage price for unsold perishable goods can also be added to the problem for added complexity to mimic a given situation. From that, the salesperson can determine how many of a perishable product should be purchased for resale at a given time in order to optimize their profits.

== References ==

Set covering problem

2020-12-21T11:36:18Z

Wc593:

Authors: Sherry Liang, Khalid Alanazi, Kumail Al Hamoud (ChemE 6800 Fall 2020)

== Introduction ==

The set covering problem is a significant NP-hard problem in combinatorial optimization. Given a collection of elements, the set covering problem aims to find the minimum number of sets that incorporate (cover) all of these elements. <ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref>

The set covering problem importance has two main aspects: one is pedagogical, and the other is practical.

First, because many greedy approximation methods have been proposed for this combinatorial problem, studying it gives insight into the use of approximation algorithms in solving NP-hard problems. Thus, it is a primal example in teaching computational algorithms. We present a preview of these methods in a later section, and we refer the interested reader to these references for a deeper discussion. <ref name="one" /> <ref name="seven"> P. Slavı́k, [https://www.sciencedirect.com/science/article/abs/pii/S0196677497908877 "A Tight Analysis of the Greedy Algorithm for Set Cover]," ''Journal of Algorithms,'', vol. 25, pp. 237-245, 1997. </ref> <ref name="nine"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "What Is the Best Greedy-like Heuristic for the Weighted Set Covering Problem?]," ''Operations Research Letters'', vol. 44, pp. 366-369, 2016. </ref>

Second, many problems in different industries can be formulated as set covering problems. For example, scheduling machines to perform certain jobs can be thought of as covering the jobs. Picking the optimal location for a cell tower so that it covers the maximum number of customers is another set covering application. Moreover, this problem has many applications in the airline industry, and it was explored on an industrial scale as early as the 1970s. <ref name="two"> J. Rubin, [https://www.jstor.org/stable/25767684?seq=1 "A Technique for the Solution of Massive Set Covering Problems, with Application to Airline Crew Scheduling]," ''Transportation Science'', vol. 7, pp. 34-48, 1973. </ref>

== Problem formulation ==
In the set covering problem, two sets are given: a set <math> U </math> of elements and a set <math> S </math> of subsets of the set <math> U </math>. Each subset in <math> S </math> is associated with a predetermined cost, and the union of all the subsets covers the set <math> U </math>. This combinatorial problem then concerns finding the optimal number of subsets whose union covers the universal set while minimizing the total cost.<ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref> <ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>

The mathematical formulation of the set covering problem is define as follows. We define <math> U </math> = { <math> u_i,..., u_m </math>} as the universe of elements and <math> S </math> = { <math> s_i,..., s_n </math>} as a collection of subsets such that <math> s_i \subset U </math> and the union of <math> s_i</math> covers all elements in <math> U </math> (i.e. <math>\cup</math><math> s_i</math> = <math> U </math> ). Addionally, each set <math> s_i</math> must cover at least one element of <math> U </math> and has associated cost <math> c_i</math> such that <math> c_i > 0</math>. The objective is to find the minimum cost sub-collection of sets <math> X </math> <math>\subset</math> <math> S </math> that covers all the elements in the universe <math> U </math>.

== Integer linear program formulation ==
An integer linear program (ILP) model can be formulated for the minimum set covering problem as follows:<ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref>

'''Decision variables'''

<math> y_i = \begin{cases} 1, & \text{if subset }i\text{ is selected} \\ 0, & \text{otherwise } \end{cases}</math>

'''Objective function'''

minimize <math>\sum_{i=1}^n c_i y_i</math>

'''Constraints '''

<math> \sum_{i=1}^n y_i \geq 1, \forall i= 1,....,m</math>

<math> y_i \in \{0, 1\}, \forall i = 1,....,n</math>

The objective function <math>\sum_{i=1}^n c_i y_i</math> is defined to minimize the number of subset <math> s_i</math> that cover all elements in the universe by minimizing their total cost. The first constraint implies that every element <math> i </math> in the universe <math> U </math> must be be covered and the second constraint <math> y_i \in \{0, 1\} </math> indicates that the decision variables are binary which means that every set is either in the set cover or not.

Set covering problems are significant NP-hard optimization problems, which implies that as the size of the problem increases, the computational time to solve it increases exponentially. Therefore, there exist approximation algorithms that can solve large scale problems in polynomial time with optimal or near-optimal solutions. In subsequent sections, we will cover two of the most widely used approximation methods to solve set cover problem in polynomial time which are linear program relaxation methods and classical greedy algorithms. <ref name="seven" />

== Approximation via LP relaxation and rounding ==
Set covering is a classical integer programming problem and solving integer program in general is NP-hard. Therefore, one approach to achieve an <math> O</math>(log<math>n</math>) approximation to set covering problem in polynomial time is solving via linear programming (LP) relaxation algorithms <ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref> <ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>. In LP relaxation, we relax the integrality requirement into a linear constraints. For instance, if we replace the constraints <math> y_i \in \{0, 1\}</math> with the constraints <math> 0 \leq y_i \leq 1 </math>, we obtain the following LP problem that can be solved in polynomial time:

minimize <math>\sum_{i=1}^n c_i y_i</math>

subject to <math> \sum_{i=1}^n y_i \geq 1, \forall i= 1,....,m</math>

<math> 0 \leq y_i\leq 1, \forall i = 1,....,n</math>

The above LP formulation is a relaxation of the original ILP set cover problem. This means that every feasible solution of the integer program is also feasible for this LP program. Additionally, the value of any feasible solution for the integer program is the same value in LP since the objective functions of both integer and linear programs are the same. Solving the LP program will result in an optimal solution that is a lower bound for the original integer program since the minimization of LP finds a feasible solution of lowest possible values. Moreover, we use LP rounding algorithms to directly round the fractional LP solution to an integral combinatorial solution as follows:
 

'''Deterministic rounding algorithm'''
 

Suppose we have an optimal solution <math> z^* </math> for the linear programming relaxation of the set cover problem. We round the fractional solution <math> z^* </math> to an integer solution <math> z </math> using LP rounding algorithm. In general, there are two approaches for rounding algorithms, deterministic and randomized rounding algorithm. In this section, we will explain the deterministic algorithms. In this approach, we include subset <math> s_i </math> in our solution if <math> z^* \geq 1/d </math>, where <math> d </math> is the maximum number of sets in which any element appears. In practice, we set <math> z </math> to be as follows:<ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>

<math> z = \begin{cases} 1, & \text{if } z^*\geq 1/d \\ 0, & \text{otherwise } \end{cases}</math>

The rounding algorithm is an approximation algorithm for the set cover problem. It is clear that the algorithm converge in polynomial time and <math> z </math> is a feasible solution to the integer program.

== Greedy approximation algorithm ==
Greedy algorithms can be used to approximate for optimal or near-optimal solutions for large scale set covering instances in polynomial solvable time. <ref name="seven" /> <ref name="nine" /> The greedy heuristics applies iterative process that, at each stage, select the largest number of uncovered elements in the universe <math> U </math>, and delete the uncovered elements, until all elements are covered. <ref name="ten"> V. Chvatal, [https://pubsonline.informs.org/doi/abs/10.1287/moor.4.3.233 "Greedy Heuristic for the Set-Covering Problem]," ''Mathematics of Operations Research'', vol. 4, pp. 233-235, 1979. </ref> Let <math> T </math> be the set that contain the covered elements, and <math> U </math> be the set that contain the elements of <math> Y </math> that still uncovered. At the beginning of the iteration, <math> T </math> is empty and all elements <math> Y \in U </math>. We iteratively select the set of <math> S </math> that covers the largest number of elements in <math> U </math> and add it to the covered elements in <math> T </math>. An example of this algorithm is presented below.

'''Greedy algorithm for minimum set cover example: '''

Step 0: <math> \quad </math> <math> T \in \Phi </math> <math> \quad \quad \quad \quad \quad </math> { <math> T </math> stores the covered elements }

Step 1: <math> \quad </math> '''While''' <math> U \neq \Phi </math> '''do:''' <math> \quad </math> { <math> U </math> stores the uncovered elements <math> Y </math>}

Step 2: <math> \quad \quad \quad </math> select <math> s_i \in S </math> that covers the highest number of elements in <math> U </math>

Step 3: <math> \quad \quad \quad </math> add <math> s_i </math> to <math> T </math>

Step 4: <math> \quad \quad \quad </math> remove <math> s_i </math> from <math> U </math>

Step 5: <math> \quad </math> '''End while'''

Step 6: <math> \quad </math> '''Return''' <math> S </math>

==Numerical Example==
Let’s consider a simple example where we assign cameras at different locations. Each location covers some areas of stadiums, and our goal is to put the least amount of cameras such that all areas of stadiums are covered. We have stadium areas from 1 to 15, and possible camera locations from 1 to 8.

We are given that camera location 1 covers stadium areas {1,3,4,6,7}, camera location 2 covers stadium areas {4,7,8,12}, while the remaining camera locations and the stadium areas that the cameras can cover are given in table 1 below:
{| class="wikitable"
|+Table 1 Camera Location vs Stadium Area
|-
!camera Location
|1
|2
|3
|4
|5
|6
|7
|8
|-
!stadium area
|1,3,4,6,7
|4,7,8,12
|2,5,9,11,13
|1,2,14,15
|3,6,10,12,14
|8,14,15
|1,2,6,11
|1,2,4,6,8,12
|}

We can then represent the above information using binary values. If the stadium area <math>i</math> can be covered with camera location <math>j</math>, then we have <math>y_{ij} = 1</math>. If not,<math>y_{ij} = 0</math>. For instance, stadium area 1 is covered by camera location 1, so <math>y_{11} = 1</math>, while stadium area 1 is not covered by camera location 2, so <math>y_{12} = 0</math>. The binary variables <math>y_{ij}</math> values are given in the table below:
{| class="wikitable"
|+Table 2 Binary Table (All Camera Locations and Stadium Areas)
!
!Camera1
!Camera2
!Camera3
!Camera4
!Camera5
!Camera6
!Camera7
!Camera8
|-
!Stadium1
|1
|
|
|1
|
|
|1
|1
|-
!Stadium2
|
|
|1
|1
|
|
|1
|1
|-
!Stadium3
|1
|
|
|
|1
|
|
|
|-
!Stadium4
|1
|1
|
|
|
|
|
|1
|-
!Stadium5
|
|
|1
|
|
|
|
|
|-
!Stadium6
|1
|
|
|
|1
|
|1
|1
|-
!Stadium7
|1
|1
|
|
|
|
|
|
|-
!Stadium8
|
|1
|
|
|
|1
|
|1
|-
!Stadium9
|
|
|1
|
|
|
|
|
|-
!Stadium10
|
|
|
|
|1
|
|
|
|-
!Stadium11
|
|
|1
|
|
|
|1
|
|-
!Stadium12
|
|1
|
|
|1
|
|
|1
|-
!Stadium13
|
|
|1
|
|
|
|
|
|-
!Stadium14
|
|
|
|1
|1
|1
|
|
|-
!Stadium15
|
|
|
|1
|
|1
|
|
|}

We introduce another binary variable <math>z_j</math> to indicate if a camera is installed at location <math>j</math>. <math>z_j = 1</math> if camera is installed at location <math>j</math>, while <math>z_j = 0</math> if not.

Our objective is to minimize <math>\sum_{j=1}^8 z_j</math>. For each stadium, there’s a constraint that the stadium area <math>i</math> has to be covered by at least one camera location. For instance, for stadium area 1, we have <math>z_1 + z_4 + z_7 + z_8 \geq 1</math>, while for stadium 2, we have <math>z_3 + z_4 + z_7 + z_8 \geq 1</math>. All the 15 constraints that corresponds to 15 stadium areas are listed below:

minimize <math>\sum_{j=1}^8 z_j</math>

''s.t. Constraints 1 to 15 are satisfied:''

<math> z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math> z_3 + z_4 + z_7 + z_8 \geq 1 (2)</math>

<math> z_1 + z_5 \geq 1 (3)</math>

<math> z_1 + z_2 + z_8 \geq 1 (4)</math>

<math> z_3 \geq 1 (5)</math>

<math>z_1 + z_5 + z_7 + z_8 \geq 1 (6)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_3 \geq 1 (9)</math>

<math>z_5 \geq 1 (10)</math>

<math>z_3 + z_7 \geq 1 (11)</math>

<math>z_2 + z_5 + z_8 \geq 1 (12)</math>

<math>z_3 \geq 1 (13)</math>

<math>z_4 + z_5 + z_6 \geq 1 (14)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

From constraint {5,9,13}, we can obtain <math>z_3 = 1</math>. Thus we no longer need constraint 2 and 11 as they are satisfied when <math>z_3 = 1</math>. With <math>z_3 = 1</math> determined, the constraints left are:

minimize <math>\sum_{j=1}^8 z_j</math>,

s.t.:

<math>z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math>z_1 + z_5 \geq 1 (3)</math>

<math>z_1 + z_2 + z_8 \geq 1 (4)</math>

<math>z_1 + z_5 + z_7 + z_8 \geq 1 (6)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_5 \geq 1 (10)</math>

<math>z_2 + z_5 + z_8 \geq 1 (12)</math>

<math>z_4 + z_5 + z_6 \geq 1 (14)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

Now if we take a look at constraint <math>10. z_5 \geqslant 1</math> so <math>z_5</math> shall equal to 1. As <math>z_5 = 1</math>, constraint {3,6,12,14} are satisfied no matter what other <math>z</math> values are taken. If we also take a look at constraint 7 and 4, if constraint 4 will be satisfied as long as constraint 7 is satisfied since <math>z</math> values are nonnegative, so constraint 4 is no longer needed. The remaining constraints are:

minimize <math>\sum_{j=1}^8 z_j</math>

s.t.:

<math>z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

The next step is to focus on constraint 7 and 15. We can have at least 4 combinations of <math>z_1, z_2, z_4, z_6</math>values.

<math>A: z_1 = 1, z_2 = 0, z_4 = 1, z_6 = 0</math>

<math>B: z_1 = 1, z_2 = 0, z_4 = 0, z_6 = 1</math>

<math>C: z_1 = 0, z_2 = 1, z_4 = 1, z_6 = 0</math>

<math>D: z_1 = 0, z_2 = 1, z_4 = 0, z_6 = 1</math>

We can then discuss each combination and determine <math>z_7, z_8</math>values for constraint 1 and 8 to be satisfied.

Combination <math>A</math>: constraint 1 already satisfied, we need <math>z_8 = 1</math> to satisfy constraint 8.

Combination <math>B</math>: constraint 1 already satisfied, constraint 8 already satisfied.

Combination <math>C</math>: constraint 1 already satisfied, constraint 8 already satisfied.

Combination <math>D</math>: we need <math>z_7 = 1</math> or <math>z_8 = 1</math> to satisfy constraint 1, while constraint 8 already satisfied.

Our final step is to compare the four combinations. Since our objective is to minimize <math>\sum_{j=1}^8 z_j</math> and combinations <math>B</math> and <math>C</math> require the least amount of <math>z_j</math> to be 1, they are the optimal solutions.

To conclude, our two solutions are:

<math>Solution 1: z_1 = 1, z_3 = 1, z_5 = 1, z_6 = 1</math>

<math>Solution 2: z_2 = 1, z_3 = 1, z_4 = 1, z_5 = 1</math>

The minimum number of cameras that we need to install is 4.

'''Let's now consider solving the problem using the greedy algorithm.'''

We have a set <math>U</math> (stadium areas) that needs to be covered with <math>C</math> (camera locations).

<math>U = \{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\}</math>

<math>C = \{C_1,C_2,C_3,C_4,C_5,C_6,C_7,C_8\}</math>

<math>C_1 = \{1,3,4,6,7\} </math>

<math>C_2 = \{4,7,8,12\}</math>

<math>C_3 = \{2,5,9,11,13\}</math>

<math>C_4 = \{1,2,14,15\}</math>

<math>C_5 = \{3,6,10,12,14\}</math>

<math>C_6 = \{8,14,15\}</math>

<math>C_7 = \{1,2,6,11\}</math>

<math>C_8 = \{1,2,4,6,8,12\} </math>

The cost of each Camera Location is the same in this case, we just hope to minimize the total number of cameras used, so we can assume the cost of each <math>C</math> to be 1.

Let <math>I</math> represents set of elements included so far. Initialize <math>I</math> to be empty.

First Iteration:

The per new element cost for <math>C_1 = 1/5</math>, for <math>C_2 = 1/4</math>, for <math>C_3 = 1/5</math>, for <math>C_4 = 1/4</math>, for <math>C_5 = 1/5</math>, for <math>C_6 = 1/3</math>, for <math>C_7 = 1/4</math>, for <math>C_8 = 1/6</math>

Since <math>C_8</math> has minimum value, <math>C_8</math> is added, and <math>I</math> becomes <math>\{1,2,4,6,8,12\}</math>.

Second Iteration:

<math>I</math> = <math>\{1,2,4,6,8,12\}</math>

The per new element cost for <math>C_1 = 1/2</math>, for <math>C_2 = 1/1</math>, for <math>C_3 = 1/4</math>, for <math>C_4 = 1/2</math>, for <math>C_5 = 1/3</math>, for <math>C_6 = 1/2</math>, for <math>C_7 = 1/1</math>

Since <math>C_3</math> has minimum value, <math>C_3</math> is added, and <math>I</math> becomes <math>\{1,2,4,5,6,8,9,11,12,13\}</math>.

Third Iteration:

<math>I</math> = <math>\{1,2,4,5,6,8,9,11,12,13\}</math>

The per new element cost for <math>C_1 = 1/2</math>, for <math>C_2 = 1/1</math>, for <math>C_4 = 1/2</math>, for <math>C_5 = 1/3</math>, for <math>C_6 = 1/2</math>, for <math>C_7 = 1/1</math>

Since <math>C_5</math> has minimum value, <math>C_5</math> is added, and <math>I</math> becomes <math>\{1,2,3,4,5,6,8,9,10,11,12,13,14\}</math>.

Fourth Iteration:

<math>I</math> = <math>\{1,2,3,4,5,6,8,9,10,11,12,13,14\}</math>

The per new element cost for <math>C_1 = 1/1</math>, for <math>C_2 = 1/1</math>, for <math>C_4 = 1/0</math>, for <math>C_6 = 1/1</math>, for <math>C_7 = 1/0</math>

Since <math>C_1</math>, <math>C_2</math>, <math>C_6</math> all have meaningful and the same values, we can choose either both <math>C_1</math> and <math>C_6</math> or both <math>C_2</math> and <math>C_6</math>, as <math>C_1</math> or <math>C_2 </math> add <math>7</math> to <math>I</math>, and <math>C_6</math> add <math>15</math> to <math>I</math>.

<math>I</math> becomes <math>\{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\}</math>.

The solution we obtained is:

Option 1: <math>C_8</math> + <math>C_3</math> + <math>C_5</math> + <math>C_6</math> + <math>C_1</math>

Option 2: <math>C_8</math> + <math>C_3</math> + <math>C_5</math> + <math>C_6</math> + <math>C_2</math>

The greedy algorithm does not provide the optimal solution in this case.

The usual elimination algorithm would give us the minimum number of cameras that we need to install to be4, but the greedy algorithm gives us the minimum number of cameras that we need to install is 5.

== Applications==

The applications of the set covering problem span a wide range of applications, but its usefulness is evident in industrial and governmental planning. Variations of the set covering problem that are of practical significance include the following.
;The optimal location problem

This set covering problems is concerned with maximizing the coverage of some public facilities placed at different locations. <ref name="three"> R. Church and C. ReVelle, [https://link.springer.com/article/10.1007/BF01942293 "The maximal covering location problem]," ''Papers of the Regional Science Association'', vol. 32, pp. 101-118, 1974. </ref> Consider the problem of placing fire stations to serve the towns of some city. <ref name="four"> E. Aktaş, Ö. Özaydın, B. Bozkaya, F. Ülengin, and Ş. Önsel, [https://pubsonline.informs.org/doi/10.1287/inte.1120.0671 "Optimizing Fire Station Locations for the Istanbul Metropolitan Municipality]," ''Interfaces'', vol. 43, pp. 240-255, 2013. </ref> If each fire station can serve its town and all adjacent towns, we can formulate a set covering problem where each subset consists of a set of adjacent towns. The problem is then solved to minimize the required number of fire stations to serve the whole city.

Let <math> y_i </math> be the decision variable corresponding to choosing to build a fire station at town <math> i </math>. Let <math> S_i </math> be a subset of towns including town <math> i </math> and all its neighbors. The problem is then formulated as follows.

minimize <math>\sum_{i=1}^n y_i</math>

such that <math> \sum_{i\in S_i} y_i \geq 1, \forall i</math>

A real-world case study involving optimizing fire station locations in Istanbul is analyzed in this reference. <ref name="four" /> The Istanbul municipality serves 790 subdistricts, which should all be covered by a fire station. Each subdistrict is considered covered if it has a neighboring district (a district at most 5 minutes away) that has a fire station. For detailed computational analysis, we refer the reader to the mentioned academic paper.
; The optimal route selection problem

Consider the problem of selecting the optimal bus routes to place pothole detectors. Due to the scarcity of the physical sensors, the problem does not allow for placing a detector on every road. The task of finding the maximum coverage using a limited number of detectors could be formulated as a set covering problem. <ref name="five"> J. Ali and V. Dyo, [https://www.scitepress.org/Link.aspx?doi=10.5220/0006469800830088 "Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach]," ''Proceedings of the 14th International Joint Conference on E-Business and Telecommunications'', pp. 83-88, 2017. </ref> <ref name="eleven"> P.H. Cruz Caminha , R. De Souza Couto , L.H. Maciel Kosmalski Costa , A. Fladenmuller , and M. Dias de Amorim, [https://www.mdpi.com/1424-8220/18/6/1976 "On the Coverage of Bus-Based Mobile Sensing]," ''Sensors'', 2018. </ref> Specifically, giving a collection of bus routes '''''R''''', where each route itself is divided into segments. Route <math> i </math> is denoted by <math> R_i </math>, and segment <math> j </math> is denoted by <math> S_j </math>. The segments of two different routes can overlap, and each segment is associated with a length <math> a_j </math>. The goal is then to select the routes that maximize the total covered distance.

This is quite different from other applications because it results in a maximization formulation, rather than a minimization formulation. Suppose we want to use at most <math> k </math> different routes. We want to find <math> k </math> routes that maximize the length of of covered segments. Let <math> x_i </math> be the binary decision variable corresponding to selecting route <math> R_i </math>, and let <math> y_j </math> be the decision variable associated with covering segment <math> S_j </math>. Let us also denote the set of routes that cover segment <math> j </math> by <math> C_j </math>. The problem is then formulated as follows.

<math>
\begin{align}
\text{max} & ~~ \sum_{j} a_jy_j\\
\text{s.t} & ~~ \sum_{i\in C_j} x_i \geq y_j \quad \forall j \\
& ~~ \sum_{i} x_i = k \\
& ~~ x_i,y_{j} \in \{0,1\} \\
\end{align}
</math>

The work by Ali and Dyo explores a greedy approximation algorithm to solve an optimal selection problem including 713 bus routes in Greater London. <ref name="five" /> Using 14% of the routes only (100 routes), the greedy algorithm returns a solution that covers 25% of the segments in Greater London. For a details of the approximation algorithm and the world case study, we refer the reader to this reference. <ref name="five" /> For a significantly larger case study involving 5747 buses covering 5060km, we refer the reader to this academic article. <ref name="eleven" />
;The airline crew scheduling problem

An important application of large-scale set covering is the airline crew scheduling problem, which pertains to assigning airline staff to work shifts. <ref name="two" /> <ref name="six"> E. Marchiori and A. Steenbeek, [https://link.springer.com/chapter/10.1007/3-540-45561-2_36 "An Evolutionary Algorithm for Large Scale Set Covering Problems with Application to Airline Crew Scheduling]," ''Real-World Applications of Evolutionary Computing. EvoWorkshops 2000. Lecture Notes in Computer Science'', 2000. </ref> Thinking of the collection of flights as a universal set to be covered, we can formulate a set covering problem to search for the optimal assignment of employees to flights. Due to the complexity of airline schedules, this problem is usually divided into two subproblems: crew pairing and crew assignment. We refer the interested reader to this survey, which contains several problem instances with the number of flights ranging from 1013 to 7765 flights, for a detailed analysis of the formulation and algorithms that pertain to this significant application. <ref name="two" /> <ref name="eight"> A. Kasirzadeh, M. Saddoune, and F. Soumis [https://www.sciencedirect.com/science/article/pii/S2192437620300820?via%3Dihub "Airline crew scheduling: models, algorithms, and data sets]," ''EURO Journal on Transportation and Logistics'', vol. 6, pp. 111-137, 2017. </ref>

==Conclusion ==

The set covering problem, which aims to find the least number of subsets that cover some universal set, is a widely known NP-hard combinatorial problem. Due to its applicability to route planning and airline crew scheduling, several methods have been proposed to solve it. Its straightforward formulation allows for the use of off-the-shelf optimizers to solve it. Moreover, heuristic techniques and greedy algorithms can be used to solve large-scale set covering problems for industrial applications.

== References ==
<references />

Facility location problem

2020-12-21T11:35:38Z

Wc593:

Authors: Liz Cantlebary, Lawrence Li (ChemE 6800 Fall 2020)

== Introduction ==
The Facility Location Problem (FLP) is a classic optimization problem that determines the best location for a factory or warehouse to be placed based on geographical demands, facility costs, and transportation distances. These problems generally aim to maximize the supplier's profit based on the given customer demand and location(1). FLP can be further broken down into capacitated and uncapacitated problems, depending on whether the facilities in question have a maximum capacity or not(2).

== Theory and Formulation ==

=== Weber Problem and Single Facility FLPs ===
The Weber Problem is a simple FLP that consists of locating the geometric median between three points with different weights. The geometric median is a point between three given points in space such that the sum of the distances between the median and the other three points is minimized. It is based on the premise of minimizing transportation costs from one point to various destinations, where each destination has a different associated cost per unit distance.

Given <math>N</math> points <math>(a_1,b_1)...(a_N,b_N)</math> on a plane with associated weights <math>w_1...w_N</math>, the 2-dimensional Weber problem to find the geometric median <math>(x,y)</math> is formulated as(1)

<math>\min\begin{align} W(x,y) = \sum_{i=1}^Nw_id_i(x,y,a_i,b_i)\\ \end{align}</math>

where

<math>d_i(x,y,a_i,b_i)=\sqrt{(x-a_i)^2+(y-b_i)^2}</math>

The above formulation serves as a foundation for many basic single facility FLPs. For example, the minisum problem aims to locate a facility at the point that minimizes the sum of the weighted distances to the given set of existing facilities, while the minimax problem consists of placing the facility at the point that minimizes the maximum weighted distance to the existing facilities(3). Additionally, in contrast to the minimax problem, the maximin facility problem maximizes the minimum weighted distance to the given facilities.

=== Capacitated and Uncapacitated FLPs ===
FLPs can often be formulated as mixed-integer programs (MIPs), with a fixed set of facility and customer locations. Binary variables are used in these problems to represent whether a certain facility is open or closed and whether that facility can supply a certain customer. Capacitated and uncapacitated FLPs can be solved this way by defining them as integer programs.

A capacitated facility problem applies constraints to the production and transportation capacity of each facility. As a result, customers may not be supplied by the most immediate facility, since this facility may not be able to satisfy the given customer demand.

In a problem with <math>N</math> facilities and <math>M</math> customers, the capacitated formulation defines a binary variable <math>x_i</math> and a variable <math>y_{ij}</math> for each facility <math>i</math> and each customer <math>j</math>. If facility <math>i</math> is open, <math>x_i=1</math>; otherwise <math>x_i=0</math>. Open facilities have an associated fixed cost <math>f_i</math> and a maximum capacity <math>k_i</math>. <math>y_{ij}</math> is the fraction of the total demand <math>d_j</math> of customer <math>j</math> that facility <math>i</math> has satisfied and the transportation cost between facility <math>i</math> and customer <math>j</math> is represented as <math>t_{ij}</math>. The capacitated FLP is therefore defined as(2)

<math>\min\ \sum_{i=1}^N\sum_{j=1}^Md_jt_{ij}y_{ij}+\sum_{i=1}^Nf_ix_i</math>

<math>s.t.\ \sum_{i=1}^Ny_{ij}=1\ \ \forall\, j\in\{1,...,M\}</math>

<math>\quad \quad \sum_{j=1}^Md_jy_{ij}\leq k_ix_i\ \ \forall\, i\in\{1,...,N\}</math>

<math>\quad \quad y_{ij}\geq0\ \ \forall\, i\in\{1,...,N\},\ \forall\, j\in\{1,...,M\}</math>

<math>\quad \quad x_i\in\{0,1\}\ \ \forall\, i\in\{1,...,N\}</math>

In an uncapacitated facility problem, the amount of product each facility can produce and transport is assumed to be unlimited, and the optimal solution results in customers being supplied by the lowest-cost, and usually the nearest, facility. Using the above formulation, the unlimited capacity means <math>k_i</math> can be assumed to be a sufficiently large constant, while <math>y_{ij}</math> is now a binary variable, because the demand of each customer can be fully met with the nearest facility(2). If facility <math>i</math> supplies customer <math>j</math>, then <math>y_{ij}=1</math>; otherwise <math>y_{ij}=0</math>.

=== Approximate and Exact Algorithms ===
A variety of approximate algorithms can be used to solve facility location problems. These algorithms terminate after a given number of steps based on the size of the problem, yielding a feasible solution with an error that does not exceed a constant approximation ratio(4). This ratio <math>r</math> indicates that the approximate solution is no greater than the exact solution by a factor of <math>r</math>.

While greedy algorithms generally do not perform well on FLPs, the primal-dual greedy algorithm presented by Jain and Vazirani tends to be faster in solving the uncapacitated FLP than LP-rounding algorithms, which solve the LP relaxation of the integer formulation and round the fractional results(4). The Jain-Vazirani algorithm computes the primal and the dual to the LP relaxation simultaneously and guarantees a constant approximation ratio of 1.861(5). This solver has a running time complexity of <math>O(m\log m)</math>, where <math>m</math> corresponds to the number of edges between facilities and cities. Improving upon this primal-dual approach, the modified Jain-Mahdian-Saberi algorithm guarantees a better approximation ratio for the uncapacitated problem(5).

To solve the capacitated FLP, which often contains more complex constraints, many algorithms utilize a Lagrangian decomposition(6), first introduced by Held and Karp in the traveling salesman problem(7). This approach allows constraints to be relaxed by penalizing this relaxation while solving a simplified problem. The capacitated problem has been effectively solved using this Lagrangian relaxation in conjunction with the volume algorithm, which is a variation of subgradient optimization presented by Barahona and Anbil(8).

Exact methods have also been presented for solving FLPs. To solve the <math>p
</math>-median capacitated facility location problem, Ceselli introduces a branch-and-bound method that solves a Lagrangian relaxation with subgradient optimization, as well as a separate branch-and-price algorithm that utilizes column generation(9). Ceselli's work indicates that branch-and-bound works well when the ratio of <math>p
</math> sites to <math>N</math> customers is low, but the performance and run-time worsen significantly as this ratio increases. In comparison, the branch-and-price method demonstrates much more stable performance across various problem sizes and is generally faster overall.

== Numerical Example ==
Suppose a paper products manufacturer has enough capital to build and manage an additional manufacturing plant in the United States in order to meet increased demand in three cities: New York City, NY, Los Angeles, CA, and Topeka, KS. The company already has distribution facilities in Denver, CO, Seattle, WA, and St. Louis, MO, and due to limited capital, cannot build an additional distribution facility. So, they must choose to build their new plant in one of these three locations. Due to geographic constraints, plants in Denver, Seattle, and St. Louis would have a maximum operating capacity of 400 tons/day, 700 tons/day, and 600 tons/day, respectively. The cost of transporting the products from the plant to the city is directly proportional, and an outline of the supply, demand, and cost of transportation is shown in the figure below. Regardless of where the plant is built, the selling price of the product is $100/ton.
[[File:Example.png|center|780x780px]]
'''Exact Solution'''

To solve this problem, we will assign the following variables:

<math>i</math> is the factory location

<math>j</math> is the city destination

<math>C_{ij}</math> is the cost of transporting one ton of product from the factory to the city

<math>x_{ij}</math> is the amount of product transported from the factory to the city in tons

<math>A_i</math> is the maximum operating capacity at the factory

<math>D_j</math> is the amount of unmet demand in the city

To determine where the company should build the factory, we will carry out the following optimization problem for each location to maximize the profit from each ton sold:

max <math>\sum_{j\in J}x_{ij}(100-C_{ij}) </math>

subject to

<math>\sum_{j\in J}x_{ij} \leq A_i </math> <math>\forall i\in I</math>

<math>\sum_{i\in I}x_{ij} \leq D_j</math> <math>\forall j\in J</math>

<math>x_{ij} \geq 0 </math> <math>\forall i \in I,</math> <math>\forall j \in J</math>

The problem is solved in GAMS (General Algebraic Modeling System).

If the factory is built in Denver, 300 tons/day of product go to Los Angeles and 100 tons/day go to Topeka, for a total profit of $36,300/day.

If the factory is built in Seattle, 300 tons/day of product go to Los Angela, 100 tons/day of product go to Topeka, and 300 tons/day go to New York City, for a total profit of $56,500/day.

If the factory is built in St. Louis, 100 tons/day of product go to Topeka and 500 tons/day go to New York City, for a total profit of $55,200/day.

Therefore, to maximize profit, the factory should be built in Seattle.

'''Approximate Solution'''

This example can also be solved approximately through the branch and bound method. The tree diagram showing the optimization is shown below.

[[File:Branch and bound.png|center|frame|Branch and bound approach]]
As shown in the tree diagram, building factories in both Denver and St. Louis would yield the highest profit of $82,200/day. Unfortunately, the company only has enough capital to build one facility. As a result of this, the only acceptable values are those in which one value is "1" and two are "0". Based on this constraint, it is clear that the company should build the factory in Seattle, as shown in the exact solution above. However, this also yields valuable information if the company hopes to expand again in the near future, because building a factories in St. Louis and Denver is more profitable than building factories in Seattle and Denver or Seattle and St. Louis. Depending on company projections, it may be a better decision to build the first factory St. Louis and aim to build an additional factory in Denver as soon as possible.

== Applications ==
[[File:BadranElHaggarFacilityLocation.jpg|thumb|321x321px|Map of optimal collection stations in Port Said, Egypt(12).]]
Facility location problems are utilized in many industries to find the optimal placement of various facilities, including warehouses, power plants, public transportation terminals, polling locations, and cell towers, to maximize efficiency, impact, and profit. In more unique applications, extensive research has been done in applying FLPs to humanitarian efforts, such as identifying disaster management sites to maximize accessibility to healthcare and treatment(10). A case study by researchers in Nigeria explored the application of mixed-integer FLPs in optimizing the locations of waste collection centers to provide sanitation services in crucial communities. More effective waste collection systems could combat unsanitary practices and environmental pollution, which are major concerns in many developing nations(11). For example, Badran and El-Haggar proposed a solid waste management system for Port Said, Egypt, implementing a mixed-integer program to optimally place waste collection stations and minimize cost(12). This program was formulated to select collection stations from a set of locations such that the sum of the fixed cost of opening collections stations, the operating costs of the collection stations, and the transportation costs from the collection stations to the composting plants is minimized.

FLPs have also been used in clustering analysis, which involves partitioning a given set of elements (e.g. data points) into different groups based on the similarity of the elements. The elements can be placed into groups by identifying the locations of center points that effectively partition the set into clusters, based on the distances from the center points to each element(13). For example, the <math>k</math>-median clustering problem can be formulated as a FLP that selects a set of <math>k</math> cluster centers to minimize the cost between each point and its closest center. The cost in this problem is represented as the Euclidean distance <math>d(i,j)</math> between a point <math>i</math> and a proposed cluster center <math>j</math>. The problem can be formulated as the following integer program, which selects <math>k</math> centers from a set of <math>N</math> points(13).

<math>\min\ \sum_{i=1}^N x_{ij}d(ij)</math>

<math>s.t.\ \sum_{j=1}^Ny_j\leq k</math>

<math>\quad \quad \sum_{j=1}^Nx_{ij}=1</math>

<math>\quad \quad x_{ij}\leq y_j</math>

<math>\quad \quad x_{ij}, y_j\in\{0,1\}</math>

In this formulation, the binary variables <math>y_j</math> and <math>x_{ij}</math> represent whether <math>j</math> is used as a center point and whether <math>j</math> is the optimal center for <math>i</math>, respectively. The <math>k</math>-median problem is NP-hard and is commonly solved using approximation algorithms. One of the most effective algorithms to date, proposed by Byrka et al., has an approximation factor of 2.611(13).

== Conclusion ==
The facility location problem is an important application of computational optimization. The uses of this optimization technique are far-reaching, and can be used to determine anything from where a family should live based on the location of their workplaces and school to where a Fortune 500 company should put a new manufacturing plant or distribution facility to maximize their return on investment.

== References ==

# Drezner, Z; Hamacher. H. W. (2004), ''Facility Location Applications and Theory''. New York, NY: Springer.
# Francis, R. L.; Mirchandani, P. B. (1990), ''Discrete Location Theory''. New York, NY: Wiley.
# Hansen, P., et al. (1985), [https://pubsonline.informs.org/doi/abs/10.1287/opre.33.6.1251 The Minisum and Minimax Location Problems Revisited.] ''Operations Research, 33'', 6, 1251-1265.
# Vygen, J. (2005), ''Approximation Algorithms for Facility Location Problems''. Research Institute for Discrete Mathematics, University of Bonn.
# Jain, K., et al. (2003), [https://dl.acm.org/doi/10.1145/950620.950621 A Greedy Facility Location Algorithm Analyzed Using Dual Fitting with Factor-Revealing LP.] ''Journal of the ACM, 50'', 6, 795-824.
# Alenezy, E. J. (2020), [https://www.hindawi.com/journals/aor/2020/5239176/ Solving Capacitated Facility Location Problem Using Lagrangian Decomposition and Volume Algorithm.] ''Advances in Operations Research,'' ''2020'', 5239176, 2020.
# Held, M.; Karp, R. M. (1970), [https://pubsonline.informs.org/doi/abs/10.1287/opre.18.6.1138 The Traveling-Salesman Problem and Minimum Spanning Trees.] ''Operations Research, 18,'' 6, 1138-1162.
# Barahona, F.; Anbil, R. (2000), [https://link.springer.com/article/10.1007%2Fs101070050002 The Volume Algorithm: Producing Primal solutions with a Subgradient Method.] ''Mathematical Programming, 87,'' 3, 385–399.
# Ceselli, A. (2003), [https://link.springer.com/article/10.1007/s10288-003-0023-5 Two Exact Algorithms for the Capacitated p-Median Problem.] ''Quarterly Journal of the Belgian, French and Italian Operations Research Societies, 4'', 1, 319-340.
# Daskin, M. S.; Dean, L. K. (2004), [https://link.springer.com/chapter/10.1007/1-4020-8066-2_3 Location of Health Care Facilities.] ''Handbook of OR/MS in Health Care: A Handbook of Methods and Applications'', 43-76.
# Adeleke, O. J.; Olukanni, D. O. (2020), [https://www.mdpi.com/2313-4321/5/2/10 Facility Location Problems: Models, Techniques, and Applications in Waste Management.] ''Recycling, 5'', 10.
# Badran, M.F.; El-Haggar, S.M. (2006), [https://www.sciencedirect.com/science/article/abs/pii/S0956053X05001534 Optimization of Municipal Solid Waste Management in Port Said – Egypt.] ''Waste Management, 26'', 5, 534-545.
# Meira, L. A. A., et al. (2017), [https://www.sciencedirect.com/science/article/abs/pii/S030439751630514X Clustering through Continuous Facility Location Problems.] ''Theoretical Computer Science, 657'', 137-145.
# Balcik, B.; Beamon, B. M. (2008), [https://www.tandfonline.com/doi/full/10.1080/13675560701561789 Facility Location in Humanitarian Relief.] ''International Journal of Logistics Research and Applications, 11'', 101-121.
# Eiselt, H. A.; Marianov, V. (2019), ''Contributions to Location Analysis''. Cham, Switzerland: Springer.

Eight step procedures

2020-12-21T11:35:03Z

Wc593:

Author: Eljona Pushaj, Diana Bogdanowich, Stephanie Keomany (SysEn 5800 Fall 2020)

=Introduction=
The eight-step procedures are a simplified, multi-stage approach for determining optimal solutions in mathematical optimization. Dynamic programming, developed by Richard Bellman in the 1950s<ref>Bellman, Richard. “The Theory of Dynamic Programming.” Bulletin of American Mathematical Society, vol. 60, 1954, pp 503–515, https://www.ams.org/journals/bull/1954-60-06/S0002-9904-1954-09848-8/S0002-9904-1954-09848-8.pdf. 18 Nov 2020.</ref>, is used to solve for the maximization or minimization of the objective function by transforming the problem into smaller steps and enumerating all the different possible solutions and finding the optimal solution.

In the eight-step procedure, a problem can be broken down into subproblems to solve. Using the solutions from the subproblems in a recursive manner, the solution can be determined after all the solutions of the subproblems are calculated to find the best solution, which demonstrates the principle of optimality: Any optimal policy has the property that, whatever the current state and current decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the current decision.<ref>Bradley, Stephen P. Applied Mathematical Programming. Addison-Wesley. 1 February 1977. 320-342. 18 Nov 2020</ref> Such a standard framework is used so that dynamic programming store the values of the subproblems to avoid recomputing, and thus, reduce time to solve the problem.<ref>Gavin-Hughes, Sam. “Dynamic Programming for Interviews.” Byte by Byte. https://www.byte-by-byte.com/dpbook/. 18 Nov 2020</ref>

=Theory, Methodology, and/or Algorithmic Discussion=

===Methodology===
To solve a problem using the 8-step procedure, one must use the following steps: 
 

'''Step 1: Specify the stages of the problem''' 
The stages of a dynamic programming problem can be defined as points where decisions are made. Specifying the stages also divides the problem into smaller pieces. 
 

'''Step 2: Specify the states for each stage''' 
The states of a problem are defined as the knowledge necessary to make a decision. There are multiple states for each stage. In general, the states consists of the information that is needed to solve the smaller problem within each stage.<ref>Chinneck. (2015). Chapter 15 Dynamic Programming. Carleton.Ca. https://www.sce.carleton.ca/faculty/chinneck/po/Chapter15.pdf</ref> 
 

'''Step 3: Specify the allowable actions for each state in each stage''' 
This helps create for a decision that must be made at each stage. 
 

'''Step 4: Describe the optimization function using an English-language description.''' 
 

'''Step 5: Define the boundary conditions''' 
This can help create a starting point to finding a solution to the problem. 
 

'''Step 6: Define the recurrence relation''' 
This is often denoted with a function, and shows the relationsip between the value of a decision at a particular stage and the value of optimal decision made at the previous stages. 
 

'''Step 7: Compute the optimal value from the bottom-up''' 
This step can be done manually or by using programming. Note that for each state, an optimal decision made at the remaining stages of the problem is independent from the decisions of the previous states. 
 

'''Step 8: Arrive at the optimal solution''' 
This is the final step for solving a problem using the eight step procedure. 

=Numerical Example=
''Suppose we have a knapsack with a weight capacity of C=5 and N=2 types of items. An item of type n weighs W [n] and generates a benefit of b [n,j] when packing j items of type n to the knapsack however only a[n] units of this item are available.''

To solve a Knapsack problem we use the following steps:

'''Step 1: Specify the stages of the problem'''

Weight capacity of C=5 and N=2

'''Step 2: Specify the states for each stage'''

Item types are stages: n=1,2

'''Step 3: Specify the allowable actions for each state in each stage'''

<math>
U_{2}(5)\, =\, 0,1,...,min\left \{ a[2], \left \lfloor \frac{5}{w[2]}\right \rfloor \right \}
</math>= '''{0,1,2}'''

'''Step 4: Describe the optimization function using an English-language description.'''

Remaining capacity s= 1,2,3,4,5

'''Step 5: Define the boundary conditions'''

Boundary Conditions:

<math>f^{*}_{n+1}(s) = 0</math>, ''s=0,1,2,3,4,5'' ''C=5''

'''Step 6: Define the recurrence relation'''

<math> f^{*}_{2}(5)= max\left \{ b[2,j]+ f^{*}_{3}(5-j*w[2]) \right \} </math>

'''Step 7: Compute the optimal value from the bottom-up'''

<math> f^{*}_{2}(5)= max\left \{ b[2,j]+ f^{*}_{3}(5-j*w[2]) \right \} </math>

<math>f^{*}_{n+1}(s) = 0</math>, ''s=0,1,2,3,4,5'' ''C=5''
{| class="wikitable"
|+
!Unused Capacity s
!<math>f^{*}_{1}(s)</math>
!Type 1 opt <math>U^{*}_{1}(s)</math>
!<math>f^{*}_{2}(s)</math>
!Type 2 opt <math>U^{*}_{2}(s)</math>
!<math>f^{*}_{3}(s)</math>
|-
|5
|9
|0
|9
|2
|0
|-
|4
|9
|0
|9
|2
|0
|-
|3
|4
|0
|4
|1
|0
|-
|2
|4
|0
|4
|1
|0
|-
|1
|0
|0
|0
|0
|0
|-
|0
|0
|0
|0
|0
|0
|}

'''Step 8: Arrive at the optimal solution'''

=Applications=
The following are some applications where dynamic programming is used. The criteria for applying dynamic programming to an optimization problem are if the objective function involves maximization, minimization, or counting and if the problem is determined by finding all the solutions to find the optimal solution.

'''Shortest/ Longest Path Problem'''

In the shortest path problem, the path with the least amount of cost or value must be determined in a problem with multiple nodes in between the beginning node ''s'' to the final node ''e''. Travelling from one node to another incurs a value or cost ''c(p, q''), and the objective is to reach t with the smallest cost possible. The eight-step procedure can be used to determine the possible solutions which the optimal solution can be determined from.<ref>Neumann K. “Dynamic Programming Basic Concepts and Applications.” Optimization in Planning and Operations of Electric Power Systems. Physica, Heidelberg, 1993, p 31-56.</ref>

Likewise, but in a maximization function, the longest path can be determined in a problem by determining the solution with the highest cost involved to travel from node ''s'' to node ''e''.

'''Knapsack Problem'''

The knapsack problem is an example of determining the distribution of effort or when there are limited resources to be shared with competing entities, and the goal is to maximize the benefit of the distribution. Dynamic programming is used when the increase in benefit in regard to increasing the quantity of resources is not linearly proportional. The volume may also be considered in addition to the weight of the resources. A volume constraint is added to the problem and represented in the state by stage ''n'' by an ordered pair (''s, v'') for remaining weight and volume. By considering ''d'' constraints, the number of states can grow exponentially with a ''d'' -dimensional state space even if the value of ''d'' is small. The problem becomes infeasible to solve and is referred to as the curse of dimensionality. However, the curse has faded due to advances in computational power.<ref>Taylor, C. Robert. Applications Of Dynamic Programming To Agricultural Decision Problems. United States, CRC Press, 2019.</ref>

'''Inventory Planning Problem'''

In inventory management, dynamic programming is used to determine how to meet anticipated and unexpected demand in order to minimize overall costs. Tracking an inventory system involves establishing a set of policies that monitor and control the levels of inventory, determining when a stock must be replenished, and the quantity of parts to order. For example, a production schedule can be computationally solved by knowing the demand, unit production costs, and inventory supply limits in order to keep the production costs below a certain rate.<ref>Bellman, Richard. “Dynamic Programming Approach to Optimal Inventory Processes with Delay in Delivery.” Quarterly of Applied Mathematics, vol 18, 1961, p. 399-403, https://www.ams.org/journals/qam/1961-18-04/S0033-569X-1961-0118516-2/S0033-569X-1961-0118516-2.pdf. 19 Nov 2020</ref>

'''Needleman-Wunsh Algorithm (Global Sequence Alignment)'''

Developed by Saul B. Needleman and Christian D. Wunsch in 1970, the Needleman-Wunsh algorithm, also known as global sequence alignment, is used to find similarities within protein or nucleotide sequences. This algorithm is an application of dynamic programming used to divide a large problem such as a large sequence into smaller subproblems and the solutions of the subproblems are used to find the optimal sequences with the highest scores. A matrix is constructed consisting of strings of the protein or nucleotide sequences. A scoring system is determined for each of the nucleotide pairs (adenine, guanine, cytosine, thymine) where there could exist a match (+1), mismatch (-1), or gap (-1). The sum of the scores determine the entire alignment pair. Then the scores are calculated for the pairs and filled out in the matrix. To find the optimal alignment, one would perform a "traceback" by starting at the upper left matrix to the bottom right. The algorithm is limited in that it can align only entire proteins.<ref>Needleman, S. B. and Wunsch, C. D. "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins." J. Mol. Biol. 48, 1970, p. 443-453.</ref>

=Conclusion=
The eight-step procedure is an approach used in dynamic programming to transform a problem into simpler problems to yield an optimal solution. The recursive nature of the procedure allows for the optimization problems to be solved using computational models that reduce time and effort and can be used in many applications across many industries.

=References=
<references />

Markov decision process

2020-12-21T11:34:23Z

Wc593:

Author: Eric Berg (eb645) (SysEn 5800 Fall 2020)

= Introduction =
A Markov Decision Process (MDP) is a stochastic sequential decision making method.<math>^1</math> Sequential decision making is applicable any time there is a dynamic system that is controlled by a decision maker where decisions are made sequentially over time. MDPs can be used to determine what action the decision maker should make given the current state of the system and its environment. This decision making process takes into account information from the environment, actions performed by the agent, and rewards in order to decide the optimal next action. MDPs can be characterized as both finite or infinite and continuous or discrete depending on the set of actions and states available and the decision making frequency.<math>^1</math> This article will focus on discrete MDPs with finite states and finite actions for the sake of simplified calculations and numerical examples. The name Markov refers to the Russian mathematician Andrey Markov, since the MDP is based on the Markov Property. In the past, MDPs have been used to solve problems like inventory control, queuing optimization, and routing problems.<math>^2</math> Today, MDPs are often used as a method for decision making in the reinforcement learning applications, serving as the framework guiding the machine to make decisions and "learn" how to behave in order to achieve its goal.

= Theory and Methodology =
A MDP makes decisions using information about the system's current state, the actions being performed by the agent and the rewards earned based on states and actions.

The MDP is made up of multiple fundamental elements: the agent, states, a model, actions, rewards, and a policy.<math>^1</math> The agent is the object or system being controlled that has to make decisions and perform actions. The agent lives in an environment that can be described using states, which contain information about the agent and the environment. The model determines the rules of the world in which the agent lives, in other words, how certain states and actions lead to other states. The agent can perform a fixed set of actions in any given state. The agent receives rewards based on its current state. A policy is a function that determines the agent's next action based on its current state. [[File:Reinforcement Learning.png|thumb|Reinforcement Learning framework used in Markov Decision Processes]]'''MDP Framework:'''

*<math>S</math> : States (<math>s \epsilon S</math>)
*<math>A</math> : Actions (<math>a \epsilon A</math>)
*<math>P(S_{t+1} | s_t, a_t)</math> : Model determining transition probabilities
*<math>R(s)</math>: Reward 
In order to understand how the MDP works, first the Markov Property must be defined. The Markov Property states that the future is independent of the past given the present.<math>^4</math> In other words, only the present is needed to determine the future, since the present contains all necessary information from the past. The Markov Property can be described in mathematical terms below:

<math display="inline">P[S_{t+1} | S_t] = P[S_{t+1} | S_1, S_2, S_3... S_t]</math>

The above notation conveys that the probability of the next state given the current state is equal to the probability of the next state given all previous states. The Markov Property is relevant to the MDP because only the current state is used to determine the next action, the previous states and actions are not needed.

'''The Policy and Value Function'''

The policy, <math>\Pi</math> , is a function that maps actions to states. The policy determines which is the optimal action given the current state to achieve the maximum total reward.

<math>\Pi : S \rightarrow A </math>

Before the best policy can be determined, a goal or return must be defined to quantify rewards at every state. There are various ways to define the return. Each variation of the return function tries to maximize rewards in some way, but differs in which accumulation of rewards should be maximized. The first method is to choose the action that maximizes the expected reward given the current state. This is the myopic method, which weighs each time-step decision equally.<math>^2</math> Next is the finite-horizon method, which tries to maximize the accumulated reward over a fixed number of time steps.<math>^2</math> But because many applications may have infinite horizons, meaning the agent will always have to make decisions and continuously try to maximize its reward, another method is commonly used, known as the infinite-horizon method. In the infinite-horizon method, the goal is to maximize the expected sum of rewards over all steps in the future. <math>^2</math> When performing an infinite sum of rewards that are all weighed equally, the results may not converge and the policy algorithm may get stuck in a loop. In order to avoid this, and to be able prioritize short-term or long term-rewards, a discount factor, <math>\gamma
</math>, is added. <math>^3</math> If <math>\gamma
</math> is closer to 0, the policy will choose actions that prioritize more immediate rewards, if <math>\gamma
</math> is closer to 1, long-term rewards are prioritized.

Return/Goal Variations:

* Myopic: Maximize <math>E[ r_t | \Pi , s_t ]
</math> , maximize expected reward for each state
* Finite-horizon: Maximize <math>E[ \textstyle \sum_{t=0}^k \displaystyle r_t | \Pi , s_t ]
</math> , maximize sum of expected reward over finite horizon
* Discounted Infinite-horizon: Maximize <math>E[ \textstyle \sum_{t=0}^\infty \displaystyle \gamma^t r_t | \Pi , s_t ]
</math> <math>\gamma \epsilon [0,1]
</math>, maximize sum of discounted expected reward over infinite horizon
The value function, <math>V(s)
</math>, characterizes the return at a given state. Most commonly, the discounted infinite horizon return method is used to determine the best policy. Below the value function is defined as the expected sum of discounted future rewards.

<math>V(s) = E[ \sum_{t=0}^\infty \gamma^t r_t | s_t ]
</math>

The value function can be decomposed into two parts, the immediate reward of the current state, and the discounted value of the next state. This decomposition leads to the derivation of the [[Bellman equation|Bellman Equation]],, as shown in equation (2). Because the actions and rewards are dependent on the policy, the value function of an MDP is associated with a given policy.

<math>V(s) = E[ r_{t+1} + \gamma V(s_{t+1}) | s_t]
</math> , <math>s_{t+1} = s'
</math>

<math>V(s) = R(s) + \gamma \sum_{s' \epsilon S}P_{ss'}V(s')
</math>

<math>V^{\Pi}(s) = R(s,\Pi(s)) + \gamma \sum_{s' \epsilon S}P(s' | s,\Pi(s))V(s')
</math> (1)

<math>V^{*}(s) = max_a [R(s, a) + \gamma \sum_{s' \epsilon S}P(s' | s, a)V^*(s')]
</math> (2)

The optimal value function can be solved iteratively using iterative methods such as dynamic programming, Monte-Carlo evaluations, or temporal-difference learning.<math>^5</math>

The optimal policy is one that chooses the action with the largest optimal value given the current state:

<math>\Pi^*(s) = argmax_a [R(s,a) + \gamma \sum_{s' \epsilon S}P_{ss'}^aV(s')]
</math> (3)

The policy is a function of the current state, meaning at each time step a new policy is calculated considering the present information. The optimal policy function can be solved using methods such as value iteration, policy iteration, Q-learning, or linear programming. <math>^{5,6}</math>

'''Algorithms'''

The first method for solving the optimality equation (2) is using value iteration, also known as successive approximation, backwards induction, or dynamic programming. <math>^{1,6}</math>

Value Iteration Algorithm:

# Initialization: Set <math>V^{*}_0(s) = 0
</math> for all <math>s \epsilon S</math> , choose <math>\varepsilon >0
</math>, n=1
# Value Update: For each <math>s \epsilon S</math>, compute: <math>V^{*}_{n+1}(s) = max_a [R(s, a) + \gamma \sum_{s' \epsilon S}P(s' | s, a)V^*_n(s')]
</math>
# If <math>| V_{n+1} - V_n | < \varepsilon
</math>, the algorithm has converged and the optimal value function, <math>V^*
</math>, has been determined, otherwise return to step 2 and increment n by 1.
The value function approximation becomes more accurate at each iteration because more future states are considered. The value iteration algorithm can be slow to converge in certain situations, so an alternative algorithm can be used which converges more quickly.

Policy Iteration Algorithm:

# Initialization: Set an arbitrary policy <math>\Pi(s)
</math> and <math>V(s)
</math> for all <math>s \epsilon S</math>, choose <math>\varepsilon >0
</math>, n=1
# Policy Evaluation: For each <math>s \epsilon S</math>, compute: <math>V^{\Pi}_{n+1}(s) = R(s,\Pi(s)) + \gamma \sum_{s' \epsilon S}P(s' | s,\Pi(s))V^{\Pi}_n(s')
</math>
# If <math>| V_{n+1} - V_n | < \varepsilon
</math>, the optimal value function, <math>V^*
</math> has been determined, continue to next step, otherwise return to step 2 and increment n by 1.
# Policy Update: For each <math>s \epsilon S</math>, compute: <math>\Pi_{n+1}(s) = argmax_a [R(s,\Pi_n(s)) + \gamma \sum_{s' \epsilon S}P(s' | s,\Pi_n(s))V^{\Pi}_n(s')]
</math>
# If <math>\Pi_{n+1} = \Pi_n
</math> ,the algorithm has converged and the optimal policy, <math>\Pi^*
</math> has been determined, otherwise return to step 2 and increment n by 1.

With each iteration the optimal policy is improved using the previous policy and value function until the algorithm converges and the optimal policy is found.

= Numerical Example =
[[File:Markov Decision Process Example 2.png|alt=|thumb|499x499px|A Markov Decision Process describing a college student's hypothetical situation.]]
As an example, the MDP can be applied to a college student, depicted to the right. In this case, the agent would be the student. The states would be the circles and squares in the diagram, and the arrows would be the actions. The action between work and school is leave work and go to school. In the state that the student is at school, the allowable actions are to go to the bar, enjoy their hobby, or sleep. The probabilities assigned to each state given the previous state and action in this example is 1. The rewards associated with each state are written in red.

Assume <math>P(s'|s) = 1.0

</math> , <math>\gamma
</math> =1.

First, the optimal value functions must be calculated for each state.

<math>V^{*}(s) = max_a [R(s, a) + \gamma \sum_{s' \epsilon S}P(s' | s, a)V^*(s')]
</math>

<math>V^{*}(Hobby) = max_a [3 + (1)(1.0*0)] = 3
</math>

<math>V^{*}(Bar) = max_a [2 + 1(1.0*0)] = 2
</math>

<math>V^*(Sleep) = max_a[0 + 1(1.0*0)] = 0
</math>

<math>V^*(School) = max_a[ -2 + 1(1.0*2) , -2 + 1(1.0*0) , -2 + 1(1.0*3)] = 1
</math>

<math>V^*(YouTube) = max_a[-1 + 1(1.0*-1) , -1 +1(1.0*1)]= 0
</math>

<math>V^*(Work) = max_a[1 + 1(1.0*0) , 1 + 1(1.0*1)] = 2
</math>

Then, the optimal policy at each state will choose the action that generates the highest value function.

<math>\Pi^*(s) = argmax_a [R(s,a) + \gamma \sum_{s' \epsilon S}P_{ss'}^aV(s')]
</math>

<math>\Pi^*(YouTube) = argmax_a [0,2] \rightarrow a =
</math> Work

<math>\Pi^*(Work) = argmax_a [0,1] \rightarrow a =
</math> School

<math>\Pi^*(School) = argmax_a [0,2,3] \rightarrow a =
</math> Hobby

Therefore, the optimal policy in each state provides a sequence of decisions that generates the optimal path sequence in this decision process. As a results, if the student starts in state Work, he/she should choose to go to school, then to enjoy their hobby, then go to sleep.

= Applications =
[[File:Pong.jpg|thumb|Computer playing Pong arcade game by Atari using reinforcement learning]]
MDPs have been applied in various fields including operations research, electrical engineering, computer science, manufacturing, economics, finance, and telecommunication.<math>^2</math> For example, the sequential decision making process described by MDP can be used to solve routing problems such as the [[Traveling salesman problem]]. In this case, the agent is the salesman, the actions available are the routes available to take from the current state, the rewards in this case are the costs of taking each route, and the goal is to determine the optimal policy that minimizes the cost function over the duration of the trip. Another application example is maintenance and repair problems, in which a dynamic system such as a vehicle will deteriorate over time due to its actions and the environment, and the available decisions at every time epoch is to do nothing, repair, or replace a certain component of the system.<math>^2</math> This problem can be formulated as an MDP to choose the actions that to minimize cost of maintenance over the life of the vehicle. MDPs have also been applied to optimize telecommunication protocols, stock trading, and queue control in manufacturing environments. <math>^2</math>

Given the significant advancements in artificial intelligence and machine learning over the past decade, MDPs are being applied in fields such as robotics, automated systems, autonomous vehicles, and other complex autonomous systems. MDPs have been used widely within reinforcement learning to teach robots or other computer-based systems how to do something they were previously were unable to do. For example, MDPs have been used to teach a computer how to play computer games like Pong, Pacman, or AlphaGo.<math>^{7,8}</math> DeepMind Technologies, owned by Google, used the MDP framework in conjunction with neural networks to play Atari games better than human experts. <math>^7</math> In this application, only the raw pixel input of the game screen was used as input, and a neural network was used to estimate the value function for each state, and choose the next action.<math>^7</math> MDPs have been used in more advanced applications to teach a simulated human robot how to walk and run and a real legged-robot how to walk.<math>^9</math>
[[File:Google Deepmind.jpg|thumb|Google's DeepMind uses reinforcement learning to teach AI how to walk]]

= Conclusion =

A MDP is a stochastic, sequential decision-making method based on the Markov Property. MDPs can be used to make optimal decisions for a dynamic system given information about its current state and its environment. This process is fundamental in reinforcement learning applications and a core method for developing artificially intelligent systems. MDPs have been applied to a wide variety of industries and fields including robotics, operations research, manufacturing, economics, and finance.

= References =

<references />

# Puterman, M. L. (1990). Chapter 8 Markov decision processes. In ''Handbooks in Operations Research and Management Science'' (Vol. 2, pp. 331–434). Elsevier. <nowiki>https://doi.org/10.1016/S0927-0507(05)80172-0</nowiki>
# Feinberg, E. A., & Shwartz, A. (2012). ''Handbook of Markov Decision Processes: Methods and Applications''. Springer Science & Business Media.
# Howard, R. A. (1960). ''Dynamic programming and Markov processes.'' John Wiley.
# Ashraf, M. (2018, April 11). ''Reinforcement Learning Demystified: Markov Decision Processes (Part 1)''. Medium. <nowiki>https://towardsdatascience.com/reinforcement-learning-demystified-markov-decision-processes-part-1-bf00dda41690</nowiki>
# Bertsekas, D. P. (2011). Dynamic Programming and Optimal Control 3rd Edition, Volume II. ''Massachusetts Institue of Technology'', 233.
# Littman, M. L. (2001). Markov Decision Processes. In N. J. Smelser & P. B. Baltes (Eds.), ''International Encyclopedia of the Social & Behavioral Sciences'' (pp. 9240–9242). Pergamon. <nowiki>https://doi.org/10.1016/B0-08-043076-7/00614-8</nowiki>
# Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. ''ArXiv:1312.5602 [Cs]''. <nowiki>http://arxiv.org/abs/1312.5602</nowiki>
# Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. ''Science'', ''362''(6419), 1140–1144. <nowiki>https://doi.org/10.1126/science.aar6404</nowiki>
# Ha, S., Xu, P., Tan, Z., Levine, S., & Tan, J. (2020). Learning to Walk in the Real World with Minimal Human Effort. ''ArXiv:2002.08550 [Cs]''. <nowiki>http://arxiv.org/abs/2002.08550</nowiki>
# Bellman, R. (1966). Dynamic Programming. ''Science'', ''153''(3731), 34–37. <nowiki>https://doi.org/10.1126/science.153.3731.34</nowiki>
# Abbeel, P. (2016). ''Markov Decision Processes and Exact Solution Methods:'' 34.
# Silver, D. (2015). Markov Decision Processes. ''Markov Processes'', 57.
<span title="url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fzotero.org%3A2&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Lecture%202%3A%20Markov%20Decision%20Processes&rft.jtitle=Markov%20Processes&rft.aufirst=David&rft.aulast=Silver&rft.au=David%20Silver&rft.pages=57&rft.language=en" class="Z3988">

Network flow problem

2020-12-21T11:33:32Z

Wc593:

Author: Aaron Wheeler, Chang Wei, Cagla Deniz Bahadir, Ruobing Shui, Ziqiu Zhang (ChemE 6800 Fall 2020)

== Introduction ==
Network flow problems arise in several key instances and applications within society and have become fundamental problems within computer science, operations research, applied mathematics, and engineering. Developments in the approach to tackle these problems resulted in algorithms that became the chief instruments for solving problems related to large-scale systems and industrial logistics. Spurred by early developments in linear programming, the methods for addressing these extensive problems date back several decades and they evolved over time as the use of digital computing became increasingly prevalent in industrial processes. Historically, the first instance of an algorithmic development for the network flow problem came in 1956, with the network simplex method formulated by George Dantzig.[1] A variation of the simplex algorithm that revolutionized linear programming, this method leveraged the combinatorial structure inherent to these types of problems and demonstrated incredibly high accuracy.[2] This method and its variations would go on to define the embodiment of the algorithms and models for the various and distinct network flow problems discussed here.

== Theory, Methodology, and Algorithms ==
The network flow problem can be conceptualized as a directed graph which abides by flow capacity and conservation constraints. The vertices in the graph are classified into origins (source <math>X</math>), destinations (sink <math>O</math>), and intermediate points and are collectively referred to as nodes (<math>N</math>). These nodes are different from one another such that <math>N_i \neq X,O,\ldots N_j</math>.[3] The edges in the directed graph are the directional links between nodes and are referred to as arcs (<math>A</math>). These arcs are defined with a specific direction <math>(i, j)</math> that corresponds to the nodes they are connecting. The arcs <math>A\subseteq (i,j)</math> are also defined by a specific flow capacity <math>c(A)>0</math> that cannot be exceeded. The supply and demand of units <math>\Sigma_i u_i=0~for~i\in N</math> are formulated by negative and positive flow notation, and are defined such that sources equate to positive values (supply) and sinks equate to negative values (demand). Intermediate nodes have no net supply or demand. Figure 1 illustrates this general definition of the network.
[[File:Picture1.png|thumb|Figure 1. General Network Flow Problem]]

Additional constraints of the network flow optimization model place limits on the solution and vary significantly based on the specific type of problem being solved. Historically, the classic network flow problems are considered to be the maximum flow problem and the minimum-cost circulation problem, the assignment problem, bipartite matching problem, transportation problem, and the transshipment problem.[2] The approach to these problems become quite specific based upon the problem’s objective function but can be generalized by the following iterative approach: 1. determining the initial basic feasible solution; 2. checking the optimality conditions (i.e. whether the problem is infeasible, unbounded over the feasible region, optimal solution has been found, etc.); and 3. constructing an improved basic feasible solution if the optimal solution has not been determined.[3]
=== General Applications ===

==== The Assignment Problem ====
Various real-life instances of assignment problems exist for optimization, such as assigning a group of people to different tasks, events to halls with different capacities, rewards to a team of contributors, and vacation days to workers. All together, the assignment problem is a bipartite matching problem in the kernel. [3] In a classical setting, two types of objects of equal amount are bijective (i.e. they have one-to-one matching), and this tight constraint ensures a perfect matching. The objective is to minimize the cost or maximize the profit of matching, since different items of two types have distinct affinity. [[File:Assignment.png|thumb|Figure 2. Classic model of assignment problem|alt=|267x267px]]A classic example is as follows: suppose there are <math> n </math> people (set <math> P </math>) to be assigned to <math> n </math> tasks (set <math> T </math>). Every task has to be completed and each task has to be handled by only one person, and <math> c_{ij} </math>, usually given by a table, measures the benefits gained by assigning the person <math> i </math> (in <math> P </math>) to the task <math> j </math> (in <math> T </math>). [4] The natural objective here is to maximize the overall benefits by devising the optimal assignment pattern. A graph of the general assignment problem and a table of preference are depicted as Figure 2 and Table 2.
{| class="wikitable sortable"
|+Table 1. Table of preference
!Benefits
!Task 1
! Task 2
!Task 3
!...
!Task n
|-
!Person 1
|0
|3
|5
|...
|2
|-
!Person 2
|2
|1
|3
|...
|6
|-
!Person 3
|1
|4
|0
|...
|3
|-
!...
|...
|...
|...
|...
|...
|-
!Person n
|0
|2
|3
|...
|3
|}
Figure 2 can be viewed as a network. The nodes represent people and tasks, and the edges represent potential assignments between a person and a task. Each task can be completed by any person. However, the person that actually ends up being assigned to the task will be the lone individual who is best suited to complete. In the end, the edges with positive flow values will be the only ones represented in the finalized assignment. [5]

To approach this problem, the binary variable <math> x_{ij} </math> is defined as whether the person <math> i </math> is assigned to the task <math> j </math>. If so, <math> x_{ij} </math> = 1, and <math> x_{ij} </math> = 0 otherwise.

The concise-form formulation of the problem is as follows [3]:

max <math>z=\sum_{i=1}^n\sum_{j=1}^n c_{ij}x_{ij}</math>

Subject to:

<math>\sum_{j=1}^n x_{ij}=1~~\forall i\in [1,n]
</math>

<math>\sum_{I=1}^n x_{ij}=1~~\forall j\in [1,n]
</math>

<math>x_{ij}=0~or~1~~\forall i,j\in [1,n] </math>

The first constraint captures the requirement of assigning each person to a single task. The second constraint indicates that each task must be done by exactly one person. The objective function sums up the overall benefits of all assignments.

To see the analogy between the assignment problem and the network flow, we can describe each person supplying a flow of 1 unit and each task demanding a flow of 1 unit, with the benefits over all “channels” being maximized. [3]

A potential issue lies in the branching of the network, specifically an instance where a person splits its one unit of flow into multiple tasks and the objective remains maximized. This shortcoming is allowed by the laws that govern the network flow model, but are unfeasible in real-life instances. Fortunately, since the network simplex method only involves addition and subtraction of a single edge while transferring the basis, which is served by the spanning tree of the flow graph, if the supply (the number of people here) and the demand (the number of tasks here) in the constraints are integers, the solved variables will be automatically integers even if it is not explicitly stated in the problem. This is called the integrality of the network problem, and it certainly applies to the assignment problem. [6]

==== The Transportation Problem ====
People first came up with the transportation problem when distributing troops during World War II. [7] Now, it has become a useful model for solving logistics problems, and the objective is usually to minimize the cost of transportation.

Consider the following scenario:

There are 2 chemical plants located in 2 different places: <math> M </math> and <math> N </math>. There are 3 raw material suppliers in other 3 locations: <math> F </math>, <math> G </math>, and <math> H </math>. The amount of materials from a supplier can be arbitrarily divided into two parts and shipped to two factories. Supplier <math> F </math>, <math> G </math>, and <math> H </math> can provide <math> S_1 </math>, <math> S_2 </math>, and <math> S_3 </math> amounts of materials respectively. The chemical plants located at <math> M </math> and <math> N </math> have the material demand of <math> D_1 </math> and <math> D_2 </math> respectively. Each transportation route, from suppliers to chemical plants, is attributed with a specific cost. This model raises the question: to keep the chemical plants running, what is the best way to arrange the material from the suppliers so that the transportation cost could be minimized?
[[File:Transportation problem example.png|thumb|Figure 3. Transportation problem example]]
Several quantities should be defined to help formulate the frame of the solution:

<math>S_{i}
</math> = the amount of material provided at the supplier <math>i
</math>

<math>D_{j}
</math> = the amount of material being consumed at the chemical plant <math>j
</math>

<math>x_{ij}
</math> = the amount of material being transferred from supplier <math>i
</math> to chemical plant <math display="inline">j
</math>

<math>C_{ij}
</math> = the cost of transferring 1 unit of material from supplier <math>i
</math> to chemical plant <math>j
</math>

<math>x_{ij}
</math><math>C_{ij}
</math> = the cost of the material transportation from <math>i
</math> to <math>j
</math>

Here, the amount of material being delivered and being consumed is bound to the supply and demand constraints:

(1): The amount of material shipping from supplier <math>i
</math> cannot exceed the amount of material available at supplier <math>i
</math>.

<math>\sum_j^n x_{ij}\ \leq S_{i} \qquad \forall i\in I=[1,m]
</math>

(2): The amount of material arrived at chemical plant <math>j
</math> should at least fulfill the demand at chemical plant <math>j
</math>.

<math>\sum_i^m x_{ij}\ \geq D_{j} \qquad \forall j\in J=[1,n]
</math>

The objective is to find the minimum cost of transportation, so the cost of each transportation line should be added up, and the total cost should be minimized.

<math>\sum_i^m \sum_j^n x_{ij}\ C_{ij}
</math>

Using the definitions above, the problem can be formulated as such:

min<math> \quad z = \sum_i^m \sum_j^n x_{ij}\ C_{ij}

</math>

<math>s.t. \quad\ \sum_j^n x_{ij}\ \leq S_{i} \qquad \forall i\in I=[1,m]
</math>

<math>\sum_i^m x_{ij}\ \geq D_{j} \qquad \forall j\in J=[1,n]
</math>

However, the problem is not complete at this point because there is no constraint for <math>x_{ij}
</math>, and that means <math>x_{ij}
</math> can be any number, even negative. In order for <math>x_{ij}
</math> to make sense physically, a lower bound of zero is mandatory, which corresponds to the situation where no material was transported from <math>i
</math> to <math>j
</math>. Adding the last constraint will complete this formulation as such:

min<math> \quad z = \sum_i^m \sum_j^n x_{ij}\ C_{ij}

</math>

<math>s.t. \quad\ \sum_j^n x_{ij}\ \leq S_{i} \qquad \forall i\in I=[1,m]
</math>

<math>\sum_i^m x_{ij}\ \geq D_{j} \qquad \forall j\in J=[1,n]
</math>

<math>x_{ij}\ \geq 0
</math>

The problem and the formulation is adapted from Chapter 8 of the book: Applied Mathematical Programming by Bradley, Hax and Magnanti. [3]

==== The Shortest-Path Problem ====
The shortest-path problem can be defined as finding the path that yields the shortest total distance between the origin and the destination. Each possible stop is a node and the paths between these nodes are edges incident to these nodes, where the path distance becomes the weight of the edges. In addition to being the most common and straightforward application for finding the shortest path, this model is also used in various applications depending on the definition of nodes and edges. [3] For example, when each node represents a different object and the edge specifies the cost of replacement, the equipment replacement problem is derived. Moreover, when each node represents a different project and the edge specifies the relative priority, the model becomes a project scheduling problem.
[[File:Shortest-Path.png|thumb|443x443px|Figure 4. General form of shortest-path problem]]
A graph of the general shortest-path problem is depicted as Figure 4:

In the general form of the shortest-path problem, the variable <math> x_{ij} </math> represents whether the edge <math> (i,j) </math> is active (i.e. with a positive flow), and the parameter <math> c_{ij} </math> (e.g. <math> c_{12} </math> = 6) defines the distance of the edge <math> (i,j) </math>. The general problem is formulated as below:

min <math>z=\sum_{i=1}^n \sum_{j=1}^n c_{ij}x_{ij}</math>

Subject to:

<math>\sum_{j=1}^n x_{ij} - \sum_{k=1}^n x_{ki} = \begin{cases} 1 & \text{if }i=s\text{ (source)} \\ 0 & \text{otherwise} \\ -1 & \text{if }i=t \text{ (sink)} \end{cases}</math>

<math>x_{ij}\geq 0~~\forall (i,j)\in E</math>

The first term of the constraint is the total outflow of the node i, and the second term is the total inflow. So, the formulation above could be seen as one unit of flow being supplied by the origin, one unit of flow being demanded by the destination, and no net inflow or outflow at any intermediate nodes. These constraints mandate a flow of one unit, amounting to the active path, from the origin to the destination. Under this constraint, the objective function minimizes the overall path distance from the origin to the destination.

Similarly, the integrality of the network problem applies here, precluding the unreasonable fractioning. With supply and demand both being integer (one here), the edges can only have integer amount of flow in the result solved by simplex method. [6]

In addition, the point-to-point model above can be further extended to other problems. A number of real life scenarios require visiting multiple places from a single starting point. This “Tree Problem” can be modeled by making small adjustments to the original model. In this case, the source node should supply more units of flow and there will be multiple sink nodes demanding one unit of flow. Overall, the objective and the constraint formulation are similar. [4]

==== Maximal Flow Problem ====
This problem describes a situation where the material from a source node is sent to a sink node. The source and sink node are connected through multiple intermediate nodes, and the common optimization goal is to maximize the material sent from the source node to the sink node. [3]

Consider the following scenario:
[[File:Picture2.png|thumb|Figure 5. Maximal flow problem example]]
The given structure is a piping system. The water flows into the system from the source node, passing through the intermediate nodes, and flows out from the sink node. There is no limitation on the amount of water that can be used as the input for the source node. Therefore, the sink node can accept an unlimited amount of water coming into it. The arrows denote the valid channel that water can flow through, and each channel has a known flow capacity. What is the maximum flow that the system can take?

Several quantities should be defined to help formulate the frame of the solution:
[[File:Picture3.png|thumb|Figure 6. For every intermediate node j, there is a group of node i and a group of node k.]]
For any intermediate node <math display="inline">j
</math> in the system, it receives water from adjacent node(s) <math>i
</math>, and sends water to the adjacent node(s) <math display="inline">k

</math>. The node <math>i
</math> and k are relative to the node <math display="inline">j
</math>.

<math>i
</math> = the node(s) that gives water to node <math display="inline">j
</math>

<math display="inline">j
</math> = the intermediate node(s)

<math display="inline">k

</math> = the node(s) that receives the water coming out of node <math display="inline">j
</math>

<math>x_{ij}
</math> = amount of water leaving node <math>i
</math> and entering node <math display="inline">j
</math> (<math>i
</math> and <math display="inline">j
</math> are adjacent nodes)

<math>x_{jk}
</math> = amount of water leaving node <math display="inline">j
</math> and entering node <math display="inline">k

</math> (<math>i
</math> and <math display="inline">k

</math> are adjacent nodes)

For the source and sink node, they have net flow that is non-zero:

<math display="inline">m
</math> = source node

<math display="inline">n
</math> = sink node

<math>x_{in}
</math> = amount of water leaving node <math>i
</math> and entering sink node <math display="inline">n
</math> (<math>i
</math> and <math display="inline">n
</math> are adjacent nodes)

<math>x_{mk}
</math> = amount of water leaving source node <math display="inline">m
</math> and entering node <math display="inline">k

</math> (<math display="inline">m
</math> and <math display="inline">k

</math> are adjacent nodes)

Flow capacity definition is applied to all nodes (including intermediate nodes, the sink, and the source):

<math>C_{ab}
</math> = transport capacity between any two nodes <math display="inline">a
</math> and <math display="inline">b
</math> (<math display="inline">a
</math><math> \neq
</math><math display="inline">b
</math>)

The main constraints for this problem are the transport capacity between each node and the material conservation:

(1): The amount of water flowing from any node <math display="inline">a
</math> to node <math display="inline">b
</math> should not exceed the flow capacity between node <math display="inline">a
</math> to node <math display="inline">b
</math> .

<math>0\leq x_{ab} \leq C_{ab}
</math>

(2): The intermediate node <math display="inline">j
</math> does not hold any water, so the amount of water that flows into node <math display="inline">j
</math> has to exit the node with the exact same amount it entered with.

<math>\sum_i^px_{ij}- \sum_k^r x_{jk} =0
\qquad \begin{cases} \forall i\in I=[1,p] \\ \forall j\in J=[1,q]\\ \forall k\in K=[1,r] \end{cases}
</math>

Overall, the net flow out of the source node has to be the same as the net flow into the sink node. This net flow is the amount that should be maximized.

Using the definitions above:
[[File:Picture4.png|thumb|Figure 7. The imaginary flow connects the sink node to the source node, creating a close loop.]]
min<math> \quad z = \sum_k^r x_{uk}

</math> (or <math>\sum_i^p x_{iv}

</math>)

<math>s.t. \quad\ \sum_i^px_{ij}- \sum_k^r x_{jk} =0
\qquad \begin{cases} \forall i\in I=[1,p] \\ \forall j\in J=[1,q]\\ \forall k\in K=[1,r] \end{cases}
</math>

<math>0\leq x_{ab} \leq C_{ab}
</math>

This expression can be further simplified by introducing an imaginary flow from the sink to the source.

By introducing this imaginary flow, the piping system is now closed. The mass conservation constraint now also holds for the source and sink node, so they can be treated as the intermediate nodes. The problem can be rewritten as the following:

min<math> \quad z = x_{vu}

</math>

<math>s.t. \quad\ \sum_i^px_{ij}- \sum_k^r x_{jk} =0
\qquad \begin{cases} \forall i\in I=[1,p] \\ \forall j\in J=[1,q+2]\\ \forall k\in K=[1,r] \end{cases}
</math>

<math>0\leq x_{ab} \leq C_{ab}
</math>

The problem and the formulation are derived from an example in Chapter 8 of the book: Applied Mathematical Programming by Bradley, Hax and Magnanti. [3]

=== Algorithms ===

==== Ford–Fulkerson Algorithm ====
A broad range of network flow problems could be reduced to the max-flow problem. The most common way to approach the max-flow problem in polynomial time is the Ford-Fulkerson Algorithm (FFA). FFA is essentially a greedy algorithm and it iteratively finds the augmenting s-t path to increase the value of flow. The pathfinding terminates until there is no s-t path present. Ultimately, the max-flow pattern in the network graph will be returned. [8]

Typically, FFA is applied to flow networks with only one source node s and one sink node t. In addition, the capacity conditions and the conservation conditions, which are two properties defining the flow, must be satisfied.[9] The capacity conditions require that each edge carry a flow that is no more than its capacity, or <math>0\leq f(e)\leq c_{e},\forall e\in E</math>, where function f returns the flow on a certain edge. The conservation conditions require all nodes except the source and the sink to have a net flow of 0, or ,<math>\sum_{e~into~v}f(v)= \sum_{e~out~of~v}f(v),\forall v\in V-{s,t} </math>.

FFA introduces the concept of residue graph based on the original graph <math>G</math> to allow backtracking, or pushing backward on edges that are already carrying flow.[9] The residue graph <math>G_{f} </math>is defined as the following:

1. <math>G_{f}</math>has exactly the same node set as <math>G</math>.

2. For each edge <math>e = (u,v)</math>with a nonnegative flow <math> f( e)</math> in <math>G</math>, <math>G_{f}</math>has the edge e with the capacity <math>c(e)_{f} = c_{e} - f(e)</math>, and also <math>G_f</math> has the edge <math>e' = (v,u)</math> with the capacity <math>c(e')_{f} = f(e)</math>.

Note that initially, the <math>G_{f} </math> is identical to <math>G</math> since there is no flow present in <math>G</math>.

The steps of FFA are as below. [10] Essentially, the method repeatedly finds a path with positive flow in the residue graph, and updates the flow graph and residue graph until <math>s</math> and <math>t</math> become disjoint in the residue graph.

1. Set <math>f(e) = 0, \forall e\in E</math>in <math>G</math>, and create a copy as <math>G_{f}</math>.

2. While there is still a <math>s, t</math> path <math>p</math> in <math>G_{f}</math>:

a. Find <math>c_{f}(p) = min(c_{f}(e):e\in p)</math>

b. For each edge <math>e\in p</math>:

bi. <math>f(e) = f(e) + c_{f}(p)</math> if <math>e\in E</math> in <math>G</math>, <math>f(e) = f(e) - c_{f}(p)</math> if <math>e'\in E</math> in <math>G</math>

bii. <math>c(e)= c(e) - c_{f}(p),c(e')= c(e') + c_{f}(p)</math> in <math> G_{f}</math>

[[File:Phase 1.png|thumb|Figure 8: Flow graph and residue graph at the first phase]]
An example of running the FFA is as below.
The flow graph <math>G</math> and residue graph<math>G_{f}</math> at the initial phase is depicted in Figure 8, where the number of each edge in the flow graph is the flow units on the edge, whereas it is the updated edge capacity in the residue graph.

In the residue graph, an <math>s-t</math> path can be found in the residue graph tracing the edge <math>s\rightarrow A\rightarrow B\rightarrow t</math> with the flow of two units. After augmenting the path on both graphs, the flow graph and the residue graph look like the Figure 9.

[[File:Phase 2.png|thumb|Figure 9: Flow graph and residue graph after updating with the first s,t-path]]

At this stage, there is still <math>s,t</math>-path in the residue graph <math>s\rightarrow B\rightarrow A\rightarrow t</math> with a flow of one unit. After augmenting the path on both graphs, the flow graph and the residue graph look like the Figure 10.

[[File:Phase 3.png|thumb|Figure 10: Flow graph and residue graph after augmenting with the second s,t-path]]

At this stage, there is no more <math>s,t</math>-path in the residue graph, so FFA terminates and the maximum flow can be read from the flow graph as 3 units.

== Numerical Example and Solution ==

A Food Distributor Company is farming and collecting vegetables from farmers to later distribute to the grocery stores. The distributor has specific agreements with different third-party companies to mediate the delivery to the grocery stores. In a particular month, the company has 600 ton vegetables to deliver to the grocery store. They have agreements with two third-party transport companies A and B, which have different tariffs for delivering goods between themselves, the distributor, and the grocery store. They also have limits on transport capacity for each path. These delivery points are numbered as shown below, with path 1 being the transport from the Food Distributor Company to the transport company A. The limits and tariffs for each path can be found in the Table 2 below, and the possible transportation connections between the distributor company, the third-party transporters, and the grocery store are shown in the figure below. The distributor companies cannot hold any amount of food, and any incoming food should be delivered to an end point. The distributor company wants to minimize the overall transport cost of shipping 600 tons of vegetables to the grocery store by choosing the optimal path provided by the transport companies. How should the distributor company map out their path and the amount of vegetables carried on each path to minimize cost overall?
[[File:Wiki example.png|thumb|Figure. 11. Illustration of the network for the food distribution problem.]]
{| class="wikitable"
|+Table 2. Product Limits and Tariffs for each Path
|
|1
|2
|3
|4
|5
|6
|-
|Product limit (ton)
|250
|450
|350
|200
|300
|500
|-
|Tariff ($/ton)
|10
|12.5
|5
|7.5
|10
|20
|}

This question is adapted from one of the exercise questions in chapter 8 of the book: Applied Mathematical Programming by Bradley, Hax and Magnanti [3].

=== Formulation of the Problem ===
The problem can be formulated as below where variables <math>x_1, x_2, x_3,..., x_6</math> denote the tons of vegetables carried in paths 1 to 6. The objective function stated in the first line is to minimize the cost of the operation, which is the summation of the tons of vegetables carried on each path multiplied by the corresponding tariff: <math>\sum_{i=1}^6 x_i t_i</math>.

<math>\begin{array}{lcl} \min z = 10x_1 + 12.5x_2 + 5x_3 + 7.5x_4 + 10x_5 + 20x_6 \\ s.t. \qquad x_5 = x_1 - x_3 + x_4 \\ \ \ \ \quad \qquad x_6 = x_2 + x_3 - x_4 \\ \ \ \ \quad \qquad x_5 + x_6 = 600 \\ \ \ \ \quad \qquad x_1 + x_2 = 600 \\ \ \ \ \quad \qquad x_1 \leq 250 \\ \ \ \ \quad \qquad x_2 \leq 450 \\ \ \ \ \quad \qquad x_3 \leq 350 \\ \ \ \ \quad \qquad x_4 \leq 200 \\ \ \ \ \quad \qquad x_5 \leq 300 \\ \ \ \ \quad \qquad x_6 \leq 500 \\ \ \ \ \quad \qquad x_1, x_2, x_3, x_4, x_5, x_6 \geq 0\\\end{array}

</math>

The second step is to write down the constraints. The first constraint ensures that the net amount present in the Transport Company A, which is the deliveries received from path 1 and path 2 minus the transport to Transport Company B should be delivered to the grocery store with path 5. The second constraint ensures this for the Transport Company B. The third and fourth constraints are ensuring that the total amount of vegetables shipping from the Food Distributor Company and the total amount of vegetables delivered to the grocery store are both 600 tons. The constraints 5 to 10 depict the upper limits of the amount of vegetables that can be carried on paths 1 to 6. The final constraint depicts that all variables are non-negative.

=== Solution of the Problem ===
This problem can be solved using Simplex Algorithm[11] or with the CPLEX Linear Programming solver in GAMS optimization platform. The steps of the solution using the GAMS platform is as follows:

The first step is to list the variables, which are the tons of vegetables that will be transported in routes 1 to 6. The paths can be denoted as<math>x_1, x_2, x_3,..., x_6</math> . The objective function is the overall cost: z.

'''variables x1,x2,x3,x4,x5,x6,z;'''

The second step is to list the equations which are the constraints and the objective function. The objective function is a summation of the amount of vegetables carried in path i, multiplied with the tariff of path i for all i: <math>\sum_{i=1}^6 x_i t_i</math>. The GAMS code for the objective function is written below:

'''obj.. z=e= 10*x1+12.5*x2+5*x3+7.5*x4+10*x5+20*x6;'''

Overall, there are 10 constraints in this problem. The constraints 1, and 2 are equations for the paths 5 and 6. The amount carried in path 5 can be found by summing the amount of vegetables incoming to Transport Company A from path 1 and path 4, minus the amount of vegetables leaving Transport Company A with path 3. This can be attributed to the restriction that barrs the companies from keeping any vegetables and requires them to eventually deliver all the incoming produce. The equality 1 ensures that this constraint holds for path 5 and equation 2 ensures it for path 6. A sample of these constraints is written below for path 5:

'''c1.. x5 =e=x1-x3+x4;'''

Constraint 3 ensures that the sum of vegetables carried in path 1 and path 2 add to the total of 600 tons of vegetables that leave the Food Distributor Company. Likewise, the constraint 4 ensures that the sum amount of food transported in paths 5 and 6 adds up to 600 tons of vegetables that have to be delivered to the grocery store. A sample of these constraints is written below for the total delivery to the grocery store:

'''c3.. x5+x6=e=600;'''

Constraints 5 to 10 should ensure that the amount of food transported in each path should not exceed the maximum capacity depicted in the table. A sample of these constraints is written below for the capacity of path 1:

'''c5.. x1=l=250;'''

After listing the variables, objective function and the constraints, the final step is to call the CPLEX solver and set the type of the optimization problem as '''lp''' (linear programming). In this case the problem will be solved with a Linear Programming algorithm to minimize the objective (cost) function.

The GAMS code yields the results below:

'''x1 = 250, x2 = 350, x3 = 0, x4 = 50, x5 = 300, x6 = 300, z =16250.'''

== Real Life Applications ==
Network problems have many applications in all kinds of areas such as transportation, city design, resource management and financial planning.[6]

There are several special cases of network problems, such as the shortest path problem, minimum cost flow problem, assignment problem and transportation problem.[6] Three application cases will be introduced here.

=== The Minimum Cost Flow Problem ===
[[File:Pic8.jpg|thumb|Figure. 12. Illustration of the ship subnetwork.[14]]]
[[File:Pic9.jpg|thumb|Figure. 13. Illustration of cargo subnetwork.[14]]]
Minimum cost flow problems are pervasive in real life, such as deciding how to allocate temporal quay crane in container terminals, and how to make optimal train schedules on the same railroad line.[12]

R. Dewil and his group use MCNFP to assist traffic enforcement.[13] Police patrol “hot spots”, which are areas where crashes occur frequently on highways. R. Dewil studies a method intended to estimate the optimal route of hot spots. He describes the time it takes to move the detector to a certain position as the cost, and the number of patrol cars from one node to next as the flow, in order to minimize the total cost.[13]

=== The Assignment Problem ===
Dung-Ying Lin studies an assignment problem in which he aims to assign freights to ships and arrange transportation paths along the Northern Sea Route in a manner which yields maximum profit.[14] Within this network composed of a ship subnetwork and a cargo subnetwork( shown as Figure 12 and Figure 13), each node corresponds to a port at a specific time and each arc represents the movement of a ship or a cargo. Other types of assignment problems are faculty scheduling, freight assignment, and so on.

=== The Shortest Path Problem ===
Shortest path problems are also present in many fields, such as transportation, 5G wireless communication, and implantation of the global dynamic routing scheme.[15][16][17]

Qiang Tu and his group studies the constrained reliable shortest path (CRSP) problem for electric vehicles in the urban transportation network. [15] He describes the reliable travel time of path as the objective item, which is made up of planning travel time of path and the reliability item. The group studies the Chicago sketch network consisting of 933 nodes and 2950 links and the Sioux Falls network consisting of 24 nodes and 76 links. The results show that the travelers’ risk attitudes and properties of electric vehicles in the transportation network can have a great influence on the path choice.[15] The study can contribute to the invention of the city navigation system.

== Conclusion ==
Since its inception, the network flow problem has provided humanity with a straightforward and scalable approach for several large-scale challenges and problems. The Simplex algorithm and other computational optimization platforms have made addressing these problems routine, and have greatly expedited efforts for groups concerned with supply-chain and other distribution processes. The formulation of this problem has had several derivations from its original format, but its overall methodology and approach have remained prevalent in several of society’s industrial and commercial processes, even over half a century later. Classical models such as the assignment, transportation, maximal flow, and shortest path problem configurations have found their way into diverse settings, ranging from streamlining oil distribution networks along the gulf coast to arranging optimal scheduling assignments for college students amidst a global pandemic. All in all, the network flow problem and its monumental impact, have made it a fundamental tool for any group that deals with combinatorial data sets. And with the surge in adoption of data-driven models and applications within virtually all industries, the use of the network flow problem approach will only continue to drive innovation and meet consumer demands for the foreseeable future.

== References ==
1. Karp, R. M. (2008). [https://www.sciencedirect.com/science/article/pii/S1572528607000370/ George Dantzig’s impact on the theory of computation]. Discrete Optimization, 5(2), 174-185.

2. Goldberg, A. V. Tardos, Eva, Tarjan, Robert E. (1989). [http://www.cs.cornell.edu/~eva/Network.Flow.Algorithms.pdf/ Network Flow Algorithms, Algorithms and Combinatorics]. 9. 101-164.

3. Bradley, S. P. Hax, A. C., & Magnanti, T. L. (1977). Network Models. [http://web.mit.edu/15.053/www/AMP.htm/ Applied mathematical programming] (p. 259). Reading, MA: Addison-Wesley.

4. Chinneck, J. W. (2006). [https://www.optimization101.org/ Practical optimization: a gentle introduction. Systems and Computer Engineering]. Carleton University, Ottawa. 11.

5. Roy, B. V. Mason, K.(2005). [https://web.stanford.edu/~ashishg/msande111/notes/chapter5.pdf/ Formulation and Analysis of Linear Programs, Chapter 5 Network Flows].

6. Vanderbei, R. J. (2020). [https://www.springer.com/gp/book/9781461476306/ Linear programming: foundations and extensions (Vol. 285)]. Springer Nature.

7. Sobel, J. (2014). [https://econweb.ucsd.edu/~jsobel/172aw02/notes8.pdf/ Linear Programming Notes VIII: The Transportation Problem].

8. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 26.2: The Ford–Fulkerson method". Introduction to Algorithms (Second ed.). MIT Press and McGraw–Hill.

9. Jon Kleinberg; Éva Tardos (2006). "Chapter 7: Network Flow". Algorithm Design. Pearson Education.

10. [https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm/ Ford–Fulkerson algorithm]. Retrieved December 05, 2020.

11. Hu, G. (2020, November 19). [https://optimization.cbe.cornell.edu/index.php?title=Simplex_algorithm#cite_note-11/ Simplex algorithm]. Retrieved November 22, 2020.

12. Altınel, İ. K., Aras, N., Şuvak, Z., & Taşkın, Z. C. (2019). [https://www.sciencedirect.com/science/article/pii/S0166218X18304815/ Minimum cost noncrossing flow problem on layered networks]. Discrete Applied Mathematics, 261, 2-21.

13. Dewil, R., Vansteenwegen, P., Cattrysse, D., & Van Oudheusden, D. (2015). [https://core.ac.uk/download/pdf/34613916.pdf/ A minimum cost network flow model for the maximum covering and patrol routing problem]. European Journal of Operational Research, 247(1), 27-36.

14. Lin, D. Y., & Chang, Y. T. (2018). [https://www.sciencedirect.com/science/article/pii/S1366554517308037/ Ship routing and freight assignment problem for liner shipping: Application to the Northern Sea Route planning problem]. Transportation Research Part E: Logistics and Transportation Review, 110, 47-70.

15. Tu, Q., Cheng, L., Yuan, T., Cheng, Y., & Li, M. (2020). [https://www.sciencedirect.com/science/article/pii/S095965262031177X/ The Constrained Reliable Shortest Path Problem for Electric Vehicles in the Urban Transportation Network]. Journal of Cleaner Production, 121130.

16. Guo, Y., Li, S., Jiang, W., Zhang, B., & Ma, Y. (2017). [https://dl.acm.org/doi/abs/10.1016/j.phycom.2017.06.010/ Learning automata-based algorithms for solving the stochastic shortest path routing problems in 5G wireless communication]. Physical Communication, 25, 376-385.

17. Haddou, N. B., Ez-Zahraouy, H., & Rachadi, A. (2016). [https://www.infona.pl/resource/bwmeta1.element.elsevier-2eaa73bc-4e22-39aa-89b9-71ef2d7e2d63/ Implantation of the global dynamic routing scheme in scale-free networks under the shortest path strategy]. Physics Letters A, 380(33), 2513-2517.

Matrix game (LP for game theory)

2020-12-21T11:32:34Z

Wc593:

Author: David Oswalt (SysEn 5800 Fall 2020)

== Game Theory and Linear Programming ==
[[File:JohnOskar.png|thumb|John von Neumann (1903–1957) and Oskar Morgenstern (1902–1977)]]
Game theory is a formal language for modeling and analyzing the interactive behaviors of intelligent, rational decision-makers (or players). Game theory provides the mathematical methods necessary to analyze the decisions of two or more players based on their preferences to determine a final outcome. The theory was first conceptualized by mathematician Ernst Zermelo in the early 20th century. However, John von Neumann pioneered modern game theory through his book Theory of Games and Economic Behavior, written alongside co-author Oskar Morgenstern. For this reason, John von Neumann is often credited by historians as the Father of Game Theory.[1][2] This theory has provided a framework for approaching complex, high-pressure situations and has a broad spectrum of applications. These applications of game theory have helped shape modern economics and social sciences as we know them today and are discussed in the Applications section below.

Analyzing game theoretic situations is a practical application of linear programming. These situations can get quite complex mathematically, but one of the simplest forms of game is called the Finite Two-Person Zero-Sum Game (or Matrix Game for short). In a Matrix Game, two players are involved in a competitive situation in which one player’s loss is the other’s gain. Some common terms related to the Matrix Game that will be used throughout this chapter have been defined below:

* '''Game''' – Any social situation involving two or more individuals.[2]
* '''Players''' – The individuals involved in a game. In the case of two-person zero-sum games, these players are assumed to be rational and intelligent.[2]
* '''Rationality''' – A decision maker is considered to be rational if he or she makes decisions consistently in pursuit of his or her own objectives. Assuming a player to be rational implies that said player’s objective is to maximize his or her own payoff.[2]
* '''Utility''' – The scale upon which a decision’s payoff is measured.[2]

Analyzing these games uses John von Neumann’s Minimax Theorem that was derived using the Brouwer Fixed-Point Theorem. However, over time it was proven that the Matrix Game could be solved using Linear Programming along with the Duality Theorem.[3] This solution to the Matrix game has been proven in the Theory and Algorithmic Discussion section below.

== Theory and Algorithmic Discussion ==
Consider a simple two-player zero-sum matrix game called Evens and Odds. In this game, two players each wager $1 before simultaneously showing either one or two fingers. If the sum of the fingers showing is even, player 1 wins the pot for that round ($2). If the sum of the fingers showing is odd, player 2 wins the pot for that round. As with all matrix games, the assumption that both players are rational and intelligent decision makers with the goal of maximizing their own total payoff in each round applies. The expected utility for each player can be defined using a payoff matrix, ''P''. In this payoff matrix, the rows and columns represent the decisions of player 1 and player 2 respectively. The below payoff matrix represents the payoff to player 1 in this matrix game.

<math>P=\begin{bmatrix} 2 & -2 \\ -2 & 2 \end{bmatrix}</math>

The rows of this payoff matrix indicate the decision made by player 1, and the columns indicate the decision made by player 2. If player 1 puts up one finger (first row) and player 2 puts up 1 finger (first column), then player 1 wins $2. In this example, since each player has an equal ½ probability of throwing one or two fingers, neither player has a distinct advantage. Consider now a less-trivial game where the payoff matrix is no longer evenly distributed, shown below.

<math>P=\begin{bmatrix} 1 & -2 \\ -3 & 2 \end{bmatrix}</math>

While it may be intuitive that player 2 has the edge in this new game, making this determination is not as clear for much more complicated games. This is where the mathematics behind game theory comes into play. Consider a more general form of a two-person zero-sum game where two players are allowed to pick from a finite set of actions. Let <math>n </math> represent the finite number of actions that player one (or the “row player”) can choose from and <math>i </math> represent the action selected, or <math>i= 1,2,...,n </math>''.'' Likewise, let <math>m </math> represent the finite number of actions that player two (or the “column player”) can choose from and <math>j </math> represent the action selected, or <math>j= 1,2,...,m </math>. The general form of the payoff matrix for a matrix game is now shown below. Note that all positive payments go to the row player and all negative payments go to the column player.

<math>P = [p_{ij}]</math>

Next, we assume that each player is making a random selection in accordance with a fixed probability distribution. This probability distribution is defined by what is called the ''stochastic vector,'' <math>y</math>. Each component of the stochastic vector, <math>y_i </math>, denotes the probability that the row player selects action <math>i </math>. This stochastic vector is made up of nonnegative probabilities that sum up to one per the fundamental law of probability:

<math>y \geq 0 \text{ and } e^Ty=1, </math>

where e is a vector of all ones. Likewise, the stochastic vector for the column player can be defined as <math>x </math>, with the probabilities that this player selects action <math>j </math> denoted by<math>x_j </math>. To compute the expected payoff to the column player, the payoff from each outcome <math>(i,j) </math> for all <math>i = 1,2,...,n </math> and <math>j= 1,2,...,m </math> times the probability of that outcome are summed. Thus, the column player’s expected payoff is defined as

<math>\sum_{i,j}y_ia_{ij}x_j = y^T Px</math>.

Since we have assumed that our column player acts rationally, we can expect them to act in accordance with the stochastic vector x. In other words, the column player has adopted strategy x. The row player’s best option for defending against strategy x is to adopt strategy y*, in which they act to minimize the column player’s payout:

<math>\begin{align}
\text{min} & ~~ y^TPx\\
\text{s.t} & ~~ e^Ty=1 \\
& ~~ y \geq 0 \\
\end{align}</math>

By assuming that our column player acts intelligently, this implies that they are aware of the row player’s strategy to minimize their payoff. Hence, the column player can employ strategy x* that maximizes their payoff given the row player’s strategy y* with the following maximum:

<math>\max_{x} \min_{y} y^T Px</math>

The above equation can be solved by reformulating it as a linear program. By taking the inner optimization over the deterministic strategies, this equation can be re-written as:

<math>\begin{align}
\text{max} & ~~ \text{min}_i e_i ^T Px\\
\text{s.t} & ~~ \sum_{j=1}^n x_{j} = 1\\
& ~~ x_j \geq 0 & ~~ j = 1, 2, ..., n \\
\end{align}</math>

In order to put a lower bound on the minimization term, a new variable ''v'' is introduced. This gives us the following linear program:

<math>\begin{align}
\text{max} & ~~ v\\
\text{s.t} & ~~ v \leq e_i^T Px & ~~ i = 1, 2, ..., m\\
& ~~ \sum_{j=1}^n = 1 \\
& ~~ x_j \geq 0 & ~~ j = 1, 2, ..., n \\
\end{align}</math>

or in vector notation,

<math>\begin{align}
\text{max} & ~~ v\\
\text{s.t} & ~~ ve-Px \leq 0\\
& ~~ e^T x =1\\
& ~~ x \geq 0\\
\end{align}</math>

The above max-min linear program governs the column player’s strategy x*. We can use this linear program to determine the row player’s strategy y* by taking the duel to yield a min-max linear program:

<math>\min_{x} \max_{y} y^T Px</math>

Similarly to the max-min linear program used for the column player’s strategy, the above equation can be reformulated into a linear program by taking the inner optimization over the deterministic strategies and introducing a new variable u:

<math>\begin{align}
\text{max} & ~~ u\\
\text{s.t} & ~~ ue-P^Ty \leq 0\\
& ~~ e^T y =1\\
& ~~ y \geq 0\\
\end{align}</math>

These linear programs can be solved to find the optimal strategies <math>x*</math> and <math>y*</math>. The Minimax Theorem can now be used to verify that both solutions are consistent with one another. The Minimax Theorem states that there exist stochastic vectors <math>x*</math>and <math>y*</math>for which

<math>\max_{x} y^{*T} Px = \min_{y} y^T Px^*</math>

In order to prove the Minimax Theorem, we first consider the fact that

<math>v^* = \min_{i} e_i ^T Px^* = \min_{y} y^T Px*,</math>

and

<math>u* = \max_{j} e_j ^T P^T y^* = \max_{x} x^T P^T y* = \max_{x} y^{*T}Px</math>

Since the max-min linear program for x* and the min-max linear program for y* are duals of one another, we can assume that v* = u*. Therefore,

<math>\max_{x} y^{*T} Px = \min_{y} y^T Px^*</math>

By solving the above equation for the optimal values v* = u* yields what is called the value of the game. The value of a game shows how much utility each player can expect to gain or lose on average. In the event that v* = u* = 0, the game is considered to be fair, meaning neither player has a distinct disadvantage. In order to illustrate the power of the minimax theorem in solving matrix games, a numerical example has been provided in the section below.

== Numerical Example ==
Many decisions made in sports can be modeled as finite two-person zero-sum games. Take, for example, a common dilemma seen in American football. The offense has driven down the field and is just a few short yards of scoring. The team has four plays, or ''downs'', to score. On the third down, the team gets stopped by the defense and is unable to score, leaving only one more play to make it happen. There are two options for scoring. The first is a field goal, in which the team kicks the ball through the uprights for 3 points. The second option is to run a passing or running play for a touchdown, worth 7 points. This is often referred to as a “Fourth and Goal” situation and is a dilemma that play-callers face in most football games. While the option of scoring a touchdown yields a higher payoff, it is a much risker option as running and passing plays are easier to defend against than a field goal. For this reason, football coaches often settle on kicking a field goal on 4th down instead of going for it. This anticlimactic end to a long and exciting drive often leaves fans with an unsatisfying feeling, knowing that their team was only a few yards from scoring a touchdown. While kicking the field goal nearly guarantees 3 points, is it smarter to employ a more aggressive strategy and go for the touchdown? Game theory can help determine the strategy that will yield the highest amount of points on average over time.

There are a few assumptions to be made in order to model this Fourth and Goal Dilemma. The first is that both football teams are ideal. What this means is that if the offense chooses a run play and the defense chooses to defend a run play, then the run will be stopped with zero yards gained. It also means that if the offense chooses a run play and the defense incorrectly chooses to defend a passing play, then the play will be successful with a touchdown scored. We are also assuming that if the offense chooses to kick a field goal, then it is guaranteed to be successful. This is assumed due to the fact that field goals from just a few yards out are very rarely missed. The final assumption is that all other factors contributing to play calling are neglected. This could include situations such as the offense being down 2 points with only a few seconds on the clock, when a field goal for 3 points would be the obvious best strategy. With this strategy in mind, a the payoff to the offense can be outlined as follows:
{| class="wikitable"
|+4th and Goal Dilemma Payoff
!
!
! colspan="3" |Defense
|-
!
!
!Run
!Pass
!FG
|-
| rowspan="3" |'''Offense'''
|'''Run'''
| 0
|7
|7
|-
|'''Pass'''
|7
| 0
|7
|-
|'''FG'''
|3
|3
|3
|}
The above payoff table can also be depicted by the following payoff matrix, <math>P</math>, where the columns represent the defensive team's actions and the rows represent the offensive team's actions.

<math>P = \begin{bmatrix} 0 & 7 & 7 \\ 7 & 0 & 7 \\ 3 & 3 & 3 \end{bmatrix}</math>

In order to determine their optimal strategy, the offense must solve the below linear program:

<math>\begin{align}
\text{min} & ~~ w \\
\text{s.t.} & ~~ \begin{bmatrix} 0 & 7 & 7 & 1 \\ 7 & 0 & 7 & 1\\ 3 & 3 & 3 & 1\\ 1 & 1 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ w \end{bmatrix} \begin{matrix} \geq \\ \geq \\ \geq \\ = \end{matrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ w \end{bmatrix} \\
\end{align}</math>

The above linear program has been solved using a concrete model with the GLPK solver package in ''Pyomo'', a Python-based computational optimization modeling language. The solution shows that the the offense should adopt the following strategy to maximize the amount of points scored on average over time:

<math>x^* = \begin{bmatrix} 0.50 \\ 0.50 \\ 0 \end{bmatrix}</math>

Using the stochastic vector <math>y^*</math>defined above, the value of the game can be computed:

<math>w^* = 3.5</math>

This means that if the offense runs a pass play 50% of the time, runs a running play 50% of the time and never chooses to kick the field goal, they can expect a payout of at least 3.5 points on average over time. This scenario, while vastly oversimplified, demonstrates the power of applying linear programming to determine optimal strategies in finite two-person zero-sum games. It also demonstrates that it pays dividends to make aggressive play-calling decisions in sports such as football.

== Other Applications of the Matrix Game ==
The rise of game theory spanned the time frame in which both World War I and World War II occurred, so naturally one of the earliest applications was in developing winning military strategies. Game theory was used to make high-pressure decisions on attack and defense strategies that optimized their impact within a set of constraints. The Battle of Bismarck Sea between Japanese and American forces in 1943 is one of the most historic examples of game theory in this context. In this battle, the US Air Force analyzed an attack situation using a two-person zero-sum game to maximize the amount of time they had to bomb a Japanese naval fleet, given the limited information they had about the convoy’s route. This demonstrates the fact that the word “game” in “game theory” can be misleading. Not all applications of game theory are fun games and many applications can have serious consequences.

One of the other earlier applications of game theory was in economics. This ended up growing into one of the more significant applications of game theory and has formed modern economics as we know it today. The theory played a major role in the development of many sub-disciplines of economics, such as industrial organization, international trade, labor economics, and macroeconomics.[1] As game theory matured, its applications expanded into various fields of social science, including political science, international relations, philosophy, sociology and anthropology. It is also used in biology and computer science. To this day, economics remains the most prominent application of game theory.

== Conclusion ==
Situations modeled as finite two-person zero-sum games, or ''Matrix Games,'' tend to be oversimplified and not have much practical use. However, solving matrix games using linear programming is merely an introduction into the power of analyzing stochastic decision making using computational optimization methods. Game theory has revolutionized the world's approach to disciplines such as economics, war, intelligence, biology, computer science, political science and many more. The methods used to solve game theoretic models continue to evolve and will subsequently continue to change the way decision makers approach the world around us.

== References ==
[1] Bonanno, Giacomo. ''Game Theory''. 2nd ed., CreateSpace Independent Publishing Platform, 2015.

[2] Myerson, Roger B. ''Game Theory Analysis of Conflict''. Harvard University Press, 2013.

[3] Vanderbei, Robert J. ''Linear Programming: Foundations and Extensions''. 2nd ed., Kluwer, 2004.

[4] “Blog: Five Early AI Geniuses: John Von Neumann.” ''Tim McCloud'', 19 June 2019, timmccloud.net/blog-5%E2%80%8A-%E2%80%8Aearly-ai-geniuses-john-von-neumann-and-chess/.

Optimization with absolute values

2020-12-21T11:31:56Z

Wc593:

Authors: Matthew Chan (mdc297), Yilian Yin (yy896), Brian Amado (ba392), Peter Williams (pmw99), Dewei Xiao (dx58) (SysEn 5800 Fall 2020)

== Introduction ==
Absolute values can make it relatively difficult to determine the optimal solution when handled without first converting to standard form. This conversion of the objective function is a good first step in solving optimization problems with absolute values. As a result, one can go on to solve the problem using linear programing techniques. With the addition of a new variable (ex: <math>\textstyle X^a </math>) in the objective function the problem is considered nonlinear. Additional constraints must be added to find the optimal solution.

== Method ==

=== Defining Absolute Values ===
An absolute value of a real number can be described as its distance away from zero, or the non-negative magnitude of the number. <ref> Mendelson, Elliott, Schaum's Outline of Beginning Calculus, McGraw-Hill Professional, 2008. https://books.google.com/books?id=A8hAm38zsCMC&pg=PA2#v=onepage&q&f=false </ref> Thus,

<math>\displaystyle |x|={\begin{cases}-x,&{\text{if }}x<0\\x,&{\text{if }}x\geq 0\end{cases}}</math>

Absolute values can exist in linear optimization problems in two primary instances: in constraints and in the objective function. <ref> "Absolute Values." ''lp_solve'', http://lpsolve.sourceforge.net/. Accessed 20 Nov. 2020. </ref>

=== Absolute Values in Constraints ===
Within constraints, absolute value relations can be transformed into one of the following forms:

<math> \begin{align}
|X| &= 0 \\
|X| &\le C \\
|X| &\ge C
\end{align} </math>

Where <math>\textstyle X</math> is a linear combination (<math>\textstyle ax_1 + bx_2 + ...</math> where <math>\textstyle a, b</math> are constants) and <math>\textstyle C</math> is a constant <math>\textstyle > 0</math>.

==== Form when <math>\displaystyle |X| = 0</math> ====
In this form, the only possible solution is if <math>\displaystyle X = 0</math> simplifying the constraint. Note that this solution also occurs if the constraint is in the form <math>\displaystyle |X| \le 0</math> due to the same conclusion that the only possible solution is <math>\textstyle X = 0</math>.

==== Form when <math>\displaystyle |X| \le C</math> ====
The second form a linear constraint can exist in is <math>\displaystyle |X|\leq C</math>. In this case, an equivalent feasible solution can be described by splitting the constraint into two:

<math> \begin{align}
X &\leq C \\
-X &\leq C
\end{align} </math>

The solution can be understood visually since <math>\textstyle X</math> must lie between <math>\textstyle -C</math> and <math>\textstyle C</math>, as shown below:

[[File:Number Line X Less Than C.png|none|thumb]]

==== Form when <math>\displaystyle |X| \ge C</math> ====
Visually, the solution space for the last form is the complement of the second solution above, resulting in the following representation:[[File:Number Line for X Greater Than C.png|none|thumb]]In expression form, the solutions can be written as:

<math> \begin{align}
X &\geq C \\
-X &\geq C
\end{align} </math>

As seen visually, the feasible region has a gap and thus non-convex. The expressions also make it impossible for both to simultaneously hold true. This means that it is not possible to transform constraints in this form to linear equations. <ref> ''Optimization Methods in Management Science / Operations Research.'' Massachusetts Institute of Technology, Spring 2013, https://ocw.mit.edu/courses/sloan-school-of-management/15-053-optimization-methods-in-management-science-spring-2013/tutorials/MIT15_053S13_tut04.pdf. Accessed 20 Nov. 2020. </ref>

An approach to reach a solution for this particular case exists in the form of Mixed-Integer Linear Programming, where only one of the equations above is “active”.

The inequality can be reformulated into the following:

<math> \begin{align}
&X + N*Y \ge C \\
-&X + N*(1-Y) \ge C \\
&Y = 0, 1
\end{align} </math>

With this new set of constraints, a large constant <math>\textstyle N</math> is introduced, along with a binary variable <math>\textstyle Y</math>. So long as <math>\textstyle N</math> is sufficiently larger than the upper bound of <math>\textstyle X + C</math>, the large constant multiplied with the binary variable ensures that one of the constraints must be satisfied. For instance, if <math>\textstyle Y = 0</math>, the new constraints will resolve to:

<math> \begin{align}
&X \ge C \\
-&X + N \ge C
\end{align} </math>

Since <math>\textstyle N</math> is sufficiently large, the latter constraint will always be satisfied, leaving only one relation active: <math>\textstyle X \ge C</math>. Functionally, this allows for the XOR logical operation of <math>\textstyle X \geq C</math> and <math>\textstyle -X \geq C</math>.

=== Absolute Values in Objective Functions ===
In objective functions, to leverage transformations of absolute functions, all constraints must be linear.

Similar to the case of absolute values in constraints, there are different approaches to the reformation of the objective function, depending on the satisfaction of sign constraints. The satisfaction of sign constraints is when the coefficient signs of the absolute terms must all be either:

* Positive for a minimization problem
* Negative for a maximization problem

==== Sign Constraints are Satisfied ====
At a high level, the transformation works similarly to the second case of absolute value in constraints – aiming to bound the solution space for the absolute value term with a new variable, <math>\textstyle Z</math>.

If <math>\textstyle |X|</math> is the absolute value term in our objective function, two additional constraints are added to the linear program:

<math> \begin{align}
&X\leq Z \\
-&X\leq Z
\end{align} </math>

The <math>\textstyle |X|</math> term in the objective function is then replaced by <math>\textstyle Z</math>, relaxing the original function into a collection of linear constraints.

==== Sign Constraints are Not Satisfied ====
In order to transform problems where the coefficient signs of the absolute terms do not fulfill the conditions above, a similar conclusion is reached to that of the last case for absolute values in constraints – the use of integer variables is needed to reach an LP format.

The following constraints need to be added to the problem:

<math> \begin{align}
&X + N*Y \ge Z \\
-&X + N*(1-Y) \ge Z \\
&X \le Z \\
-&X \le Z \\
&Y = 0, 1
\end{align} </math>

Again, <math>\textstyle N</math> is a large constant, <math>\textstyle Z</math> is a replacement variable for <math>\textstyle |X|</math> in the objective function, and <math>\textstyle Y</math> is a binary variable. The first two constraints ensure that one and only one constraint is active while the other will be automatically satisfied, following the same logic as above. The third and fourth constraints ensure that <math>\textstyle Z</math> must be equal to <math>\textstyle |X|</math> and has either a positive or negative value. For instance, for the case of <math>\textstyle Y = 0</math>, the new constraints will resolve to:

<math> \begin{align}
&X \ge Z \\
-&X + N \ge Z \\
&X \le Z \\
-&X \le Z
\end{align} </math>

As <math>\textstyle N</math> is sufficiently large (<math>\textstyle N</math> must be at least <math>\textstyle 2|X|</math> for this approach), the second constraint must be satisfied. Since <math>\textstyle Z</math> is non-negative, the fourth constraint must also be satisfied. The remaining constraints, <math>\textstyle X \ge Z</math> and <math>\textstyle X \le Z</math> can only be satisfied when <math>\textstyle Z = X</math> and is of non-negative signage. Together, these constraints will allow for the selection of the largest <math>\textstyle |X|</math> for maximization problems (or smallest for minimization problems).

=== Absolute Values in Nonlinear Optimization Problems ===
The addition of a new variable <math> (X_a) </math> to an objective function with absolute value quantities forms a nonlinear optimization problem. The absolute value quantities would require that the problem be reformatted before proceeding. Additional constraints must be added to account for the added variable.

==Numerical Example==
'''Example when All Sign Constraints are Satisfied'''

<math> \begin{align}
\min \quad &{2|x_1| + 3|x_2| + |x_3|} \\
s.t. \quad &x_1 + 2x_2 - 3x_3 \le 8 \\
&2x_1 - x_2 + 4x_3= 14
\end{align}</math>

The absolute value quantities will be replaced with single variables:

<math>|x_1| = U_1 </math>

<math>|x_2| = U_2</math>

<math>|x_3| = U_3</math>

We must introduce additional constraints to ensure we do not lose any information by doing this substitution:

<math> -U_1 \le x_1 \le U_1 </math>

<math> -U_2 \le x_2 \le U_2 </math>

<math> -U_3 \le x_3 \le U_3 </math>

The problem has now been reformulated as a linear programming problem that can be solved normally:

<math> \begin{align}
\min \quad &{ 2U_1 + 3U_2 + U_3} \\
s.t. \quad &x_1 + 2x_2 - 3x_3 \le 8 \\
&2x_1 - x_2 + 4x_3= 14 \\
-&U_1 \le x_1 \le U_1 \\
-&U_2 \le x_2 \le U_2 \\
-&U_3 \le x_3 \le U_3
\end{align}</math>

The optimum value for the objective function is <math>3.5</math>, which occurs when <math>x_1 = 0 </math> and <math>x_2 = 0 </math> and <math>x_3 = 3.5 </math>.

'''Example when Sign Constraints are not Satisfied'''

<math> \begin{align}
\min \quad &{2|x_1| + 3|x_2| - |x_3|} \\
s.t. \quad &x_1 + 2x_2 - 3x_3 \le 8 \\
&2x_1 - x_2 + 4x_3= 14
\end{align}</math>

The absolute value quantities will be replaced with single variables:

<math>|x_1| = U_1 </math>

<math>|x_2| = U_2</math>

<math>|x_3| = U_3</math>

We must introduce additional constraints to ensure we do not lose any information by doing this substitution:

<math> \begin{align}
-&U_1 \le x_1 \le U_1 \\
-&U_2 \le x_2 \le U_2 \\
&x_3 + M*Y \ge U_3 \\
-&x_3 + M*(1-Y) \ge U_3 \\
&x_3 \le U_3 \\
-&x_3 \le U_3 \\
&Y = 0,1
\end{align}</math>

The problem has now been reformulated as a linear programming problem that can be solved normally:
<ref> Shanno, David F., and Roman L. Weil. “'Linear' Programming with Absolute-Value Functionals.” Operations Research, vol. 19, no. 1, 1971, pp. 120–124. Accessed 13 Dec. 2020. JSTOR, www.jstor.org/stable/168871. </ref>

<math> \begin{align}
\min \quad &{ 2U_1 + 3U_2 - U_3} \\
s.t. \quad &x_1 + 2x_2 - 3x_3 \le 8 \\
&2x_1 - x_2 + 4x_3= 14 \\
-&U_1 \le x_1 \le U_1 \\
-&U_2 \le x_2 \le U_2 \\
&x_3 + M*Y \ge U_3 \\
-&x_3 + M*(1-Y) \ge U_3 \\
&x_3 \le U_3 \\
-&x_3 \le U_3 \\
&Y = 0,1
\end{align}</math>

The optimum value for the objective function is <math>-3.5</math>, which occur when <math>x_1 = 0 </math> and <math>x_2 = 0 </math> and <math>x_3 = 3.5 </math>.

== Applications ==

Consider the problem <math>Ax=b; \quad max \quad z= x c,jx,i</math>. This problem cannot, in general, be solved with the simplex method. The problem has a simplex method solution (with unrestricted basis entry) only if c, are nonpositive (non-negative for minimizing problems).

The primary application of absolute-value functionals in linear programming has been for absolute-value or L(i)-metric regression analysis. Such application is always a minimization problem with all C(j) equal to 1 so that the required conditions for valid use of the simplex method are met.

By reformulating the original problem into a Mixed-Integer Linear Program (MILP), we can utilize known programs to solve for the optimal solution(s).

=== Application in Financial: Portfolio Selection===
Under this topic, the same tricks played in the Numerical Example section to perform '''Reduction to a Linear Programming Problem''' will be applied here again, to reform the problem into a MILP in order to solve the problem. An example is given as below.

A portfolio is determined by what fraction of one's assets to put into each investment. <ref> Vanderbei R.J. (2008) Financial Applications. In: Linear Programming. International Series in Operations Research & Management Science, vol 114. Springer, Boston, MA. <nowiki>https://doi.org/10.1007/978-0-387-74388-2_13</nowiki> https://link.springer.com/chapter/10.1007/978-0-387-74388-2_13 </ref> It can be denoted as a collection of nonnegative numbers <math>\textstyle x_j</math>, where <math> j = 1, 2,...,n </math>. Because each <math> \textstyle x_j </math>stands for a portion of the assets, it sums to one. In order to get a highest reward through finding a right mix of assets, let <math>\mu</math>, the positive parameter, denote the importance of risk relative to the return, and <math>/textstyle Rj</math> denote the return in the next time period on investment <math>j, j = 1, 2,..., n</math>. The total return one would obtain from the investment is <math>R = \sum_{j}\!x_j\!R_j </math>. The expected return is <math>\mathbb{E}\!R = \sum_{j}\!x_j\mathbb{E}\!R_j </math>. And the Mean Absolute Deviation from the Mean (MAD) is <math>\mathbb{E}\left\vert \!R - \mathbb{E}\!R \right\vert = \mathbb{E}\left\vert \sum_{j}\!x_j\tilde{R}_j \right\vert </math>.

maximize <math display="inline">\mu\sum_j\!x_j\mathbb{E}\!R_j - \mathbb{E}\left\vert \sum_j \!x_j\tilde{R}_j \right\vert </math>

subject to <math>\sum_j\!x_j = 1</math>

<math>x_j \geq 0</math> , <math> j = 1,2,..n.</math>

where <math>\tilde{R}_j = \!R_j - \mathbb{E}\!R_j </math>

Very obviously, this problem is not a linear programming problem yet. Similar to the numerical example showed above, the right thing to do is to replace each absolute value with a new variable and impose inequality constraints to ensure that the new variable is the appropriate absolute value once an optimal value is obtained. To simplify the program, an average of the historical returns can be taken in order to get the mean expected return: <math>r_j = \mathbb{E}\!R_j = \left ( \frac{1}{T} \right ) \sum_{t=1}^T \!R_j(t)
</math>. Thus the objective function is turned into: <math>\mu\sum_{j}\!x_j\!r_j - \left ( \frac{1}{T} \right ) \sum_{t=1}^T\left\vert \sum_{j} \!x_j \bigl(R_j (t) - \!r_j\bigr) \right\vert
</math>

Now, replace <math>\left\vert \sum_{j} \!x_j \bigl(R_j (t) - \!r_j\bigr) \right\vert
</math> with a new variable <math>y_t
</math>and thus the problem can be rewrote as:

maximize <math>\mu \sum_j \!x_j\!r_j - \left ( \frac{1}{T} \right ) \sum_{t=1}^T \!y_t

</math>

subject to <math>-\!y_t \leq \sum_{j} \!x_j \bigl(R_j (t) - \!r_j\bigr) \leq y_t
</math>. t = 1, 2,...,T

where <math>\sum_j \!x_j = 1

</math>

<math>x_j\geq 0

</math>. j = 1, 2,...,n

<math>y_t \geq 0

</math>. t = 1, 2,...,T

So finally, after some simplifications methods and some tricks applied, the original problem is converted into a linear programming which is easier to be solved further.

===Data Transfer Rate===
Another application of optimization with absolute values is data transfer rate. Faster-than-nyquist, or FTNS, is a framework to transmit signals beyond the Nyquist rate. The refence to this section proposed a 24.7% faster symbol rate by utilizing Sum-of-Absolute-Values optimization. <ref>Sasahara, Hampei & Hayashi, Kazunori & Nagahara, Masaaki. (2016). Symbol Detection for Faster-Than-Nyquist Signaling by Sum-of-Absolute-Values Optimization. IEEE Signal Processing Letters. PP. 1-1. 10.1109/LSP.2016.2625839. https://www.researchgate.net/publication/309745511_Symbol_Detection_for_Faster-Than-Nyquist_Signaling_by_Sum-of-Absolute-Values_Optimization </ref>

The initial model is defined as follows:
<math>\displaystyle x_0 (t) = \sum^N_{n=1} x_{n,0} h_n (t), t \in [0,T] </math>

where t ∈ R denotes the continuous time index, N ∈ N is the number of transmitted symbols in each transmission period, T > 0 is the interval of one period, <math>x_{n,0}</math> ∈ {+1, −1} are independent and identically distributed (i.i.d.) binary symbols [i.e., binary phase shift keying (BPSK)], and <math>h_n (t) (n = 1,...,N) </math> are the modulation pulses.

Reformulated as a convex optimization problem and repeating Newton’s method with absolute values, the solution approximates can be achieved:
<math>\displaystyle \min_{z \in R^N} (\lambda \Vert y - Hz \Vert^2_2 + \frac{1}{2} \Vert z - 1_N \Vert_1 + \frac{1}{2} \Vert z + 1_N \Vert_1 ) </math>

== Conclusion ==
The presence of an absolute value within the objective function prevents the use of certain optimization methods. Solving these problems requires that the function be manipulated in order to continue with linear programming techniques like the simplex method. The applications of optimization with absolute values range from the financial sector to the digital world where data transfer rates can be improved as well as improving portfolio returns. The way these problems are formulated, must take absolute values into account in order to model the problem correctly. The absolute values inherently make these problems non-linear so determining the most optimal solutions is only achievable after reformulating them into linear programs.

== References ==
<references />

Interior-point method for LP

2020-12-21T11:31:15Z

Wc593:

Authors: Tomas Lopez Lauterio, Rohit Thakur and Sunil Shenoy (SysEn 5800 Fall 2020) 

== Introduction ==
Linear programming problems seek to optimize linear functions given linear constraints. There are several applications of linear programming including inventory control, production scheduling, transportation optimization and efficient manufacturing processes. Simplex method has been a very popular method to solve these linear programming problems and has served these industries well for a long time. But over the past 40 years, there have been significant number of advances in different algorithms that can be used for solving these types of problems in more efficient ways, especially where the problems become very large scale in terms of variables and constraints.<ref> "Practical Optimization - Algorithms and Engineering Applications" by Andreas Antoniou and Wu-Sheng Lu, ISBN-10: 0-387-71106-6 </ref> <ref> "Linear Programming - Foundations and Extensions - 3rd edition''" by Robert J Vanderbei, ISBN-113: 978-0-387-74387-5. </ref> In early 1980s Karmarkar (1984) <ref> N Karmarkar, "A new Polynomial - Time algorithm for linear programming", Combinatorica, VOl. 4, No. 8, 1984, pp. 373-395.</ref> published a paper introducing interior point methods to solve linear-programming problems. A simple way to look at differences between simplex method and interior point method is that a simplex method moves along the edges of a polytope towards a vertex having a lower value of the cost function, whereas an interior point method begins its iterations inside the polytope and moves towards the lowest cost vertex without regard for edges. This approach reduces the number of iterations needed to reach that vertex, thereby reducing computational time needed to solve the problem. 

=== Lagrange Function ===
Before getting too deep into description of Interior point method, there are a few concepts that are helpful to understand. First key concept to understand is related to Lagrange function. Lagrange function incorporates the constraints into a modified objective function in such a way that a constrained minimizer <math> (x^{*}) </math> is connected to an unconstrained minimizer <math> \left \{x^{*},\lambda ^{*} \right \} </math> for the augmented objective function <math> L\left ( x , \lambda \right ) </math>, where the augmentation is achieved with <math> 'p' </math> Lagrange multipliers. <ref> Computational Experience with Primal-Dual Interior Point Method for Linear Programming''" by Irvin Lustig, Roy Marsten, David Shanno </ref><ref> "Practical Optimization - Algorithms and Engineering Applications" by Andreas Antoniou and Wu-Sheng Lu, ISBN-10: 0-387-71106-6 </ref> 
To illustrate this point, consider a simple an optimization problem: 
minimize <math> f\left ( x \right ) </math> 
subject to: <math> A \cdot x = b </math> 
where, <math> A \, \in \, R^{p\, \times \, n} </math> is assumed to have a full row rank
Lagrange function can be laid out as: 
<math>L(x, \lambda ) = f(x) + \sum_{i=1}^{p}\lambda _{i}\cdot a_{i}(x)</math> 
where, <math> '\lambda ' </math> introduced in this equation is called Lagrange Multiplier. 
=== Newton's Method ===
Another key concept to understand is regarding solving linear and non-linear equations using Newton's methods.
Assume an unconstrained minimization problem in the form: 
minimize <math> g\left ( x \right ) </math> , where <math> g\left ( x \right ) </math> is a real valued function with <math> 'n' </math> variables. 
A local minimum for this problem will satisfy the following system of equations: 
<math>\left [ \frac{\partial g(x)}{\partial x_{1}} ..... \frac{\partial g(x)}{\partial x_{n}}\right ]^{T} = \left [ 0 ... 0 \right ]</math> 

The Newton's iteration looks like: 
<math>x^{k+1} = x^{k} - \left [ \nabla ^{2} g(x^{k}) \right ]^{-1}\cdot \nabla g(x^{k})</math> 
 

== Theory and algorithm ==
[[File:Visualization.png|685x685px|Visualization of Central Path method in Interior point|thumb]]

Given a linear programming problem with constraint equations that have inequality terms, the inequality term is typically replaced with an equality term using slack variables. The new reformulation can be discontinuous in nature and to replace the discontinuous function with a smoother function, a logarithmic form of this reformulation is utilized. This nonlinear objective function is called "''Logarithmic Barrier Function''"
The process involves starting with formation of a primal-dual pair of linear programs and then using "''Lagrangian function''" form on the "''Barrier function''" to convert the constrained problems into unconstrained problems. These unconstrained problems are then solved using Newton's method as shown above. 

=== Problem Formulation ===

Consider a combination of primal-dual problem below: 
('''Primal Problem formulation''') 
→ minimize <math> c^{T}x </math> 
Subject to: <math> Ax = b </math> and <math> x \geq 0 </math> 
('''Dual Problem formulation''') 
→ maximize <math> b^{T}y </math> 
Subject to: <math> A^{T}y + \lambda = c </math> and <math> \lambda \geq 0 </math> 
<math> '\lambda ' </math> vector introduced represents the slack variables. 

The Lagrangian functional form is used to configure two equations using "''Logarithmic Barrier Function''" for both primal and dual forms mentioned above: 
Lagrangian equation for Primal using Logarithm Barrier Function : <math> L_{p}(x,y) = c^{T}\cdot x - \mu \cdot \sum_{j=1}^{n}log(x_{j}) - y^{T}\cdot (Ax - b) </math> 
Lagrangian equation for Dual using Logarithm Barrier Function : <math> L_{d}(x,y,\lambda ) = b^{T}\cdot y + \mu \cdot \sum_{j=1}^{n}log(\lambda _{j}) - x^{T}\cdot (A^{T}y +\lambda - c) </math> 

Taking the partial derivatives of Lp and Ld with respect to variables <math> 'x'\; '\lambda'\; 'y' </math>, and forcing these terms to zero, we get the following equations: 
<math> Ax = b </math> and <math> x \geq 0 </math> 
<math> A^{T}y + \lambda = c </math> and <math> \lambda \geq 0 </math> 
<math> x_{j}\cdot \lambda _{j} = \mu </math> for ''j''= 1,2,.......''n'' 

where, <math> '\mu ' </math> is strictly positive scaler parameter. For each <math> \mu > 0 </math> , the vectors in the set <math> \left \{ x\left ( \mu \right ), y\left ( \mu \right ) , \lambda \left ( \mu \right )\right \} </math> satisfying above equations, can we viewed as set of points in <math> R^{n} </math> , <math> R^{p} </math>, <math> R^{n} </math> respectively, such that when <math> '\mu ' </math> varies, the corresponding points form a set of trajectories called ''"Central Path"''. The central path lies in the ''"Interior"'' of the feasible regions. There is a sample illustration of ''"Central Path"'' method in figure to right. Starting with a positive value of <math> '\mu ' </math> and as <math> '\mu ' </math> approaches 0, the optimal point is reached. 

Let Diagonal[...] denote a diagonal matrix with the listed elements on its diagonal.
Define the following: 
'''X''' = Diagonal [<math> x_{1}^{0}, .... x_{n}^{0} </math>] 
<math> \lambda </math> = Diagonal (<math> \lambda _{1}^{0}, .... \lambda _{n}^{0} </math> ) 
'''eT''' = (1 .....1) as vector of all 1's. 
Using these newly defined terms, the equation above can be written as: 
<math> X\cdot \lambda \cdot e = \mu \cdot e </math> 

=== Iterations using Newton's Method ===
Employing the Newton's iterative method to solve the following equations: 
<math> Ax - b = 0 </math> 
<math> A^{T}y + \lambda = c </math> 
<math> X\cdot \lambda \cdot e - \mu \cdot e = 0</math> 
With definition of starting point that lies within feasible region as <math> \left ( x^{0},y^{0},\lambda ^{0} \right ) </math> such that <math> x^{0}> 0 \, and \lambda ^{0}> 0 </math>.
Also defining 2 residual vectors for both the primal and dual equations: 
<math> \delta _{p} = b - A\cdot x^{0} </math> 
<math> \delta _{d} = c - A^{0}\cdot y^{0} - \lambda ^{0} </math> 

Applying Newton's Method to solve above equations: 
<math> \begin{bmatrix}
A & 0 & 0\\
0 & A^{T} & 1\\
\lambda & 0 & X
\end{bmatrix} \cdot \begin{bmatrix}
\delta _{x}\\
\delta _{y}\\
\delta _{\lambda }
\end{bmatrix} = \begin{bmatrix}
\delta _{p}\\
\delta _{d}\\
\mu \cdot e - X\cdot \lambda \cdot e
\end{bmatrix}
</math> 
So a single iteration of Newton's method involves the following equations. For each iteration, we solve for the next value of <math> x^{k+1},y^{k+1},\lambda ^{k+1} </math>: 
<math> (A\lambda ^{-1}XA^{T})\delta _{y} = b- \mu A\lambda^{-1} + A\lambda ^{-1}X\delta _{d} </math> 
<math> \delta _{\lambda} = \delta _{d}\cdot A^{T}\delta _{y} </math> 
<math> \delta _{x} = \lambda ^{-1}\left [ \mu \cdot e - X\lambda e -\lambda \delta _{z}\right ] </math> 
<math> \alpha _{p} = min\left \{ \frac{-x_{j}}{\delta _{x_{j}}} \right \} </math> for <math> \delta x_{j} < 0 </math> 
<math> \alpha _{d} = min\left \{ \frac{-\lambda_{j}}{\delta _{\lambda_{j}}} \right \} </math> for <math> \delta \lambda_{j} < 0 </math> 

The value of the the following variables for next iteration (+1) is determined by: 
<math> x^{k+1} = x^{k} + \alpha _{p}\cdot \delta _{x} </math> 
<math> y^{k+1} = y^{k} + \alpha _{d}\cdot \delta _{y} </math> 
<math> \lambda^{k+1} = \lambda^{k} + \alpha _{d}\cdot \delta _{\lambda} </math> 

The quantities <math> \alpha _{p} </math> and <math> \alpha _{d} </math> are positive with <math> 0\leq \alpha _{p},\alpha _{d}\leq 1 </math>. 
After each iteration of Newton's method, we assess the duality gap that is given by the expression below and compare it against a small value <big>ε</big> 
<math> \frac{c^{T}x^{k}-b^{T}y^{k}}{1+\left | b^{T}y^{k} \right |} \leq \varepsilon </math> 
The value of <big>ε</big> can be chosen to be something small 10-6, which essentially is the permissible duality gap for the problem. 

== Numerical Example ==

Maximize 
<math> 3X_{1} + 3X_{2} </math> 

such that 
<math> X_{1} + X_{2} \leq 4, </math> 
<math> X_{1} \geq 0, </math> 
<math> X_{2} \geq 0, </math> 

Barrier form of the above primal problem is as written below:

<math> P(X,\mu) = 3X_{1} + 3X_{2} + \mu.log(4-X_{1} - X_{2}) + \mu.log(X_{1}) + \mu.log(X_{2})</math> 

The Barrier function is always concave, since the problem is a maximization problem, there will be one and only one solution. In order to find the maximum point on the concave function we take a derivate and set it to zero.

Taking partial derivative and setting to zero, we get the below equations

<math> \frac{\partial P(X,\mu)}{\partial X_{1}} = 3 - \frac{\mu}{(4-X_{1}-X_{2})} + \frac{\mu}{X_{1}} = 0</math> 

<math> \frac{\partial P(X,\mu)}{\partial X_{2}} = 3 - \frac{\mu}{(4-X_{1}-X_{2})} + \frac{\mu}{X_{2}} = 0</math> 

Using above equations the following can be derived: <math> X_{1} = X_{2}</math> 

Hence the following can be concluded

<math> 3 - \frac{\mu}{(4-2X_{1})} + \frac{\mu}{X_{1}} = 0 </math> 

The above equation can be converted into a quadratic equation as below:

<math> 6X_{1}^2 - 3X_{1}(4-\mu)-4\mu = 0</math> 

The solution to the above quadratic equation can be written as below:

<math> X_{1} = \frac{3(4-\mu)\pm(\sqrt{9(4-\mu)^2 + 96\mu} }{12} = X_{2}</math> 

Taking only take the positive value of <math> X_{1} </math> and <math> X_{2} </math> from the above equation as <math> X_{1} \geq 0 </math> and <math> X_{2} \geq 0</math> we can solve <math>X_{1}</math> and <math>X_{2}</math> for different values of <math>\mu</math>. The outcome of such iterations is listed in the table below.

{| class="wikitable"
|+ Objective & Barrier Function w.r.t <math>X_{1}</math>, <math>X_{2}</math> and <math>\mu</math>
|-
! <math>\mu</math> !! <math>X_{1}</math> !! <math>X_{2}</math> !! <math>P(X, \mu)</math> !! <math>f(x)</math>
|-
| 0 || 2 || 2 || 12 || 12
|-
| 0.01 || 1.998 || 1.998 || 11.947 || 11.990
|-
| 0.1 || 1.984 || 1.984 || 11.697 || 11.902
|-
| 1 || 1.859 || 1.859 || 11.128 || 11.152
|-
| 10 || 1.486 || 1.486 || 17.114 || 8.916
|-
| 100 || 1.351 || 1.351 || 94.357 || 8.105
|-
| 1000 || 1.335 || 1.335 || 871.052 || 8.011
|}

From the above table it can be seen that:
# as <math>\mu</math> gets close to zero, the Barrier Function becomes tight and close to the original function.
# at <math>\mu=0</math> the optimal solution is achieved.

Summary:
Maximum Value of Objective function <math>=12</math> 
Optimal points <math>X_{1} = 2 </math> and <math>X_{2} = 2</math>

The Newton's Method can also be applied to solve linear programming problems as indicated in the "Theory and Algorithm" section above. The solution to linear programming problems as indicated in this section "Numerical Example", will be similar to quadratic equation as obtained above and will converge in one iteration.

== Applications ==
Primal-Dual interior-point (PDIP) methods are commonly used in optimal power flow (OPF), in this case what is being looked is to maximize user utility and minimize operational cost satisfying operational and physical constraints. The solution to the OPF needs to be available to grid operators in few minutes or seconds due to changes and fluctuations in loads during power generation. Newton-based primal-dual interior point can achieve fast convergence in this OPF optimization problem. <ref> A. Minot, Y. M. Lu and N. Li, "A parallel primal-dual interior-point method for DC optimal power flow," 2016 Power Systems Computation Conference (PSCC), Genoa, 2016, pp. 1-7, doi: 10.1109/PSCC.2016.7540826. </ref>

Another application of the PDIP is for the minimization of losses and cost in the generation and transmission in hydroelectric power systems. <ref> L. M. Ramos Carvalho and A. R. Leite Oliveira, "Primal-Dual Interior Point Method Applied to the Short Term Hydroelectric Scheduling Including a Perturbing Parameter," in IEEE Latin America Transactions, vol. 7, no. 5, pp. 533-538, Sept. 2009, doi: 10.1109/TLA.2009.5361190. </ref>

PDIP are commonly used in imaging processing. One these applications is for image deblurring, in this case the constrained deblurring problem is formulated as primal-dual. The constrained primal-dual is solved using a semi-smooth Newton’s method. <ref> D. Krishnan, P. Lin and A. M. Yip, "A Primal-Dual Active-Set Method for Non-Negativity Constrained Total Variation Deblurring Problems," in IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2766-2777, Nov. 2007, doi: 10.1109/TIP.2007.908079. </ref>

PDIP can be utilized to obtain a general formula for a shape derivative of the potential energy describing the energy release rate for curvilinear cracks. Problems on cracks and their evolution have important application in engineering and mechanical sciences. <ref> V. A. Kovtunenko, Primal–dual methods of shape sensitivity analysis for curvilinear cracks with nonpenetration, IMA Journal of Applied Mathematics, Volume 71, Issue 5, October 2006, Pages 635–657 </ref>

== Conclusion ==

The primal-dual interior point method is a good alternative to the simplex methods for solving linear programming problems. The primal dual method shows superior performance and convergence on many large complex problems. simplex codes are faster on small to medium problems, interior point primal-dual are much faster on large problems.

== References ==
<references />

Computational complexity

2020-12-21T11:30:25Z

Wc593:

Authors: Steve Bentioulis, Augie Bravo, Will Murphy (SysEn 6800 Fall 2020)

== Introduction ==
<blockquote>''“The subject of my talk is perhaps most directly indicated by simply asking two questions: first, is it harder to multiply than to add? and second, why?...there is no algorithm for multiplication computationally as simple as that for addition, and this proves something of a stumbling block” - Alan Cobham, 1965'' <ref>[https://www.cs.toronto.edu/~sacook/homepage/cobham_intrinsic.pdf A. Cobham, The intrinsic computational difficulty of functions], in Y. Bar-Hillel, ed., Logic, Methodology and Philosophy of Science: Proceedings of the 1964 International Congress, North-Holland Publishing Company, Amsterdam, 1965, p. 24-30 </ref></blockquote>
Computational complexity refers to the amount of resources needed to solve a problem. Complexity increases as the amount of resources required increases. While this notion may seem straightforward enough, computational complexity has profound impacts.

The quote above from Alan Cobham is some of the earliest thinking on defining computational complexity and set the stage for defining problems based on complexity classes to indicate the feasibility of computational problems.

Additionally, the theory of computational complexity is in its infancy and has only been studied in earnest starting in the 20th century.

== Theory, Methodology ==
The concept of computation has evolved since the advent of the standard universal electronic computer and the associated widespread societal adoption. And while the electronic computer is often synonymous with computation, it is important to remember that computation is a major scientific concept irrespective of whether it is conducted by machine, man, or otherwise.

When studying computation, a key area of interest is understanding what problems are, in fact, computable. Researchers have shown that some tasks are inherently incomputable in that no computer can solve them without going into infinite loops on certain inputs. This phenomenon begs the question how do you determine if a problem can be computed, moreover, for those problems that can be computed how can you calculate the complexity associated with computing the answer.

The focus of computational complexity is the measure of computational efficiency quantifying the amount of computational resources required to solve a problem.<ref>Arora, S., & Barak, B. (2009). Computational complexity: a modern approach. Cambridge: Cambridge University Press. Retrieved from https://cornell-library.skillport.com/skillportfe/main.action?path=summary/BOOKS/31235</ref>

Within the study of computational complexity there exists the notion of a complexity class defined as a set of functions that can be computed within given resource bounds.<ref>Du, D., & Ko, K.-I. (2014). Theory of computational complexity. (Second edition.). Hoboken, New Jersey: Wiley. Retrieved from http://onlinelibrary.wiley.com/book/10.1002/9781118595091</ref>

=== Class P ===
Class P computational complexity problems refer to those that can solved in polynomial running time, where “P” stands for polynomial and running time is a function of the number of bits in the input.<ref>Arora, S., & Barak, B. (2009). Computational complexity: a modern approach. Cambridge: Cambridge University Press. Retrieved from https://cornell-library.skillport.com/skillportfe/main.action?path=summary/BOOKS/31235</ref>

A complexity class refers to a specific decision problem rather than generic types of problems. For example, it is not acceptable to state that integer multiplication is in class P. Rather you must state the specific decision problem, e.g. the product of 3 and 5 is a class P problem.

Furthermore, the running time is defined by minutes or nanoseconds, but refers to the number of operations to be performed to resolve or verify an answer to a problem. Running time is a function of the number of bits being input into the decision problem. This allows us to ignore the efficiency of the machine running the computation and judge the complexity of the decision problem solely by the merits of such a problem.

=== Class NP ===
NP stands for nondeterministic polynomial time, originally referring to nondeterministic Turing machines (NDTM) in which the Turing machine has two transition functions and the computer arbitrarily determines which transition function to apply.

Complexity class NP consists of problems that can be efficiently verified within a running time upper bounded by polynomial function. Verifiability is the concept that if given a potential solution to the problem it can be confirmed or rejected.

==== Class NP-hard and NP-complete ====
The NP-hard complexity class is a subcategory of the NP complexity class that defines those problems that are at least as hard as any other language in NP. If P ≠ NP, then NP-hard problems cannot be decided in polynomial time. See P vs NP on this page.

NP-complete refers to those problems within the NP complexity-class that are the hardest problems to solve within the NP class. Examples of NP-complete problems include Independent Set, [https://optimization.cbe.cornell.edu/index.php?title=Traveling_salesman_problem Traveling Salesperson], Subset Sum, and Integer Programming problems. The implication of these problems is that they are not in P unless P = NP.

=== P vs NP ===
The difference between class P and class NP computational complexity is illustrated simply by considering a Sudoku puzzle. Ask yourself, is it easier to solve a Sudoku puzzle or verify whether an answer to a Sudoku puzzle is correct? Class P refers to computational complexity problems that can be efficiently solved. Class NP refers to those problems which have answers that can be efficiently verified. The answer to a Sudoku problem can be efficiently verified and for that reason is considered a class NP complexity problem.

This then begs the question that for every class NP problem, i.e. those that can be efficiently verified, does that mean they can also be efficiently solved? If so, then P = NP. However, we have not yet been able to prove that P = NP and thus the implications that P ≠ NP must also be considered.

The importance of understanding P vs NP is the subject of much discussion and has even sparked competition in the scientific community. The problem of P vs NP was selected by the Clay Mathematics Institute (CMI) of Cambridge, Massachusetts as one of seven most difficult and important problems to be solved at the dawn of the 21st century. A prize of $1 million has been allocated for anyone that can bring forward a solution.<ref>Clay Mathematics Institute, The Millennium Prize Problems. Retrieved from http://https://www.claymath.org/millennium-problems/millennium-prize-problems</ref>

=== Methodology ===
The methodology for determining computational complexity is built upon the notion of a Turing machine and quantifying the number of computational operations the machine is to perform to resolve or verify a problem. A straight-forward approach would be to quantify the number of operations required considering every possible input to the Turing machine’s algorithm. This approach is referred to as worst-case complexity as it is the most possible number of operations to be performed in order to solve the problem.

However, critics of worst-case complexity highlight that in practice the worst-case behaviors of algorithms may never actually be encountered, and the worst-case approach can be unnecessarily cumbersome. As an alternative, average-case analysis seeks to design efficient algorithms that apply to most real-life instances. An important component of average-case analysis is the concept of an average-graph distribution of the inputs. There are several approaches to determining the average-graph including randomization. An average-case problem consists of both a decision problem and an average-graph distribution of inputs, implying that the complexity of a decision problem can vary with the inputs.<ref>Arora, S., & Barak, B. (2009). Computational complexity: a modern approach. Cambridge: Cambridge University Press. Retrieved from https://cornell-library.skillport.com/skillportfe/main.action?path=summary/BOOKS/31235</ref>

== Numerical Example ==
The efficiency of a computation problem is measured by the operations executed to solve, not the seconds (or years) required to solve it. The number of operations executed is a function of input size and arrangement. The big O notation is used to determine an algorithm’s complexity class according to the number of operations it performs as a function of input.<ref>Mohr, A. Quantum Computing in Complexity Theory and Theory of Computation (PDF). Retrieved from http://www.austinmohr.com/Work_files/complexity.pdf</ref>

The notation O(n) is used where ‘O’ refers to the order of a function and ‘n’ represents the size of the input.<ref>A. Mejia, How you can change the world by learning Data Structures and Algorithms. Retrieved from: https://towardsdatascience.com/how-you-can-change-the-world-by-learning-data-structures-and-algorithms-84566c1829e3</ref>

An example of an O(1) problem includes determining whether one number is odd or even. The algorithm reads a bit of input and performs one operation to determine whether or not the number is odd or even. No matter how large or small the quantity of inputs the number of operations holds constant at 1; for that reason this is an O(1) problem.

An example of O(n) problem includes identifying the minimum input within an unsorted array. To compute this problem the computer must read every bit of input to determine whether or not it is less than the prior bit of input. For this reason, the number of operations is linearly correlated to the quantity of inputs. For example, the decision problem of finding the minimum of {5,9,3,2,7,1,4} requires the computer to check every element in the array. This array has n=7 inputs, so it will require 7 operations to read each bit an determine if it is less than the prior bit. This scales linearly as the size of input increases.
== Applications ==
Computational Complexity is influential to numerous scientific fields including [https://optimization.cbe.cornell.edu/index.php?title=Quantum_computing_for_optimization quantum computing], game theory, data mining, and cellular automata.<ref>Robert A. Meyers. (2012). Computational Complexity: Theory, Techniques, and Applications. Springer: Springer. Retrieved from https://search.ebscohost.com/login.aspx?custid=s9001366&groupid=main&profid=pfi&authtype=ip,guest&direct=true&db=edspub&AN=edp1880523&site=eds-live&scope=site</ref> Focusing in on quantum computing, there are important applications to the study of computational complexity as the theory of complexity is largely based upon the Turing machine and the Church-Turing thesis that any physically realizable computation device can be simulated by a Turing machine. If quantum computers are to be physically realizable they could alter our understanding of how complex a decision problem may be by providing enhanced methods in which algorithms may be computed and potentially lowering the number of operations to be performed.<ref> Arora, S., & Barak, B. (2009). Computational complexity: a modern approach. Cambridge: Cambridge University Press. Retrieved from https://cornell-library.skillport.com/skillportfe/main.action?path=summary/BOOKS/31235 </ref>

== Conclusion ==
Computational complexity has important implications in the field of computer science and far reaching applications that span numerous fields and industries. As computable problems become more complex the ability to increase the efficiency in which they are solved becomes more important. Advancements toward solving P vs NP will have far reaching impacts on how we approach the computability of problems and the ability to efficiently allocate resources.

== Sources ==
<references />

Simplex algorithm

2020-12-21T11:28:03Z

Wc593:

Author: Guoqing Hu (SysEn 6800 Fall 2020)

== Introduction ==
Simplex algorithm (or Simplex method) is a widely-used algorithm to solve the Linear Programming(LP) optimization problems. The simplex algorithm can be thought of as one of the elementary steps for solving the inequality problem, since many of those will be converted to LP and solved via Simplex algorithm.<ref name=":0">[http://www-personal.umich.edu/~murty/books/linear_complementarity_webbook/lcp-complete.pdf Linear complementarity, linear and nonlinear programming Internet Edition].</ref> Simplex algorithm has been proposed by [[Wikipedia: George_Dantzig|George Dantzig]], initiated from the idea of step by step downgrade to one of the vertices on the convex polyhedral.<ref>Dantzig, G. B. (1987, May). [https://apps.dtic.mil/dtic/tr/fulltext/u2/a182708.pdf Origins of the simplex method].</ref> "Simplex" could be possibly referred to as the top vertex on the simplicial cone which is the geometric illustration of the constraints within LP problems.<ref>Strang, G. (1987). Karmarkar’s algorithm and its place in applied mathematics. ''The Mathematical Intelligencer,'' ''9''(2), 4-10. doi:10.1007/bf03025891.</ref>

== Algorithmic Discussion ==
There are two theorems in LP:

# The feasible region for an LP problem is a convex set (Every linear equation's second derivative is 0, implying the monotonicity of the trend). Therefore, if an LP has an optimal solution, there must be an extreme point of the feasible region that is optimal
# For an LP optimization problem, there is only one extreme point of the LP's feasible region regarding every basic feasible solution. Plus, there will be a minimum of one basic feasible solution corresponding to every extreme point in the feasible region.<ref name=":1">Vanderbei, R. J. (2000). ''Linear programming: Foundations and extensions''. Boston: Kluwer.</ref>
[[File:Geometric Illustration of LP problem.png|thumb|Geometric Illustration of LP problem]]
Based on the two theorems above, the geometric illustration of the LP problem could be depicted. Each line of this polyhedral will be the boundary of the LP constraints, in which every vertex will be the extreme points according to the theorem. The simplex method is the way to adjust the nonbasic variables to travel to different vertex till the optimum solution is found.<ref>Sakarovitch M. (1983) Geometric Interpretation of the Simplex Method. In: Thomas J.B. (eds) Linear Programming. Springer Texts in Electrical Engineering. Springer, New York, NY. <nowiki>https://doi.org/10.1007/978-1-4757-4106-3_8</nowiki></ref>

Consider the following expression as the general linear programming problem standard form:

<math>\max \sum_{i=1}^n c_ix_i</math>

With the following constraints:

<math> \begin{align} s.t. \quad \sum_{j=1}^n a_{ij}x_j &\leq b_i \quad i = 1,2,...,m \\

x_j &\geq 0 \quad j = 1,2,...,n \end{align} </math>

The first step of the simplex method is to add slack variables and symbols which represent the objective functions:

<math> \begin{align} \phi &= \sum_{i=1}^n c_ix_i\\
z_i &= b_i - \sum_{j=1}^n a_{ij}x_j \quad i = 1,2,...,m \end{align} </math>

The new introduced slack variables may be confused with the original values. Therefore, it will be convenient to add those slack variables <math> z_i </math> to the end of the list of ''x''-variables with the following expression:

<math> \begin{align} \phi &= \sum_{i=1}^n c_ix_i\\
x_{n+i} &= b_i - \sum_{j=1}^n a_{ij}x_{ij} \quad i=1,2,...,m \end{align} </math>

With the progression of simplex method, the starting dictionary (which is the equations above) switches between the dictionaries in seeking for optimal values. Every dictionary will have ''m'' basic variables which form the feasible area, as well as ''n'' non-basic variables which compose the objective function. Afterward, the dictionary function will be written in the form of:

<math> \begin{align}
\phi &= \bar{\phi} + \sum_{j=1}^n \bar{c_j}x_j\\
x_{i} &= \bar{b_i} - \sum_{j=1}^n \bar{a_{ij}}x_{ij} \quad i=1,2,...,n+m
\end{align} </math>

Where the variables with bar suggest that those corresponding values will change accordingly with the progression of the simplex method. The observation could be made that there will specifically one variable goes from non-basic to basic and another acts oppositely. This kind of variable is referred to as the ''entering variable''. Under the goal of increasing <math>\phi</math>, the entering variables are selected from the set {1,2,...,n}. As long as there are no repetitive entering variables can be selected, the optimal values will be found. The decision of which entering variable should be selected at first place should be made based on the consideration that there usually are multiple constraints (n>1). For the Simplex algorithm, the coefficient with the least value is preferred since the major objective is maximization.

The ''leaving variables'' are defined as which go from basic to non-basic. The reason of their existence is to ensure the non-negativity of those basic variables. Once the entering variables are determined, the corresponding leaving variables will change accordingly from the equation below:

<math> x_i = \bar{b_i} - \bar{a_{ik}}x_k \quad i \, \epsilon \, \{ 1,2,...,n+m \}</math>

Since the non-negativity of entering variables should be ensured, the following inequality can be derived:

<math> \bar{b_i} - \bar{a_i}x_k \geq 0 \quad i\,\epsilon\, \{1,2,...,n+m \}</math>

Where <math>x_k</math> is immutable. The minimum <math>x_i</math> should be zero to get the minimum value since this cannot be negative. Therefore, the following equation should be derived:

<math> x_k = \frac {\bar{b_i}}{\bar{a_{ik}}} </math>

Due to the nonnegativity of all variables, the value of <math>x_k</math> should be raised to the largest of all of those values calculated from above equation. Hence, the following equation can be derived:

<math> x_k = \min_{\bar{a_{ik}}>0} \, \frac{\bar{b_i}}{\bar{a_{ik}}} \quad i=1,2,...,n+m</math>

Once the leaving-basic and entering-nonbasic variables are chosen, reasonable row operation should be conducted to switch from the current dictionary to the new dictionary, as this step is called ''pivot.''<ref name=":1" />

As in the pivot process, the coefficient for the selected pivot element should be one, meaning the reciprocal of this coefficient should be multiplied to every element within this row. Afterward, multiplying this specific row with corresponding coefficients and adding this to different rows, one should get 0 values for all other entries in this pivot element's column.

If there are any negative variables after the pivot process, one should continue finding the pivot element by repeating the process above. At once there are no more negative values for basic and non-basic variables. The optimal solution is found.<ref>Evar D. Nering and Albert W. Tucker, 1993, ''Linear Programs and Related Problems'', Academic Press. (elementary)</ref><ref>Robert J. Vanderbei, ''Linear Programming: Foundations and Extensions'', 3rd ed., International Series in Operations Research & Management Science, Vol. 114, Springer Verlag, 2008. <nowiki>ISBN 978-0-387-74387-5</nowiki>.</ref>

== Numerical Example ==
Considering the following numerical example to gain better understanding:

<math> \max{4x_1+x_2+4x_3}</math>

with the following constraints:

<math> \begin{align}
2x_1 + x_2 + x_3 &\leq 2 \\
x_1 + 2x_2 +3x_3 &\leq 4\\
2x_1 + 2x_2 + x_3 &\leq 8 \\
x_1,x_2,x_3 &\geq 0
\end{align}</math>

With adding slack variables to get the following equations:

<math> \begin{align}
z - 4x_1 - x_2 -4x_3 &= 0 \\
2x_1 + x_2 + x_3 + s_1 &= 2 \\
x_1 + 2x_2 + 3x_3 + s_2 &= 4\\
2x_1 + 2x_2 + x_3 + s_3 &= 8 \\
x_1,x_2,x_3,s_1,s_2,s_3 &\geq 0 \end{align} </math>

The simplex tableau can be derived as following:

<math>
\begin{array}{c c c c c c c | r}
x_1 & x_2 & x_3 & s_1 & s_2 & s_3 & z & b \\
\hline
2 & 1 & 1 & 1 & 0 & 0 & 0 & 2 \\
1 & 2 & 3 & 0 & 1 & 0 & 0 & 4 \\
2 & 2 & 1 & 0 & 0 & 1 & 0 & 8 \\
\hline
-4 & -1 & -4 & 0 & 0 & 0 & 1 & 0
\end{array} </math>

In the last row, the column with the smallest value should be selected. Although there are two smallest values, the result will be the same no matter of which one is selected first. For this solution, the first column is selected. After the least coefficient is found, the pivot process will be conducted by searching for the coefficient <math> \frac{b_i}{x_1} </math>. Since the coefficient in the first row is 1 and 4 for the second row, the first row should be pivoted. And following tableau can be created:

<math>
\begin{array}{c c c c c c c | r}
x_1 & x_2 & x_3 & s_1 & s_2 & s_3 & z & b \\
\hline
1 & 0.5 & 0.5 & 0.5 & 0 & 0 & 0 & 1 \\
1 & 2 & 3 & 0 & 1 & 0 & 0 & 4 \\
2 & 2 & 1 & 0 & 0 & 1 & 0 & 8 \\
\hline
-4 & -1 & -4 & 0 & 0 & 0 & 1 & 0
\end{array} </math>

By performing the row operation still every other rows (other than first row) in column 1 are zeroes:

<math>
\begin{array}{c c c c c c c | r}
x_1 & x_2 & x_3 & s_1 & s_2 & s_3 & z & b \\
\hline
1 & 0.5 & 0.5 & 0.5 & 0 & 0 & 0 & 1 \\
0 & 1.5 & 2.5 & -0.5 & 1 & 0 & 0 & 3 \\
0 & 1 & 0 & -1 & 0 & 1 & 0 & 6 \\
\hline
0 & 1 & -2 & 2 & 0 & 0 & 1 & 4
\end{array} </math>

Because there is one negative value in last row, the same processes should be performed again. The smallest value in the last row is in the third column. And in the third column, the second row has the smallest coefficients of <math> \frac{b_i}{x_3}</math> which is 1.2. Thus, the second row will be selected for pivoting. The simplex tableau is the following:

<math>
\begin{array}{c c c c c c c | r}
x_1 & x_2 & x_3 & s_1 & s_2 & s_3 & z & b \\
\hline
1 & 0.5 & 0.5 & 0.5 & 0 & 0 & 0 & 1 \\
0 & 0.6 & 1 & -0.2 & 0.4 & 0 & 0 & 1.2 \\
0 & 1 & 0 & -1 & 0 & 1 & 0 & 6 \\
\hline
0 & 1 & -2 & 2 & 0 & 0 & 1 & 4
\end{array} </math>

By performing the row operation to make other columns 0's, the following could be derived

<math>
\begin{array}{c c c c c c c | r}
x_1 & x_2 & x_3 & s_1 & s_2 & s_3 & z & b \\
\hline
1 & 0.2 & 0 & 0.6 & -0.2 & 0 & 0 & 0.4 \\
0 & 0.6 & 1 & -0.2 & 0.4 & 0 & 0 & 1.2 \\
0 & -0.1 & 0 & 0.2 & 0.6 & -1 & 0 & -4.2 \\
\hline
0 & 2.2 & 0 & 1.6 & 0.8 & 0 & 1 & 6.4
\end{array} </math>

There is no need to further conduct calculation since all values in the last row are non-negative. From the tableau above, <math>x_1</math>, <math> x_3</math> and <math>z</math> are basic variables since all rows in their columns are 0's except one row is 1.Therefore, the optimal solution will be <math>x_1 = 0.4</math>, <math>x_2 = 0</math>, <math>x_3 = 1.2</math>, achieving the maximum value: <math>z = 6.4</math>

== Application ==
The simplex method can be used in many programming problems since those will be converted to LP (Linear Programming) and solved by the simplex method. Besides the mathematical application, much other industrial planning will use this method to maximize the profits or minimize the resources needed.

=== Mathematical Problem ===
The simplex method is commonly used in many programming problems. Due to the heavy load of computation on the non-linear problem, many non-linear programming(NLP) problems cannot be solved effectively. Consequently, many NLP will rely on the LP solver, namely the simplex method, to do some of the work in finding the solution (for instance, the upper or lower bound of the feasible solution), or in many cases, those NLP will be wholly linearized to LP and solved from the simplex method.<ref name=":0" /> Other than solving the problems, simplex method can also be used reliably to support the LP's solution from other theorem, for instance the [[wikipedia:Farkas'_lemma#:~:text=Farkas'%20lemma%20is%20a%20solvability,the%20Hungarian%20mathematician%20Gyula%20Farkas.&text=Farkas'%20lemma%20belongs%20to%20a,two%20systems%20has%20a%20solution.|Farkas' theorem]] in which Simplex method proves the suggested feasible solutions.[1] Besides solving the problems, the Simplex method can also enlighten the scholars with the ways of solving other problems, for instance, Quadratic Programming (QP).<ref>Wolfe, P. (1959). The simplex method for quadratic programming. ''Econometrica,'' ''27''(3), 382. doi:10.2307/1909468</ref> For some QP problems, they have linear constraints to the variables which can be solved analogous to the idea of the Simplex method.
=== Industrial Application ===
The industries from different fields will use the simplex method to plan under the constraints. With considering that it is usually the case that the constraints or tradeoffs and desired outcomes are linearly related to the controllable variables, many people will develop the models to solve the LP problem via the simplex method, for instance, the agricultural and economic problems

Farmers usually need to rationally allocate the existed resources to obtain the maximum profits. The potential constraints are raised from multiple perspectives including policy restriction, budget concerns as well as farmland area. Farmers may incline to use the simplex-method-based model to have a better plan, as those constraints may be constant in many scenarios and the profits are usually linearly related to the farm production, thereby forming the LP problem. Currently, there is an existing plant-model that can accept inputs such as price, farm production, and return the optimal plan to maximize the profits with given information.<ref>Hua, W. (1998). [https://shareok.org/bitstream/handle/11244/12005/Thesis-1998-H8735a.pdf?sequence=1 Application of the revised simplex method to the farm planning model].</ref>

Besides agricultural purposes, the Simplex method can also be used by enterprises to make profits. The rational sale-strategy will be indispensable to the successful practice of marketing. Since there are so many enterprises international wide, the marketing strategy from enamelware is selected for illustration. After widely collecting the data of the quality of varied products manufactured, cost of each and popularity among the customers, the company may need to determine which kind of products well worth the investment and continue making profits as well as which won't. Considering the cost and profit factors are linearly dependent on the production, economists will suggest an LP model that can be solved via the simplex method.<ref>Nikitenko, A. V. (1996). Economic analysis of the potential use of a simplex method in designing the sales strategy of an enamelware enterprise. ''Glass and Ceramics,'' ''53''(12), 367-369. doi:10.1007/bf01129674.</ref>

The above professional fields are only the tips of the iceberg to the simplex method application. Many other fields will use this method since the LP problem is gaining popularity in recent days and the simplex method plays a crucial role in solving those problems.

== Conclusion ==
It is indisputable to acknowledge the influence of the Simplex method to programming, as this method won the 'National Medal of Science' to its inventor, George Dantzig.<ref>Cottle, R., Johnson, E. and Wets, R. (2007). George B. Dantzig (1914–2005). ''Notices Amer. Math. Soc.'' 54, 344–362.</ref> Not only for its wide usage in the mathematic models and industrial manufacture, but the Simplex method also provides a new perspective in solving the inequality problems. As its contribution to the programming substantially boosts the advancement of the current technology and economy from making the optimal plan with the constraints. Nowadays, with the development of technology and economics, the Simplex method is substituted with some more advanced solvers which can solve the problems with faster speed and handle a larger amount of constraints and variables, but this innovative method marks the creativity at that age and continuously offer the inspiration to the upcoming challenges.

== References ==
<references />

Duality

2020-12-21T11:27:37Z

Wc593:

Author: Claire Gauthier, Trent Melsheimer, Alexa Piper, Nicholas Chung, Michael Kulbacki (SysEn 6800 Fall 2020)

== Introduction ==
Every optimization problem may be viewed either from the primal or the dual, this is the principle of '''duality'''. Duality develops the relationships between one optimization problem and another related optimization problem. If the primal optimization problem is a maximization problem, the dual can be used to find upper bounds on its optimal value. If the primal problem is a minimization problem, the dual can be used to find the lower bounds.

According to the American mathematical scientist George Dantzig, Duality for Linear Optimization was conjectured by Jon von Neumann after Dantzig presented a problem for Linear Optimization. Von Neumann determined that two person zero sum matrix game (from Game Theory) was equivalent to Linear Programming. Proofs of the Duality theory were published by Canadian Mathematician Albert W. Tucker and his group in 1948. <ref name=":0"> Duality (Optimization). (2020, July 12). ''In Wikipedia. ''https://en.wikipedia.org/wiki/Duality_(optimization)#:~:text=In%20mathematical%20optimization%20theory%2C%20duality,the%20primal%20(minimization)%20problem.</ref>

== Theory, methodology, and/or algorithmic discussions ==

=== Definition ===

'''Primal'''<blockquote>Maximize <math>z=\textstyle \sum_{j=1}^n \displaystyle c_j x_j</math> </blockquote><blockquote>subject to:

</blockquote><blockquote><blockquote><math>\textstyle \sum_{j=1}^n \displaystyle a_{i,j} x_j\lneq b_i \qquad (i=1, 2, ... ,m) </math></blockquote></blockquote><blockquote><blockquote><math>x_j\gneq 0 \qquad (j=1, 2, ... ,n) </math></blockquote></blockquote><blockquote>

</blockquote>'''Dual'''<blockquote>
Minimize <math>v=\textstyle \sum_{i=1}^m \displaystyle b_i y_i</math>

subject to:<blockquote><math>\textstyle \sum_{i=1}^m \displaystyle y_ia_{i,j}\gneq c_j \qquad (j=1, 2, ... , n) </math></blockquote><blockquote><math>y_j\gneq 0 \qquad (i=1, 2, ... , m)</math></blockquote></blockquote>Between the primal and the dual, the variables <math>c</math> and <math>b</math> switch places with each other. The coefficient (<math>c_j</math>) of the primal becomes the right-hand side (RHS) of the dual. The RHS of the primal (<math>b_j</math>) becomes the coefficient of the dual. The less than or equal to constraints in the primal become greater than or equal to in the dual problem. <ref name=":1"> Ferguson, Thomas S. ''A Concise Introduction to Linear Programming.'' University of California Los Angeles. https://www.math.ucla.edu/~tom/LP.pdf </ref>

=== Constructing a Dual ===
<math>\begin{matrix} \max(c^Tx) \\ \ s.t. Ax\leq b \\ x \geq 0 \end{matrix}</math> <math> \quad \longrightarrow \quad</math><math>\begin{matrix} \min(b^Ty) \\ \ s.t. A^Tx\geq c \\ y \geq 0 \end{matrix}</math>

=== Duality Properties ===
The following duality properties hold if the primal problem is a maximization problem as considered above. This especially holds for weak duality.

==== Weak Duality ====

* Let <math>x=[x_1, ... , x_n] </math> be any feasible solution to the primal
* Let <math>y = [y_1, ... , y_m] </math>be any feasible solution to the dual
* <math>\therefore </math>(z value for x) <math>\leq </math>(v value for y)

The weak duality theorem says that the z value for x in the primal is always less than or equal to the v value of y in the dual.

The difference between (v value for y) and (z value for x) is called the optimal ''duality gap'', which is always nonnegative. <ref> Bradley, Hax, and Magnanti. (1977). ''Applied Mathematical Programming.'' Addison-Wesley. http://web.mit.edu/15.053/www/AMP-Chapter-04.pdf </ref>

==== Strong Duality Lemma ====

* Let <math>x=[x_1, ... , x_n] </math> be any feasible solution to the primal
* Let <math>y = [y_1, ... , y_m] </math>be any feasible solution to the dual
* If (z value for x) <math>= </math> (v value for y), then '''x''' is optimal for the primal and '''y''' is optimal for the dual

'''Graphical Explanation'''

Essentially, as you choose values of x or y that come closer to the optimal solution, the value of z for the primal, and v for the dual will converge towards the optimal solution. On a number line, the value of z which is being maximized will approach from the left side of the optimum value while the value of v which is being minimized will approach from the right-hand side.
[[File:Duality numberline .png|thumb| '''Figure 1: Graphical Representation of Duality''']]

* If the primal is unbounded, then the dual is infeasible
* If the dual is unbounded, then the primal is infeasible

==== Strong Duality Theorem ====
If the primal solution has an optimal solution <math>x^*</math> then the dual problem has an optimal solution <math>y^*</math> such that

<math>\textstyle \sum_{j=1}^n \displaystyle c_j x_j^* = \textstyle \sum_{i=1}^m \displaystyle b_i y_i^*</math>

Dual problems and their solutions are used in connection with the following optimization topics:

'''Karush-Kuhn-Tucker (KKT) Variables'''

* The optimal solution to the dual problem is a vector of the KKT multipliers. Consider we have a convex optimization problem where <math>f(x), g_1(x),...,g_m(x) </math> are convex differentiable functions. Suppose the pair <math>(\bar{x},\bar{u}) </math> is a saddlepoint of the Lagrangian and that <math>\bar{x} </math> together with <math>\bar{u} </math> satisfy the KKT conditions. The optimal solutions of this optimization problem are then <math>\bar{x} </math> and <math>\bar{u} </math> with no duality gap. <ref> ''KKT Conditions and Duality.'' (2018, February 18). Dartmouth College. https://math.dartmouth.edu/~m126w18/pdf/part4.pdf </ref>
* To have strong duality as described above, you must meet the KKT conditions.
*

'''Dual Simplex Method'''

* Solving a Linear Programming problem by the Simplex Method gives you a solution of its dual as a by-product. This simplex algorithm tries to reduce the infeasibility of the dual problem. The dual simplex method can be thought of as a disguised simplex method working on the dual. The dual simplex method is when we maintain dual feasibility by imposing the condition that the objective function includes every variable with a nonpositive coefficient, and terminating when the primal feasibility conditions are satisfied. <ref name=":3"> Chvatal, Vasek. (1977). ''The Dual Simplex Method.'' W.H. Freeman and Co. http://cgm.cs.mcgill.ca/~avis/courses/567/notes/ch10.pdf </ref>

=== Duality Interpretation ===

* Duality can be leveraged in a multitude of interpretations. The following example includes an economic optimization problem that leverages duality:

'''Economic Interpretation Example'''

* A rancher is preparing for a flea market sale in which he intends to sell three types of clothing that are all comprised of wool from his sheep: peacoats, hats, and scarves. With locals socializing the high quality of his clothing, the rancher plans to sell out of all of his products each time he opens up a store at the flea market. The following shows the rancher's materials, time, and profits received for his peacoats, hats, and scarves, respectively.
{| class="wikitable"
|+
!Clothing
!Wool (ft^2)
!Sewing Material (in)
!Production Time (hrs)
!Profit ($)
|-
|Peacoat
|12
|80
|7
|175
|-
|Hat
|2
|40
|3
|25
|-
|Scarf
|4
|20
|1
|21
|}
* With limited materials and time for an upcoming flea market event in which the rancher will once again sell his products, the rancher needs to determine how he can make best use of his time and materials to ultimately maximize his profits. The rancher is running lower than usual on wool supply; he only has 50 square feet of wool sheets to be made for his clothing this week. Furthermore, the rancher only has 460 inches of sewing materials left. Lastly, with the rancher has a limited time of 25 hours to produce his clothing line.
*
*
* With the above information the rancher creates the following linear program:

'''maximize''' <math>z=175x_1+25x_2+21x_3</math>

'''subject to:'''

<math>12x_1+2x_2+3x_3\leq 50</math>

<math>80x_1+40x_2+20x_3\leq 460</math>

<math>7x_1+3x_2+1x_3\leq 25</math>

<math>x_1,x_2,x_3\geq 0</math>

* Before the rancher finds the optimal number of peacoats, hats, and scarves to produce, a local clothing store owner approaches him and asks if she can purchase his labor and materials for her store. Unsure of what is a fair purchase price to ask for these services, the clothing store owner decides to create a dual of the original primal:

'''minimize''' <math>v=50y_1+460y_2+25y_3</math>

'''subject to:'''

<math>12y_1+80y_2+7y_3\geq 175</math>

<math>2y_1+40y_2+3y_3\geq 25</math>

<math>3y_1+20y_2+1y_3\geq 21</math>

<math>y_1,y_2,y_3\geq 0</math>

* By leveraging the above dual, the clothing store owner is able to determine the asking price for the rancher's materials and labor. In the dual, the clothing store owner's objective is now to minimize the asking price, where <math>y_1</math> represents the the amount of wool, <math>y_2</math> represents the amount of sewing material, and <math>y_3</math> represents the rancher's labor.
*

== Numerical Example ==

=== Construct the Dual for the following maximization problem: ===
'''maximize''' <math>z=6x_1+14x_2+13x_3</math>

'''subject to:'''

<math>\tfrac{1}{2}x_1+2x_2+x_3\leq 24</math>

<math>x_1+2x_2+4x_3\leq 60</math>

<math>3x_1+5x_3\leq 12</math>

For the problem above, form augmented matrix A. The first two rows represent constraints one and two respectively. The last row represents the objective function.

<math>A =\begin{bmatrix} \tfrac{1}{2} & 2 & 1\\ 1 & 2 & 4 \\ 3 & 0 & 5 \end{bmatrix}</math>

Find the transpose of matrix A

<math>A^T=\begin{bmatrix} \tfrac{1}{2} & 1 & 3 \\ 2 & 2 & 0 \\ 1 & 4 & 5 \end{bmatrix}</math>

From the last row of the transpose of matrix A, we can derive the objective function of the dual. Each of the preceding rows represents a constraint. Note that the original maximization problem had three variables and two constraints. The dual problem has two variables and three constraints.

'''minimize''' <math>v=24y_1+60y_2+12y_3
</math>

'''subject to:'''

<math>\tfrac{1}{2}y_1+y_2+3y_3 \geq 6</math>

<math>2y_1+2y_2\geq 14</math>

<math>y_1+4y_2+5y_3\geq 13</math>

== Applications ==
Duality appears in many linear and nonlinear optimization models. In many of these applications we can solve the dual in cases when solving the primal is more difficult. If for example, there are more constraints than there are variables ''(m >> n)'', it may be easier to solve the dual. A few of these applications are presented and described in more detail below. <ref name=":2"> R.J. Vanderbei. (2008). ''Linear Programming: Foundations and Extensions.'' Springer. http://link.springer.com/book/10.1007/978-0-387-74388-2. </ref>

'''Economics'''

* When calculating optimal product to yield the highest profit, duality can be used. For instance, the primal could be to maximize the profit, but by taking the dual the problem can be reframed into minimizing the cost. By transitioning the problem to set the raw material prices one can determine the price that the owner is willing to accept for the raw material. These dual variables are related to the values of resources available, and are often referred to as resource shadow prices. <ref> Alaouze, C.M. (1996). ''Shadow Prices in Linear Programming Problems.'' New South Wales - School of Economics. https://ideas.repec.org/p/fth/nesowa/96-18.html#:~:text=In%20linear%20programming%20problems%20the,is%20increased%20by%20one%20unit. </ref>

'''Structural Design'''

* An example of this is in a structural design model, the tension on the beams are the primal variables, and the displacements on the nodes are the dual variables. <ref> Freund, Robert M. (2004, February 10). ''Applied Language Duality for Constrained Optimization.'' Massachusetts Institute of Technology. https://ocw.mit.edu/courses/sloan-school-of-management/15-094j-systems-optimization-models-and-computation-sma-5223-spring-2004/lecture-notes/duality_article.pdf </ref>

'''Electrical Networks'''

*When modeling electrical networks the current flows can be modeled as the primal variables, and the voltage differences are the dual variables. <ref> Freund, Robert M. (2004, March). ''Duality Theory of Constrained Optimization.'' Massachusetts Institute of Technology. https://ocw.mit.edu/courses/sloan-school-of-management/15-084j-nonlinear-programming-spring-2004/lecture-notes/lec18_duality_thy.pdf </ref>

'''Game Theory'''

* Duality theory is closely related to game theory. Game theory is an approach used to deal with multi-person decision problems. The game is the decision-making problem, and the players are the decision-makers. Each player chooses a strategy or an action to be taken. Each player will then receive a payoff when each player has selected a strategy. The zero sum game that Von Neumann conjectured was the same as linear programming, is when the gain of one player results in the loss of another. This general situation of a zero sum game has similar characteristics to duality. <ref> Stolee, Derrick. (2013). ''Game Theory and Duality.'' University of Illinois at Urbana-Champaigna. https://faculty.math.illinois.edu/~stolee/Teaching/13-482/gametheory.pdf </ref>

'''Support Vector Machines'''

* Support Vector Machines (SVM) is a popular machine learning algorithm for classification. The concept of SVM can be broken down into three parts, the first two being Linear SVM and the last being Non-Linear SVM. There are many other concepts to SVM including hyperplanes, functional and geometric margins, and quadratic programming <ref> Jana, Abhisek. (2020, April). ''Support Vector Machines for Beginners - Linear SVM.'' http://www.adeveloperdiary.com/data-science/machine-learning/support-vector-machines-for-beginners-linear-svm/ </ref>. In relation to Duality, the primal problem is helpful in solving Linear SVM, but in order to get to the goal of solving Non-Linear SVM, the primal problem is not useful. This is where we need Duality to look at the dual problem to solve the Non-Linear SVM <ref> Jana, Abhisek. (2020, April 5). ''Support Vector Machines for Beginners - Duality Problem.'' https://www.adeveloperdiary.com/data-science/machine-learning/support-vector-machines-for-beginners-duality-problem/. </ref>.

== Conclusion ==
Since proofs of Duality theory were published in 1948 <ref name=":0" /> duality has been such an important technique in solving linear and nonlinear optimization problems. This theory provides the idea that the dual of a standard maximum problem is defined to be the standard minimum problem <ref name=":1" />. This technique allows for every feasible solution for one side of the optimization problem to give a bound on the optimal objective function value for the other <ref name=":2" />. This technique can be applied to situations such as solving for economic constraints, resource allocation, game theory and bounding optimization problems. By developing an understanding of the dual of a linear program one can gain many important insights on nearly any algorithm or optimization of data.

== References ==

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T11:12:01Z

Wc593: /* Network flow problem */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Numerical Example and Solution
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

* Reference
*# Many references listed here are not used in any of the text in the Wiki. Please link them appropriately.

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T11:08:59Z

Wc593: /* Adam */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

* Reference
*# Many references listed here are not used in any of the text in the Wiki. Please link them appropriately.

Adam

2020-12-21T11:08:31Z

Wc593:

Authors: Nicholas Kincaid (CHEME 6800 Fall 2020) 
Steward: Fengqi You

== Introduction ==
Adam <ref name="adam"> Kingma, Diederik P., and Jimmy Lei Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015, pp. 1–15.</ref> is a variant of gradient descent that has become widely popular in the machine learning community. Presented in 2015, the Adam algorithm is often recommended as the default algorithm for training neural networks as it has shown improved performance over other variants of gradient descent algorithms for a wide range of problems. Adam's name is derived from adaptive moment estimation because uses estimates of the first and second moments of the gradient to perform updates, which can be seen as incorporating gradient descent with momentum (the first-order moment) and [https://optimization.cbe.cornell.edu/index.php?title=RMSProp RMSProp] algorithm<ref>Tieleman, Tijmen, and Hinton, Geoffrey. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 2012.</ref> (the second-order moment).

== Background ==
=== Batch Gradient Descent ===
In standard batch gradient descent, the parameters, <math>\theta</math>, of the objective function <math>f(\theta)</math>, are updated based on the gradient of <math>f</math> with respect to
<math>\theta</math> for the entire training dataset, as

<math> g_t =\nabla_{\theta_{t-1}} f \big(\theta_{t-1} \big) </math> 
<math> \theta_t = \theta_{t-1} - \alpha g_t , </math> 

where <math>\alpha</math> is defined as the learning rate and is a hyper-parameter of the optimization algorithm, and <math>t</math> is the iteration number. Key challenges of the standard gradient descent method are the tendency to get stuck in local minima and/or saddle points of the objective function, as well as choosing a proper learning rate, <math>\alpha</math>, which can lead to poor convergence.<ref>Ruder, Sebastian. An Overview of Gradient Descent Optimization Algorithms, 2016, pp. 1–14, http://arxiv.org/abs/1609.04747.</ref>

=== Stochastic Gradient Descent ===
Another variant of gradient descent is [https://optimization.cbe.cornell.edu/index.php?title=Stochastic_gradient_descent stochastic gradient descent (SGD)], the gradient is computed and parameters are updated as in equation 1, but for each training sample in the training set.
=== Mini-Batch Gradient Descent ===
In between batch gradient descent and stochastic gradient descent, mini-batch gradient descent computes parameters updates on the gradient computed from a subset of the training set, where the size of the subset is often referred to as the batch size.

== Adam Algorithm ==
The Adam algorithm first computes the gradient, <math>g_t</math> of the objective function with respect to the parameters <math>\theta</math>, but then computes and stores first and second order moments of the gradient, <math>m_t</math> and <math>v_t</math>
respectively, as

<math> m_t = \beta_1 \cdot m_{t-1} + (1-\beta_1) \cdot g_t </math> 
<math> v_t = \beta_2 \cdot v_{t-1} + (1-\beta_2) \cdot g_t^2, </math> 

where <math>\beta_1</math> and <math>\beta_2</math> are hyper-parameters that are <math>\in [0,1]</math>. These parameters can seen as exponential decay rates of the estimated moments, as the previous value is successively multiplied by the value less than 1 in each iteration. The authors of the original paper suggest values <math>\beta_1 = 0.9</math> and <math>\beta_2 = 0.999</math>. In the current notation, the first iteration of the algorithm is at <math>t=1</math> and both, <math>m_0</math> and <math>v_0</math> are initialized to zero. Since both moments are initialized to zero, at early time steps, these values are biased towards zero. To counter this, the authors proposed a corrected update to <math>m_t</math> and <math>v_t</math> as

<math> \hat{m}_t = m_t / (1-\beta_1 ^t) </math> 
<math> \hat{v}_t = v_t / (1-\beta_2 ^t). </math> 
Finally, the parameter update is computed as

<math> \theta_t = \theta_{t-1} - \alpha \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon), </math> 

where <math>\epsilon</math> is a small constant for stability. The authors recommend a value of <math>\epsilon=10^{-8}</math>.

== Numerical Example ==

[[File:Contour.png|thumb|Contour plot of the loss function showing the trajectory of Adam algorithm from the initial point]]

[[File:Model fit .png|thumb|Plot showing original data points and resulting model fit from the Adam algorithm]]

To illustrate how updates occur in the Adam algorithm, consider a linear, least-squares regression problem formulation. The table below shows a sample data-set of student exam grades and the number of hours spent studying for the exam. The goal of this example will be to generate a linear model to predict exam grades as a function of time spent studying.

{| class="wikitable"
|-
| Hours Studying || 9.0 || 4.9 || 1.6 || 1.9 || 7.9 || 2.0 || 11.5 || 3.9 || 1.1 || 1.6 || 5.1 || 8.2 || 7.3 || 10.4 || 11.2
|-
| Exam Grad || 88.0 || 72.3 || 66.5 || 65.1 || 79.5 || 60.8 || 94.3, || 66.7 || 65.4 || 63.8 || 68.4 || 82.5 || 75.9 || 87.8 || 85.2
|}

The hypothesized model function will be

<math>f_\theta(x) = \theta_0 + \theta_1 x.</math>

The cost function is defined as

<math> J({\theta}) = \frac{1}{2}\sum_i^n \big(f_\theta(x_i) - y_i \big)^2, </math>

Where the <math>1/2</math> coefficient is used only to make the derivatives cleaner. The optimization problem can then be formulated as trying to find the values of <math>\theta</math> that minimize the squared residuals of <math>f_\theta(x)</math> and <math>y</math>.

<math> \mathrm{argmin}_{\theta} \quad \frac{1}{n}\sum_{i}^n \big(f_\theta(x_i) - y_i \big) ^2 </math>

For simplicity, parameters will be updated after every data point i.e. a batch size of 1. For a single data point the derivatives of the cost function with respect to <math>\theta_0</math> and <math>\theta_1</math> are

<math> \frac{\partial J(\theta)}{\partial \theta_0} = \big(f_\theta(x) - y \big) </math> 
<math> \frac{\partial J(\theta)}{\partial \theta_1} = \big(f_\theta(x) - y \big) x </math>

The initial values of <math>{\theta}</math> will be set to [50, 1] and The learning rate, <math>\alpha</math>, is set to 0.1 and the suggested parameters for <math>\beta_1</math>, <math>\beta_2</math>, and <math>\epsilon</math> are used. With the first data sample of <math> (x,y)=[8.98, 88.01]</math>, the computed gradients are

<math> \frac{\partial J(\theta)}{\partial \theta_0} = \big((50 + 1\cdot 9 - 88.01 \big) = -29.0 </math> 
<math> \frac{\partial J(\theta)}{\partial \theta_1} = \big((50 + 1\cdot 9 - 88.01 \big)\cdot 9.0 = -261 </math> 

With <math>m_0</math> and <math>v_0</math> being initialized to zero, the calculations of <math>m_1</math> and <math>v_1</math> are

<math> m_1 = 0.9 \cdot 0 + (1-0.9) \cdot \begin{bmatrix} -29\\ -261 \end{bmatrix} = \begin{bmatrix} -2.9\\ -26.1\end{bmatrix} </math> 
<math> v_1 = 0.999\cdot 0 + (1-0.999) \cdot \begin{bmatrix} -29^2\\-261^2 \end{bmatrix} = \begin{bmatrix} 0.84\\ 68.2\end{bmatrix} , </math> 

The bias-corrected terms are computed as

<math> \hat{m}_1 = \begin{bmatrix} -2.9\\ -26.1\end{bmatrix} \frac{1}{ (1-0.9^1)} = \begin{bmatrix} -29.0\\-261.1\end{bmatrix}</math> 
<math> \hat{v}_1 = \begin{bmatrix} 0.84\\ 68.2\end{bmatrix} \frac{1} {(1-0.999^1)} = \begin{bmatrix} 851.5\\68168\end{bmatrix}. </math> 

Finally, the parameter update is

<math> \theta_0 = 50 - 0.1 \cdot -29 / (\sqrt{851.5} + 10^{-8}) = 50.1 </math> 
<math> \theta_1 = 1 - 0.1 \cdot -261 / (\sqrt{68168} + 10^{-8}) = 1.1 </math> 

This procedure is repeated until the parameters have converged, giving <math>\theta</math> values of <math>[58.98, 2.72]</math>. The figures to the right show the trajectory of the Adam algorithm over a contour plot of the objective function and the resulting model fit. It should be noted that the stochastic gradient descent algorithm with a learning rate of 0.1 diverges and with a rate of 0.01, SGD oscillates around the global minimum due to the large magnitudes of the gradient in the <math>\theta_1</math> direction.

== Applications ==
[[File:Adam training.png|thumb|Comparison of training a multilayer neural network on MNIST images for different gradient descent algorithms published in the original Adam paper (Kingma, 2015)<ref name="adam" />.]]

The Adam optimization algorithm has been widely used in machine learning applications to train model parameters. When used with backpropagation, the Adam algorithm has been shown to be a very robust and efficient method for training artificial neural networks and is capable of working well with a variety of structures and applications. In their original paper, the authors present three different training examples, logistic regression, multi-layer neural networks for classification of MNIST images, and a convolutional neural network (CNN). The training results from the original Adam paper showing the objective function cost vs. the iteration over the entire data set for the multi-layer neural network is shown to the right.

== Variants of Adam ==
=== AdaMax ===
AdaMax<ref name="adam" /> is a variant of the Adam algorithm proposed in the original Adam paper that uses an exponentially weighted infinity norm instead of the second-order moment estimate. The weighted infinity norm updated <math>u_t</math>, is computed as

<math> u_t = \max(\beta_2 \cdot u_{t-1}, |g_t|). </math>

The parameter update then becomes

<math> \theta_t = \theta_{t-1} - (\alpha / (1-\beta_1^t)) \cdot m_t / u_t. </math>

=== Nadam ===
The Nadam algorithm<ref>Dozat, Timothy. Incorporating Nesterov Momentum into Adam. ICLR Workshop, no. 1, 2016, pp. 2013–16. </ref> was proposed in 2016 and incorporates the Nesterov Accelerate Gradient (NAG)<ref>Nesterov, Yuri. A method of solving a convex programming problem with convergence rate O(1/k^2). In Soviet Mathematics Doklady, 1983, pp. 372-376.</ref>, a popular momentum like SGD variation, into the first-order moment term.

== Conclusion ==
Adam is a variant of the gradient descent algorithm that has been widely adopted in the machine learning community. Adam can be seen as the combination of two other variants of gradient descent, SGD with momentum and RMSProp. Adam uses estimations of the first and second-order moments of the gradient to adapt the parameter update. These moment estimations are computed via moving averages,<math>m_t</math> and <math>v_t</math>, of the gradient and the squared gradient respectfully. In a variety of neural network training applications, Adam has shown increased convergence and robustness over other gradient descent algorithms and is often recommended as the default optimizer for training.<ref> "Neural Networks Part 3: Learning and Evaluation," CS231n: Convolutional Neural Networks for Visual Recognition, Stanford Unversity, 2020</ref>

== References ==
<references/>

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T11:06:21Z

Wc593: /* RMSProp */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

* Reference
*# Many references listed here are not used in any of the text in the Wiki. Please link them appropriately.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

RMSProp

2020-12-21T11:05:24Z

Wc593:

Author: Jason Huang (SysEn 6800 Fall 2020)

== Introduction ==
RMSProp, root mean square propagation, is an optimization algorithm/method designed for Artificial Neural Network (ANN) training. And it is an unpublished algorithm first proposed in the Coursera course. [https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf “Neural Network for Machine Learning”] lecture six by Geoff Hinton.[9] RMSProp lies in the realm of adaptive learning rate methods, which have been growing in popularity in recent years because it is the extension of Stochastic Gradient Descent (SGD) algorithm, momentum method, and the foundation of Adam algorithm. One of the applications of RMSProp is the stochastic technology for mini-batch gradient descent.

==Theory and Methodology==

=== Perceptron and Neural Networks ===
Perceptron is an algorithm used for supervised learning of binary classifier, and also can be regard as the simplify version/single layer of the Artificial Neural Network (ANN) to better understanding the neural network, which function is to imitate the human brain and conscious center function in Artificial Intelligence(AI) and present the small unit behavior in neural system when human thinking. The basis form of the perceptron consists inputs, weights, bias, net sum and activation function.
[[File:Screen Shot 2020-12-14 at 01.09.28.png|thumb|Basis form of perceptron ]]

The process of the perceptron is started by initiating input value <math>x_{1},x_{2} </math> and multiplying them by their weights to obtain <math>w_{1}, w_{2} </math>. All of the weights will be added up together to create the weight sum<math> \sum_i w_{i} </math>. And the weighted sum is then applied to the activation function <math>f </math> to produce the perceptron's output.

A neural network works similarly to the human brain’s neural network. A “neuron” in a neural network is a mathematical function that collects and classifies information according to a specific architecture. A neural network contains layers of interconnected nodes, which can be regards as the perception and is similar to the multiple linear regression. The perceptron transfers the signal by a multiple linear regression into an activation function which may be nonlinear.

=== '''RProp''' ===
RProp, or we call Resilient Back Propagation, is the widely used algorithm for supervised learning with multi-layered feed-forward networks. The basic concept of the backpropagation learning algorithm is the repeated application of the chain rule to compute the influence of each weight in the network with respect to an arbitrary error. The derivatives equation of error function can be represented as:

<math> \frac{\partial E}{\partial w_{ij}} = \frac{\partial E}{\partial s_{i}} \frac{\partial s_{i}}{\partial net_{i}} \frac{\partial net_{i}}{\partial w_{ij}}</math>

Where <math>w_{ij}</math> is the weight from neuron <math>j</math> to neuron <math>i</math>, <math>s_{i}</math> is the output , and <math>net_{i}</math> is the weighted sum of the inputs of neurons <math>i</math>. Once the weight of each partial derivatives is known, the error function can be presented by performing a simple gradient descent:

<math>w_{ij}(t+1) = w_{ij}(t) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)</math>

The choice of the learning rate <math>\epsilon</math>, which scales the derivative, has an important effect on the time needed until convergence is reached. If it is set too small, too many steps are needed to reach an acceptable solution; on the contrary, a large learning rate will possibly lead to oscillation, preventing the error to fall below a certain value[7].

In addition, RProp can combine the method with momentum method, to prevent above problem and to accelerate the convergence rate, the equation can rewrite as:

<math> \Delta w_{ij}(t) = \epsilon \frac{\partial E}{\partial w_{ij}}(t) + \mu \Delta w_{ij}(t-1) </math>

However, It turns out that the optimal value of the momentum parameter <math>\mu</math> in above equation is equally problem dependent as the learning rate <math>\epsilon</math>, and that no general improvement can be accomplished. Besides, RProp algorithm is not function well when we have very large datasets and need to perform mini-batch weights updates. Therefore, scientist proposal a novel algorithm, RMSProp, which can cover more scenarios than RProp.

=== '''RMSProp''' ===
RProp algorithm does not work for mini-batches is because it violates the central idea behind stochastic gradient descent, when we have a small enough learning rate, it averages the gradients over successive mini-batches. To solve this issue, consider the weight, that gets the gradient 0.1 on nine mini-batches, and the gradient of -0.9 on tenths mini-batch, RMSProp did force those gradients to roughly cancel each other out, so that the stay approximately the same when computing.

By using the sign of gradient from RProp algorithm, and the mini-batches efficiency, and averaging over mini-batches which allows combining gradients in the right way. RMSProp keep moving average of the squared gradients for each weight. And then we divide the gradient by square root the mean square.

The updated equation can be performed as:

<math>E[g^2](t) = \beta E[g^2](t-1) + (1- \beta) (\frac{\partial c}{\partial w})^2</math>

<math>w_{ij}(t) = w_{ij}(t-1) - \frac{ \eta }{ \sqrt{E[g^2]}} \frac{\partial c}{\partial w_{ij}} </math>

where <math>E[g] </math> is the moving average of squared gradients, <math> \delta c / \delta w </math> is gradient of the cost function with respect to the weight, <math>\eta </math> is the learning rate and <math>\beta
</math> is moving average parameter (default value — 0.9, to make the sum of default gradient value 0.1 on nine mini-batches and -0.9 on tenths is approximate zero, and the default value <math>\eta </math> is 0.001 as per experience).

==Numerical Example==
For the simple unconstrained optimization problem <math>min f(x) = 0.1x_{1}^2 +2x_{2}^2 </math> :

settle <math>\beta
</math> = 0.9, <math>\eta </math> = 0.4, , and transform the optimization problem to the standard RMSProp form, the equations are presented as below:
[[File:Trajectory.png|alt=the visualization of the trajectory of RMSProp algorithm|thumb|The visualization of the trajectory of RMSProp algorithm]]
<math>\frac{\partial c_{1}}{\partial w_{2}}, \frac{\partial c_{2}}{\partial w_{2}} = 0.2x_{1}, 4x_{2}

</math>

<math>E_{1}(t) = 0.9 E_{1}(t-1) + (1 - 0.9)(\frac{\partial c_{1}}{\partial w_{1}})^2</math>

<math>E_{2}(t) = 0.9 E_{2}(t-1) + (1 - 0.9)(\frac{\partial c_{2}}{\partial w_{2}})^2</math>

<math>w_{1}(t) = w_{1}(t-1) - \frac{0.4}{ \sqrt{E_{1}}} \frac{\partial c_{1}}{\partial w_{1}}</math>

<math>w_{2}(t) = w_{2}(t-1) - \frac{0.4}{ \sqrt{E_{1}}} \frac{\partial c_{2}}{\partial w_{2}}</math>

while using programming language to help us to solve optimization problem and visualize the trajectory of RMSProp algorithm, we can observe that the curve converge to a certain point. For this particular question, minimize solution <math>0 </math> will be obtained with <math>(x_{1}, x_{2}) </math> is <math>(0, 0) </math>. [[File:1 - 2dKCQHh - Long Valley.gif|thumb|Visualizing Optimization algorithm comparing convergence with similar algorithm[1]]]

== Applications and Discussion ==
[[File:2 - pD0hWu5 - Beale's function.gif|thumb|Visualizing Optimization algorithm comparing convergence with similar algorithm[1]]]
The applications of RMSprop concentrate on the optimization with complex function like the neural network, or the non-convex optimization problem with adaptive learning rate, and widely used in the stochastic problem. The RMSprop optimizer restricts the oscillations in the vertical direction. Therefore, we can increase the learning rate or the algorithm could take larger steps in the horizontal direction converging to faster the similar approach gradient descent algorithm combine with momentum method.

In the first visualization scheme, the gradients based optimization algorithm has a different convergence rate. As the visualizations are shown, without scaling based on gradient information algorithms are hard to break the symmetry and converge rapidly. RMSProp has a relative higher converge rate than SGD, Momentum, and NAG, beginning descent faster, but it is slower than Ada-grad, Ada-delta, which are the Adam based algorithm. In conclusion, when handling the large scale/gradients problem, the scale gradients/step sizes like Ada-delta, Ada-grad, and RMSProp perform better with high stability.

Ada-grad adaptive learning rate algorithms that look a lot like RMSProp. Ada-grad adds element-wise scaling of the gradient-based on the historical sum of squares in each dimension. This means that we keep a running sum of squared gradients, and then we adapt the learning rate by dividing it by the sum to get the result. Considering the concepts in RMSProp widely used in other machine learning algorithms, we can say that it has high potential to coupled with other methods such as momentum,...etc.

== Conclusion==
RMSProp, root mean squared propagation is the optimization machine learning algorithm to train the Artificial Neural Network (ANN) by different adaptive learning rate and derived from the concepts of gradients descent and RProp. Combining averaging over mini-batches, efficiency, and the gradients over successive mini-batches, RMSProp can reach the faster convergence rate than the original optimizer, but lower than the advanced optimizer such as Adam. As knowing the high performance of RMSProp and possibility of combining with other algorithm, harder problem could be better described and converged in the future.

==Reference==

1. A. Radford, "[https://imgur.com/a/Hqolp#NKsFHJb Visualizing Optimization Algos (open sourse)".]

2. R. Yamashita, M Nishio and R KGian, "Convolutional neural networks: an overview and application in radiology", pp. 9:611–629, 2018.[[File:3 - NKsFHJb - Saddle Point.gif|thumb|Visualizing Optimization algorithm comparing convergence with similar algorithm[1]]]3. V. Bushave, "Understanding RMSprop — faster neural network learning", 2018.

4. V. Bushave, "How do we ‘train’ neural networks ?", 2017.

5. S. Ruder, "An overview of gradient descent optimization algorithms" ,2016.

6. R. Maksutov, "Deep study of a not very deep neural network. Part 3a: Optimizers overview", 2018.

7. M. Riedmiller, H Braun, "A Direct Adaptive Method for Faster Back-propagation Learning: The RPROP Algorithm", pp.586-591, 1993.

8. D. Garcia-Gasulla, "An Out-of-the-box Full-network Embedding for Convolutional Neural Networks" pp.168-175, 2018.

9. [https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf Geoffrey Hinton, "Coursera Neural Networks for Machine Learning lecture 6", 2018.]

10. [https://www.programcreek.com/python/example/104283/keras.optimizers.RMSprop Python keras.optimizers.RMSprop() Examples.]

11. [https://d2l.ai/chapter_optimization/rmsprop.html RMSProp Algorithm Implementation Example.]

12. S.De, A. Mukherjee, and E. Ullah, "Convergence guarantees for RMSProp and Adam in non-convex optimization and and empirical comparison to Nesterov acceleration", conference paper at ICLR, 2019.

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:55:55Z

Wc593: /* Adaptive robust optimization */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:54:23Z

Wc593: /* Branch and cut */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

==[[Adaptive robust optimization]]==

* Problem Formulation
*# Please check typos such as "Let ''u'' bee a vector".
*# The abbreviation KKT is not previously defined.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:53:27Z

Wc593: /* Branch and cut */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted. Please fix the same.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

==[[Adaptive robust optimization]]==

* Problem Formulation
*# Please check typos such as "Let ''u'' bee a vector".
*# The abbreviation KKT is not previously defined.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

Branch and cut

2020-12-21T10:53:15Z

Wc593: /* Strong Branching: */

Author: Lindsay Siegmundt, Peter Haddad, Chris Babbington, Jon Boisvert, Haris Shaikh (SysEn 6800 Fall 2020)

Steward: Wei-Han Chen, Fengqi You

== Introduction ==
The Branch and Cut methodology was discovered in the 90s as a way to solve/optimize Mixed-Integer Linear Programs (Karamanov, Miroslav)<ref>Karamanov, Miroslav. “Branch and Cut: An Empirical Study.” ''Carnegie Mellon University'' , Sept. 2006, https://www.cmu.edu/tepper/programs/phd/program/assets/dissertations/2006-operations-research-karamanov-dissertation.pdf.</ref>. This concept is comprised of two known optimization methodologies - Branch and Bound and Cutting Planes. Utilizing these two tools allows for the Branch and Cut to find an optimal solution through relaxing the problem to produce the upper bound. Relaxing the problem allows for the complex problem to be simplified in order for it to be solve more easily. Furthermore, the upper bound represents the highest value the objective can take in order to be feasible. The optimal solution is found when the objective is equal to the upper bound (Luedtke, Jim)<ref>Luedtke, Jim. “The Branch-and-Cut Algorithm for Solving Mixed-Integer Optimization Problems.” ''Institute for Mathematicians and Its Applications'', 10 Aug. 2016, https://www.ima.umn.edu/materials/2015-2016/ND8.1-12.16/25397/Luedtke-mip-bnc-forms.pdf.</ref>. This methodology is critical to the future of optimization since it combines two common tools in order to utilize each component in order to find the optimal solution. Moving forward, the critical components of different methodologies could be combined in order to find optimality in a more simple and direct manner.

== Methodology & Algorithm ==

=== Methodology ===
{| class="wikitable"
|+Abbreviation Details
!Acronym
!Expansion
|-
|LP
|Linear Programming
|-
|B&B
|Branch and Bound
|}

==== Most Infeasible Branching: ====
Most infeasible branching is a very popular method that picks the variable with fractional part closest to <math>0:5</math>, i.e.,<math> si = 0:5-|xA_i- xA_i-0:5|</math><ref>Branching rules revisited Tobias Achterberga;∗, Thorsten Kocha, Alexander Martinb https://www-m9.ma.tum.de/downloads/felix-klein/20B/AchterbergKochMartin-BranchingRulesRevisited.pdf</ref>. Most infeasible branching picks a variable where the least tendency can be recognized to which side the variable should be rounded. However, the performance of this method is not any superior to the rule of selecting a variable randomly.

==== '''Strong Branching:''' ====
For each fractional variable, strong branching tests the dual bound increase by computing the LP relaxations result from the branching on that variable. As a branching variable for the current node, the variable that leads to the largest increases is selected. Despite its obvious simplicity, strong branching is so far the most powerful branching technique in terms of the number of nodes available in the B&B tree, this effectiveness can however be accomplished only at the cost of computation.<ref>A Branch-and-Cut Algorithm for Mixed Integer Bilevel Linear Optimization Problems and Its Implementation<nowiki/>https://coral.ise.lehigh.edu/~ted/files/papers/MIBLP16.pdf</ref>

==== '''Pseudo Cost:''' ====
[[File:Image.png|thumb|Pure psuedo cost branching]]

Another way to approximate a relaxation value is by utilizing a pseudo cost method. The pseudo-cost of a variable is an estimate of the per unit change in the objective function from making the value of the variable to be rounded up or down. For each variable we choose variable with the largest estimated LP objective gain<ref>Advances in Mixed Integer Programming http://scip.zib.de/download/slides/SCIP-branching.ppt</ref>.
==='''Algorithm'''===
Branch and Cut for is a variation of the Branch and Bound algorithm. Branch and Cut incorporates Gomery cuts allowing the search space of the given problem. The standard Simplex Algorithm will be used to solve each Integer Linear Programming Problem (LP).

<math>min: c^tx
</math>

<math>s.t. Ax 

<math>x \geq 0
</math>

<math>x_i = int, i = 1,2,3...,n
</math>

Above is a mix-integer linear programming problem. x and c are a part of the n-vector. These variables can be set to 0 or 1 allow binary variables. The above problem can be denoted as <math>LP_n </math>

Below is an Algorithm to utilize the Branch and Cut algorithm with Gomery cuts and Partitioning:

'''Step 0:'''
Upper Bound = ∞
Lower Bound = -∞
'''Step 1. Initialize:'''

Set the first node as <math>LP_0</math> while setting the active nodes set as <math>L</math>. The set can be accessed via <math>LP_n </math>

===='''Step 2. Terminate:'''====
Step 3. Iterate through list L:

While <math>L</math> is not empty (i is the index of the list of L), then:

'''Step 3.1. Convert to a Relaxation:'''

'''Solve 3.2.'''

Solve for the Relaxed

'''Step 3.3.'''
If Z is infeasible:
Return to step 3.
else:
Continue with solution Z.
'''Step 4. Cutting Planes:'''
If a cutting plane is found:
then add to the Linear Relaxation problem (as a constraint) and return to step 3.2
Else:
Continue.
'''Step 5. Pruning and Fathoming:'''

(a)If ≥ Z:, then go to step 3.
If Z^l <= Z AND X_i is an integral feasible:
Z = Z^i
Remove all Z^i from Set(L)
'''Step 6. Partition'''

Let <math>D^{lj=k}_{j=1}</math> be a partition of the constraint set <math>D</math> of problem <math>LP_l</math>. Add problems <math>D^{lj=k}_{j=1}</math> to L, where <math>LP^l_j</math> is <math>LP_l</math> with feasible region restricted to <math>D^l_j</math> and <math>Z_{lj}</math> for j=1,...k is set to the value of <math>Z^l</math> for the parent problem l. Go to step 3.<ref name=":0">Benders, J. F. (Sept. 1962), "Partitioning procedures for solving mixed-variables programming problems", Numerische Mathematik 4(3): 238–252.</ref>

==Numerical Example==
First, list out the MILP:

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>x_1,x_2\geq0</math>

Solution to original LP

<math>z =-19.56, x_1=1.88, x_2=1.72 </math>

Branch on x1 to generate sub-problems

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>x_1\geq2</math>

<math>x_1,x_2\geq0</math>

Solution to fist branch sub-problem

<math>z =-15, x_1=2, x_2=1</math>

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>x_1\leq1</math>

<math>x_1,x_2\geq0</math>

Solution to second branch sub-problem

<math>z =-14.5, x_1=1, x_2=1.5</math>

Adding a cut

<math>min \ z=-4x_1-7x_2</math>

<math>6x_1 + x_2 \leq13</math>

<math>-x_1+4x_2\leq5</math>

<math>2x_1+x_2\leq 3</math>

<math>x_1\leq1</math>

<math>x_1,x_2\geq0</math>

Solution to cut LP

<math>z=-13.222,x_1=.778,x_2=1.444</math>

==Application==
Several of the Branch and Cut applications are described below in more detail and how they can be used. These applications serve as methods in which Branch and Cut can be used to optimize various problems efficiently.

=== '''Combinatorial Optimization''' ===
Combinatorial Optimization is a great application for Branch and Cut. This style of optimization is the methodology of utilizing the finite known sets and information of the sets to optimize the solution. The original intent for this application was for maximizing flow as well as in the transportation industry (Maltby and Ross). This combinatorial optimization has also taken on some new areas where it is used often. Combinatorial Optimization is now an imperative component in studying artificial intelligence and machine learning algorithms to optimize solutions. The finite sets that Combinatorial Optimization tends to utilize and focus on includes graphs, partially ordered sets, and structures that define linear independence call matroids.<ref>[https://brilliant.org/wiki/combinatorial-optimization/ Maltby, Henry, and Eli Ross. “Combinatorial Optimization.” ''Brilliant Math & Science Wiki'', https://brilliant.org/wiki/combinatorial-optimization/.]</ref>

=== '''Bender’s Decomposition''' ===
Bender’s Decomposition is another Branch and Cut application that is utilized widely in Stochastic Programming. Bender’s Decomposition is where you take the initial problem and divide into two distinct subsets. By dividing the problem into two separate problems you are able to solve each set easier than the original instance (Benders). Therefore the first problem within the subset created can be solved for the first variable set. The second sub problem is then solved for, given that first problem solution. Doing this allows for the sub problem to be solved to determine whether the first problem is infeasible (Benders). Bender’s cuts can be added to constrain the problem until a feasible solution can be found.<ref name=":0" />

=== '''Large-Scale Symmetric Traveling Salesmen Problem''' ===
The Large-Scale Symmetric Traveling Salesmen Problem is a common problem that was always looked into optimizing for the shortest route while visiting each city once and returning to the original city at the end. On a larger scale this style of problem must be broken down into subsets or nodes (SIAM). By constraining this style of problem such as the methods of Combinatorial Optimization, the Traveling Salesmen Problem can be viewed as partially ordered sets. By doing this on a large scale with finite cities you are able to optimize the shortest path taken and ensure each city is only visited once.<ref>Society for Industrial and Applied Mathematics. “SIAM Rev.” ''SIAM Review'', 18 July 2006, https://epubs.siam.org/doi/10.1137/1033004</ref>

=== '''Submodular Function''' ===
Submodular Function is another function in which is used throughout artificial intelligence as well as machine learning. The reason for this is because as inputs are increased into the function the value or outputs decrease. This allows for a great optimization features in the cases stated above because inputs are continually growing. This allows for machine learning and artificial intelligence to continue to grow based on these algorithms (Tschiatschek, Iyer, and Bilmes)<ref>S. Tschiatschek, R. Iyer, H. Wei and J. Bilmes, Learning Mixtures of Submodular Functions for Image Collection Summarization, NIPS-2014.</ref>. By enforcing new inputs to the system the system will learn more and more to ensure it optimizes the solution that is to be made.<ref>A. Krause and C. Guestrin, Beyond Convexity: Submodularity in Machine Learning, Tutorial at ICML-2008</ref>

==Conclusion==
The Branch and Cut is an optimization algorithm used to optimize integer linear programming. It combines two other optimization algorithms - branch and bound and cutting planes in order to utilize the results from each method in order to create the most optimal solution. There are three different methodologies used within the specific method - most infeasible branching, strong branching, and pseudo code. Furthermore, Branch and Cut can be utilized it multiple scenarios - Submodular function, large-scale symmetric traveling salesmen problem, bender's decomposition, and combination optimization which increases the impact of the methodology.

==Reference==
<references />

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:51:38Z

Wc593: /* Heuristic algorithms */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted. Please fix the same.
*# Fix typos: e.g., repeated “for the current”.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

==[[Adaptive robust optimization]]==

* Problem Formulation
*# Please check typos such as "Let ''u'' bee a vector".
*# The abbreviation KKT is not previously defined.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

Heuristic algorithms

2020-12-21T10:51:16Z

Wc593:

Author: Anmol Singh (as2753)

Steward: Fengqi You, Allen Yang

== Introduction ==
In mathematical programming, a heuristic algorithm is a procedure that determines near-optimal solutions to an optimization problem. However, this is achieved by trading optimality, completeness, accuracy, or precision for speed.<ref> Eiselt, Horst A et al. Integer Programming and Network Models. Springer, 2011.</ref> Nevertheless, heuristics is a widely used technique for a variety of reasons:

*Problems that do not have an exact solution or for which the formulation is unknown
*The computation of a problem is computationally intensive
*Calculation of bounds on the optimal solution in branch and bound solution processes
==Methodology==
Optimization heuristics can be categorized into two broad classes depending on the way the solution domain is organized:

===Construction methods (Greedy algorithms)===
The greedy algorithm works in phases, where the algorithm makes the optimal choice at each step as it attempts to find the overall optimal way to solve the entire problem.<ref>
''Introduction to Algorithms'' (Cormen, Leiserson, Rivest, and Stein) 2001, Chapter 16 "Greedy Algorithms".</ref> It is a technique used to solve the famous “traveling salesman problem” where the heuristic followed is: "At each step of the journey, visit the nearest unvisited city."

====Example: Scheduling Problem====
You are given a set of N schedules of lectures for a single day at a university. The schedule for a specific lecture is of the form (s time, f time) where s time represents the start time for that lecture, and similarly, the f time represents the finishing time. Given a list of N lecture schedules, we need to select a maximum set of lectures to be held out during the day such that none of the lectures overlaps with one another i.e. if lecture Li and Lj are included in our selection then the start time of j ≥ finish time of i or vice versa. The most optimal solution to this would be to consider the earliest finishing time first. We would sort the intervals according to the increasing order of their finishing times and then start selecting intervals from the very beginning.

===Local Search methods===
The Local Search method follows an iterative approach where we start with some initial solution, explore the neighborhood of the current solution, and then replace the current solution with a better solution.<ref> Eiselt, Horst A et al. Integer Programming and Network Models. Springer, 2011.</ref> For this method, the “traveling salesman problem” would follow the heuristic in which a solution is a cycle containing all nodes of the graph and the target is to minimize the total length of the cycle.

==== Example Problem ====
Suppose that the problem P is to find an optimal ordering of N jobs in a manufacturing system. A solution to this problem can be described as an N-vector of job numbers, in which the position of each job in the vector defines the order in which the job will be processed. For example, [3, 4, 1, 6, 5, 2] is a possible ordering of 6 jobs, where job 3 is processed first, followed by job 4, then job 1, and so on, finishing with job 2. Define now M as the set of moves that produce new orderings by the swapping of any two jobs. For example, [3, 1, 4, 6, 5, 2] is obtained by swapping the positions of jobs 4 and 1.
==Popular Heuristic Algorithms==

===Genetic Algorithm===
The term Genetic Algorithm was first used by John Holland.<ref>J.H. Holland (1975) ''Adaptation in Natural and Artificial Systems,'' University of Michigan Press, Ann Arbor, Michigan; re-issued by MIT Press (1992).</ref> They are designed to mimic the Darwinian theory of evolution, which states that populations of species evolve to produce more complex organisms and fitter for survival on Earth. Genetic algorithms operate on string structures, like biological structures, which are evolving in time according to the rule of survival of the fittest by using a randomized yet structured information exchange. Thus, in every generation, a new set of strings is created, using parts of the fittest members of the old set.<ref>Optimal design of heat exchanger networks, Editor(s): Wilfried Roetzel, Xing Luo, Dezhen Chen, Design and Operation of Heat Exchangers and their Networks, Academic Press, 2020, Pages 231-317, <nowiki>ISBN 9780128178942</nowiki>, https://doi.org/10.1016/B978-0-12-817894-2.00006-6.</ref> The algorithm terminates when the satisfactory fitness level has been reached for the population or the maximum generations have been reached. The typical steps are<ref>Wang FS., Chen LH. (2013) Genetic Algorithms. In: Dubitzky W., Wolkenhauer O., Cho KH., Yokota H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9863-7_412 </ref>:

1. Choose an initial population of candidate solutions

2. Calculate the fitness, how well the solution is, of each individual

3. Perform crossover from the population. The operation is to randomly choose some pair of individuals like parents and exchange so parts from the parents to generate new individuals

4. Mutation is to randomly change some individuals to create other new individuals

5. Evaluate the fitness of the offspring

6. Select the survive individuals

7. Proceed from 3 if the termination criteria have not been reached

===Tabu Search Algorithm===
Tabu search (TS) is a heuristic algorithm created by Fred Glover<ref>Fred Glover (1986). "Future Paths for Integer Programming and Links to Artificial Intelligence". Computers and Operations Research. '''13''' (5): 533–549,https://doi.org/10.1016/0305-0548(86)90048-1</ref> using a gradient-descent search with memory techniques to avoid cycling for determining an optimal solution. It does so by forbidding or penalizing moves that take the solution, in the next iteration, to points in the solution space previously visited. The algorithm spends some memory to keep a Tabu list of forbidden moves, which are the moves of the previous iterations or moves that might be considered unwanted. A general algorithm is as follows<ref>Optimization of Preventive Maintenance Program for Imaging Equipment in Hospitals, Editor(s): Zdravko Kravanja, Miloš Bogataj, Computer-Aided Chemical Engineering, Elsevier, Volume 38, 2016, Pages 1833-1838, ISSN 1570-7946, <nowiki>ISBN 9780444634283</nowiki>, https://doi.org/10.1016/B978-0-444-63428-3.50310-6.</ref>:

1. Select an initial solution ''s''0 ∈ ''S''. Initialize the Tabu List ''L''0 = ∅ and select a list tabu size. Establish ''k'' = 0.

2. Determine the neighborhood feasibility ''N''(''sk'') that excludes inferior members of the tabu list ''Lk''.

3. Select the next movement ''sk'' + 1 from ''N''(''Sk'') or ''Lk'' if there is a better solution and update ''Lk'' + 1

4. Stop if a condition of termination is reached, else, ''k'' = ''k'' + 1 and return to 1

==== Example: The Classical Vehicle Routing Problem ====
''Vehicle Routing Problems'' have very important applications in distribution management and have become some of the most studied problems in the combinatorial optimization literature. These include several Tabu Search implementations that currently rank among the most effective. The ''Classical Vehicle Routing Problem'' (CVRP) is the basic variant in that class of problems. It can formally be defined as follows. Let ''G'' = (''V, A'') be a graph where ''V'' is the vertex set and ''A'' is the arc set. One of the vertices represents the ''depot'' at which a fleet of identical vehicles of capacity ''Q'' is based, and the other vertices customers that need to be serviced. With each customer vertex vi are associated a demand qi and a service time ti. With each arc (vi, vj) of ''A'' are associated a cost cij and a travel time tij.<ref>Glover, Fred, and Gary A Kochenberger. Handbook Of Metaheuristics. Kluwer Academic Publishers, 2003.</ref> The CVRP consists of finding a set of routes such that:

1. Each route begins and ends at the depot

2. Each customer is visited exactly once by exactly one route

3. The total demand of the customers assigned to each route does not exceed ''Q''

4. The total duration of each route (including travel and service times) does not exceed a specified value ''L''

5. The total cost of the routes is minimized

A feasible solution for the problem thus consists of a partition of the customers into m groups, each of total demand no larger than ''Q'', that are sequenced to yield routes (starting and ending at the depot) of duration no larger than ''L''.

===Simulated Annealing Algorithm===
The Simulated Annealing Algorithm was developed by Kirkpatrick et. al. in 1983<ref>Kirkpatrick, S., Gelatt, C., & Vecchi, M. (1983). Optimization by Simulated Annealing. ''Science,'' ''220''(4598), 671-680. Retrieved November 25, 2020, from http://www.jstor.org/stable/1690046</ref> and is based on the analogy of ideal crystals in thermodynamics. The annealing process in metallurgy can make particles arrange themselves in the position with minima potential as the temperature is slowly decreased. The Simulation Annealing algorithm mimics this mechanism and uses the objective function of an optimization problem instead of the energy of a material to arrive at a solution. A general algorithm is as follows<ref>Brief review of static optimization methods, Editor(s): Stanisław Sieniutycz, Jacek Jeżowski, Energy Optimization in Process Systems and Fuel Cells (Third Edition), Elsevier, 2018, Pages 1-41, <nowiki>ISBN 9780081025574</nowiki>, https://doi.org/10.1016/B978-0-08-102557-4.00001-3.</ref> :

1. Fix initial temperature (''T''0)

2. Generate starting point '''x'''0 (this is the best point '''''X'''''* at present)

3. Generate randomly point '''''XS''''' (neighboring point)

4. Accept '''''XS''''' as '''''X'''''* (currently best solution) if an acceptance criterion is met. This must be such a condition that the probability of accepting a worse point is greater than zero, particularly at higher temperatures

5. If an equilibrium condition is satisfied, go to (6), otherwise jump back to (3).

6. If termination conditions are not met, decrease the temperature according to a certain cooling scheme and jump back to (1). If the termination conditions are satisfied, stop calculations accepting the current best value '''''X'''''* as the final (‘optimal’) solution.

== Numerical Example: Knapsack Problem ==
One of the most common applications of the heuristic algorithm is the Knapsack Problem, in which a given set of items (each with a mass and a value) are grouped to have a maximum value while being under a certain mass limit. It uses the Greedy Approximation Algorithm to sort the items based on their value per unit mass and then includes the items with the highest value per unit mass if there is still space remaining.

'''<big>Example</big>'''

The following table specifies the weights and values per unit of five different products held in storage. The quantity of each product is unlimited. A plane with a weight capacity of 13 is to be used, for one trip only, to transport the products. We would like to know how many units of each product should be loaded onto the plane to maximize the value of goods shipped.
{| class="wikitable"
|+
!
Product (i)
!Weight per unit (wi)
!Value per unit (vi)
|-
|1
|7
|9
|-
|2
|5
|4
|-
|3
|4
|3
|-
|4
|3
|2
|-
|5
|1
|0.5
|}
'''<big>Solution:</big>'''

'''(a) Stages:'''

We view each type of product as a stage, so there are 5 stages. We can also add a sixth stage representing the endpoint after deciding

'''(b) States:'''

We can view the remaining capacity as states, so there are 14 states in each stage: 0,1, 2, 3, …13

'''(c) Possible decisions at each stage:'''

Suppose we are in state s in stage n (n < 6), hence there are s capacity remaining. Then the possible number of items we can pack is:

j = 0, 1, …[s/wn]

For each such action j, we can have an arc going from the state s in stage n to the state n – j*wn in stage n + 1. For each arc in the graph, there is a corresponding benefit j*vn. We are trying to find a maximum benefit path from state 13 in stage 1, to stage 6.

'''(d) Optimization function:'''

Let fn(s) be the value of the maximum benefit possible with items of type n or greater using total capacity at most s

'''(e) Boundary conditions:'''

The sixth stage should have all zeros, that is, f6(s) = 0 for each s = 0,1, … 13

'''(f) Recurrence relation:'''

fn(s) = max {j*vn + fn+1(s – j*wn)}, j = 0, 1, …, [s/wn]

'''(g) Compute:'''

The solution will not show all the computations steps. Instead, only a few cases are given below to illustrate the idea.

* For stage 5, f5(s) = maxj=0, 1, …[s/1] {j*0.5 + 0} = 0.5s because given the all zero states in stage 6, the maximum possible value is to use up all the remaining s capacity.
* For stage 4, state 7,

f4(7) = maxj=0,1, …, [7/w4] = {j*v4 + f5(7 - w4*j)}

= max {0 + 3.5; 2 + 2; 4 + 0.5}

= 4.5

Using the recurrence relation above, we get the following table:
{| class="wikitable"
|+
!Unused Capacity
s
!f1(s)
!Type 1
opt
!f2(s)
!Type 2
opt
!f3(s)
!Type 3
opt
!f4(s)
!Type 4
opt
!f5(s)
!Type 5
opt
!f6(s)
|-
|13
|13.5
|1
|10
|2
|9.5
|3
|8.5
|4
|6.5
|13
|0
|-
|12
|13
|1
|9
|2
|9
|3
|8
|4
|6
|12
|0
|-
|11
|12
|1
|8.5
|2
|8
|2
|7
|3
|5.5
|11
|0
|-
|10
|11
|1
|8
|2
|7
|2
|6.5
|3
|5
|10
|0
|-
|9
|10
|1
|7
|1
|6.5
|2
|6
|3
|4.5
|9
|0
|-
|8
|9.5
|1
|6
|1
|6
|2
|5
|2
|4
|8
|0
|-
|7
|9
|1
|5
|1
|5
|1
|4.5
|2
|3.5
|7
|0
|-
|6
|4.5
|0
|4.5
|1
|4
|1
|4
|2
|3
|6
|0
|-
|5
|4
|0
|4
|1
|3.5
|1
|3
|1
|2.5
|5
|0
|-
|4
|3
|0
|3
|0
|3
|1
|2.5
|1
|2
|4
|0
|-
|3
|2
|0
|2
|0
|2
|0
|2
|1
|1.5
|3
|0
|-
|2
|1
|0
|1
|0
|1
|0
|1
|0
|1
|2
|0
|-
|1
|0.5
|0
|0.5
|0
|0.5
|0
|0.5
|0
|0.5
|1
|0
|-
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|}
'''Optimal solution:''' The maximum benefit possible is 13.5. Tracing forward to get the optimal solution: the optimal decision corresponding to the entry 13.5 for f1(1) is 1, therefore we should pack 1 unit of type 1. After that we have 6 capacity remaining, so look at f2(6) which is 4.5, corresponding to the optimal decision of packing 1 unit of type 2. After this, we have 6-5 = 1 capacity remaining, and f3(1) = f4(1) = 0, which means we are not able to pack any type 3 or type 4. Hence we go to stage 5 and find that f5(1) = 1, so we should pack 1 unit of type 5. This gives the entire optimal solution as can be seen in the table below:
{| class="wikitable"
|+
! colspan="2" |Optimal solution
|-
!Product (i)
!Number of units
|-
|1
|1
|-
|2
|1
|-
|5
|1
|}

==Applications==
Heuristic algorithms have become an important technique in solving current real-world problems. Its applications can range from optimizing the power flow in modern power systems<ref> NIU, M., WAN, C. & Xu, Z. A review on applications of heuristic optimization algorithms for optimal power flow in modern power systems. J. Mod. Power Syst. Clean Energy 2, 289–297 (2014), https://doi.org/10.1007/s40565-014-0089-4</ref> to groundwater pumping simulation models<ref> J. L. Wang, Y. H. Lin and M. D. Lin, "Application of heuristic algorithms on groundwater pumping source identification problems," 2015 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 2015, pp. 858-862, https://doi.org/10.1109/IEEM.2015.7385770.</ref>. Heuristic optimization techniques are increasingly applied in environmental engineering applications as well such as the design of a multilayer sorptive barrier system for landfill liner.<ref>Matott, L. Shawn, et al. “Application of Heuristic Optimization Techniques and Algorithm Tuning to Multilayered Sorptive Barrier Design.” Environmental Science & Technology, vol. 40, no. 20, 2006, pp. 6354–6360., https://doi.org/10.1021/es052560+.</ref> Heuristic algorithms have also been applied in the fields of bioinformatics, computational biology, and systems biology.<ref>Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A, Robles V (2006) Machine learning in bioinformatics. Brief Bioinform 7(1):86–112 </ref>

==Conclusion==
Heuristic algorithms are not a panacea, but they are handy tools to be used when the use of exact methods cannot be implemented. Heuristics can provide flexible techniques to solve hard problems with the advantage of simple implementation and low computational cost. Over the years, we have seen a progression in heuristics with the development of hybrid systems that combine selected features from various types of heuristic algorithms such as tabu search, simulated annealing, and genetic or evolutionary computing. Future research will continue to expand the capabilities of existing heuristics to solve complex real-world problems.

==References==
<references />

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:48:42Z

Wc593: /* Column generation algorithms */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Heuristic algorithms]]==

* Methodology
*# Please use proper symbol for "greater than or equal to".
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted. Please fix the same.
*# Fix typos: e.g., repeated “for the current”.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

==[[Adaptive robust optimization]]==

* Problem Formulation
*# Please check typos such as "Let ''u'' bee a vector".
*# The abbreviation KKT is not previously defined.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

Column generation algorithms

2020-12-21T10:46:30Z

Wc593:

Author: Lorena Garcia Fernandez (lgf572)

== Introduction ==
Column Generation techniques have the scope of solving large linear optimization problems by generating only the variables that will have an influence on the objective function. This is important for big problems with many variables where the formulation with these techniques would simplify the problem formulation, since not all the possibilities need to be explicitly listed.<ref>Desrosiers, Jacques & Lübbecke, Marco. (2006). A Primer in Column Generation.p7-p14 10.1007/0-387-25486-2_1. </ref>

== Theory, methodology and algorithmic discussions ==
'''''Theory'''''

The way this method work is as follows; first, the original problem that is being solved needs to be split into two problems: the master problem and the sub-problem.

* The master problem is the original column-wise (i.e: one column at a time) formulation of the problem with only a subset of variables being considered.<ref>
AlainChabrier, Column Generation techniques, 2019 URL: https://medium.com/@AlainChabrier/column-generation-techniques-6a414d723a64
</ref>

* The sub-problem is a new problem created to identify a new promising variable. The objective function of the sub-problem is the reduced cost of the new variable with respect to the current dual variables, and the constraints require that the variable obeys the naturally occurring constraints. The subproblem is also referred to as the RMP or “restricted master problem”. From this we can infer that this method will be a good fit for problems whose constraint set admit a natural breakdown (i.e: decomposition) into sub-systems representing a well understood combinatorial structure.<ref>
AlainChabrier, Column Generation techniques, 2019 URL: https://medium.com/@AlainChabrier/column-generation-techniques-6a414d723a64
</ref>

To execute that decomposition from the original problem into Master and subproblems there are different techniques. The theory behind this method relies on the Dantzig-Wolfe decomposition.<ref>Dantzig-Wolfe decomposition. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Dantzig-Wolfe_decomposition&oldid=50750</ref>

In summary, when the master problem is solved, we are able to obtain dual prices for each of the constraints in the master problem. This information is then utilized in the objective function of the subproblem. The subproblem is solved. If the objective value of the subproblem is negative, a variable with negative reduced cost has been identified. This variable is then added to the master problem, and the master problem is re-solved. Re-solving the master problem will generate a new set of dual values, and the process is repeated until no negative reduced cost variables are identified. The subproblem returns a solution with non-negative reduced cost, we can conclude that the solution to the master problem is optimal.<ref>Wikipedia, the free encyclopeda. Column Generation. URL: https://en.wikipedia.org/wiki/Column_generation</ref>

'''''Methodology'''''<ref>L.A. Wolsey, Integer programming. Wiley,Column Generation Algorithms p185-p189,1998</ref>
[[File:Column Generation.png|thumb|468x468px|Column generation schematics<ref name=":4">GERARD. (2005). Personnel and Vehicle scheduling, Column Generation, slide 12. URL: https://slideplayer.com/slide/6574/</ref>]]
Consider the problem in the form:

(IP)
<math>z=max\left \{\sum_{k=1}^{K}c^{k}x^{k}:\sum_{k=1}^{K}A^{k}x^{k}=b,x^{k}\epsilon X^{k}\; \; \; for\; \; \; k=1,...,K \right \}</math>

Where <math>X^{k}=\left \{x^{k}\epsilon Z_{+}^{n_{k}}: D^{k}x^{k}\leq d^{_{k}} \right \}</math> for <math>k=1,...,K</math>. Assuming that each set <math>X^{k}</math> contains a large but finite set of points <math>\left \{ x^{k,t} \right \}_{t=1}^{T_{k}}</math>, we have that <math>X^{k}=</math>:

<math>\left \{ x^{k}\epsilon R^{n_{k}}:x^{k}=\sum_{t=1}^{T_{k}}\lambda _{k,t}x^{k,t},\sum_{t=1}^{T_{k}}\lambda _{k,t}=1,\lambda _{k,t}\epsilon \left \{ 0,1 \right \}for \; \; k=1,...,K \right \}</math>

Note that, on the assumption that each of the sets <math>X^{k}=</math> is bounded for <math>k=1,...,K</math> the approach will involve solving an equivalent problem of the form as below:

<math>max\left \{ \sum_{k=1}^{K}\gamma ^{k}\lambda ^{k}: \sum_{k=1}^{K}B^{k}\lambda ^{k}=\beta ,\lambda ^{k}\geq 0\; \; integer\; \; for\; \; k=1,...,K \right \}</math>

where each matrix <math>B^{k}</math> has a very large number of columns, one for each of the feasible points in <math>X^{k}</math>, and each vector <math>\lambda ^{k}</math> contains the corresponding variables.

Now, substituting for <math>x^{k}=</math> leads to an equivalent ''IP Master Problem (IPM)'':

(IPM)
<math>\begin{matrix}
z=max\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left(c^{k}x^{k,t}\right )\lambda _{k,t} \\ \sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( A^{k}x^{k,t} \right )\lambda _{k,t}=b\\
\sum_{t=1}^{T_{k}}\lambda _{k,t}=1\; \; for\; \; k=1,...,K \\
\lambda _{k,t}\epsilon \left \{ 0,1 \right \}\; \; for\; \; t=1,...,T_{k}\; \; and\; \; k=1,...,K.
\end{matrix}</math>

To solve the Master Linear Program, we use a column generation algorithm. This is in order to solve the linear programming relaxation of the Integer Programming Master Problem, called the ''Linear Programming Master Problem (LPM)'':

(LPM)
<math>\begin{matrix}
z^{LPM}=max\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( c^{k}x^{k,t} \right )\lambda _{k,t}\\
\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( A^{k}x^{k,t} \right )\lambda _{k,t}=b \\
\sum_{t=1}^{T_{k}}\lambda _{k,t}=1\; \;for\; \; k=1,...,K \\
\lambda _{k,t} \geq 0\; \; for\; \; t=1,...,T_{k},\; k=1,...,K
\end{matrix}</math>

Where there is a column <math>\begin{pmatrix}
c^{k}x\\
A^{k}x\\
e_{k}
\end{pmatrix}</math> for each ''<math>x</math>'' ''<math display="inline">\in</math> <math display="inline">X^{k}</math>''. On the next steps of this method, we will use <math>\left \{ \pi _{i} \right \}_{i=1}^{m}</math> as the dual variables associated with the joint constraints, and <math>\left \{ \mu_{k} \right \}_{k=1}^{K}</math> as dual variables for the second set of constraints.The latter are also known as convexity constraints.
The idea is to solve the linear program by the primal simplex algorithm. However, the pricing step of choosing a column to enter the basis must be modified because of the very big number of columns in play. Instead of pricing the columns one at a time, the question of finding a column with the biggest reduced price is itself a set of <math>K</math> optimization problems.

''Initialization:'' we suppose that a subset of columns (at least one for each <math>k</math>) is available, providing a feasible ''Restricted Linear Programming Master Problem'':

(RLPM)
<math>\begin{matrix}
z^{LPM}=max\tilde{c}\tilde{\lambda} \\
\tilde{A}\tilde{\lambda }=b \\
\tilde{\lambda }\geq 0
\end{matrix}</math>

where <math>\tilde{b}=\begin{pmatrix}
b\\
1\\
\end{pmatrix}</math>, <math>\tilde{A}</math> is generated by the available set of columns and <math>\tilde{c}\tilde{\lambda }</math> are the corresponding costs and variables. Solving the RLPM gives an optimal primal solution <math>\tilde{\lambda ^{*}}</math> and an optimal dual solution <math>\left ( \pi ,\mu \right )\epsilon\; R^{m}\times R^{k}</math>

''Primal feasibility:'' Any feasible solution of ''RLMP'' is feasible for ''LPM''. More precisely, <math>\tilde{\lambda^{*} }</math> is a feasible solution of ''LPM'', and hence <math>\tilde{z}^{LPM}=\tilde{c}\tilde{\lambda ^{*}}=\sum_{i=1}^{m}\pi _{i}b_{i}+\sum_{k=1}^{K}\mu _{k}\leq z^{LPM}</math>

''Optimality check for LPM:'' It is required to check whether <math>\left ( \pi ,\mu \right )</math> is dual feasible for ''LPM''. This means checking for each column, that is for each <math>k</math>, and for each <math>x\; \epsilon \; X^{k}</math> if the reduced price <math>c^{k}x-\pi A^{k}x-\mu _{k}\leq 0</math>. Rather than examining each point separately, we treat all points in <math>X^{k}</math> implicitly, by solving an optimization subproblem:

<math>\zeta _{k}=max\left \{ \left (c^{k}-\pi A^{k} \right )x-\mu _{k}\; :\; x\; \epsilon \; X^{k}\right \}.</math>

''Stopping criteria:'' If <math>\zeta _{k}> 0</math> for <math>k=1,...,K</math> the solution <math>\left ( \pi ,\mu \right )</math> is dual feasible for ''LPM'', and hence <math>z^{LPM}\leq \sum_{i=1}^{m}\pi _{i}b_{i}+\sum_{k=1}^{K}\mu _{k}</math>. As the value of the primal feasible solution <math>\tilde{\lambda }</math> equals that of this upper bound, <math>\tilde{\lambda }</math> is optimal for ''LPM''.

''Generating a new column:'' If <math>\zeta _{k}> 0</math> for some <math>k</math>, the column corresponding to the optimal solution <math>\tilde{x}^{k}</math> of the subproblem has a positive reduced price. Introducing the column <math>\begin{pmatrix}
c^{k}x\\
A^{k}x\\
e_{k}
\end{pmatrix}</math> leads then to a Restricted Linear Programming Master Problem that can be easily reoptimized (e.g., by the primal simplex algorithm)

== Numerical example: The Cutting Stock problem<ref>L.A. Wolsey, Integer programming. Wiley,Column Generation Algorithms p185-p189,1998The Cutting Stock problem</ref> ==

Suppose we want to solve a numerical example of the cutting stock problem, specifically a one-dimensional cutting stock problem.

''Problem Overview''

A company produces steel bars with diameter <math>45</math> millimeters and length <math>33</math> meters. The company also takes care of cutting the bars for their different customers, who each require different lengths. At the moment, the following demand forecast is expected and must be satisfied:
{| class="wikitable"
|+
|Pieces needed
|Piece length(m)
|Type of item
|-
|144
|6
|1
|-
|105
|13.5
|2
|-
|72
|15
|3
|-
|30
|16.5
|4
|-
|24
|22.5
|5
|}
The objective is to establish what is the minimum number of steel bars that should be used to satisfy the total demand.

A possible model for the problem, proposed by Gilmore and Gomory in the 1960ies is the one below:

'''Sets'''

<math>K=\left \{ 1,2,3,4,5 \right \}</math>: set of item types;

''<math display="inline">S</math>:'' set of patterns (i.e., possible ways) that can be adopted to cut a given bar into portions of the need lengths.

'''Parameters'''

<math display="inline">M</math>: bar length (before the cutting process);

<math display="inline">L_k</math>'':'' length of item ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'';

<math display="inline">R_s</math> : number of pieces of type ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'' required;

<math display="inline">N_{k,s}</math> : number of pieces of type ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'' in pattern ''<math display="inline">s</math>'' ''<math display="inline">\in</math> <math display="inline">S</math>''.

'''Decision variables'''

<math display="inline">Y_s</math> : number of bars that should be portioned using pattern ''<math display="inline">s</math>'' ''<math display="inline">\in</math> <math display="inline">S</math>''.

'''Model'''

<math>\begin{matrix}\min(y)\sum_{s=1}^Sy_s \\ \ s.t. \sum_kN_{ks}y_s\geq J_k \forall k\in K \\ y_s\in \Zeta_+\forall s\in S \end{matrix}

</math>

''Solving the problem''

The model assumes the availability of the set ''<math display="inline">K</math>'' and the parameters <math display="inline">N_{k,s}</math> . To generate this data, you would have to list all possible cutting patterns. However, the number of possible cutting patterns is a big number. This is why a direct implementation of the model above is not practical in real-world problems. In this case is when it makes sense to solve the continuous relaxation of the above model. This is because, in reality, the demand figures are so high that the number of bars to cut is also a large number, and therefore a good solution can be determined by rounding up to the next integer each variable <math>y_s

</math>found by solving the continuous relaxation. In addition to that, the solution of the relaxed problem will become the starting point for the application of an exact solution method (for instance, the Branch-and Bound).<blockquote>''Key take-away: In the next steps of this example we will analyze how to solve the continuous relaxation of the model.''</blockquote>As a starting point, we need any feasible solution. Such a solution can be constructed as follows:

# We consider any single-item cutting patterns, i.e., <math>\|K\|

</math> configurations, each containing <math display="inline">{\textstyle N_{k,s} } = \llcorner \frac{W}{L_k}\lrcorner

</math> pieces of type <math>k

</math>;
# Set <math display="inline">{\textstyle y_{k}} = \llcorner \frac{R_s}{N_{k,s}}\lrcorner

</math> for pattern <math>k

</math> (where pattern <math>k

</math> is the pattern containing only pieces of type <math>k

</math>).

This solution could also be arrived to by applying the simplex method to the model (without integrality constraints), considering only the decision variables that correspond to the above single-item patterns:

<math>\begin{align}
\text{min} & ~~ y_{1}+y_{2}+y_{3}+y_{4}+y_{5}\\
\text{s.t} & ~~ 15y_{1} \ge 144\\
\ & ~~ 6y_{2} \ge 105\\
\ & ~~ 6y_{3} \ge 72\\
\ & ~~ 6y_{4} \ge 30\\
\ & ~~ 3y_{5} \ge 24\\
\ & ~~ y_{1},y_{2},y_{3},y_{4},y_{5} \ge 0\\
\end{align}</math>

In fact, if we solve this problem (for example, use CPLEX solver in GAMS) the solution is as below:
{| class="wikitable"
|Y1
|28.8
|-
|Y2
|52.5
|-
|Y3
|24
|-
|Y4
|15
|-
|Y5
|24
|}
Next, a new possible pattern (number <math>6</math>) will be considered. This pattern contains only one piece of item type number <math>5</math>. So the question is if the new solution would remain optimal if this new pattern was allowed. Duality helps answer ths question. At every iteration of the simplex method, the outcome is a feasible basic solution (corresponding to some basis <math>B</math>) for the primal problem and a dual solution (the multipliers <math>u^{t}=c^{t}BB^{-1}</math>) that satisfy the complementary slackness conditions. (Note: the dual solution will be feasible only when the last iteration is reached)

The inclusion of new pattern <math>6</math> corresponds to including a new variable in the primal problem, with objective cost <math>1</math> (as each time pattern <math>6</math> is chosen, one bar is cut) and corresponding to the following column in the constraint matrix:

<math>D_6= \begin{bmatrix}
\ 1 \\
\ 0 \\
\ 0 \\
\ 0 \\
\ 1 \\
\end{bmatrix}</math>

These variables create a new dual constraint. We then have to check if this new constraint is violated by the current dual solution (or in other words, ''if the reduced cost of the new variable with respect to basis <math>B</math> is negative)''

The new dual constraint is:<math>1\times u_{1}+0\times u_{2}+0\times u_{3}+0\times u_{4}+1\times u_{5}\leq 1</math>

The solution for the dual problem can be computed in different software packages, or by hand. The example below shows the solution obtained with GAMS for this example:

(Note the solution for the dual problem would be: <math>u=c_{T}^{B}B^{-1}</math>)

{| class="wikitable"
|Dual variable
|Variable value
|-
|D1
|0.067
|-
|D2
|0.167
|-
|D3
|0.167
|-
|D4
|0.167
|-
|D5
|0.333
|}
Since <math>0.2+1=1.2> 1</math>, the new constraint is violated.

This means that the current primal solution (in which the new variable is <math>y_{6}=0</math>) may not be optimal anymore (although it is still feasible). The fact that the dual constraint is violated means the associated primal variable has negative reduced cost:

the norm of <math>c_6 = c_6-u^TD_6=1-0.4=0.6</math>

To help us solve the problem, the next step is to let <math>y_{6}</math> enter the basis. To do so, we modify the problem by inserting the new variable as below:

<math>\begin{align}
\text{min} & ~~ y_{1}+y_{2}+y_{3}+y_{4}+y_{5}+y_{6}\\
\text{s.t} & ~~ 15y_{1} +y_{6}\ge 144\\
\ & ~~ 6y_{2} \ge 105\\
\ & ~~ 6y_{3} \ge 72\\
\ & ~~ 6y_{4} \ge 30\\
\ & ~~ 3y_{5}+y_{6} \ge 24\\
\ & ~~ y_{1},y_{2},y_{3},y_{4},y_{5},y_{6} \ge 0\\
\end{align}</math>

If this problem is solved with the simplex method, the optimal solution is found, but restricted only to patterns <math>1</math> to <math>6</math>. If a new pattern is available, a decision should be made whether this new pattern should be used or not by proceeding as above. However, the problem is how to find a pattern (i.e., a variable; i.e, a column of the matrix) whose reduced cost is negative (i.e., which will mean it is convenient to include it in the formulation). At this point one can notice that number of possible patterns exponentially large,and all the patterns are not even known explicitly. The question then is:

''Given a basic optimal solution for the problem in which only some variables are included, how can we find (if any exists) a variable with negative reduced cost (i.e., a constraint violated by the current dual solution)?''

This question can be transformed into an optimization problem: in order to see whether a variable with negative reduced cost exists, we can look for the minimum of the reduced costs of all possible variables and check whether this minimum is negative:

<math>\bar{c}=1-u^Tz</math>

Because every column of the constraint matrix corresponds to a cutting pattern, and every entry of the column says how many pieces of a certain type are in that pattern. In order for <math>z

</math> to be a possible column of the constraint matrix, the following condition must be satisfied:

<math display="inline">\begin{matrix}z_k\in \Zeta_+\forall k\in K \\ \ \sum_kL_kz_k \leq M \end{matrix}

</math>

And by so doing, it enables the conversion of the problem of finding a variable with negative reduced cost into the integer linear programming problem below:

<math>\begin{matrix}\min\ \bar{c} = 1 - sum_{k=1}^K u_k \times z_k \\ \ s.t. \sum_kL_kz_k \leq M \\ z_k\in \Zeta_+\forall k\in K \end{matrix}

</math>

which, in turn, would be equivalent to the below formulation (we just write the objective in maximization form and ignore the additive constant <math>1</math>):

<math>\begin{matrix} \max\sum_{k=1}^K u_k \times z_k \\ \ s.t. \sum_kL_kz_k \leq M \\ z_k\in \Zeta_+\forall k\in K \end{matrix}</math>

The coefficients <math>z_k

</math> of a column with negative reduced cost can be found by solving the above integer [[wikipedia:Knapsack_problem|"knapsack"]] problem (which is a traditional type of problem that we find in integer programming).

In our example, if we start from the problem restricted to the five single-item patterns, the above problem reads as:

<math>\begin{align}
\text{min} & ~~ 0.067z_{1}+0.167z_{2}+0.167z_{3}+0.167z_{4}+z_{5}\\
\text{s.t} & ~~ 6z_{1} +13.5z_{2}+15z_{3}+16.5z_{4}+22.5z_{5}\le 33\\
\ & ~~ z_{1},z_{2},z_{3},z_{4},z_{5}\ge 0\\
\end{align}</math>

which has the following optimal solution: <math>z^T= [1 \quad 0\quad 0\quad 0\quad 1]</math>

This matches the pattern we called <math>D6</math>, earlier on in this page.

Optimality test

If : <math display="inline">\sum_{k=1}^{K}z_{k}^{*}u_{k}^{*}\leq 1</math>

then <math>y^*</math> is an optimal solution of the full continuous relaxed problem (that is, including all patterns in ''<math display="inline">S</math>'')

If this condition is not true, we go ahead and update the master problem by including in ''<math display="inline">S^'</math>'' the pattern <math>\lambda</math> defined by <math>N_{s,\lambda}</math> (in practical terms this means that the column '''<math>y^*</math>''' needs to be included in the constraint matrix)

For this example we find that the optimality test is met as <math>\sum_{k=1}^{K}z_{k}^{*}u_{k}^{*}=0.4 \leq 1</math> so we have have found an optimal solution of the relaxed continuous problem (if this was not the case we would have had to go back to reformulating and solving the master problem, as discussed in the methodology section of this page)

'''''Algorithm discussion'''''

The column generation subproblem is the critical part of the method is generating the new columns. It is not reasonable to compute the reduced costs of all variables <math>y_s

</math> for <math>s=1,...,S</math>, otherwise this procedure would reduce to the simplex method. In fact, n<math>n</math> can be very large (as in the cutting-stock problem) or, for some reason, it might not be possible or convenient to enumerate all decision variables. This is when it would be necessary to study a specific column generation algorithm for each problem; ''only if such an algorithm exists (and is practical)'', the method can be fully applied. In the one-dimensional cutting stock problem, we transformed the column generation subproblem into an easily solvable integer linear programming problem. In other cases, the computational effort required to solve the subproblem is too high, such that appying this full procedure becomes unefficient.

== Applications ==
As previously mentioned, column generation techniques are most relevant when the problem that we are trying to solve has a high ratio of number of variables with respect to the number of constraints. As such some common applications are:

* Bandwith packing
* Bus driver scheduling
* Generally, column generation algorithms are used for large delivery networks, often in combination with other methods, helping to implement real-time solutions for on-demand logistics. We discuss a supply chain scheduling application below.

'''''Bandwidth packing'''''

The objective of this problem is to allocate bandwidth in a telecommunications network to maximize total revenue. The routing of a set of traffic demands between different users is to be decided, taking into account the capacity of the network arcs and the fact that the traffic between each pair of users cannot be split The problem can be formulated as an integer programming problem and the linear programming relaxation solved using column generation and the simplex algorithm. A branch and bound procedure which branches upon a particular path is used in this particular paper<ref name=":3">Parker, Mark & Ryan, Jennifer. (1993). A column generation algorithm for bandwidth packing. Telecommunication Systems. 2. 185-195. 10.1007/BF02109857. </ref> that looks into bandwidth routing, to solve the IP. The column generation algorithm greatly reduces the complexity of this problem.

'''''Bus driver scheduling'''''

Bus driver scheduling aims to find the minimum number of bus drivers to cover a published timetable of a bus company. When scheduling bus drivers, contractual working rules must be enforced, thus complicating the problem. A column generation algorithm can decompose this complicated problem into a master problem and a series of pricing subproblems. The master problem would select optimal duties from a set of known feasible duties, and the pricing subproblem would augment the feasible duty set to improve the solution obtained in the master problem.<ref name=":2">Dung‐Ying Lin, Ching‐Lan Hsu. Journal of Advanced Transportation. Volume50, Issue8, December 2016, Pages 1598-1615. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/atr.1417</ref>

'''''Supply Chain scheduling problem'''''

A typical application is where we consider the problem of scheduling a set of shipments between different nodes of a supply chain network. Each shipment has a fixed departure time, as well as an origin and a destination node, which, combined, determine the duration of the associated trip. The aim is to schedule as many shipments as possible, while also minimizing the number of vehicles utilized for this purpose. This problem can be formulated by an integer programming model and an associated branch and price solution algorithm. The optimal solution to the LP relaxation of the problem can be obtained through column generation, solving the linear program a huge number of variables, without explicitly considering all of them. In the context of this application, the master problem schedules the maximum possible number of shipments using only a small set of vehicle-routes, and a column generation (colgen) sub-problem would generate cost-effective vehicle-routes to be fed fed into the master problem. After finding the optimal solution to the LP relaxation of the problem, the algorithm would branch on the fractional decision variables (vehicle-routes), in order to reach the optimal integer solution.<ref name=":1">Kozanidis, George. (2014). Column generation for scheduling shipments within a supply chain network with the minimum number of vehicles. OPT-i 2014 - 1st International Conference on Engineering and Applied Sciences Optimization, Proceedings. 888-898</ref>

== Conclusions ==
Column generation is a way of starting with a small, manageable part of a problem (specifically, with some of the variables), solving that part, analyzing that interim solution to find the next part of the problem (specifically, one or more variables) to add to the model, and then solving the full or extended model. In the column generation method, the algorithm steps are repeated until an optimal solution to the entire problem is achieved.<ref> ILOG CPLEX 11.0 User's Manual > Discrete Optimization > Using Column Generation: a Cutting Stock Example > What Is Column Generation? 1997-2007. URL:http://www-eio.upc.es/lceio/manuals/cplex-11/html/usrcplex/usingColumnGen2.html#:~:text=In%20formal%20terms%2C%20column%20generation,method%20of%20solving%20the%20problem.&text=By%201960%2C%20Dantzig%20and%20Wolfe,problems%20with%20a%20decomposable%20structure</ref>

This algorithm provides a way of solving a linear programming problem adding columns (corresponding to constrained variables) during the pricing phase of the problem solving phase, that would otherwise be very tedious to formulate and compute. Generating a column in the primal formulation of a linear programming problem corresponds to adding a constraint in its dual formulation.

== References ==

Column generation algorithms

2020-12-21T10:45:10Z

Wc593:

Author: Lorena Garcia Fernandez (lgf572)

== Introduction ==
Column Generation techniques have the scope of solving large linear optimization problems by generating only the variables that will have an influence on the objective function. This is important for big problems with many variables where the formulation with these techniques would simplify the problem formulation, since not all the possibilities need to be explicitly listed.<ref>Desrosiers, Jacques & Lübbecke, Marco. (2006). A Primer in Column Generation.p7-p14 10.1007/0-387-25486-2_1. </ref>

== Theory, methodology and algorithmic discussions ==
'''''Theory'''''

The way this method work is as follows; first, the original problem that is being solved needs to be split into two problems: the master problem and the sub-problem.

* The master problem is the original column-wise (i.e: one column at a time) formulation of the problem with only a subset of variables being considered.<ref>
AlainChabrier, Column Generation techniques, 2019 URL: https://medium.com/@AlainChabrier/column-generation-techniques-6a414d723a64
</ref>

* The sub-problem is a new problem created to identify a new promising variable. The objective function of the sub-problem is the reduced cost of the new variable with respect to the current dual variables, and the constraints require that the variable obeys the naturally occurring constraints. The subproblem is also referred to as the RMP or “restricted master problem”. From this we can infer that this method will be a good fit for problems whose constraint set admit a natural breakdown (i.e: decomposition) into sub-systems representing a well understood combinatorial structure.<ref>
AlainChabrier, Column Generation techniques, 2019 URL: https://medium.com/@AlainChabrier/column-generation-techniques-6a414d723a64
</ref>

To execute that decomposition from the original problem into Master and subproblems there are different techniques. The theory behind this method relies on the Dantzig-Wolfe decomposition.<ref>Dantzig-Wolfe decomposition. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Dantzig-Wolfe_decomposition&oldid=50750</ref>

In summary, when the master problem is solved, we are able to obtain dual prices for each of the constraints in the master problem. This information is then utilized in the objective function of the subproblem. The subproblem is solved. If the objective value of the subproblem is negative, a variable with negative reduced cost has been identified. This variable is then added to the master problem, and the master problem is re-solved. Re-solving the master problem will generate a new set of dual values, and the process is repeated until no negative reduced cost variables are identified. The subproblem returns a solution with non-negative reduced cost, we can conclude that the solution to the master problem is optimal.<ref>Wikipedia, the free encyclopeda. Column Generation. URL: https://en.wikipedia.org/wiki/Column_generation</ref>

'''''Methodology'''''<ref>L.A. Wolsey, Integer programming. Wiley,Column Generation Algorithms p185-p189,1998</ref>
[[File:Column Generation.png|thumb|468x468px|Column generation schematics<ref name=":4">GERARD. (2005). Personnel and Vehicle scheduling, Column Generation, slide 12. URL: https://slideplayer.com/slide/6574/</ref>]]
Consider the problem in the form:

(IP)
<math>z=max\left \{\sum_{k=1}^{K}c^{k}x^{k}:\sum_{k=1}^{K}A^{k}x^{k}=b,x^{k}\epsilon X^{k}\; \; \; for\; \; \; k=1,...,K \right \}</math>

Where <math>X^{k}=\left \{x^{k}\epsilon Z_{+}^{n_{k}}: D^{k}x^{k}\leq d^{_{k}} \right \}</math> for <math>k=1,...,K</math>. Assuming that each set <math>X^{k}</math> contains a large but finite set of points <math>\left \{ x^{k,t} \right \}_{t=1}^{T_{k}}</math>, we have that <math>X^{k}=</math>:

<math>\left \{ x^{k}\epsilon R^{n_{k}}:x^{k}=\sum_{t=1}^{T_{k}}\lambda _{k,t}x^{k,t},\sum_{t=1}^{T_{k}}\lambda _{k,t}=1,\lambda _{k,t}\epsilon \left \{ 0,1 \right \}for \; \; k=1,...,K \right \}</math>

Note that, on the assumption that each of the sets <math>X^{k}=</math> is bounded for <math>k=1,...,K</math> the approach will involve solving an equivalent problem of the form as below:

<math>max\left \{ \sum_{k=1}^{K}\gamma ^{k}\lambda ^{k}: \sum_{k=1}^{K}B^{k}\lambda ^{k}=\beta ,\lambda ^{k}\geq 0\; \; integer\; \; for\; \; k=1,...,K \right \}</math>

where each matrix <math>B^{k}</math> has a very large number of columns, one for each of the feasible points in <math>X^{k}</math>, and each vector <math>\lambda ^{k}</math> contains the corresponding variables.

Now, substituting for <math>x^{k}=</math> leads to an equivalent ''IP Master Problem (IPM)'':

(IPM)
<math>\begin{matrix}
z=max\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left(c^{k}x^{k,t}\right )\lambda _{k,t} \\ \sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( A^{k}x^{k,t} \right )\lambda _{k,t}=b\\
\sum_{t=1}^{T_{k}}\lambda _{k,t}=1\; \; for\; \; k=1,...,K \\
\lambda _{k,t}\epsilon \left \{ 0,1 \right \}\; \; for\; \; t=1,...,T_{k}\; \; and\; \; k=1,...,K.
\end{matrix}</math>

To solve the Master Linear Program, we use a column generation algorithm. This is in order to solve the linear programming relaxation of the Integer Programming Master Problem, called the ''Linear Programming Master Problem (LPM)'':

(LPM)
<math>\begin{matrix}
z^{LPM}=max\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( c^{k}x^{k,t} \right )\lambda _{k,t}\\
\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}\left ( A^{k}x^{k,t} \right )\lambda _{k,t}=b \\
\sum_{t=1}^{T_{k}}\lambda _{k,t}=1\; \;for\; \; k=1,...,K \\
\lambda _{k,t} \geq 0\; \; for\; \; t=1,...,T_{k},\; k=1,...,K
\end{matrix}</math>

Where there is a column <math>\begin{pmatrix}
c^{k}x\\
A^{k}x\\
e_{k}
\end{pmatrix}</math> for each ''<math>x</math>'' ''<math display="inline">\in</math> <math display="inline">X^{k}</math>''. On the next steps of this method, we will use <math>\left \{ \pi _{i} \right \}_{i=1}^{m}</math> as the dual variables associated with the joint constraints, and <math>\left \{ \mu_{k} \right \}_{k=1}^{K}</math> as dual variables for the second set of constraints.The latter are also known as convexity constraints.
The idea is to solve the linear program by the primal simplex algorithm. However, the pricing step of choosing a column to enter the basis must be modified because of the very big number of columns in play. Instead of pricing the columns one at a time, the question of finding a column with the biggest reduced price is itself a set of <math>K</math> optimization problems.

''Initialization:'' we suppose that a subset of columns (at least one for each <math>k</math>) is available, providing a feasible ''Restricted Linear Programming Master Problem'':

(RLPM)
<math>\begin{matrix}
z^{LPM}=max\tilde{c}\tilde{\lambda} \\
\tilde{A}\tilde{\lambda }=b \\
\tilde{\lambda }\geq 0
\end{matrix}</math>

where <math>\tilde{b}=\begin{pmatrix}
b\\
1\\
\end{pmatrix}</math>, <math>\tilde{A}</math> is generated by the available set of columns and <math>\tilde{c}\tilde{\lambda }</math> are the corresponding costs and variables. Solving the RLPM gives an optimal primal solution <math>\tilde{\lambda ^{*}}</math> and an optimal dual solution <math>\left ( \pi ,\mu \right )\epsilon\; R^{m}\times R^{k}</math>

''Primal feasibility:'' Any feasible solution of ''RLMP'' is feasible for ''LPM''. More precisely, <math>\tilde{\lambda^{*} }</math> is a feasible solution of ''LPM'', and hence <math>\tilde{z}^{LPM}=\tilde{c}\tilde{\lambda ^{*}}=\sum_{i=1}^{m}\pi _{i}b_{i}+\sum_{k=1}^{K}\mu _{k}\leq z^{LPM}</math>

''Optimality check for LPM:'' It is required to check whether <math>\left ( \pi ,\mu \right )</math> is dual feasible for ''LPM''. This means checking for each column, that is for each <math>k</math>, and for each <math>x\; \epsilon \; X^{k}</math> if the reduced price <math>c^{k}x-\pi A^{k}x-\mu _{k}\leq 0</math>. Rather than examining each point separately, we treat all points in <math>X^{k}</math> implicitly, by solving an optimization subproblem:

<math>\zeta _{k}=max\left \{ \left (c^{k}-\pi A^{k} \right )x-\mu _{k}\; :\; x\; \epsilon \; X^{k}\right \}.</math>

''Stopping criteria:'' If <math>\zeta _{k}> 0</math> for <math>k=1,...,K</math> the solution <math>\left ( \pi ,\mu \right )</math> is dual feasible for ''LPM'', and hence <math>z^{LPM}\leq \sum_{i=1}^{m}\pi _{i}b_{i}+\sum_{k=1}^{K}\mu _{k}</math>. As the value of the primal feasible solution <math>\tilde{\lambda }</math> equals that of this upper bound, <math>\tilde{\lambda }</math> is optimal for ''LPM''.

''Generating a new column:'' If <math>\zeta _{k}> 0</math> for some <math>k</math>, the column corresponding to the optimal solution <math>\tilde{x}^{k}</math> of the subproblem has a positive reduced price. Introducing the column <math>\begin{pmatrix}
c^{k}x\\
A^{k}x\\
e_{k}
\end{pmatrix}</math> leads then to a Restricted Linear Programming Master Problem that can be easily reoptimized (e.g., by the primal simplex algorithm)

== Numerical example: The Cutting Stock problem<ref>L.A. Wolsey, Integer programming. Wiley,Column Generation Algorithms p185-p189,1998The Cutting Stock problem</ref> ==

Suppose we want to solve a numerical example of the cutting stock problem, specifically a one-dimensional cutting stock problem.

''Problem Overview''

A company produces steel bars with diameter <math>45</math> millimeters and length <math>33</math> meters. The company also takes care of cutting the bars for their different customers, who each require different lengths. At the moment, the following demand forecast is expected and must be satisfied:
{| class="wikitable"
|+
|Pieces needed
|Piece length(m)
|Type of item
|-
|144
|6
|1
|-
|105
|13.5
|2
|-
|72
|15
|3
|-
|30
|16.5
|4
|-
|24
|22.5
|5
|}
The objective is to establish what is the minimum number of steel bars that should be used to satisfy the total demand.

A possible model for the problem, proposed by Gilmore and Gomory in the 1960ies is the one below:

'''Sets'''

<math>K=\left \{ 1,2,3,4,5 \right \}</math>: set of item types;

''<math display="inline">S</math>:'' set of patterns (i.e., possible ways) that can be adopted to cut a given bar into portions of the need lengths.

'''Parameters'''

<math display="inline">M</math>: bar length (before the cutting process);

<math display="inline">L_k</math>'':'' length of item ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'';

<math display="inline">R_s</math> : number of pieces of type ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'' required;

<math display="inline">N_{k,s}</math> : number of pieces of type ''<math display="inline">k</math>'' ''<math display="inline">\in</math> <math display="inline">K</math>'' in pattern ''<math display="inline">s</math>'' ''<math display="inline">\in</math> <math display="inline">S</math>''.

'''Decision variables'''

<math display="inline">Y_s</math> : number of bars that should be portioned using pattern ''<math display="inline">s</math>'' ''<math display="inline">\in</math> <math display="inline">S</math>''.

'''Model'''

<math>\begin{matrix}\min(y)\sum_{s=1}^Sy_s \\ \ s.t. \sum_kN_{ks}y_s\geq J_k \forall k\in K \\ y_s\in \Zeta_+\forall s\in S \end{matrix}

</math>

''Solving the problem''

The model assumes the availability of the set ''<math display="inline">K</math>'' and the parameters <math display="inline">N_{k,s}</math> . To generate this data, you would have to list all possible cutting patterns. However, the number of possible cutting patterns is a big number. This is why a direct implementation of the model above is not partical in real-world problems. In this case is when it makes sense to solve the continuous relaxation of the above model. This is because, in reality, the demand figures are so high that the number of bars to cut is also a large number, and therefore a good solution can be determined by rounding up to the next integer each variable <math>y_s

</math>found by solving the continuous relaxation. In addition to that, the solution of the relaxed problem will become the starting point for the application of an exact solution method (for instance, the Branch-and Bound).<blockquote>''Key take-away: In the next steps of this example we will analyze how to solve the continuous relaxation of the model.''</blockquote>As a starting point, we need any feasible solution. Such a solution can be constructed as follows:

# We consider any single-item cutting patterns, i.e., <math>\|K\|

</math> configurations, each containing <math display="inline">{\textstyle N_{k,s} } = \llcorner \frac{W}{L_k}\lrcorner

</math> pieces of type <math>k

</math>;
# Set <math display="inline">{\textstyle y_{k}} = \llcorner \frac{R_s}{N_{k,s}}\lrcorner

</math> for pattern <math>k

</math> (where pattern <math>k

</math> is the pattern containing only pieces of type <math>k

</math>).

This solution could also be arrived to by applying the simplex method to the model (without integrality constraints), considering only the decision variables that correspond to the above single-item patterns:

<math>\begin{align}
\text{min} & ~~ y_{1}+y_{2}+y_{3}+y_{4}+y_{5}\\
\text{s.t} & ~~ 15y_{1} \ge 144\\
\ & ~~ 6y_{2} \ge 105\\
\ & ~~ 6y_{3} \ge 72\\
\ & ~~ 6y_{4} \ge 30\\
\ & ~~ 3y_{5} \ge 24\\
\ & ~~ y_{1},y_{2},y_{3},y_{4},y_{5} \ge 0\\
\end{align}</math>

In fact, if we solve this problem (for example, use CPLEX solver in GAMS) the solution is as below:
{| class="wikitable"
|Y1
|28.8
|-
|Y2
|52.5
|-
|Y3
|24
|-
|Y4
|15
|-
|Y5
|24
|}
Next, a new possible pattern (number <math>6</math>) will be considered. This pattern contains only one piece of item type number <math>5</math>. So the question is if the new solution would remain optimal if this new pattern was allowed. Duality helps answer ths question. At every iteration of the simplex method, the outcome is a feasible basic solution (corresponding to some basis <math>B</math>) for the primal problem and a dual solution (the multipliers <math>u^{t}=c^{t}BB^{-1}</math>) that satisfy the complementary slackness conditions. (Note: the dual solution will be feasible only when the last iteration is reached)

The inclusion of new pattern <math>6</math> corresponds to including a new variable in the primal problem, with objective cost <math>1</math> (as each time pattern <math>6</math> is chosen, one bar is cut) and corresponding to the following column in the constraint matrix:

<math>D_6= \begin{bmatrix}
\ 1 \\
\ 0 \\
\ 0 \\
\ 0 \\
\ 1 \\
\end{bmatrix}</math>

These variables create a new dual constraint. We then have to check if this new constraint is violated by the current dual solution (or in other words, ''if the reduced cost of the new variable with respect to basis <math>B</math> is negative)''

The new dual constraint is:<math>1\times u_{1}+0\times u_{2}+0\times u_{3}+0\times u_{4}+1\times u_{5}\leq 1</math>

The solution for the dual problem can be computed in different software packages, or by hand. The example below shows the solution obtained with GAMS for this example:

(Note the solution for the dual problem would be: <math>u=c_{T}^{B}B^{-1}</math>)

{| class="wikitable"
|Dual variable
|Variable value
|-
|D1
|0.067
|-
|D2
|0.167
|-
|D3
|0.167
|-
|D4
|0.167
|-
|D5
|0.333
|}
Since <math>0.2+1=1.2> 1</math>, the new constraint is violated.

This means that the current primal solution (in which the new variable is <math>y_{6}=0</math>) may not be optimal anymore (although it is still feasible). The fact that the dual constraint is violated means the associated primal variable has negative reduced cost:

the norm of <math>c_6 = c_6-u^TD_6=1-0.4=0.6</math>

To help us solve the problem, the next step is to let <math>y_{6}</math> enter the basis. To do so, we modify the problem by inserting the new variable as below:

<math>\begin{align}
\text{min} & ~~ y_{1}+y_{2}+y_{3}+y_{4}+y_{5}+y_{6}\\
\text{s.t} & ~~ 15y_{1} +y_{6}\ge 144\\
\ & ~~ 6y_{2} \ge 105\\
\ & ~~ 6y_{3} \ge 72\\
\ & ~~ 6y_{4} \ge 30\\
\ & ~~ 3y_{5}+y_{6} \ge 24\\
\ & ~~ y_{1},y_{2},y_{3},y_{4},y_{5},y_{6} \ge 0\\
\end{align}</math>

If this problem is solved with the simplex method, the optimal solution is found, but restricted only to patterns <math>1</math> to <math>6</math>. If a new pattern is available, a decision should be made whether this new pattern should be used or not by proceeding as above. However, the problem is how to find a pattern (i.e., a variable; i.e, a column of the matrix) whose reduced cost is negative (i.e., which will mean it is convenient to include it in the formulation). At this point one can notice that number of possible patterns exponentially large,and all the patterns are not even known explicitly. The question then is:

''Given a basic optimal solution for the problem in which only some variables are included, how can we find (if any exists) a variable with negative reduced cost (i.e., a constraint violated by the current dual solution)?''

This question can be transformed into an optimization problem: in order to see whether a variable with negative reduced cost exists, we can look for the minimum of the reduced costs of all possible variables and check whether this minimum is negative:

<math>\bar{c}=1-u^Tz</math>

Because every column of the constraint matrix corresponds to a cutting pattern, and every entry of the column says how many pieces of a certain type are in that pattern. In order for <math>z

</math> to be a possible column of the constraint matrix, the following condition must be satisfied:

<math display="inline">\begin{matrix}z_k\in \Zeta_+\forall k\in K \\ \ \sum_kL_kz_k \leq M \end{matrix}

</math>

And by so doing, it enables the conversion of the problem of finding a variable with negative reduced cost into the integer linear programming problem below:

<math>\begin{matrix}\min\ \bar{c} = 1 - sum_{k=1}^K u_k \times z_k \\ \ s.t. \sum_kL_kz_k \leq M \\ z_k\in \Zeta_+\forall k\in K \end{matrix}

</math>

which, in turn, would be equivalent to the below formulation (we just write the objective in maximization form and ignore the additive constant <math>1</math>):

<math>\begin{matrix} \max\sum_{k=1}^K u_k \times z_k \\ \ s.t. \sum_kL_kz_k \leq M \\ z_k\in \Zeta_+\forall k\in K \end{matrix}</math>

The coefficients <math>z_k

</math> of a column with negative reduced cost can be found by solving the above integer [[wikipedia:Knapsack_problem|"knapsack"]] problem (which is a traditional type of problem that we find in integer programming).

In our example, if we start from the problem restricted to the five single-item patterns, the above problem reads as:

<math>\begin{align}
\text{min} & ~~ 0.067z_{1}+0.167z_{2}+0.167z_{3}+0.167z_{4}+z_{5}\\
\text{s.t} & ~~ 6z_{1} +13.5z_{2}+15z_{3}+16.5z_{4}+22.5z_{5}\le 33\\
\ & ~~ z_{1},z_{2},z_{3},z_{4},z_{5}\ge 0\\
\end{align}</math>

which has the following optimal solution: <math>z^T= [1 \quad 0\quad 0\quad 0\quad 1]</math>

This matches the pattern we called <math>D6</math>, earlier on in this page.

Optimality test

If : <math display="inline">\sum_{k=1}^{K}z_{k}^{*}u_{k}^{*}\leq 1</math>

then <math>y^*</math> is an optimal solution of the full continuous relaxed problem (that is, including all patterns in ''<math display="inline">S</math>'')

If this condition is not true, we go ahead and update the master problem by including in ''<math display="inline">S^'</math>'' the pattern <math>\lambda</math> defined by <math>N_{s,\lambda}</math> (in practical terms this means that the column '''<math>y^*</math>''' needs to be included in the constraint matrix)

For this example we find that the optimality test is met as <math>\sum_{k=1}^{K}z_{k}^{*}u_{k}^{*}=0.4 \leq 1</math> so we have have found an optimal solution of the relaxed continuous problem (if this was not the case we would have had to go back to reformulating and solving the master problem, as discussed in the methodology section of this page)

'''''Algorithm discussion'''''

The column generation subproblem is the critical part of the method is generating the new columns. It is not reasonable to compute the reduced costs of all variables <math>y_s

</math> for <math>s=1,...,S</math>, otherwise this procedure would reduce to the simplex method. In fact, n<math>n</math> can be very large (as in the cutting-stock problem) or, for some reason, it might not be possible or convenient to enumerate all decision variables. This is when it would be necessary to study a specific column generation algorithm for each problem; ''only if such an algorithm exists (and is partical)'', the method can be fully applied. In the one-dimensional cutting stock problem, we transformed the column generation subproblem into an easily solvable integer linear programming problem. In other cases, the computational effort required to solve the subproblem is too high, such that appying this full procedure becomes unefficient.

== Applications ==
As previously mentioned, column generation techniques are most relevant when the problem that we are trying to solve has a high ratio of number of variables with respect to the number of constraints. As such some common applications are:

* Bandwith packing
* Bus driver scheduling
* Generally, column generation algorithms are used for large delivery networks, often in combination with other methods, helping to implement real-time solutions for on-demand logistics. We discuss a supply chain scheduling application below.

'''''Bandwidth packing'''''

The objective of this problem is to allocate bandwidth in a telecommunications network to maximize total revenue. The routing of a set of traffic demands between different users is to be decided, taking into account the capacity of the network arcs and the fact that the traffic between each pair of users cannot be split The problem can be formulated as an integer programming problem and the linear programming relaxation solved using column generation and the simplex algorithm. A branch and bound procedure which branches upon a particular path is used in this particular paper<ref name=":3">Parker, Mark & Ryan, Jennifer. (1993). A column generation algorithm for bandwidth packing. Telecommunication Systems. 2. 185-195. 10.1007/BF02109857. </ref> that looks into bandwidth routing, to solve the IP. The column generation algorithm greatly reduces the complexity of this problem.

'''''Bus driver scheduling'''''

Bus driver scheduling aims to find the minimum number of bus drivers to cover a published timetable of a bus company. When scheduling bus drivers, contractual working rules must be enforced, thus complicating the problem. A column generation algorithm can decompose this complicated problem into a master problem and a series of pricing subproblems. The master problem would select optimal duties from a set of known feasible duties, and the pricing subproblem would augment the feasible duty set to improve the solution obtained in the master problem.<ref name=":2">Dung‐Ying Lin, Ching‐Lan Hsu. Journal of Advanced Transportation. Volume50, Issue8, December 2016, Pages 1598-1615. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/atr.1417</ref>

'''''Supply Chain scheduling problem'''''

A typical application is where we consider the problem of scheduling a set of shipments between different nodes of a supply chain network. Each shipment has a fixed departure time, as well as an origin and a destination node, which, combined, determine the duration of the associated trip. The aim is to schedule as many shipments as possible, while also minimizing the number of vehicles utilized for this purpose. This problem can be formulated by an integer programming model and an associated branch and price solution algorithm. The optimal solution to the LP relaxation of the problem can be obtained through column generation, solving the linear program a huge number of variables, without explicitly considering all of them. In the context of this application, the master problem schedules the maximum possible number of shipments using only a small set of vehicle-routes, and a column generation (colgen) sub-problem would generate cost-effective vehicle-routes to be fed fed into the master problem. After finding the optimal solution to the LP relaxation of the problem, the algorithm would branch on the fractional decision variables (vehicle-routes), in order to reach the optimal integer solution.<ref name=":1">Kozanidis, George. (2014). Column generation for scheduling shipments within a supply chain network with the minimum number of vehicles. OPT-i 2014 - 1st International Conference on Engineering and Applied Sciences Optimization, Proceedings. 888-898</ref>

== Conclusions ==
Column generation is a way of starting with a small, manageable part of a problem (specifically, with some of the variables), solving that part, analyzing that interim solution to find the next part of the problem (specifically, one or more variables) to add to the model, and then solving the full or extended model. In the column generation method, the algorithm steps are repeated until an optimal solution to the entire problem is achieved.<ref> ILOG CPLEX 11.0 User's Manual > Discrete Optimization > Using Column Generation: a Cutting Stock Example > What Is Column Generation? 1997-2007. URL:http://www-eio.upc.es/lceio/manuals/cplex-11/html/usrcplex/usingColumnGen2.html#:~:text=In%20formal%20terms%2C%20column%20generation,method%20of%20solving%20the%20problem.&text=By%201960%2C%20Dantzig%20and%20Wolfe,problems%20with%20a%20decomposable%20structure</ref>

This algorithm provides a way of solving a linear programming problem adding columns (corresponding to constrained variables) during the pricing phase of the problem solving phase, that would otherwise be very tedious to formulate and compute. Generating a column in the primal formulation of a linear programming problem corresponds to adding a constraint in its dual formulation.

== References ==

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:39:35Z

Wc593: /* Set covering problem */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Column generation algorithms]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory, methodology and algorithmic discussions
*# Some minor typos/article agreement issues exist “is not partical in real-world”.

==[[Heuristic algorithms]]==

* Methodology
*# Please use proper symbol for "greater than or equal to".
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted. Please fix the same.
*# Fix typos: e.g., repeated “for the current”.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

==[[Adaptive robust optimization]]==

* Problem Formulation
*# Please check typos such as "Let ''u'' bee a vector".
*# The abbreviation KKT is not previously defined.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.

Set covering problem

2020-12-21T10:38:51Z

Wc593: /* Approximation via LP relaxation and rounding */

Authors: Sherry Liang, Khalid Alanazi, Kumail Al Hamoud
 
Steward: Allen Yang, Fengqi You

== Introduction ==

The set covering problem is a significant NP-hard problem in combinatorial optimization. Given a collection of elements, the set covering problem aims to find the minimum number of sets that incorporate (cover) all of these elements. <ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref>

The set covering problem importance has two main aspects: one is pedagogical, and the other is practical.

First, because many greedy approximation methods have been proposed for this combinatorial problem, studying it gives insight into the use of approximation algorithms in solving NP-hard problems. Thus, it is a primal example in teaching computational algorithms. We present a preview of these methods in a later section, and we refer the interested reader to these references for a deeper discussion. <ref name="one" /> <ref name="seven"> P. Slavı́k, [https://www.sciencedirect.com/science/article/abs/pii/S0196677497908877 "A Tight Analysis of the Greedy Algorithm for Set Cover]," ''Journal of Algorithms,'', vol. 25, pp. 237-245, 1997. </ref> <ref name="nine"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "What Is the Best Greedy-like Heuristic for the Weighted Set Covering Problem?]," ''Operations Research Letters'', vol. 44, pp. 366-369, 2016. </ref>

Second, many problems in different industries can be formulated as set covering problems. For example, scheduling machines to perform certain jobs can be thought of as covering the jobs. Picking the optimal location for a cell tower so that it covers the maximum number of customers is another set covering application. Moreover, this problem has many applications in the airline industry, and it was explored on an industrial scale as early as the 1970s. <ref name="two"> J. Rubin, [https://www.jstor.org/stable/25767684?seq=1 "A Technique for the Solution of Massive Set Covering Problems, with Application to Airline Crew Scheduling]," ''Transportation Science'', vol. 7, pp. 34-48, 1973. </ref>

== Problem formulation ==
In the set covering problem, two sets are given: a set <math> U </math> of elements and a set <math> S </math> of subsets of the set <math> U </math>. Each subset in <math> S </math> is associated with a predetermined cost, and the union of all the subsets covers the set <math> U </math>. This combinatorial problem then concerns finding the optimal number of subsets whose union covers the universal set while minimizing the total cost.<ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref> <ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>

The mathematical formulation of the set covering problem is define as follows. We define <math> U </math> = { <math> u_i,..., u_m </math>} as the universe of elements and <math> S </math> = { <math> s_i,..., s_n </math>} as a collection of subsets such that <math> s_i \subset U </math> and the union of <math> s_i</math> covers all elements in <math> U </math> (i.e. <math>\cup</math><math> s_i</math> = <math> U </math> ). Addionally, each set <math> s_i</math> must cover at least one element of <math> U </math> and has associated cost <math> c_i</math> such that <math> c_i > 0</math>. The objective is to find the minimum cost sub-collection of sets <math> X </math> <math>\subset</math> <math> S </math> that covers all the elements in the universe <math> U </math>.

== Integer linear program formulation ==
An integer linear program (ILP) model can be formulated for the minimum set covering problem as follows:<ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref>

'''Decision variables'''

<math> y_i = \begin{cases} 1, & \text{if subset }i\text{ is selected} \\ 0, & \text{otherwise } \end{cases}</math>

'''Objective function'''

minimize <math>\sum_{i=1}^n c_i y_i</math>

'''Constraints '''

<math> \sum_{i=1}^n y_i \geq 1, \forall i= 1,....,m</math>

<math> y_i \in \{0, 1\}, \forall i = 1,....,n</math>

The objective function <math>\sum_{i=1}^n c_i y_i</math> is defined to minimize the number of subset <math> s_i</math> that cover all elements in the universe by minimizing their total cost. The first constraint implies that every element <math> i </math> in the universe <math> U </math> must be be covered and the second constraint <math> y_i \in \{0, 1\} </math> indicates that the decision variables are binary which means that every set is either in the set cover or not.

Set covering problems are significant NP-hard optimization problems, which implies that as the size of the problem increases, the computational time to solve it increases exponentially. Therefore, there exist approximation algorithms that can solve large scale problems in polynomial time with optimal or near-optimal solutions. In subsequent sections, we will cover two of the most widely used approximation methods to solve set cover problem in polynomial time which are linear program relaxation methods and classical greedy algorithms. <ref name="seven" />

== Approximation via LP relaxation and rounding ==
Set covering is a classical integer programming problem and solving integer program in general is NP-hard. Therefore, one approach to achieve an <math> O</math>(log<math>n</math>) approximation to set covering problem in polynomial time is solving via linear programming (LP) relaxation algorithms <ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref> <ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>. In LP relaxation, we relax the integrality requirement into a linear constraints. For instance, if we replace the constraints <math> y_i \in \{0, 1\}</math> with the constraints <math> 0 \leq y_i \leq 1 </math>, we obtain the following LP problem that can be solved in polynomial time:

minimize <math>\sum_{i=1}^n c_i y_i</math>

subject to <math> \sum_{i=1}^n y_i \geq 1, \forall i= 1,....,m</math>

<math> 0 \leq y_i\leq 1, \forall i = 1,....,n</math>

The above LP formulation is a relaxation of the original ILP set cover problem. This means that every feasible solution of the integer program is also feasible for this LP program. Additionally, the value of any feasible solution for the integer program is the same value in LP since the objective functions of both integer and linear programs are the same. Solving the LP program will result in an optimal solution that is a lower bound for the original integer program since the minimization of LP finds a feasible solution of lowest possible values. Moreover, we use LP rounding algorithms to directly round the fractional LP solution to an integral combinatorial solution as follows:
 

'''Deterministic rounding algorithm'''
 

Suppose we have an optimal solution <math> z^* </math> for the linear programming relaxation of the set cover problem. We round the fractional solution <math> z^* </math> to an integer solution <math> z </math> using LP rounding algorithm. In general, there are two approaches for rounding algorithms, deterministic and randomized rounding algorithm. In this section, we will explain the deterministic algorithms. In this approach, we include subset <math> s_i </math> in our solution if <math> z^* \geq 1/d </math>, where <math> d </math> is the maximum number of sets in which any element appears. In practice, we set <math> z </math> to be as follows:<ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>

<math> z = \begin{cases} 1, & \text{if } z^*\geq 1/d \\ 0, & \text{otherwise } \end{cases}</math>

The rounding algorithm is an approximation algorithm for the set cover problem. It is clear that the algorithm converge in polynomial time and <math> z </math> is a feasible solution to the integer program.

== Greedy approximation algorithm ==
Greedy algorithms can be used to approximate for optimal or near-optimal solutions for large scale set covering instances in polynomial solvable time. <ref name="seven" /> <ref name="nine" /> The greedy heuristics applies iterative process that, at each stage, select the largest number of uncovered elements in the universe <math> U </math>, and delete the uncovered elements, until all elements are covered. <ref name="ten"> V. Chvatal, [https://pubsonline.informs.org/doi/abs/10.1287/moor.4.3.233 "Greedy Heuristic for the Set-Covering Problem]," ''Mathematics of Operations Research'', vol. 4, pp. 233-235, 1979. </ref> Let <math> T </math> be the set that contain the covered elements, and <math> U </math> be the set that contain the elements of <math> Y </math> that still uncovered. At the beginning of the iteration, <math> T </math> is empty and all elements <math> Y \in U </math>. We iteratively select the set of <math> S </math> that covers the largest number of elements in <math> U </math> and add it to the covered elements in <math> T </math>. An example of this algorithm is presented below.

'''Greedy algorithm for minimum set cover example: '''

Step 0: <math> \quad </math> <math> T \in \Phi </math> <math> \quad \quad \quad \quad \quad </math> { <math> T </math> stores the covered elements }

Step 1: <math> \quad </math> '''While''' <math> U \neq \Phi </math> '''do:''' <math> \quad </math> { <math> U </math> stores the uncovered elements <math> Y </math>}

Step 2: <math> \quad \quad \quad </math> select <math> s_i \in S </math> that covers the highest number of elements in <math> U </math>

Step 3: <math> \quad \quad \quad </math> add <math> s_i </math> to <math> T </math>

Step 4: <math> \quad \quad \quad </math> remove <math> s_i </math> from <math> U </math>

Step 5: <math> \quad </math> '''End while'''

Step 6: <math> \quad </math> '''Return''' <math> S </math>

==Numerical Example==
Let’s consider a simple example where we assign cameras at different locations. Each location covers some areas of stadiums, and our goal is to put the least amount of cameras such that all areas of stadiums are covered. We have stadium areas from 1 to 15, and possible camera locations from 1 to 8.

We are given that camera location 1 covers stadium areas {1,3,4,6,7}, camera location 2 covers stadium areas {4,7,8,12}, while the remaining camera locations and the stadium areas that the cameras can cover are given in table 1 below:
{| class="wikitable"
|+Table 1 Camera Location vs Stadium Area
|-
!camera Location
|1
|2
|3
|4
|5
|6
|7
|8
|-
!stadium area
|1,3,4,6,7
|4,7,8,12
|2,5,9,11,13
|1,2,14,15
|3,6,10,12,14
|8,14,15
|1,2,6,11
|1,2,4,6,8,12
|}

We can then represent the above information using binary values. If the stadium area <math>i</math> can be covered with camera location <math>j</math>, then we have <math>y_{ij} = 1</math>. If not,<math>y_{ij} = 0</math>. For instance, stadium area 1 is covered by camera location 1, so <math>y_{11} = 1</math>, while stadium area 1 is not covered by camera location 2, so <math>y_{12} = 0</math>. The binary variables <math>y_{ij}</math> values are given in the table below:
{| class="wikitable"
|+Table 2 Binary Table (All Camera Locations and Stadium Areas)
!
!Camera1
!Camera2
!Camera3
!Camera4
!Camera5
!Camera6
!Camera7
!Camera8
|-
!Stadium1
|1
|
|
|1
|
|
|1
|1
|-
!Stadium2
|
|
|1
|1
|
|
|1
|1
|-
!Stadium3
|1
|
|
|
|1
|
|
|
|-
!Stadium4
|1
|1
|
|
|
|
|
|1
|-
!Stadium5
|
|
|1
|
|
|
|
|
|-
!Stadium6
|1
|
|
|
|1
|
|1
|1
|-
!Stadium7
|1
|1
|
|
|
|
|
|
|-
!Stadium8
|
|1
|
|
|
|1
|
|1
|-
!Stadium9
|
|
|1
|
|
|
|
|
|-
!Stadium10
|
|
|
|
|1
|
|
|
|-
!Stadium11
|
|
|1
|
|
|
|1
|
|-
!Stadium12
|
|1
|
|
|1
|
|
|1
|-
!Stadium13
|
|
|1
|
|
|
|
|
|-
!Stadium14
|
|
|
|1
|1
|1
|
|
|-
!Stadium15
|
|
|
|1
|
|1
|
|
|}

We introduce another binary variable <math>z_j</math> to indicate if a camera is installed at location <math>j</math>. <math>z_j = 1</math> if camera is installed at location <math>j</math>, while <math>z_j = 0</math> if not.

Our objective is to minimize <math>\sum_{j=1}^8 z_j</math>. For each stadium, there’s a constraint that the stadium area <math>i</math> has to be covered by at least one camera location. For instance, for stadium area 1, we have <math>z_1 + z_4 + z_7 + z_8 \geq 1</math>, while for stadium 2, we have <math>z_3 + z_4 + z_7 + z_8 \geq 1</math>. All the 15 constraints that corresponds to 15 stadium areas are listed below:

minimize <math>\sum_{j=1}^8 z_j</math>

''s.t. Constraints 1 to 15 are satisfied:''

<math> z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math> z_3 + z_4 + z_7 + z_8 \geq 1 (2)</math>

<math> z_1 + z_5 \geq 1 (3)</math>

<math> z_1 + z_2 + z_8 \geq 1 (4)</math>

<math> z_3 \geq 1 (5)</math>

<math>z_1 + z_5 + z_7 + z_8 \geq 1 (6)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_3 \geq 1 (9)</math>

<math>z_5 \geq 1 (10)</math>

<math>z_3 + z_7 \geq 1 (11)</math>

<math>z_2 + z_5 + z_8 \geq 1 (12)</math>

<math>z_3 \geq 1 (13)</math>

<math>z_4 + z_5 + z_6 \geq 1 (14)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

From constraint {5,9,13}, we can obtain <math>z_3 = 1</math>. Thus we no longer need constraint 2 and 11 as they are satisfied when <math>z_3 = 1</math>. With <math>z_3 = 1</math> determined, the constraints left are:

minimize <math>\sum_{j=1}^8 z_j</math>,

s.t.:

<math>z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math>z_1 + z_5 \geq 1 (3)</math>

<math>z_1 + z_2 + z_8 \geq 1 (4)</math>

<math>z_1 + z_5 + z_7 + z_8 \geq 1 (6)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_5 \geq 1 (10)</math>

<math>z_2 + z_5 + z_8 \geq 1 (12)</math>

<math>z_4 + z_5 + z_6 \geq 1 (14)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

Now if we take a look at constraint <math>10. z_5 \geqslant 1</math> so <math>z_5</math> shall equal to 1. As <math>z_5 = 1</math>, constraint {3,6,12,14} are satisfied no matter what other <math>z</math> values are taken. If we also take a look at constraint 7 and 4, if constraint 4 will be satisfied as long as constraint 7 is satisfied since <math>z</math> values are nonnegative, so constraint 4 is no longer needed. The remaining constraints are:

minimize <math>\sum_{j=1}^8 z_j</math>

s.t.:

<math>z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

The next step is to focus on constraint 7 and 15. We can have at least 4 combinations of <math>z_1, z_2, z_4, z_6</math>values.

<math>A: z_1 = 1, z_2 = 0, z_4 = 1, z_6 = 0</math>

<math>B: z_1 = 1, z_2 = 0, z_4 = 0, z_6 = 1</math>

<math>C: z_1 = 0, z_2 = 1, z_4 = 1, z_6 = 0</math>

<math>D: z_1 = 0, z_2 = 1, z_4 = 0, z_6 = 1</math>

We can then discuss each combination and determine <math>z_7, z_8</math>values for constraint 1 and 8 to be satisfied.

Combination <math>A</math>: constraint 1 already satisfied, we need <math>z_8 = 1</math> to satisfy constraint 8.

Combination <math>B</math>: constraint 1 already satisfied, constraint 8 already satisfied.

Combination <math>C</math>: constraint 1 already satisfied, constraint 8 already satisfied.

Combination <math>D</math>: we need <math>z_7 = 1</math> or <math>z_8 = 1</math> to satisfy constraint 1, while constraint 8 already satisfied.

Our final step is to compare the four combinations. Since our objective is to minimize <math>\sum_{j=1}^8 z_j</math> and combinations <math>B</math> and <math>C</math> require the least amount of <math>z_j</math> to be 1, they are the optimal solutions.

To conclude, our two solutions are:

<math>Solution 1: z_1 = 1, z_3 = 1, z_5 = 1, z_6 = 1</math>

<math>Solution 2: z_2 = 1, z_3 = 1, z_4 = 1, z_5 = 1</math>

The minimum number of cameras that we need to install is 4.

'''Let's now consider solving the problem using the greedy algorithm.'''

We have a set <math>U</math> (stadium areas) that needs to be covered with <math>C</math> (camera locations).

<math>U = \{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\}</math>

<math>C = \{C_1,C_2,C_3,C_4,C_5,C_6,C_7,C_8\}</math>

<math>C_1 = \{1,3,4,6,7\} </math>

<math>C_2 = \{4,7,8,12\}</math>

<math>C_3 = \{2,5,9,11,13\}</math>

<math>C_4 = \{1,2,14,15\}</math>

<math>C_5 = \{3,6,10,12,14\}</math>

<math>C_6 = \{8,14,15\}</math>

<math>C_7 = \{1,2,6,11\}</math>

<math>C_8 = \{1,2,4,6,8,12\} </math>

The cost of each Camera Location is the same in this case, we just hope to minimize the total number of cameras used, so we can assume the cost of each <math>C</math> to be 1.

Let <math>I</math> represents set of elements included so far. Initialize <math>I</math> to be empty.

First Iteration:

The per new element cost for <math>C_1 = 1/5</math>, for <math>C_2 = 1/4</math>, for <math>C_3 = 1/5</math>, for <math>C_4 = 1/4</math>, for <math>C_5 = 1/5</math>, for <math>C_6 = 1/3</math>, for <math>C_7 = 1/4</math>, for <math>C_8 = 1/6</math>

Since <math>C_8</math> has minimum value, <math>C_8</math> is added, and <math>I</math> becomes <math>\{1,2,4,6,8,12\}</math>.

Second Iteration:

<math>I</math> = <math>\{1,2,4,6,8,12\}</math>

The per new element cost for <math>C_1 = 1/2</math>, for <math>C_2 = 1/1</math>, for <math>C_3 = 1/4</math>, for <math>C_4 = 1/2</math>, for <math>C_5 = 1/3</math>, for <math>C_6 = 1/2</math>, for <math>C_7 = 1/1</math>

Since <math>C_3</math> has minimum value, <math>C_3</math> is added, and <math>I</math> becomes <math>\{1,2,4,5,6,8,9,11,12,13\}</math>.

Third Iteration:

<math>I</math> = <math>\{1,2,4,5,6,8,9,11,12,13\}</math>

The per new element cost for <math>C_1 = 1/2</math>, for <math>C_2 = 1/1</math>, for <math>C_4 = 1/2</math>, for <math>C_5 = 1/3</math>, for <math>C_6 = 1/2</math>, for <math>C_7 = 1/1</math>

Since <math>C_5</math> has minimum value, <math>C_5</math> is added, and <math>I</math> becomes <math>\{1,2,3,4,5,6,8,9,10,11,12,13,14\}</math>.

Fourth Iteration:

<math>I</math> = <math>\{1,2,3,4,5,6,8,9,10,11,12,13,14\}</math>

The per new element cost for <math>C_1 = 1/1</math>, for <math>C_2 = 1/1</math>, for <math>C_4 = 1/0</math>, for <math>C_6 = 1/1</math>, for <math>C_7 = 1/0</math>

Since <math>C_1</math>, <math>C_2</math>, <math>C_6</math> all have meaningful and the same values, we can choose either both <math>C_1</math> and <math>C_6</math> or both <math>C_2</math> and <math>C_6</math>, as <math>C_1</math> or <math>C_2 </math> add <math>7</math> to <math>I</math>, and <math>C_6</math> add <math>15</math> to <math>I</math>.

<math>I</math> becomes <math>\{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\}</math>.

The solution we obtained is:

Option 1: <math>C_8</math> + <math>C_3</math> + <math>C_5</math> + <math>C_6</math> + <math>C_1</math>

Option 2: <math>C_8</math> + <math>C_3</math> + <math>C_5</math> + <math>C_6</math> + <math>C_2</math>

The greedy algorithm does not provide the optimal solution in this case.

The usual elimination algorithm would give us the minimum number of cameras that we need to install to be4, but the greedy algorithm gives us the minimum number of cameras that we need to install is 5.

== Applications==

The applications of the set covering problem span a wide range of applications, but its usefulness is evident in industrial and governmental planning. Variations of the set covering problem that are of practical significance include the following.
;The optimal location problem

This set covering problems is concerned with maximizing the coverage of some public facilities placed at different locations. <ref name="three"> R. Church and C. ReVelle, [https://link.springer.com/article/10.1007/BF01942293 "The maximal covering location problem]," ''Papers of the Regional Science Association'', vol. 32, pp. 101-118, 1974. </ref> Consider the problem of placing fire stations to serve the towns of some city. <ref name="four"> E. Aktaş, Ö. Özaydın, B. Bozkaya, F. Ülengin, and Ş. Önsel, [https://pubsonline.informs.org/doi/10.1287/inte.1120.0671 "Optimizing Fire Station Locations for the Istanbul Metropolitan Municipality]," ''Interfaces'', vol. 43, pp. 240-255, 2013. </ref> If each fire station can serve its town and all adjacent towns, we can formulate a set covering problem where each subset consists of a set of adjacent towns. The problem is then solved to minimize the required number of fire stations to serve the whole city.

Let <math> y_i </math> be the decision variable corresponding to choosing to build a fire station at town <math> i </math>. Let <math> S_i </math> be a subset of towns including town <math> i </math> and all its neighbors. The problem is then formulated as follows.

minimize <math>\sum_{i=1}^n y_i</math>

such that <math> \sum_{i\in S_i} y_i \geq 1, \forall i</math>

A real-world case study involving optimizing fire station locations in Istanbul is analyzed in this reference. <ref name="four" /> The Istanbul municipality serves 790 subdistricts, which should all be covered by a fire station. Each subdistrict is considered covered if it has a neighboring district (a district at most 5 minutes away) that has a fire station. For detailed computational analysis, we refer the reader to the mentioned academic paper.
; The optimal route selection problem

Consider the problem of selecting the optimal bus routes to place pothole detectors. Due to the scarcity of the physical sensors, the problem does not allow for placing a detector on every road. The task of finding the maximum coverage using a limited number of detectors could be formulated as a set covering problem. <ref name="five"> J. Ali and V. Dyo, [https://www.scitepress.org/Link.aspx?doi=10.5220/0006469800830088 "Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach]," ''Proceedings of the 14th International Joint Conference on E-Business and Telecommunications'', pp. 83-88, 2017. </ref> <ref name="eleven"> P.H. Cruz Caminha , R. De Souza Couto , L.H. Maciel Kosmalski Costa , A. Fladenmuller , and M. Dias de Amorim, [https://www.mdpi.com/1424-8220/18/6/1976 "On the Coverage of Bus-Based Mobile Sensing]," ''Sensors'', 2018. </ref> Specifically, giving a collection of bus routes '''''R''''', where each route itself is divided into segments. Route <math> i </math> is denoted by <math> R_i </math>, and segment <math> j </math> is denoted by <math> S_j </math>. The segments of two different routes can overlap, and each segment is associated with a length <math> a_j </math>. The goal is then to select the routes that maximize the total covered distance.

This is quite different from other applications because it results in a maximization formulation, rather than a minimization formulation. Suppose we want to use at most <math> k </math> different routes. We want to find <math> k </math> routes that maximize the length of of covered segments. Let <math> x_i </math> be the binary decision variable corresponding to selecting route <math> R_i </math>, and let <math> y_j </math> be the decision variable associated with covering segment <math> S_j </math>. Let us also denote the set of routes that cover segment <math> j </math> by <math> C_j </math>. The problem is then formulated as follows.

<math>
\begin{align}
\text{max} & ~~ \sum_{j} a_jy_j\\
\text{s.t} & ~~ \sum_{i\in C_j} x_i \geq y_j \quad \forall j \\
& ~~ \sum_{i} x_i = k \\
& ~~ x_i,y_{j} \in \{0,1\} \\
\end{align}
</math>

The work by Ali and Dyo explores a greedy approximation algorithm to solve an optimal selection problem including 713 bus routes in Greater London. <ref name="five" /> Using 14% of the routes only (100 routes), the greedy algorithm returns a solution that covers 25% of the segments in Greater London. For a details of the approximation algorithm and the world case study, we refer the reader to this reference. <ref name="five" /> For a significantly larger case study involving 5747 buses covering 5060km, we refer the reader to this academic article. <ref name="eleven" />
;The airline crew scheduling problem

An important application of large-scale set covering is the airline crew scheduling problem, which pertains to assigning airline staff to work shifts. <ref name="two" /> <ref name="six"> E. Marchiori and A. Steenbeek, [https://link.springer.com/chapter/10.1007/3-540-45561-2_36 "An Evolutionary Algorithm for Large Scale Set Covering Problems with Application to Airline Crew Scheduling]," ''Real-World Applications of Evolutionary Computing. EvoWorkshops 2000. Lecture Notes in Computer Science'', 2000. </ref> Thinking of the collection of flights as a universal set to be covered, we can formulate a set covering problem to search for the optimal assignment of employees to flights. Due to the complexity of airline schedules, this problem is usually divided into two subproblems: crew pairing and crew assignment. We refer the interested reader to this survey, which contains several problem instances with the number of flights ranging from 1013 to 7765 flights, for a detailed analysis of the formulation and algorithms that pertain to this significant application. <ref name="two" /> <ref name="eight"> A. Kasirzadeh, M. Saddoune, and F. Soumis [https://www.sciencedirect.com/science/article/pii/S2192437620300820?via%3Dihub "Airline crew scheduling: models, algorithms, and data sets]," ''EURO Journal on Transportation and Logistics'', vol. 6, pp. 111-137, 2017. </ref>

==Conclusion ==

The set covering problem, which aims to find the least number of subsets that cover some universal set, is a widely known NP-hard combinatorial problem. Due to its applicability to route planning and airline crew scheduling, several methods have been proposed to solve it. Its straightforward formulation allows for the use of off-the-shelf optimizers to solve it. Moreover, heuristic techniques and greedy algorithms can be used to solve large-scale set covering problems for industrial applications.

== References ==
<references />

Set covering problem

2020-12-21T10:37:02Z

Wc593: /* Integer linear program formulation */

Authors: Sherry Liang, Khalid Alanazi, Kumail Al Hamoud
 
Steward: Allen Yang, Fengqi You

== Introduction ==

The set covering problem is a significant NP-hard problem in combinatorial optimization. Given a collection of elements, the set covering problem aims to find the minimum number of sets that incorporate (cover) all of these elements. <ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref>

The set covering problem importance has two main aspects: one is pedagogical, and the other is practical.

First, because many greedy approximation methods have been proposed for this combinatorial problem, studying it gives insight into the use of approximation algorithms in solving NP-hard problems. Thus, it is a primal example in teaching computational algorithms. We present a preview of these methods in a later section, and we refer the interested reader to these references for a deeper discussion. <ref name="one" /> <ref name="seven"> P. Slavı́k, [https://www.sciencedirect.com/science/article/abs/pii/S0196677497908877 "A Tight Analysis of the Greedy Algorithm for Set Cover]," ''Journal of Algorithms,'', vol. 25, pp. 237-245, 1997. </ref> <ref name="nine"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "What Is the Best Greedy-like Heuristic for the Weighted Set Covering Problem?]," ''Operations Research Letters'', vol. 44, pp. 366-369, 2016. </ref>

Second, many problems in different industries can be formulated as set covering problems. For example, scheduling machines to perform certain jobs can be thought of as covering the jobs. Picking the optimal location for a cell tower so that it covers the maximum number of customers is another set covering application. Moreover, this problem has many applications in the airline industry, and it was explored on an industrial scale as early as the 1970s. <ref name="two"> J. Rubin, [https://www.jstor.org/stable/25767684?seq=1 "A Technique for the Solution of Massive Set Covering Problems, with Application to Airline Crew Scheduling]," ''Transportation Science'', vol. 7, pp. 34-48, 1973. </ref>

== Problem formulation ==
In the set covering problem, two sets are given: a set <math> U </math> of elements and a set <math> S </math> of subsets of the set <math> U </math>. Each subset in <math> S </math> is associated with a predetermined cost, and the union of all the subsets covers the set <math> U </math>. This combinatorial problem then concerns finding the optimal number of subsets whose union covers the universal set while minimizing the total cost.<ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref> <ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>

The mathematical formulation of the set covering problem is define as follows. We define <math> U </math> = { <math> u_i,..., u_m </math>} as the universe of elements and <math> S </math> = { <math> s_i,..., s_n </math>} as a collection of subsets such that <math> s_i \subset U </math> and the union of <math> s_i</math> covers all elements in <math> U </math> (i.e. <math>\cup</math><math> s_i</math> = <math> U </math> ). Addionally, each set <math> s_i</math> must cover at least one element of <math> U </math> and has associated cost <math> c_i</math> such that <math> c_i > 0</math>. The objective is to find the minimum cost sub-collection of sets <math> X </math> <math>\subset</math> <math> S </math> that covers all the elements in the universe <math> U </math>.

== Integer linear program formulation ==
An integer linear program (ILP) model can be formulated for the minimum set covering problem as follows:<ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref>

'''Decision variables'''

<math> y_i = \begin{cases} 1, & \text{if subset }i\text{ is selected} \\ 0, & \text{otherwise } \end{cases}</math>

'''Objective function'''

minimize <math>\sum_{i=1}^n c_i y_i</math>

'''Constraints '''

<math> \sum_{i=1}^n y_i \geq 1, \forall i= 1,....,m</math>

<math> y_i \in \{0, 1\}, \forall i = 1,....,n</math>

The objective function <math>\sum_{i=1}^n c_i y_i</math> is defined to minimize the number of subset <math> s_i</math> that cover all elements in the universe by minimizing their total cost. The first constraint implies that every element <math> i </math> in the universe <math> U </math> must be be covered and the second constraint <math> y_i \in \{0, 1\} </math> indicates that the decision variables are binary which means that every set is either in the set cover or not.

Set covering problems are significant NP-hard optimization problems, which implies that as the size of the problem increases, the computational time to solve it increases exponentially. Therefore, there exist approximation algorithms that can solve large scale problems in polynomial time with optimal or near-optimal solutions. In subsequent sections, we will cover two of the most widely used approximation methods to solve set cover problem in polynomial time which are linear program relaxation methods and classical greedy algorithms. <ref name="seven" />

== Approximation via LP relaxation and rounding ==
Set covering is a classical integer programming problem and solving integer program in general is NP-hard. Therefore, one approach to achieve an <math> O</math>(log<math>n</math>) approximation to set covering problem in polynomial time is solving via linear programming (LP) relaxation algorithms <ref name="one"> T. Grossman and A. Wool, [https://www.sciencedirect.com/science/article/abs/pii/S0377221796001610 "Computational experience with approximation algorithms for the set covering problem]," ''European Journal of Operational Research'', vol. 101, pp. 81-92, 1997. </ref> <ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>. In LP relaxation, we relax the integrality requirement into a linear constraints. For instance, if we replace the constraints <math> y_i \in \{0, 1\}</math> with the constraints <math> 0 =< y_i <= 1 </math>, we obtain the following LP problem that can be solved in polynomial time:

minimize <math>\sum_{i=1}^n c_i y_i</math>

subject to <math> \sum_{i=1}^n y_i >= 1, \forall i= 1,....,m</math>

<math> 0 =< y_i<= 1, \forall i = 1,....,n</math>

The above LP formulation is a relaxation of the original ILP set cover problem. This means that every feasible solution of the integer program is also feasible for this LP program. Additionally, the value of any feasible solution for the integer program is the same value in LP since the objective functions of both integer and linear programs are the same. Solving the LP program will result in an optimal solution that is a lower bound for the original integer program since the minimization of LP finds a feasible solution of lowest possible values. Moreover, we use LP rounding algorithms to directly round the fractional LP solution to an integral combinatorial solution as follows:
 

'''Deterministic rounding algorithm'''
 

Suppose we have an optimal solution <math> z^* </math> for the linear programming relaxation of the set cover problem. We round the fractional solution <math> z^* </math> to an integer solution <math> z </math> using LP rounding algorithm. In general, there are two approaches for rounding algorithms, deterministic and randomized rounding algorithm. In this section, we will explain the deterministic algorithms.In this approach, we include subset <math> s_i </math> in our solution if <math> z^* >= 1/d </math>, where <math> d </math> is the maximum number of sets in which any element appears. In practice, we set <math> z </math> to be as follows:<ref name="twelve"> Williamson, David P., and David B. Shmoys. “The Design of Approximation Algorithms” [https://www.designofapproxalgs.com/book.pdf]. “Cambridge University Press”, 2011. </ref>

<math> z = \begin{cases} 1, & \text{if } z^*>= 1/d \\ 0, & \text{otherwise } \end{cases}</math>

The rounding algorithm is an approximation algorithm for the set cover problem. It is clear that the algorithm converge in polynomial time and <math> z </math> is a feasible solution to the integer program.

== Greedy approximation algorithm ==
Greedy algorithms can be used to approximate for optimal or near-optimal solutions for large scale set covering instances in polynomial solvable time. <ref name="seven" /> <ref name="nine" /> The greedy heuristics applies iterative process that, at each stage, select the largest number of uncovered elements in the universe <math> U </math>, and delete the uncovered elements, until all elements are covered. <ref name="ten"> V. Chvatal, [https://pubsonline.informs.org/doi/abs/10.1287/moor.4.3.233 "Greedy Heuristic for the Set-Covering Problem]," ''Mathematics of Operations Research'', vol. 4, pp. 233-235, 1979. </ref> Let <math> T </math> be the set that contain the covered elements, and <math> U </math> be the set that contain the elements of <math> Y </math> that still uncovered. At the beginning of the iteration, <math> T </math> is empty and all elements <math> Y \in U </math>. We iteratively select the set of <math> S </math> that covers the largest number of elements in <math> U </math> and add it to the covered elements in <math> T </math>. An example of this algorithm is presented below.

'''Greedy algorithm for minimum set cover example: '''

Step 0: <math> \quad </math> <math> T \in \Phi </math> <math> \quad \quad \quad \quad \quad </math> { <math> T </math> stores the covered elements }

Step 1: <math> \quad </math> '''While''' <math> U \neq \Phi </math> '''do:''' <math> \quad </math> { <math> U </math> stores the uncovered elements <math> Y </math>}

Step 2: <math> \quad \quad \quad </math> select <math> s_i \in S </math> that covers the highest number of elements in <math> U </math>

Step 3: <math> \quad \quad \quad </math> add <math> s_i </math> to <math> T </math>

Step 4: <math> \quad \quad \quad </math> remove <math> s_i </math> from <math> U </math>

Step 5: <math> \quad </math> '''End while'''

Step 6: <math> \quad </math> '''Return''' <math> S </math>

==Numerical Example==
Let’s consider a simple example where we assign cameras at different locations. Each location covers some areas of stadiums, and our goal is to put the least amount of cameras such that all areas of stadiums are covered. We have stadium areas from 1 to 15, and possible camera locations from 1 to 8.

We are given that camera location 1 covers stadium areas {1,3,4,6,7}, camera location 2 covers stadium areas {4,7,8,12}, while the remaining camera locations and the stadium areas that the cameras can cover are given in table 1 below:
{| class="wikitable"
|+Table 1 Camera Location vs Stadium Area
|-
!camera Location
|1
|2
|3
|4
|5
|6
|7
|8
|-
!stadium area
|1,3,4,6,7
|4,7,8,12
|2,5,9,11,13
|1,2,14,15
|3,6,10,12,14
|8,14,15
|1,2,6,11
|1,2,4,6,8,12
|}

We can then represent the above information using binary values. If the stadium area <math>i</math> can be covered with camera location <math>j</math>, then we have <math>y_{ij} = 1</math>. If not,<math>y_{ij} = 0</math>. For instance, stadium area 1 is covered by camera location 1, so <math>y_{11} = 1</math>, while stadium area 1 is not covered by camera location 2, so <math>y_{12} = 0</math>. The binary variables <math>y_{ij}</math> values are given in the table below:
{| class="wikitable"
|+Table 2 Binary Table (All Camera Locations and Stadium Areas)
!
!Camera1
!Camera2
!Camera3
!Camera4
!Camera5
!Camera6
!Camera7
!Camera8
|-
!Stadium1
|1
|
|
|1
|
|
|1
|1
|-
!Stadium2
|
|
|1
|1
|
|
|1
|1
|-
!Stadium3
|1
|
|
|
|1
|
|
|
|-
!Stadium4
|1
|1
|
|
|
|
|
|1
|-
!Stadium5
|
|
|1
|
|
|
|
|
|-
!Stadium6
|1
|
|
|
|1
|
|1
|1
|-
!Stadium7
|1
|1
|
|
|
|
|
|
|-
!Stadium8
|
|1
|
|
|
|1
|
|1
|-
!Stadium9
|
|
|1
|
|
|
|
|
|-
!Stadium10
|
|
|
|
|1
|
|
|
|-
!Stadium11
|
|
|1
|
|
|
|1
|
|-
!Stadium12
|
|1
|
|
|1
|
|
|1
|-
!Stadium13
|
|
|1
|
|
|
|
|
|-
!Stadium14
|
|
|
|1
|1
|1
|
|
|-
!Stadium15
|
|
|
|1
|
|1
|
|
|}

We introduce another binary variable <math>z_j</math> to indicate if a camera is installed at location <math>j</math>. <math>z_j = 1</math> if camera is installed at location <math>j</math>, while <math>z_j = 0</math> if not.

Our objective is to minimize <math>\sum_{j=1}^8 z_j</math>. For each stadium, there’s a constraint that the stadium area <math>i</math> has to be covered by at least one camera location. For instance, for stadium area 1, we have <math>z_1 + z_4 + z_7 + z_8 \geq 1</math>, while for stadium 2, we have <math>z_3 + z_4 + z_7 + z_8 \geq 1</math>. All the 15 constraints that corresponds to 15 stadium areas are listed below:

minimize <math>\sum_{j=1}^8 z_j</math>

''s.t. Constraints 1 to 15 are satisfied:''

<math> z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math> z_3 + z_4 + z_7 + z_8 \geq 1 (2)</math>

<math> z_1 + z_5 \geq 1 (3)</math>

<math> z_1 + z_2 + z_8 \geq 1 (4)</math>

<math> z_3 \geq 1 (5)</math>

<math>z_1 + z_5 + z_7 + z_8 \geq 1 (6)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_3 \geq 1 (9)</math>

<math>z_5 \geq 1 (10)</math>

<math>z_3 + z_7 \geq 1 (11)</math>

<math>z_2 + z_5 + z_8 \geq 1 (12)</math>

<math>z_3 \geq 1 (13)</math>

<math>z_4 + z_5 + z_6 \geq 1 (14)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

From constraint {5,9,13}, we can obtain <math>z_3 = 1</math>. Thus we no longer need constraint 2 and 11 as they are satisfied when <math>z_3 = 1</math>. With <math>z_3 = 1</math> determined, the constraints left are:

minimize <math>\sum_{j=1}^8 z_j</math>,

s.t.:

<math>z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math>z_1 + z_5 \geq 1 (3)</math>

<math>z_1 + z_2 + z_8 \geq 1 (4)</math>

<math>z_1 + z_5 + z_7 + z_8 \geq 1 (6)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_5 \geq 1 (10)</math>

<math>z_2 + z_5 + z_8 \geq 1 (12)</math>

<math>z_4 + z_5 + z_6 \geq 1 (14)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

Now if we take a look at constraint <math>10. z_5 \geqslant 1</math> so <math>z_5</math> shall equal to 1. As <math>z_5 = 1</math>, constraint {3,6,12,14} are satisfied no matter what other <math>z</math> values are taken. If we also take a look at constraint 7 and 4, if constraint 4 will be satisfied as long as constraint 7 is satisfied since <math>z</math> values are nonnegative, so constraint 4 is no longer needed. The remaining constraints are:

minimize <math>\sum_{j=1}^8 z_j</math>

s.t.:

<math>z_1 + z_4 + z_7 + z_8 \geq 1 (1)</math>

<math>z_1 + z_2 \geq 1 (7)</math>

<math>z_2 + z_6 + z_8 \geq 1 (8)</math>

<math>z_4 + z_6 \geq 1 (15)</math>

The next step is to focus on constraint 7 and 15. We can have at least 4 combinations of <math>z_1, z_2, z_4, z_6</math>values.

<math>A: z_1 = 1, z_2 = 0, z_4 = 1, z_6 = 0</math>

<math>B: z_1 = 1, z_2 = 0, z_4 = 0, z_6 = 1</math>

<math>C: z_1 = 0, z_2 = 1, z_4 = 1, z_6 = 0</math>

<math>D: z_1 = 0, z_2 = 1, z_4 = 0, z_6 = 1</math>

We can then discuss each combination and determine <math>z_7, z_8</math>values for constraint 1 and 8 to be satisfied.

Combination <math>A</math>: constraint 1 already satisfied, we need <math>z_8 = 1</math> to satisfy constraint 8.

Combination <math>B</math>: constraint 1 already satisfied, constraint 8 already satisfied.

Combination <math>C</math>: constraint 1 already satisfied, constraint 8 already satisfied.

Combination <math>D</math>: we need <math>z_7 = 1</math> or <math>z_8 = 1</math> to satisfy constraint 1, while constraint 8 already satisfied.

Our final step is to compare the four combinations. Since our objective is to minimize <math>\sum_{j=1}^8 z_j</math> and combinations <math>B</math> and <math>C</math> require the least amount of <math>z_j</math> to be 1, they are the optimal solutions.

To conclude, our two solutions are:

<math>Solution 1: z_1 = 1, z_3 = 1, z_5 = 1, z_6 = 1</math>

<math>Solution 2: z_2 = 1, z_3 = 1, z_4 = 1, z_5 = 1</math>

The minimum number of cameras that we need to install is 4.

'''Let's now consider solving the problem using the greedy algorithm.'''

We have a set <math>U</math> (stadium areas) that needs to be covered with <math>C</math> (camera locations).

<math>U = \{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\}</math>

<math>C = \{C_1,C_2,C_3,C_4,C_5,C_6,C_7,C_8\}</math>

<math>C_1 = \{1,3,4,6,7\} </math>

<math>C_2 = \{4,7,8,12\}</math>

<math>C_3 = \{2,5,9,11,13\}</math>

<math>C_4 = \{1,2,14,15\}</math>

<math>C_5 = \{3,6,10,12,14\}</math>

<math>C_6 = \{8,14,15\}</math>

<math>C_7 = \{1,2,6,11\}</math>

<math>C_8 = \{1,2,4,6,8,12\} </math>

The cost of each Camera Location is the same in this case, we just hope to minimize the total number of cameras used, so we can assume the cost of each <math>C</math> to be 1.

Let <math>I</math> represents set of elements included so far. Initialize <math>I</math> to be empty.

First Iteration:

The per new element cost for <math>C_1 = 1/5</math>, for <math>C_2 = 1/4</math>, for <math>C_3 = 1/5</math>, for <math>C_4 = 1/4</math>, for <math>C_5 = 1/5</math>, for <math>C_6 = 1/3</math>, for <math>C_7 = 1/4</math>, for <math>C_8 = 1/6</math>

Since <math>C_8</math> has minimum value, <math>C_8</math> is added, and <math>I</math> becomes <math>\{1,2,4,6,8,12\}</math>.

Second Iteration:

<math>I</math> = <math>\{1,2,4,6,8,12\}</math>

The per new element cost for <math>C_1 = 1/2</math>, for <math>C_2 = 1/1</math>, for <math>C_3 = 1/4</math>, for <math>C_4 = 1/2</math>, for <math>C_5 = 1/3</math>, for <math>C_6 = 1/2</math>, for <math>C_7 = 1/1</math>

Since <math>C_3</math> has minimum value, <math>C_3</math> is added, and <math>I</math> becomes <math>\{1,2,4,5,6,8,9,11,12,13\}</math>.

Third Iteration:

<math>I</math> = <math>\{1,2,4,5,6,8,9,11,12,13\}</math>

The per new element cost for <math>C_1 = 1/2</math>, for <math>C_2 = 1/1</math>, for <math>C_4 = 1/2</math>, for <math>C_5 = 1/3</math>, for <math>C_6 = 1/2</math>, for <math>C_7 = 1/1</math>

Since <math>C_5</math> has minimum value, <math>C_5</math> is added, and <math>I</math> becomes <math>\{1,2,3,4,5,6,8,9,10,11,12,13,14\}</math>.

Fourth Iteration:

<math>I</math> = <math>\{1,2,3,4,5,6,8,9,10,11,12,13,14\}</math>

The per new element cost for <math>C_1 = 1/1</math>, for <math>C_2 = 1/1</math>, for <math>C_4 = 1/0</math>, for <math>C_6 = 1/1</math>, for <math>C_7 = 1/0</math>

Since <math>C_1</math>, <math>C_2</math>, <math>C_6</math> all have meaningful and the same values, we can choose either both <math>C_1</math> and <math>C_6</math> or both <math>C_2</math> and <math>C_6</math>, as <math>C_1</math> or <math>C_2 </math> add <math>7</math> to <math>I</math>, and <math>C_6</math> add <math>15</math> to <math>I</math>.

<math>I</math> becomes <math>\{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\}</math>.

The solution we obtained is:

Option 1: <math>C_8</math> + <math>C_3</math> + <math>C_5</math> + <math>C_6</math> + <math>C_1</math>

Option 2: <math>C_8</math> + <math>C_3</math> + <math>C_5</math> + <math>C_6</math> + <math>C_2</math>

The greedy algorithm does not provide the optimal solution in this case.

The usual elimination algorithm would give us the minimum number of cameras that we need to install to be4, but the greedy algorithm gives us the minimum number of cameras that we need to install is 5.

== Applications==

The applications of the set covering problem span a wide range of applications, but its usefulness is evident in industrial and governmental planning. Variations of the set covering problem that are of practical significance include the following.
;The optimal location problem

This set covering problems is concerned with maximizing the coverage of some public facilities placed at different locations. <ref name="three"> R. Church and C. ReVelle, [https://link.springer.com/article/10.1007/BF01942293 "The maximal covering location problem]," ''Papers of the Regional Science Association'', vol. 32, pp. 101-118, 1974. </ref> Consider the problem of placing fire stations to serve the towns of some city. <ref name="four"> E. Aktaş, Ö. Özaydın, B. Bozkaya, F. Ülengin, and Ş. Önsel, [https://pubsonline.informs.org/doi/10.1287/inte.1120.0671 "Optimizing Fire Station Locations for the Istanbul Metropolitan Municipality]," ''Interfaces'', vol. 43, pp. 240-255, 2013. </ref> If each fire station can serve its town and all adjacent towns, we can formulate a set covering problem where each subset consists of a set of adjacent towns. The problem is then solved to minimize the required number of fire stations to serve the whole city.

Let <math> y_i </math> be the decision variable corresponding to choosing to build a fire station at town <math> i </math>. Let <math> S_i </math> be a subset of towns including town <math> i </math> and all its neighbors. The problem is then formulated as follows.

minimize <math>\sum_{i=1}^n y_i</math>

such that <math> \sum_{i\in S_i} y_i \geq 1, \forall i</math>

A real-world case study involving optimizing fire station locations in Istanbul is analyzed in this reference. <ref name="four" /> The Istanbul municipality serves 790 subdistricts, which should all be covered by a fire station. Each subdistrict is considered covered if it has a neighboring district (a district at most 5 minutes away) that has a fire station. For detailed computational analysis, we refer the reader to the mentioned academic paper.
; The optimal route selection problem

Consider the problem of selecting the optimal bus routes to place pothole detectors. Due to the scarcity of the physical sensors, the problem does not allow for placing a detector on every road. The task of finding the maximum coverage using a limited number of detectors could be formulated as a set covering problem. <ref name="five"> J. Ali and V. Dyo, [https://www.scitepress.org/Link.aspx?doi=10.5220/0006469800830088 "Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach]," ''Proceedings of the 14th International Joint Conference on E-Business and Telecommunications'', pp. 83-88, 2017. </ref> <ref name="eleven"> P.H. Cruz Caminha , R. De Souza Couto , L.H. Maciel Kosmalski Costa , A. Fladenmuller , and M. Dias de Amorim, [https://www.mdpi.com/1424-8220/18/6/1976 "On the Coverage of Bus-Based Mobile Sensing]," ''Sensors'', 2018. </ref> Specifically, giving a collection of bus routes '''''R''''', where each route itself is divided into segments. Route <math> i </math> is denoted by <math> R_i </math>, and segment <math> j </math> is denoted by <math> S_j </math>. The segments of two different routes can overlap, and each segment is associated with a length <math> a_j </math>. The goal is then to select the routes that maximize the total covered distance.

This is quite different from other applications because it results in a maximization formulation, rather than a minimization formulation. Suppose we want to use at most <math> k </math> different routes. We want to find <math> k </math> routes that maximize the length of of covered segments. Let <math> x_i </math> be the binary decision variable corresponding to selecting route <math> R_i </math>, and let <math> y_j </math> be the decision variable associated with covering segment <math> S_j </math>. Let us also denote the set of routes that cover segment <math> j </math> by <math> C_j </math>. The problem is then formulated as follows.

<math>
\begin{align}
\text{max} & ~~ \sum_{j} a_jy_j\\
\text{s.t} & ~~ \sum_{i\in C_j} x_i \geq y_j \quad \forall j \\
& ~~ \sum_{i} x_i = k \\
& ~~ x_i,y_{j} \in \{0,1\} \\
\end{align}
</math>

The work by Ali and Dyo explores a greedy approximation algorithm to solve an optimal selection problem including 713 bus routes in Greater London. <ref name="five" /> Using 14% of the routes only (100 routes), the greedy algorithm returns a solution that covers 25% of the segments in Greater London. For a details of the approximation algorithm and the world case study, we refer the reader to this reference. <ref name="five" /> For a significantly larger case study involving 5747 buses covering 5060km, we refer the reader to this academic article. <ref name="eleven" />
;The airline crew scheduling problem

An important application of large-scale set covering is the airline crew scheduling problem, which pertains to assigning airline staff to work shifts. <ref name="two" /> <ref name="six"> E. Marchiori and A. Steenbeek, [https://link.springer.com/chapter/10.1007/3-540-45561-2_36 "An Evolutionary Algorithm for Large Scale Set Covering Problems with Application to Airline Crew Scheduling]," ''Real-World Applications of Evolutionary Computing. EvoWorkshops 2000. Lecture Notes in Computer Science'', 2000. </ref> Thinking of the collection of flights as a universal set to be covered, we can formulate a set covering problem to search for the optimal assignment of employees to flights. Due to the complexity of airline schedules, this problem is usually divided into two subproblems: crew pairing and crew assignment. We refer the interested reader to this survey, which contains several problem instances with the number of flights ranging from 1013 to 7765 flights, for a detailed analysis of the formulation and algorithms that pertain to this significant application. <ref name="two" /> <ref name="eight"> A. Kasirzadeh, M. Saddoune, and F. Soumis [https://www.sciencedirect.com/science/article/pii/S2192437620300820?via%3Dihub "Airline crew scheduling: models, algorithms, and data sets]," ''EURO Journal on Transportation and Logistics'', vol. 6, pp. 111-137, 2017. </ref>

==Conclusion ==

The set covering problem, which aims to find the least number of subsets that cover some universal set, is a widely known NP-hard combinatorial problem. Due to its applicability to route planning and airline crew scheduling, several methods have been proposed to solve it. Its straightforward formulation allows for the use of off-the-shelf optimizers to solve it. Moreover, heuristic techniques and greedy algorithms can be used to solve large-scale set covering problems for industrial applications.

== References ==
<references />

2020 Cornell Optimization Open Textbook Feedback

2020-12-21T10:33:48Z

Wc593: /* Facility location problem */

==[[Computational complexity]]==

* Numerical Example
*# Finding subsets of a set is NOT O(2n).
* Application
*# The applications mentioned need to be discussed further.

==[[Network flow problem]]==

* Real Life Applications
*# There is NO need to include code. Simply mention how the problem was coded along with details on the LP solver used.

==[[Interior-point method for LP]]==

* Introduction
*# Please type “minimize” and “subject to” in formal optimization problem form throughout the whole page.
* A section to discuss and/or illustrate the applications
*# Please type optimization problem in the formal form.

==[[Optimization with absolute values]]==

* An introduction of the topic
*# Add few sentences on how absolute values convert optimization problem into a nonlinear optimization problem
* Applications
*# Inline equations at the beginning of this section are not formatted properly. Please fix the notation for expected return throughout the section.

==[[Matrix game (LP for game theory)]]==

* Theory and Algorithmic Discussion
*# aij are not defined in this section.

==[[Quasi-Newton methods]]==

* Theory and Algorithm
*# Please ensure that few spaces are kept between the equations and equation numbers.

==[[Eight step procedures]]==

* Numerical Example
*# Data for the example Knapsack problem (b,w) are missing.
*# How to arrive at optimal solutions is missing.

==[[Set covering problem]]==

* Integer linear program formulation & Approximation via LP relaxation and rounding
*# Use proper math notations for “greater than equal to”.
* Numerical Example
*# Please leave some space between equation and equation number.

==[[Quadratic assignment problem]]==

* Theory, methodology, and/or algorithmic discussions
*# Discuss dynamic programming and cutting plane solution techniques briefly.

==[[Newsvendor problem]]==

* Formulation
*# A math programming formulation of the optimization problem with objective function and constraints is expected for the formulation. Please add any variant of the newsvendor problem along with some operational constraints.
*# A mathematical presentation of the solution technique is expected. Please consider any distribution for R and present a solution technique for that specific problem.

==[[Mixed-integer cuts]]==

* Applications
*# MILP and their solution techniques involving cuts are extremely versatile. Yet, only two sentences are added to describe their applications. Please discuss their applications, preferably real-world applications, in brief. Example Wikis provided on the website could be used as a reference to do so.

==[[Column generation algorithms]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory, methodology and algorithmic discussions
*# Some minor typos/article agreement issues exist “is not partical in real-world”.

==[[Heuristic algorithms]]==

* Methodology
*# Please use proper symbol for "greater than or equal to".
*# Greedy method to solve minimum spanning tree seems to be missing.

==[[Branch and cut]]==

* Methodology & Algorithm
*# Equation in most infeasible branching section is not properly formatted.
*# Step 2 appears abruptly in the algorithm and does not explain much. Please add more information regarding the same.
*# Step 5 contains latex code terms that are not properly formatted. Please fix the same.
*# Fix typos: e.g., repeated “for the current”.

== [[Mixed-integer linear fractional programming (MILFP)]] ==

* Application and Modeling for Numerical Examples
*# Please check the index notation in Mass Balance Constraint

==[[Fuzzy programming]]==

* Applications
*# Applications of fuzzy programming are quite versatile. Please discuss few of the mentioned applications briefly. The provided example Wikis can be used as a reference to write this section.

==[[Adaptive robust optimization]]==

* Problem Formulation
*# Please check typos such as "Let ''u'' bee a vector".
*# The abbreviation KKT is not previously defined.

== [[Stochastic gradient descent]] ==
* Numerical Example
*# Amount of whitespace can be reduced by changing orientation of example dataset by converting it into a table containing 3 rows and 6 columns.

==[[RMSProp]]==

* Introduction
*# References at the end of the sentence should be placed after the period.
* Theory and Methodology
*# Please check grammar in this section.
* Applications and Discussion
*# The applications section does not contain any discussion on applications. Please mention a few applications of the widely used RMSprop and discuss them briefly.

==[[Adam]]==

* Background
*# References at the end of the sentence should be placed after the period.