# Nondifferentiable optimization

This web page is a duplicate of https://optimization.mccormick.northwestern.edu/index.php/Nondifferentiable_Optimization
Author Name: Nathanael Robinson
Steward: Dajun Yue and Fenqui You

# Background

## Introduction

Non-differentiable optimization is a category of optimization that deals with objective that for a variety of reasons is non differentiable and thus non-convex. The functions in this class of optimization are generally non-smooth. These functions although continuous often contain sharp points or corners that do not allow for the solution of a tangent and are thus non-differentiable. In practice non-differentiable optimization encompasses a large variety of problems and a single one-size fits all solution is not applicable however solution is often reached through implementation of the subgradient method. Non-differentiable functions often arise in real world applications and commonly in the field of economics where cost functions often include sharp points. Early work in the optimization of non-differentiable functions was started by Soviet scientists Dubovitskii and Milyutin in the 1960's and led to continued research by Soviet Scientists. The subject has been a continued field of study since with different theories and methods being applied to solution in different cases.

## Cost Functions

In many cases, particularly economics the cost function which is the objective function of an optimization problem is non-differentiable. These non-smooth cost functions may include discontinuities and discontinuous gradients and are often seen in discontinuous physical processes. Optimal solution of these cost functions is a matter of importance to economists but presents a variety of issues when using numerical methods thus leading to the need for special solution methods.

An example of a non-differentiable cost function such as one that may be seen in economics

# Solution Methods

Solution of differentiable problems and differentiable cost functions can in general forms be solved with gradient based analytical methods such as the Kuhn-Tucker model and through numerical methods such as steepest descent and conjugate gradient. However the introduction of non-differentiable points in the function invalidates these methods, steepest descent cannot be calculated for a vertical line. A common method for solution of a non-differentiable cost function is through transformation into a non-linear programming model where all of the of new functions involved are differentiable such that solution is now possible through ordinary means.

## Simple Kink Case

An example of a two parameter kink approximation.

A common case of a non-differentiable function is the simple kink. The function is of the form:
${\displaystyle Min\quad f(x)}$
${\displaystyle S.t.\quad x\in Q\subset R^{n}}$

The function ${\displaystyle f(x)}$ is non-differentiable because of several simple kinks which can be modeled by:
${\displaystyle \gamma [f_{i}(x)]=max\{0,f_{i}(x)\}\qquad i\in I}$

If these simple kinks were removed the function would be differentiable across the entire domain. Some other types of non-differentiable objective functions can be modeled as simple kinks to allow the same type of solution.
The approach to solution of the simple kink case is to approximate each of the non-differentiable kinks with a smooth function that will allow conventional solution to the entire problem. This requires that the kinks be the only factor that renders the function non-differentiable. A simple kink can be modeled by a two-parameter approximation,${\displaystyle {\tilde {\gamma }}[f(x),y,c]}$, of the simple kink ${\displaystyle \gamma [f(x)]}$

${\displaystyle {\tilde {\gamma }}[f(x),y,c]={\begin{cases}f(x)-(1-y)^{2}/2c,&{\text{if }}(1-y)/c\leq f(x),\\yf(x)+{\tfrac {1}{2}}c[f(x)]^{2},&{\text{if }}-y/c\leq f(x)\leq (1-y)/c\\-y^{2}/2c,&{\text{if }}f(x)\leq -y/c\end{cases}}}$
Where y and c are parameters with ${\displaystyle 0\leq y\leq 1,0

Each kink ${\displaystyle \gamma _{i}}$ will be replaced in the function with its two-parameter approximation such the new ${\displaystyle {\tilde {f(x)}}}$ function is differentiable with the parameters ${\displaystyle c>0}$ and ${\displaystyle 0. The solution can now be iteratively solved by adjusting the parameters c and y and solving the optimization problem
${\displaystyle Min\quad {\tilde {f(x)}}}$
${\displaystyle s.t.\quad x\in Q\subset X}$

A solution ${\displaystyle x_{k}}$ to the approximated objective function will be obtained. The problem is now resolved with an updated parameter for ${\displaystyle c}$ which is obtained by multiplying ${\displaystyle \beta c_{k}}$ which ${\displaystyle =c_{k+1}}$ where ${\displaystyle \beta >0.\quad y_{k+1}}$ can also be updated if necessary. And a new minimization carried out with the ${\displaystyle k+1}$ case. The procedure can be repeated until a value of ${\displaystyle f(x)}$ that is consistent with the ${\displaystyle c}$ and ${\displaystyle y}$ parameters is reached.

## ${\displaystyle \varepsilon }$-Subgradient Method

If the non-differentiable function is convex and subject to convex constraints then the use of the ${\displaystyle \varepsilon }$-Subgradient Method can be applied. This method is a descent algorithm which can be applied to minimization optimization problems given that they are convex.
With this method the constraints won't be considered explicitly but rather the objective function will be minimized to the value ${\displaystyle +\infty }$. This makes it such that the minimization of ${\displaystyle g(.)}$ over set ${\displaystyle X}$ is equal to finding the minimum of the extended real value function ${\displaystyle f(x)=g(x)+\delta (x|X)}$ where ${\displaystyle \delta (|X)}$ is the indicator function of ${\displaystyle X}$. The solution will converge through a 4 step system, the basis of these steps lies a series of propositions which are further detailed in [1].
Step 1: Select a vector ${\displaystyle x_{\circ }}$ such that ${\displaystyle f(x_{\circ })<\infty }$, a scalar ${\displaystyle \varepsilon _{\circ }>0}$ and a scalar ${\displaystyle a,0 .
Step 2: Given ${\displaystyle x_{n},\quad \varepsilon _{n}>0,}$ set ${\displaystyle \varepsilon _{n+1}=a^{k}\varepsilon _{n},}$ where ${\displaystyle k}$ is the smallest non-negative integer such that ${\displaystyle 0\not \in \delta _{\varepsilon _{n+1}}f(x^{n})}$
Step 3: Find a vector ${\displaystyle y_{n}}$ such that
{\displaystyle {\begin{aligned}sup\quad &\langle y_{n},x^{*}\rangle <0\\x^{*}\in \delta _{\varepsilon _{n+1}}f(x)&\\\end{aligned}}}
Step 4: Set ${\displaystyle x_{n+1}=x_{n}+\lambda _{n}y_{n},}$ where ${\displaystyle \lambda _{n}>0}$ is such that
${\displaystyle f(x_{n})-f(x_{n+1})>\varepsilon _{n+1}}$
Return to step 2 to iterate until convergence. This method is not only guaranteed to converge but progress towards convergence is made with each iteration.

## Cutting Plane Methods

Cutting planes were first utilized for the convergence of convex non-differentiable equations. The application of cutting planes will use the subgradient inequality to change the function ${\displaystyle f}$ by approximating it as
${\displaystyle f(x)\cong {\tfrac {max}{i\in I}}f(x_{i})+\xi _{i}^{T}(x-x_{i})}$

Where ${\displaystyle \xi _{i}^{f},\quad i\in I}$ are subgradients of ${\displaystyle f}$ at ${\displaystyle x_{i},\quad i\in I}$. Thus, The original problem is now formulated as

${\displaystyle {\tfrac {min}{x}}\{{\tfrac {max}{i\in I}}f(x_{i})+\xi _{i}^{T}(x-x_{i})\}}$
Which is equivalent to the new problem

${\displaystyle Min\quad v}$
${\displaystyle s.t.\quad f(x_{i})+\xi _{i}^{T}(x-x_{i})\leq v\quad \forall i\in I}$

This new minimization formulation is now differentiable and easier to deal with, however it is only an approximation of the original equation which will become a better approximation as more constraints are added to the new model.

A simple example of non-differentiable optimization is approximation of a kink origination from an absolute value function. The simple function ${\displaystyle f=|x|}$ is an example of a function that while continuous for an infinite domain is non-differentiable at ${\displaystyle f(x)=0}$ due to the presence of a "kink" or point that will not allow for the solution of a tangent. Since the non-differentiable point of the function is known an approximation can be added to relax and smooth the function with parameter ${\displaystyle t}$. This new approximation can be modeled
${\displaystyle {\begin{cases}-x&x\geq t,\\{\tfrac {x^{2}}{t}}&-t\leq x\leq t,\\x&x\geq t,\\\end{cases}}}$