Frank-Wolfe: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
Line 43: Line 43:
</math>
</math>


<math display="inline">x-y\leq1
<math display="inline">3x+7\leq5
</math>
 
<math display="inline">x\geq0
</math> , <math display="inline">y\geq0
</math>
 
'''Step 1: Define the initial solution'''
 
Begin by choosing the feasible point (0,0)


<math display="inline">\delta fx(x,y) = 4x^3 - 16
</math> 
<math display="inline">\delta fy(x,y) = 2y-10
</math>
</math>


<math display="inline">3x+7\leq5
<math display="inline">\nabla f(0,0) = (-16,-10)
</math>
 
We will need to minimize <math display="inline">z_0 (x,y) = (-16, -10) \cdot (x,y)^T
</math>
</math>


<math display="inline">x2^1 = 0
<math display="inline">z_0 = (-16x, -10y)
</math>
</math>


'''Step 1: Define the initial solution'''
Utilize the extreme points of the defined region to find the minimum.


Begin by choosing the feasible point (0,0)
Extreme Points: <math display="inline">(0,0)
</math> <math display="inline">(0,5) 
</math><math display="inline">(1,0) 
</math><math display="inline">(2,1) 
</math>


'''Step 2: Determine the search direction''' 


<math display="inline">\delta fx(x,y) =
</math>


== Applications ==
== Applications ==

Revision as of 19:12, 27 November 2021

Author: Adeline Tse, Jeff Cheung, Tucker Barrett (SYSEN 5800 Fall 2021)

Introduction

The Frank-Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization, first proposed by Marguerite Frank and Philip Wolfe from Princeton University in 1956.[1] It is also known as the “gradient and interpolation” algorithm or sometimes the “conditional gradient method” as it utilizes a simplex method of comparing an initial feasible point with a secondary basic feasible point to maximize the linear constraints in each iteration to find or approximate the optimal solution.

Major advantages of the Frank-Wolfe algorithm include that it is simple to implement; it results in a projection-free computation (that is, it does not require projections back to the constraint set to ensure feasibility); and it generates solution iterations that are sparse with respect to the constraint set. These features favor linear minimization, simple implementation, and low complexity. It is utilized in artificial intelligence, machine learning, traffic assignment, signal processing, and many more applications.

Theory & Methodology

The Frank-Wolfe method optimizes by considering the linearization of the objective function f and moving the initial position x towards the minimizer of the linear function.[2]

The Frank-Wolfe algorithm uses step size and postulated convexity, which formulates a matrix of positive semidefinite quadratic form. Just like a convex function yields a global minimum at any local minimum on a convex set, by the definition of nonlinear programming, the concave quadratic function would yield a global maximum point at any local maximum on a convex constraint set.

The methodology of the Frank-Wolfe algorithm starts with obtaining the generalized Lagrange multipliers for a quadratic problem, PI. The found multipliers are considered solutions to a new quadratic problem,  PII. The simplex method is then applied to the new problem, PII, and the gradient and interpolation method is utilized to make incremental steps towards the solution of the original quadratic problem, PI. This is performed by taking an initial feasible point, obtaining a second basic feasible point along the gradient of the objective function at the initial point, and maximizing the objective function along the segment that joins these two points.  The found maximum along the segment is utilized as the starting point for the next iteration.[1] Using this method, the values of the objective function converge to zero and in one of the iterations a secondary point will yield a solution.

Algorithm & Proof

The Frank-Wolfe algorithm is found to be feasible via the proof below.

Step 1. Define initial solution

If is the extreme point, the initial arbitrary basic feasible solution can be considered as:

The letter denotes the number of iterations.

Step 2. Determine search direction

Search direction, that is the direction vector is:

and are feasible extreme points and belong to where is convex

First-order Taylor expansion around and problem is now reformulated to LP:

Numerical Example

Consider the problem min z,

,

Step 1: Define the initial solution

Begin by choosing the feasible point (0,0)

We will need to minimize

Utilize the extreme points of the defined region to find the minimum.

Extreme Points:


Applications

Image and Video Co-localization

The process of co-localization that captures common objects by using bounding boxes.[3]

Co-localization is one of the programming methods in the machine learning field used to find a common object within a set of images or video frames. The process of co-localization involves first generating multiple candidate bounding boxes for each image/ video frame, then identifying the specific box in each image/ video frame that jointly contains the common object.[4] This technique is extremely important for analyzing online data. By being able to recognize the common objects, visual concepts can be tagged into specific labels, thus leveraging useful data for other marketing and product design purposes. Such development is especially useful for image and video hosting services such as Flickr and YouTube.

The formulation of co-localization using the Frank-Wolfe algorithm involves minimizing a standard quadratic programming problem and reducing it to a succession of simple linear integer problems. The benefit of the algorithm is that it allows many more images to be analyzed simultaneously and more bounding boxes are generated in each image/ video frame for higher accuracy. By relaxing the discrete non-convex set to its convex hull, the constraints that define the original set can now be separated from one image/ video to another. In the case of co-localizing videos, the constraints use the shortest-path algorithm to reach optimization.[3]

Traffic Assignment Problem

A typical transportation network, also known as flow network, is considered as a convex programming problem with linear constraints. With the vertices called nodes and the edges called arcs, a directed graph called a network is formed and the flow is monitored via the movement from one node, origin (O), to another node, destination (D) along the arc. The objective of this programming problem is to achieve user-equilibrium, which means system flow is set optimal without incurring unnecessarily high cost for users.

The use of the Frank-Wolfe algorithm in this case means that the traffic assignment problem can be applied to nonlinear multicommodity network flow with flow dependent cost.[5] The algorithm is also able to process a larger set of constraints with more O-D pairs. Since the Frank-Wolfe algorithm uses a descent method to search for the direction of extreme points, the technique only recognizes the sequence of the shortest route problems. Therefore, the Frank-Wolfe algorithm is known to solve the traffic assignment problem on networks with several hundred nodes efficiently with the minimal number of iterations.

Due to the algorithm’s effectiveness in optimizing traffic assignment problem, it has been widely applied in many fields other than the estimation of the total demand for travel via road between a set of roads and zones. These applications include data-driven computer networks, fluids in pipes, currents in an electrical circuit and more.

Deep Neural Network Training

Although Stochastic Gradient Descent (SGD) is still the conventional machine learning technique for deep learning, the Frank-Wolfe algorithm has been proven to be applicable for training neural networks as well. Instead of taking projection steps via SGD, the Conditional Gradient algorithms, aka Frank-Wolfe algorithm, adopts the simple projection-free first-order algorithm by using the linear minimization oracle (LMO) approach for constrained optimization.[6]

with [0,1] and being

Through LMO, non-unique solutions to the above equation that were deemed impossible by Gradient-Descent-based optimization would now be possible.

Conclusion

References

<reference />

  1. 1.0 1.1 Frank, M. and Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1-2):95–110.
  2. Jaggi, M. (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. Inerational Conference on Machine Learning (pp.427-435). PMLR.
  3. 3.0 3.1 Joulin, A., Tang, K., & Li, F.F. (2014). Efficient image and video co-localization with frank-wolfe algorithm. European Conference on Computer Vision (pp. 253-268). Springer, Cham.
  4. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. 2009 IEEE 12th international conference on computer vision (pp. 237-244). IEEE.
  5. LeBlanc, L., Morlok, E.K., & Pierskalla, W.P. (1975). An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research, 9, 309-318.
  6. Pokutta, S., Spiegel, C., & Zimmer, M. (2020). Deep neural network training with Frank-Wolfe. arXiv preprint arXiv:2010.07243.