Particle swarm optimization: Difference between revisions

Revision as of 01:29, 12 December 2024

Author: David Schluneker (dms565), Thomas Ploetz (tep52), Michael Sorensen (mds385), Amrith Kumaar (ak836), Andrew Duffy (ajd296) (ChemE 6800 Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Particle Swarm Optimization (PSO) is inspired by nature and groups or swarms of natural creatures. It uses multiple “particles” distributed across a solution space to slowly converge upon a global, optimized solution. This algorithm does not require calculation of the function’s gradient or slope, instead using simple weights and the contribution of the rest of the swarm to provide a “fitness” value [2] to leading to a technique dependent only on simple calculations and a number of starting points. Test^[1]

The PSO algorithm was formulated and initially presented by Kennedy and Eberhart specifically to find solutions to discontinuous, but non-differentiable functions. The concept was presented at the IEEE conference on neural networks in 1995. The pair later published a book Swarm Intelligence in 2001 that further expanded on the concept [14], [23]. Others have summarized the algorithm [1], [7], [15]. This algorithm does not leverage formal methodology or mathematical rigor to identify an ideal point but instead uses a heuristic approach to explore a wide range of starting locations and conditions. These features allow the PSO algorithm to efficiently optimize complex functions.

The biological analogy for a PSO algorithm is a flock of birds, a school of fish, or a colony of ants trying to find a food resource. Each animal, or particle, is an independent entity that is conducting its own search routine for an optimal result. However, the particles will also be influenced and guided by motions of the larger swarm. The results any particle has already seen, as well as the influence of the larger swarm helps all the particles converge to an optimal solution.

In a more technical dissection, a PSO algorithm must be initially set-up with how many particles and iterations the user wants to use. More particles improve the convergence likelihood and help ensure the global optimum can be found due to a more complete search at the cost of computing resources required. The PSO algorithm allows a user to set different “weights”, step sizes, and speeds for the particles to control how influenced each particle is by its past points (cognitive) and by the other nearby points (social) in the swarm and how large of steps will be taken. These weights allow algorithm adjustment to tailor the performance and computational requirements.

The goals of studying this topic are to learn more about an algorithm capable of solving nondifferentiable functions that are relevant to modern problems. Many optimization algorithms use the gradient function liberally, so it is interesting to see an algorithm that does not require gradient computations. Additionally, the algorithm is still in active use in many fields [15], [24]. PSO is regularly used to solve optimization problems in the gas and oil industry, antenna design [25], and electric power distribution [10]. As functions get very complex in the real-world, the PSO algorithm is still capable of performing an effective search. Relevance to modern optimization problems make PSO an interesting research area.

Algorithm Discussion

The structure of the algorithm is simple. The algorithm seeks to find an optimum of an objective function, f(x). A set of particles, X with members x X , forming a swarm, is initialized randomly. The algorithm runs for a number of iterations. At each iteration, for each particle in the swarm, each particle’s position is updated to be x i+1, p = xi, p + vi, p. Subscript i indicates the iteration of the algorithm. Subscript p indicates the index of the particle in the swarm. Both the optimal solution for a particular particle, xbest, p, and a global optimum, xbest, g are tracked. f(xbest, p) is the best solution to the objective function that a particular particle, p, has experienced over all iterations. f(xbest, g ) is the best solution to the objective function for all particles, at all iterations. The algorithm is halted when either a fixed number of iterations is reached or a stopping criteria is reached. A typical stopping criteria is when the change between iterations is below a user-set threshold, demonstrating that the particles are not moving from the identified solution point. The algorithm’s control flow is shown in the following flow chart.

Figure 1: Algorithm Flow Chart A velocity, vi, p is independently computed for each particle xi,p. The velocity is the result of the sum of the preceding velocity, a cognitive component and a social component. The cognitive component is rand() *wc* (xbest, p - xi, d) and pulls the solution towards the best point observed by this particle. The social component is rand() *ws* (xbest, g - xi, d) and pulls the solution towards the best point observed across the entire swarm. Each of the social component and cognitive component are weighed by a factor. The state that must be tracked across loop iterations is; xi,p, xbest, p, vi, p for each particle and xbest, g.

The convergence of the algorithm can be checked a few different ways. The original papers were based on training neural networks, so the optimization was halted when the neural network was sufficiently able to make predictions [18, 19]. This approach is flawed because the designer of the algorithm used a priori knowledge of how good the neural network will perform. More generally, this approach incorporates application specific logic to stop when the optimal solution is good enough. Alternatively, the algorithm can be terminated following a predetermined number of iterations. Another approach is to stop when the solution's improvement at a given iteration stops [20].

The pseudocode for the PSO algorithm attempting to minimize a solution follows. The algorithm can be modified to search for a maximum by changing the initialization of xbest, p and xbest, galong with their updates. SWARM is initialized randomly vparticle initialized to 0 xbest, p initialized xbest, g initialized to For i in MAX_ITERATIONS: For particle in SWARM: cognitive := rand() *wc * (xbest, p- Xid) social := rand() *ws* (xbest, g - Xid) vparticle = vparticle + cognitive + social xi+1 = xi + vi if f(xi+1) < f(xbest, p) xbest, p= xi+1 if f(xi+1) < f(Pig ) xbest, g= xi+1

PRINT “Local optimal at xbest, g”

Figure 2: Algorithm Pseudocode

The algorithm has a few hyper-parameters including; number of particles in the swarm, weights of the cognitive update on the velocity, weights on the social update on the velocity. The Pymoo Python implementation suggests a swarm size of 25, velocity weight of 0.8, a social update weight of 2.0 and a cognitive update of 2.0 [33]. Practitioners will need to adjust parameters and determine how their problem reacts to modifications of the hyper-parameters. Heuristics may be used to adjust the various weights between iterations of the algorithm. Algorithms that adjust the hyper parameters based on heuristics are called Adaptive Particle Swarm Optimization (APSO) [36].

PSO makes very few assumptions. The algorithm can be trivially adopted to work with real, integer or mixed functions. The algorithm does not rely on gradients or hessians, so there are no assumptions or requirements on the function being continuous or differentiable. However, the lack of assumptions comes at a cost, there is no proof that the algorithm will find an optimal solution. One of the original papers by Kennedy in 1997 [19] indicated a tendency for the optimization to get stuck in local minimum. Adjustment of the algorithm’s hyper-parameters is a performance vs robustness trade that may have avoided sticking in a local optimum.

Numerical Example

This section will walk through a numerical example of solving a complex objective function using Particle Swarm Optimization.

Given a non-convex function with multiple local optima:

$minf(x,y)=sin(5x+1)+cos(7y-3)+x^{2}+y^{2}\,$

Step 1: Define the Objective Function

$f(x,y)=sin(5x+1)+cos(7y-3)+x^{2}+y^{2}\,$

Step 2: Define Key Variables

Symbol	Description	Value	Additional Notes
P	Total Number of Particles used to search the function	25	Larger number means greater chance of finding the global optimum, but larger computational load
r_1	random number between 0 and 1	(0,1)	to be randomly generated at each evaluation
r_2	random number between 0 and 1	(0,1)	to be randomly generated at each evaluation
w	Inertia Weight Constant (0,1)	(0,1)
c_1	Cognitive Coefficient (Independence, prioritize the self)	0.1*	recommend c1 c2 & sufficiently low value for slower smoother operation
c_2	Social Coefficient (Codependence, prioritize the group)	0.15*	recommend c1 c2 & sufficiently low value for slower smoother operation

The algorithm is highly sensitive to c1 and c2. Recommend referencing the search space and using 1/30 to 1/60 of the total range. Ex: target search range is -3 to 3 by -3 to 3, use a value 0.1 to 0.2 to prevent “overshooting” and oscillations of the individual particles.

Step 3: Define Stopping Criteria

For this example we are choosing a set number of iterations (t):

$t=50\,$

It is important to note that there are alternative stopping criteria such as checking for convergence of the particles - such as a minimum delta between points or using a Confidence Interval on the X and Y coordinates as your threshold. This could lead to more, or less, iterations depending upon what threshold is used. Care should be taken to ensure the stopping criteria can actually be met. This wiki does not provide an example of this methodology at this point in time.

Step 4: Setup Particle Data

Each particle has three main properties

Position of particle $i\,$ , at iteration $t\,$ , denoted as $X^{i}(t)\,$ where $X^{i}(t)=(x^{i}(t),y^{i}(t))\,$
Velocity Vector denoted as $V^{i}(t)=(v_{x}^{i}(t),v_{y}^{i}(t))\,$
Objective Function Value of $f(X^{i}(t))=f(x^{i}(t),y^{i}(t))\,$

Where $P\,$ equals total number of particles, cumulatively these would be represented as:

$X^{i}(t)=[X^{0}(t),\cdots ,X^{P-1}(t)]\,$

$V^{i}(t)=[V^{0}(t),\cdots ,V^{P-1}(t)]\,$

$f(X^{i}(t))=[f(X^{0}(t)),\cdots ,f(X{P-1}](t))]=[f(x^{0}(t),y^{0}(t)),\cdots ,f(x^{P-1}(t),y^{P-1}(t))\,$

Important Note $X^{i}(t)\,$ only tracks the current iteration of particle $i\,$ , not all values of a particle over $m\,$ iterations:

Step 5: Evaluate at Initialization Points

Fifth

Step 6: Calculate Next Iteration Position, Velocity, Value

Sixth

Step 7: Iteratively Solve

Calculate next optima, position, velocity, and function values for each particle. Repeat until the stopping criteria has been met.

Applications

The strength of PSO is in its simplicity and ability to handle arbitrary functions. The algorithm does not use gradient information, making it useful in contexts where computation of the gradient is expensive or the function is otherwise unreliable.

PSO was originally used to train neural networks (NN). Neural networks frequently have highly non-linear loss functions and it is therefore difficult to find a global optimum. [21]. Several papers have noted an improvement in NN accuracy using PSO optimization when compared to back-propagation due to the ability to avoid local optimum [26]. However, the current state of the current practice for training deep neural networks remains the back-propagation algorithm. Back-propagation is more efficient in the context of neural networks because for each particle at each iteration, there will need to be an error function computation. For example, a swarm size of 25 requires 25 times more error function evaluations [31].

In addition to identifying weights in the Neural Networks (training), PSO has been used in selecting hyper-parameters for neural networks. In these applications PSO is used to maximize the neural network’s accuracy by varying parameters including number of nodes and structure of the networks [28]. PSO fits this application well because this function is nonlinear and the gradient is non-continuous. Hyper parameters for NNs may be real, integer or binary values.

PSO is most often compared to other heuristic based algorithms such as; genetic algorithms differential evolution simulated annealing. One study for a bus scheduling application did find the PSO was better than genetic algorithms with respect to the optimality of solution and run-time [31]. The results in the paper are difficult to generalize, as results will be sensitive to the application as well as the algorithm’s hyper-parameters. The recommended choice of heuristic optimization algorithms remains up to the practitioner’s judgement.

A number of different implementations are available as libraries. Pyswarms is a Python API [32], Pymoo is an alternative Python implementation [33]. Pyswarms is a compact implementation specifically focused on PSO, whereas Pymoo is a more complete library with multiple objective function implementations. Matlab does have PSO implementation as part of the optimization toolbox [34] There is a 3rd party Matlab implementation available for free [35]. However, the algorithm is simple so many prefer to implement the algorithm directly.

Conclusions

The PSO algorithm is a heuristic based optimization algorithm. The heuristics are loosely based on the behavior of a flock of seagulls. The algorithm’s strengths are in its simplicity, lack of reliance on a gradient, and robustness against getting stuck in local optimum. The drawback is that the “swarm” requires a large number of function evaluations, making the algorithm slow. Additionally, the algorithm has no guarantee of convergence. The ability to find an optimum is sensitive to starting conditions and hyper-parameters.

The algorithm is easy to get up and running with several public implementations available in the Python and Matlab programming languages. Alternatively, the algorithm is so simple that some will implement the algorithm themselves.

When using the algorithm, practitioners need to select several weights that control the algorithms. A cognitive weight determines to what extent a particle’s previous optimum affects future searches. A social weight determines to what extent all particle’s global optimum affects future searches. These parameters and the swarm size need to be adjusted carefully to balance robustness against computational efficiency.

The PSO algorithm continues to be developed. There is research into how to adjust the algorithm’s hyper parameters at various iterations of the algorithm. These algorithms are called adaptive particle swarm optimization (APSO) and take some guesswork the algorithm. The algorithm has also been combined with genetic optimization algorithms, though the algorithms remain heuristic based and do not guarantee optimum results. \cite{ElStat2}

PSO continues to be used in applications that are non-linear or are non-differentiable. Recent (2024) studies have used PSO for RF in optimizing the placement of terminals [29] and for predicting landslides [30].

References

↑ K. Murphy (2012). Machine Learning: A Probabilistic Perspective, MIT Press

[1] K. Murphy (2012). Machine Learning: A Probabilistic Perspective, MIT Press

[1]

@@ Line 65: / Line 65: @@
 ===Step 2: Define Key Variables ===
-Second
+{| class="wikitable"
- [INSERT TABLE]
+|-
+! ''Symbol''
+! style="width: 300px;" | Description
+! style="width: 80px;" | Value
+! style="width: 300px;" | Additional Notes
+|-
+! P
+| style="text-align: center;" | Total Number of Particles used to search the function
+| style="text-align: center;" | 25
+| style="text-align: center;" | Larger number means greater chance of finding the global optimum, but larger computational load
+|-
+! r_1
+| style="text-align: center;" | random number between 0 and 1
+| style="text-align: center;" | (0,1)
+| style="text-align: center;" | to be randomly generated at each evaluation
+|-
+! r_2
+| style="text-align: center;" | random number between 0 and 1
+| style="text-align: center;" | (0,1)
+| style="text-align: center;" | to be randomly generated at each evaluation
+|-
+! w
+| style="text-align: center;" | Inertia Weight Constant (0,1)
+| style="text-align: center;" | (0,1)
+| style="text-align: center;" |
+|-
+! c_1
+| style="text-align: center;" | Cognitive Coefficient
+(Independence, prioritize the self)
+| style="text-align: center;" | 0.1*
+| style="text-align: center;" | recommend c1 c2 & sufficiently low value for slower smoother operation
+|-
+! c_2
+| style="text-align: center;" | Social Coefficient
+(Codependence, prioritize the group)
+| style="text-align: center;" | 0.15*
+| style="text-align: center;" | recommend c1 c2 & sufficiently low value for slower smoother operation
+|}
 *The algorithm is highly sensitive to c1 and c2. Recommend referencing the search space and using 1/30 to 1/60 of the total range. Ex: target search range is -3 to 3 by -3 to 3, use a value 0.1 to 0.2 to prevent “overshooting” and oscillations of the individual particles.