Genetic algorithm

Author: Yunchen Huo (yh2244), Ran Yi (ry357), Yanni Xie (yx682), Changlin Huang (ch2269), Jingyao Tong (jt887) (ChemE 6800 Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

The Genetic Algorithm (GA) is an optimization technique inspired by Charles Darwin's theory of evolution through natural selection^[1]. First developed by John H. Holland in 1973^[2], GA simulates biological processes such as selection, crossover, and mutation to explore and exploit solution spaces efficiently. Unlike traditional methods, GA does not rely on gradient information, making it particularly effective for solving complex, non-linear, and multi-modal problems.

GA operates on a population of candidate solutions, iteratively evolving toward better solutions by using fitness-based selection. This characteristic makes it suitable for tackling problems in various domains, such as engineering, machine learning, and finance. Its robustness and adaptability have established GA as a key technique in computational optimization and artificial intelligence research, as documented in Computational Optimization and Applications.

Algorithm Discussion

The GA was first introduced by John H. Holland^[2] in 1973. It is an optimization technique based on Charles Darwin’s theory of evolution by natural selection.

Before diving into the algorithm, here are definitions of the basic terminologies.

Gene: The smallest unit that makes up the chromosome (decision variable)
Chromosome: A group of genes, where each chromosome represents a solution (potential solution)
Population: A group of chromosomes (a group of potential solutions)

GA involves the following seven steps:

Initialization
- Randomly generate the initial population for a predetermined population size
Evaluation
- Evaluate the fitness of every chromosome in the population to see how good it is. Higher fitness implies better solution, making the chromosome more likely to be selected as a parent of next generation
Selection
- Natural selection serves as the main inspiration of GA, where chromosomes are randomly selected from the entire population for mating, and chromosomes with higher fitness values are more likely to be selected^[3].
Crossover
- The purpose of crossover is to create superior offspring (better solutions) by combining parts from each selected parent chromosome. There are different types of crossover, such as single-point and double-point crossover^[3]. In single-point crossover, the parent chromosomes are swapped before and after a single point. In double-point crossover, the parent chromosomes are swapped between two points^[3].
Mutation
- A mutation operator is applied to make random changes to the genes of children's chromosomes, maintaining the diversity of the individual chromosomes in the population and enabling GA to find better solutions^[3].
Insertion
- Insert the mutated children chromosomes back into the population
Repeat 2-6 until a stopping criteria is met
- Maximum number of generations reached
- No significant improvement from newer generations
- Expected fitness value is achieved

Genetic Operators:

Steps 3-5 in GA involves techniques of changing the genes of chromosomes to create new generations. The functions that were applied to the populations are called genetic operators. The main types of GA operators include selection operator, crossover operator, and mutation operator. Below are some widely used operators:

Selection Operator
- Some famous selection operators include roulette wheel, rank, tournament, etc. Roulette wheel selection allocates strings onto a wheel based on their fitness value, then randomly selects solutions for the next generation^[4].
Crossover Operator
- Single-point and double-point crossover operators were introduced in the algorithm above, and there is also a k-point crossover that selects multiple points and swaps the segments between those points of the parent chromosomes to generate new offspring^[4]. Some other well-known crossover operators are uniform crossover, partially matched crossover, order crossover, and shuffle crossover^[4].
Mutation Operator
- Mutation operator introduces variability into the population that helps the algorithm to find the global optimum. Displacement mutation operator randomly choose a substring from an individual and insert it into a new position^[4]. Additionally, other prominent mutation operators are simple inversion mutation, scramble mutation, binary bit-flipping mutation, directed mutation, and others^[5].

Numerical Example

1. Simple Example

We aim to maximize $f(x)=x^{2}$ , where $x\in [0,31]$ . Chromosomes are encoded as 5-bit binary strings since the binary format of the maximum value 31 is 11111.

1.1 Initialization (Generation 0)

The initial population is randomly generated:

Chromosome (Binary)	x (Decimal)
10010	18
00111	7
11001	25
01001	9

1.2 Generation 1

1.2.1 Evaluation

Calculate the fitness values:

Chromosome	$x$	$f(x)=x^{2}$
10010	18	324
00111	7	49
11001	25	625
01001	9	81

1.2.2 Selection

Use roulette wheel selection to choose parents for crossover. Selection probabilities are calculated as:

$P({\text{Chromosome}})={\frac {\text{Fitness}}{\text{Total Fitness}}}.$

Thus,

Total Fitness $=324+49+625+81=1079$

Compute the selection probabilities:

Chromosome	$f(x)$	Selection Probability
10010	$324$	${\frac {324}{1,079}}\approx 0.3003$
00111	$49$	${\frac {49}{1,079}}\approx 0.0454$
11001	$625$	${\frac {625}{1,079}}\approx 0.5792$
01001	$81$	${\frac {81}{1,079}}\approx 0.0751$

Cumulative probabilities:

Chromosome	Cumulative Probability
10010	$0.3003$
00111	$0.3003+0.0454=0.3457$
11001	$0.3457+0.5792=0.9249$
01001	$0.9249+0.0751=1.0000$

Random numbers for selection:

$r_{1}=0.32,\quad r_{2}=0.60,\quad r_{3}=0.85,\quad r_{4}=0.10$

Selected parents:

Pair 1: 00111 and 11001
Pair 2: 11001 and 10010

1.2.3 Crossover

Crossover probability: $P_{c}=0.7$

Pair 1: Crossover occurs at position 2.

Parent 1: $00|111$
Parent 2: $11|001$
Children: $00001\ (x=1)$ , $11111\ (x=31)$

Pair 2: No crossover.

Children: Child 3: $11001\ (x=25)$ , Child 4: $10010\ (x=18)$

1.2.4 Mutation

Mutation probability: $P_{m}=0.1$ Child 1: $00001$ (bits: $0\ 0\ 0\ 0\ 1$ ).

Mutations: Bit 1 and Bit 4 flip.
Resulting chromosome: $10011\ (x=19)$ .

Child 2: $11111$ (bits: $1\ 1\ 1\ 1\ 1$ ).

Mutation: Bit 3 flips.
Resulting chromosome: $11011\ (x=27)$ .

Child 3: $11001$ (bits: $1\ 1\ 0\ 0\ 1$ ).

Mutations: Bit 1 and Bit 5 flip.
Resulting chromosome: $01000\ (x=8)$ .

Child 4: $10010$ (bits: $1\ 0\ 0\ 1\ 0$ ).

Mutations: Bit 1 and Bit 3 flip.
Resulting chromosome: $00110\ (x=6)$ .

1.2.5 Insertion

Chromosome	$x$	$f(x)$
10011	$19$	$361$
11011	$27$	$729$
01000	$8$	$64$
00110	$6$	$36$

1.3 Generation 2

1.3.1 Evaluation

Calculate the fitness values:

Chromosome	$x$	$f(x)$
10011	$19$	$361$
11011	$27$	$729$
01000	$8$	$64$
00110	$6$	$36$

1.3.2 Selection

Total fitness: ${\text{Total Fitness}}=361+729+64+36=1,190$

Compute selection probabilities:

Chromosome	$f(x)$	Selection Probability
10011	$361$	${\frac {361}{1,190}}\approx 0.3034$
11011	$729$	${\frac {729}{1,190}}\approx 0.6126$
01000	$64$	${\frac {64}{1,190}}\approx 0.0538$
00110	$36$	${\frac {36}{1,190}}\approx 0.0303$

Cumulative probabilities:

Chromosome	Cumulative Probability
10011	$0.3034$
11011	$0.3034+0.6126=0.9160$
01000	$0.9160+0.0538=0.9698$
00110	$0.9698+0.0303=1.0000$

Random numbers for selection:

$r_{1}=0.20,\quad r_{2}=0.50,\quad r_{3}=0.80,\quad r_{4}=0.95$

Selected parents:

Pair 1: 10011 and 11011
Pair 2: 11011 and 01000

1.3.3 Crossover

Crossover probability: $P_{c}=0.7$

Pair 1: Crossover occurs at position 2.

Parent 1: $10|011$
Parent 2: $11|011$
Children: $10011\ (x=19)$ , $11011\ (x=27)$

Pair 2: Crossover occurs at position 4.

Parent 1: $11|011$
Parent 2: $01|000$
Children: $11000\ (x=24)$ , $01011\ (x=11)$

1.3.4 Mutation

Mutation probability: $P_{m}=0.1$

Child 1: No mutations. ${\text{Resulting Chromosome: }}10011$ .
Child 2: Bit 1 flips. ${\text{Resulting Chromosome: }}01011\ (x=11)$ .
Child 3: Bit 5 flips. ${\text{Resulting Chromosome: }}11010\ (x=26)$ .
Child 4: No mutations. ${\text{Resulting Chromosome: }}11011$ .

1.3.5 Insertion

Chromosome	$x$	$f(x)$
10011	$19$	$361$
01011	$11$	$121$
11010	$26$	$676$
11011	$27$	$729$

1.4 Conclusion

After 2 iterations, the best optimal solution we find is $f=729,\ x=27$ .

Due to the limitation of the page, we will not perform additional loops. In the following examples, we will show how to use code to perform multiple calculations on complex problems and specify stopping conditions.

2. Complex Example

We aim to maximize the function:

$f(x,y)=21.5+x\sin(4\pi x)+y\sin(20\pi y)$

subject to the constraints:

$-3.0\leq x\leq 12.1,\quad 4.1\leq y\leq 5.8.$

2.1 Encoding the Variables

Each chromosome represents a pair of variables $x$ and $y$ . We encode these variables into binary strings:

$x$ : $-3.0\leq x\leq 12.1$ , precision $0.1$ , requiring $8$ bits.
$y$ : $4.1\leq y\leq 5.8$ , precision $0.01$ , requiring $8$ bits.

Each chromosome is a 16-bit binary string, where the first 8 bits represent $x$ and the next 8 bits represent $y$ .

2.2 Initialization

We randomly generate an initial population of 6 chromosomes. Each chromosome is decoded to its respective $x$ and $y$ values using the formula:

${\text{Value}}={\text{Minimum}}+({\frac {\text{Binary Value}}{2^{\text{Bits}}-1}})\times {\text{Range}}.$

Initial Population

Initial Population
Chromosome (Binary)	$x$ Bits	$y$ Bits	$x$ (Decimal)	$y$ (Decimal)
10101100 11010101	10101100	11010101	$7.22$	$5.52$
01110011 00101110	01110011	00101110	$2.34$	$4.49$
11100001 01011100	11100001	01011100	$9.90$	$4.86$
00011010 10110011	00011010	10110011	$-1.88$	$5.30$
11001100 01100110	11001100	01100110	$6.50$	$4.72$
00110111 10001001	00110111	10001001	$-0.88$	$5.05$

2.3 Evaluation

Calculate the fitness $f(x,y)$ for each chromosome using the given function. Below are the computed fitness values:

Chromosome (Binary)	$x$	$y$	$f(x,y)$
10101100 11010101	$7.22$	$5.52$	$19.35$
01110011 00101110	$2.34$	$4.49$	$22.65$
11100001 01011100	$9.90$	$4.86$	$30.12$
00011010 10110011	$-1.88$	$5.30$	$16.40$
11001100 01100110	$6.50$	$4.72$	$28.75$
00110111 10001001	$-0.88$	$5.05$	$21.10$

2.4 Selection

Use roulette wheel selection to choose parents for crossover.

Total fitness:

${\text{Total Fitness}}=19.35+22.65+30.12+16.40+28.75+21.10=138.37$

Selection probabilities are calculated as:

Chromosome	Fitness $f(x,y)$	Selection Probability
10101100 11010101	$19.35$	$0.14$
01110011 00101110	$22.65$	$0.16$
11100001 01011100	$30.12$	$0.22$
00011010 10110011	$16.40$	$0.12$
11001100 01100110	$28.75$	$0.21$
00110111 10001001	$21.10$	$0.15$

Selected pairs for crossover:

Pair 1: Chromosome 3 and Chromosome 5
Pair 2: Chromosome 2 and Chromosome 6
Pair 3: Chromosome 1 and Chromosome 4

2.5 Crossover

Perform single-point crossover at position 8 (between $x$ and $y$ bits).

Pair 1:

Parent 1: $11100001\ 01011100$
Parent 2: $11001100\ 01100110$
Child 1: $11100001\ 01100110$
Child 2: $11001100\ 01011100$

Repeat for other pairs.

2.6 Mutation

Apply mutation with a small probability (e.g., 1% per bit). Suppose a mutation occurs in Child 1 at bit position 5 of the $x$ bits:

${\text{Original }}x:11100001\quad \longrightarrow \quad {\text{Mutated }}x:11101001.$

The new child becomes:

${\text{Child 1: 11101001 01100110.}}$

2.7 Insertion

Evaluate the fitness of offspring and combine with the current population. Select the top 6 chromosomes to form the next generation.

Next Generation
Chromosome (Binary)	$x$	$y$	$f(x,y)$
11101001 01100110	$10.79$	$4.78$	$30.85$
11100001 01011100	$9.90$	$4.86$	$30.12$
11001100 01100110	$6.50$	$4.72$	$28.75$
10101100 11010101	$7.22$	$5.52$	$26.30$
01110011 00101110	$2.34$	$4.49$	$22.65$
00110111 10001001	$-0.88$	$5.05$	$21.10$

2.8 Repeat Steps 2.3-2.7

Using the code (https://github.com/AcidHCL/CornellWiki) to perform the repeating process for 50 more generations we got

Fig.2. 50 iteration results

Based on Fig. 2 we can find the optimal value to be 35.2769.

Application

GA is one of the most important and successful algorithms in optimization, which can be demonstrated by numerous applications. Applications of GA include machine learning, engineering, finance and other domain applications, the following introduces the applications of GA in Unsupervised Regression, Virtual Power Plans and Forecasting Financial Market Indices.

Unsupervised Regression

The Unsupervised regression is a promising dimensionality reduction method^[6]. The concept of the Unsupervised regression is to map from low-dimensional space into the high-dimensional space by using a regression model^[7]. The goal of the Unsupervised Regression is to minimize the data space reconstruction error^[6]. Common optimization methods to achieve this goal are the Nearest neighbor regression and the Kernel regression^[6]. Since the mapping from the low-dimensional space into the high-dimensional space is usually complex, the data space reconstruction error function may become a non-convex multimodal function with multiple local optimal solutions. In this case, using GA can overcome local optima because of the population search, selection, crossover and mutation.

Virtual Power Plants

The renewable energy resources, such as wind power and solar power, have a distinct fluctuating character^[6]. To address the inconvenience of such fluctuations on the power grid, engineers have introduced the concept of virtual power plants, which bundles various different power sources into a single unit that meets specific properties^[6].

The optimization goal is to minimize the absolute value of power in the virtual power plants system with a rule base^[6]. This rule base is also known as the learning classifier system, which allows energy storage devices and standby power plants in the system to respond flexibly in different system states and achieve a balance between power consumption and generation^[6].

Since the rule base has complexity and a high degree of uncertainty, using GA can observably evolve the action part of the rule base. For example, in the complex energy scheduling process, GA can optimize the charge/discharge strategy of the energy storage equipment and the plan of start/stop of standby power plants to ensure the balance of power consumption and generation.

Forecasting Financial Market Indices

A financial market index consists of a weighted average of the prices of individual shares that make up the market^[8]. In financial markets, many traders and analysts believe that stock prices move in trends and repeat price patterns^[8]. Under this premise, using Grammatical Evolution(GE) to forecast the financial market indices and enhance trading decisions is a good choice.

GE is a machine learning method based on the GA^[8]. GE uses a biologically-inspired, genotype-phenotype mapping process, evolving computer program in any language^[8]. Unlike encoding the solution within the genetic material in the GA, the GE includes a many-to-one mapping process, which shows the robustness^[8].

While using GE to forecast financial market indices, people need to import processed historical stock price data. GE will learn price patterns in this data and generate models which can predict future price movements. These models can help traders and analysts identify the trend of the financial market, such as upward and downward trends.

Software tools and platforms that utilize Genetic Algorithms

MATLAB: The Global Optimization Toolbox of MATLAB is widely used for engineering simulations and machine learning.

Python: The DEAP and PyGAD in Python provide an environment for research and AI model optimization.

OpenGA: The OpenGA is a free C++ GA library, which is open-source. The OpenGA provides developers with a flexible way to implement GA for solving a variety of optimization problems^[9].

Conclusion

The GA is a versatile optimization tool inspired by evolutionary principles, excelling in solving complex and non-linear problems across diverse fields. Its applications, ranging from energy management to financial forecasting, highlight its adaptability and effectiveness. As computational capabilities advance, GA is poised to solve increasingly sophisticated challenges, reinforcing its relevance in both research and practical domains.

References

↑ Lambora, A., Gupta, K., & Chopra, K. (2019). Genetic Algorithm - A Literature Review. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, pp. 380–384.
↑ ^{Jump up to: 2.0} ^2.1 Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials. SIAM Journal on Computing, 2(2), 88–105
↑ ^{Jump up to: 3.0} ^3.1 ^3.2 ^3.3 Mirjalili, S. (2018). Genetic Algorithm. Evolutionary Algorithms and Neural Networks, Springer, pp. 43–56
↑ ^{Jump up to: 4.0} ^4.1 ^4.2 ^4.3 Katoch, S., Chauhan, S.S. & Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80, 8091–8126
↑ Lim, S. M., Sultan, A. B. M., Sulaiman, M. N., Mustapha, A., & Leong, K. Y. (2017). Crossover and mutation operators of genetic algorithms. International journal of machine learning and computing, 7(1), 9-12
↑ ^{Jump up to: 6.0} ^6.1 ^6.2 ^6.3 ^6.4 ^6.5 ^6.6 Kramer, O. (2017). Studies in Computational Intelligence 679 Genetic Algorithm Essentials.
↑ Kramer, O. (2016). Dimensionality reduction with unsupervised nearest neighbors.
↑ ^{Jump up to: 8.0} ^8.1 ^8.2 ^8.3 ^8.4 Chen, S.-H. (2012). Genetic Algorithms and Genetic Programming in Computational Finance. Springer Science & Business Media.
↑ Arash-codedev. (2019, September 18). Arash-codedev/openGA. GitHub. https://github.com/Arash-codedev/openGA

[1] Lambora, A., Gupta, K., & Chopra, K. (2019). Genetic Algorithm - A Literature Review. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, pp. 380–384.

[:2-2] {Jump up to: 2.0} ^2.1 Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials. SIAM Journal on Computing, 2(2), 88–105

[Mirjalili-3] {Jump up to: 3.0} ^3.1 ^3.2 ^3.3 Mirjalili, S. (2018). Genetic Algorithm. Evolutionary Algorithms and Neural Networks, Springer, pp. 43–56

[Katoch-4] {Jump up to: 4.0} ^4.1 ^4.2 ^4.3 Katoch, S., Chauhan, S.S. & Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80, 8091–8126

[5] Lim, S. M., Sultan, A. B. M., Sulaiman, M. N., Mustapha, A., & Leong, K. Y. (2017). Crossover and mutation operators of genetic algorithms. International journal of machine learning and computing, 7(1), 9-12

[:0-6] {Jump up to: 6.0} ^6.1 ^6.2 ^6.3 ^6.4 ^6.5 ^6.6 Kramer, O. (2017). Studies in Computational Intelligence 679 Genetic Algorithm Essentials.

[7] Kramer, O. (2016). Dimensionality reduction with unsupervised nearest neighbors.

[:1-8] {Jump up to: 8.0} ^8.1 ^8.2 ^8.3 ^8.4 Chen, S.-H. (2012). Genetic Algorithms and Genetic Programming in Computational Finance. Springer Science & Business Media.

[9] Arash-codedev. (2019, September 18). Arash-codedev/openGA. GitHub. https://github.com/Arash-codedev/openGA

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]