# Difference between revisions of "Set covering problem"

Authors: Sherry Liang, Khalid Alanazi, Kumail Al Hamoud

## Introduction

The set covering problem is a significant NP-hard problem in combinatorial optimization. In the set covering problem, two sets are given: a set U of elements and a set S of subsets of the set U. Each subset in S is associated with a predetermined cost, and the union of all the subsets covers the set U. This combinatorial problem then concerns finding the optimal number of subsets whose union covers the universal set while minimizing the total cost.1 The problem has many applications in the airline industry, and it was explored on an industrial scale as early as the 1970s.2

## Problem formulation

The mathematical formulation of the set covering problem is define as follows. We define U = {${\displaystyle u_{i}}$, ….. ${\displaystyle u_{m}}$} as the universe of elements and S = {${\displaystyle s_{1}}$, ….,${\displaystyle s_{n}}$} as a collection of subsets such that ${\displaystyle s_{i}}$ ${\displaystyle \in }$U and the union of ${\displaystyle s_{i}}$ covers all elements in U (i.e. ${\displaystyle \cup }$${\displaystyle s_{i}}$ = U ). Addionally, each set ${\displaystyle s_{i}}$ must cover at least one element of U and has associated cost ${\displaystyle c}$ that is larger than zero (i.e. ${\displaystyle c_{i}}$ > 0). The objective is to find the minimum cost sub-collection of sets X ${\displaystyle \in }$ S that covers all the elements in the universe U.

## Integer linear program formulation

An integer linear program (ILP) model can be formulated for the minimum set covering problem as follows:

Decision variables

${\displaystyle y_{i}={\begin{cases}1,&{\text{if subset }}i{\text{ is selected}}\\0,&{\text{otherwise }}\end{cases}}}$

Objective function

minimize ${\displaystyle \sum _{i=1}^{n}c_{i}y_{i}}$

Constraints

${\displaystyle \sum _{i=1}^{n}a_{ij}y_{i}>=1,\forall j=1,....,m}$

${\displaystyle y_{i}\in \{0,1\},\forall i=1,....,n}$

The objective function ${\displaystyle \sum _{i=1}^{n}c_{i}y_{i}}$ is defined to minimize the number of subset ${\displaystyle s_{i}}$ that cover all elements in the universe by minimizing their total cost. The first constraint implies that every element j in the universe U must be be covered where ${\displaystyle a_{ij}}$=1 if j ${\displaystyle \in }$ ${\displaystyle s_{i}}$ and 0 otherwise. The second constraint ${\displaystyle y_{i}\in \{0,1\}}$ indicates that the decision variables are binary.

Set covering problems are significant NP-hard optimization problems, which means that for large size instances, integer linear program algorithms can be difficult to solve in polynomial time. Therefore, there exist approximation algorithms that can solve large scale problems in polynomial time with optimal or near-optimal solutions. One of the widely used approximation algorithms for set covering is the classical greedy algorithms.7

## Greedy approximation algorithm

Greedy algorithms can be used to approximate for optimal or near-optimal solutions for large scale set covering instances in polynomial solvable time7,9. The greedy heuristics applies iterative process that, at each stage, select the largest number of uncovered elements in the universe U, and delete the uncovered elements, until all elements are covered. Let T be the set that contain the covered elements, and U be the set that contain the elements of Y that still uncovered. At the beginning of the iteration, T is empty and all elements Y belong to U. We iteratively select the set of S that covers the largest number of elements in U and add it to the covered elements in T. An example of this algorithm is presented below.

Greedy algorithm for minimum set cover example:

Step 0: ${\displaystyle T\in \Phi }$ \hfill { T stores the covered elements }

Step 1: While ${\displaystyle U\neq \Phi }$ do \hfill { U stores the uncovered elements Y}

Step 2: select ${\displaystyle s_{i}\in S}$ that covers the highest number of elements in U

Step 3: add ${\displaystyle s_{i}}$ to ${\displaystyle T}$

Step 4: remove ${\displaystyle s_{i}}$ from ${\displaystyle U}$

Step 5: End while

Step 6: Return ${\displaystyle S}$

## Numerical Example

Let’s consider a simple example where we assign cameras at different locations. Each location covers some areas of stadiums, and our goal is to put the least amount of cameras such that all areas of stadiums are covered. We have stadium areas from 1 to 15, and possible camera locations from 1 to 8.

We are given that camera location 1 covers stadium areas {1,3,4,6,7}, camera location 2 covers stadium areas {4,7,8,12}, while the remaining camera locations and the stadium areas that the cameras can cover are given in table 1 below:

Table 1
1 1,3,4,6,7
2 4,7,8,12
3 2,5,9,11,13
4 1,2,14,15
5 3,6,10,12,14
6 8,14,15
7 1,2,6,11
8 1,2,4,6,8,12

To reformulate the problem, we can list the stadium areas and the camera locations that cover each stadium area. For instance, stadium area 1 can be covered with camera location {1,4,7,8}, stadium area 2 can be covered with camera location {3,4,7,8}, while the rest are given in table 2 below:

Table 2
1 1,4,7,8
2 3,4,7,8
3 1,5
4 1,2,8
5 3
6 1,5,7,8
7 1,2
8 2,6,8
9 3
10 5
11 3,7
12 2,5,8
13 3
14 4,5,6
15 4,6

We can then represent the above information using binary values. If the stadium area ${\displaystyle i}$ can be covered with camera location ${\displaystyle j}$, then we have ${\displaystyle y_{ij}=1}$. If not,${\displaystyle y_{ij}=0}$. For instance, stadium area 1 is covered by camera location 1, so ${\displaystyle y_{11}=1}$, while stadium area 1 is not covered by camera location 2, so ${\displaystyle y_{12}=0}$. The binary variables ${\displaystyle y_{ij}}$ values are given in the table below:

Table 3
1 2 3 4 5 6 7 8
1 1 1 1 1
2 1 1 1 1
3 1 1
4 1 1 1
5 1
6 1 1 1 1
7 1 1
8 1 1 1
9 1
10 1
11 1 1
12 1 1 1
13 1
14 1 1 1
15 1 1

We introduce another binary variable ${\displaystyle z_{j}}$ to indicate if a camera is installed at location ${\displaystyle j}$. ${\displaystyle z_{j}=1}$ if camera is installed at location ${\displaystyle j}$, while ${\displaystyle z_{j}=0}$ if not.

Our objective is to minimize ${\displaystyle \sum _{j=1}^{8}z_{j}}$. For each stadium, there’s a constraint that the stadium area ${\displaystyle i}$ has to be covered by at least one camera location. For instance, for stadium area 1, we have ${\displaystyle z_{1}+z_{4}+z_{7}+z_{8}\geqslant 1}$, while for stadium 2, we have ${\displaystyle z_{3}+z_{4}+z_{7}+z_{8}\geqslant 1}$. All the 15 constraints that corresponds to 15 stadium areas are listed below:

Constraints 1 to 15:

${\displaystyle 1.z_{1}+z_{4}+z_{7}+z_{8}\geqslant 1}$

${\displaystyle 2.z_{3}+z_{4}+z_{7}+z_{8}\geqslant 1}$

${\displaystyle 3.z_{1}+z_{5}\geqslant 1}$

${\displaystyle 4.z_{1}+z_{2}+z_{8}\geqslant 1}$

${\displaystyle 5.z_{3}\geqslant 1}$

${\displaystyle 6.z_{1}+z_{5}+z_{7}+z_{8}\geqslant 1}$

${\displaystyle 7.z_{1}+z_{2}\geqslant 1}$

${\displaystyle 8.z_{2}+z_{6}+z_{8}\geqslant 1}$

${\displaystyle 9.z_{3}\geqslant 1}$

${\displaystyle 10.z_{5}\geqslant 1}$

${\displaystyle 11.z_{3}+z_{7}\geqslant 1}$

${\displaystyle 12.z_{2}+z_{5}+z_{8}\geqslant 1}$

${\displaystyle 13.z_{3}\geqslant 1}$

${\displaystyle 14.z_{4}+z_{5}+z_{6}\geqslant 1}$

${\displaystyle 15.z_{4}+z_{6}\geqslant 1}$

From constraint {5,9,13}, we can obtain ${\displaystyle z_{3}=1}$. Thus we no longer need constraint 2 and 11 as they are satisfied when ${\displaystyle z_{3}=1}$. With ${\displaystyle z_{3}=1}$ determined, the constraints left are:

${\displaystyle 1.z_{1}+z_{4}+z_{7}+z_{8}\geqslant 1}$

${\displaystyle 3.z_{1}+z_{5}\geqslant 1}$

${\displaystyle 4.z_{1}+z_{2}+z_{8}\geqslant 1}$

${\displaystyle 6.z_{1}+z_{5}+z_{7}+z_{8}\geqslant 1}$

${\displaystyle 7.z_{1}+z_{2}\geqslant 1}$

${\displaystyle 8.z_{2}+z_{6}+z_{8}\geqslant 1}$

${\displaystyle 10.z_{5}\geqslant 1}$

${\displaystyle 12.z_{2}+z_{5}+z_{8}\geqslant 1}$

${\displaystyle 14.z_{4}+z_{5}+z_{6}\geqslant 1}$

${\displaystyle 15.z_{4}+z_{6}\geqslant 1}$

Now if we take a look at constraint ${\displaystyle 10.z_{5}\geqslant 1}$ so ${\displaystyle z_{5}}$ shall equal to 1. As ${\displaystyle z_{5}=1}$, constraint {3,6,12,14} are satisfied no matter what other ${\displaystyle z}$ values are taken. If we also take a look at constraint 7 and 4, if constraint 4 will be satisfied as long as constraint 7 is satisfied since ${\displaystyle z}$ values are nonnegative, so constraint 4 is no longer needed. The remaining constraints are:

${\displaystyle 1.z_{1}+z_{4}+z_{7}+z_{8}\geqslant 1}$

${\displaystyle 7.z_{1}+z_{2}\geqslant 1}$

${\displaystyle 8.z_{2}+z_{6}+z_{8}\geqslant 1}$

${\displaystyle 15.z_{4}+z_{6}\geqslant 1}$

The next step is to focus on constraint 7 and 15. We can have at least 4 combinations of ${\displaystyle z_{1},z_{2},z_{4},z_{6}}$values.

${\displaystyle A:z_{1}=1,z_{2}=0,z_{4}=1,z_{6}=0}$

${\displaystyle B:z_{1}=1,z_{2}=0,z_{4}=0,z_{6}=1}$

${\displaystyle C:z_{1}=0,z_{2}=1,z_{4}=1,z_{6}=0}$

${\displaystyle D:z_{1}=0,z_{2}=1,z_{4}=0,z_{6}=1}$

We can then discuss each combination and determine ${\displaystyle z_{7},z_{8}}$values for constraint 1 and 8 to be satisfied.

Combination ${\displaystyle A}$: constraint 1 already satisfied, we need ${\displaystyle z_{8}=1}$ to satisfy constraint 8.

Combination ${\displaystyle B}$: constraint 1 already satisfied, constraint 8 already satisfied.

Combination ${\displaystyle C}$: constraint 1 already satisfied, constraint 8 already satisfied.

Combination ${\displaystyle D}$: we need ${\displaystyle z_{7}=1}$ or ${\displaystyle z_{8}=1}$ to satisfy constraint 1, while constraint 8 already satisfied.

Our final step is to compare the four combinations. Since our objective is to minimize ${\displaystyle \sum _{j=1}^{8}z_{j}}$ and combinations ${\displaystyle B}$ and ${\displaystyle C}$ require the least amount of ${\displaystyle z_{j}}$ to be 1, they are the optimal solutions.

To conclude, our two solutions are:

${\displaystyle Solution1:z_{1}=1,z_{3}=1,z_{5}=1,z_{6}=1}$

${\displaystyle Solution2:z_{2}=1,z_{3}=1,z_{4}=1,z_{5}=1}$

The minimum number of cameras that we need to install is 4.

## Applications

The applications of the set covering problem span a wide range of applications, but its usefulness is evident in industrial and governmental planning. Variations of the set covering problem that are of practical significance include the following.

The optimal location problem

This set covering problems is concerned with maximizing the coverage of some public facilities placed at different locations.3 Consider the problem of placing fire stations to serve the towns of some city.4 If each fire station can serve its town and all adjacent towns, we can formulate a set covering problem where each subset consists of a set of adjacent towns. The problem is then solved to minimize the required number of fire stations to serve the whole city.

Let ${\displaystyle y_{i}}$ be the decision variable corresponding to choosing to build a fire station at town ${\displaystyle i}$. Let ${\displaystyle S_{i}}$ be a subset of towns including town ${\displaystyle i}$ and all its neighbors. The problem is then formulated as follows.

minimize ${\displaystyle \sum _{i=1}^{n}y_{i}}$

such that ${\displaystyle \sum _{i\in S_{i}}y_{i}>=1,\forall i}$

The optimal route selection problem

Consider the problem of selecting the optimal bus routes to place pothole detectors. Due to the scarcity of the physical sensors, the problem does not allow for placing a detector on every road. The task of finding the maximum coverage using a limited number of detectors could be formulated as a set covering problem.5 Specifically, giving a collection of routes R, where each route itself is divided into segments. The segments of two routes can overlap. The goal is then to select the routes that maximize the number of covered segments.

This is quite different from other applications because it results in a maximization formulation, rather than a minimization formulation. Say we want to use at most ${\displaystyle k}$ routes. We want to find ${\displaystyle k}$ routes that maximize the number of covered segments. Let ${\displaystyle y_{i}}$ be the decision variable corresponding to choosing route ${\displaystyle R_{i}}$. The problem is then formulated as follows.

maximize ${\displaystyle \sum _{i=1}^{N}R_{i}}$

such that ${\displaystyle \sum _{i}y_{i}=k}$

For a greedy approximation solution, please check the work by Ali and Dyo.5 .

The airline crew scheduling problem

An important application of large-scale set covering is the airline crew scheduling problem, which pertains to assigning airline staff to work shifts.2,6 Thinking of the collection of flights as a universal set to be covered, we can formulate a set covering problem to search for the optimal assignment of employees to flights. Due to the complexity of airline schedules, this problem is usually divided into two subproblems: crew pairing and crew assignment. We refer the interested reader to this survey for a detailed analysis of the formulation and algorithms that pertain to this significant application.8

## Conclusion

The set covering problem, which aims to find the least number of subsets that cover some universal set, is a widely known NP-hard combinatorial problem. Due to its applicability to route planning and airline crew scheduling, several methods have been proposed to solve it. Its straightforward formulation allows for the use of off-the-shelf optimizers to solve it. Moreover, heuristic techniques and greedy algorithms can be used to solve large-scale set covering problems for industrial applications.

## References

1. Grossman, T., & Wool, A. (1997). Computational experience with approximation algorithms for the set covering problem. European Journal of Operational Research, 101(1), 81-92. doi:10.1016/s0377-2217(96)00161-0
2. RUBIN, J. (1973). A Technique for the Solution of Massive Set Covering Problems, with Application to Airline Crew Scheduling. Transportation Science, 7(1), 34-48. Retrieved November 23, 2020, from http://www.jstor.org/stable/25767684
3. Church, R., ReVelle, C. The maximal covering location problem. Papers of the Regional Science Association 32, 101–118 (1974). https://doi.org/10.1007/BF01942293
4. Aktaş, E., Özaydın, Ö, Bozkaya, B., Ülengin, F., & Önsel, Ş. (2013). Optimizing Fire Station Locations for the Istanbul Metropolitan Municipality. Interfaces, 43(3), 240-255. doi:10.1287/inte.1120.0671
5. Ali, J., & Dyo, V. (2017). Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach. Proceedings of the 14th International Joint Conference on E-Business and Telecommunications. doi:10.5220/0006469800830088
6. Marchiori, E., & Steenbeek, A. (2000). An Evolutionary Algorithm for Large Scale Set Covering Problems with Application to Airline Crew Scheduling. EvoWorkshops.
7. Petr Slavı́k. (1997). A Tight Analysis of the Greedy Algorithm for Set Cover. Journal of Algorithms,, Volume 25, Issue 2, Pages 237-254 https://doi.org/10.1006/jagm.1997.0887.
8. Kasirzadeh, A., Saddoune, M. & Soumis, F. Airline crew scheduling: models, algorithms, and data sets. EURO J Transp Logist 6, 111–137 (2017). https://doi.org/10.1007/s13676-015-0080-x
9. Vasko, Francis J., et al. “What Is the Best Greedy-like Heuristic for the Weighted Set Covering Problem?” Operations Research Letters, North-Holland, 22 Mar. 2016, www.sciencedirect.com/science/article/pii/S016763771600047X.