Difference between revisions of "Matrix game (LP for game theory)"

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
Line 22: Line 22:
 
Consider a simple two-player zero-sum matrix game called Evens and Odds. In this game, two players each wager $1 before simultaneously showing either one or two fingers. If the sum of the fingers showing is even, player 1 wins the pot for that round ($2). If the sum of the fingers showing is odd, player 2 wins the pot for that round. As with all matrix games, the assumption that both players are rational and intelligent decision makers with the goal of maximizing their own total payoff in each round applies. The expected utility for each player can be defined using a payoff matrix, ''P''. In this payoff matrix, the rows and columns represent the decisions of player 1 and player 2 respectively. The below payoff matrix represents the payoff to player 1 in this matrix game.
 
Consider a simple two-player zero-sum matrix game called Evens and Odds. In this game, two players each wager $1 before simultaneously showing either one or two fingers. If the sum of the fingers showing is even, player 1 wins the pot for that round ($2). If the sum of the fingers showing is odd, player 2 wins the pot for that round. As with all matrix games, the assumption that both players are rational and intelligent decision makers with the goal of maximizing their own total payoff in each round applies. The expected utility for each player can be defined using a payoff matrix, ''P''. In this payoff matrix, the rows and columns represent the decisions of player 1 and player 2 respectively. The below payoff matrix represents the payoff to player 1 in this matrix game.
  
<math display="block">P=\begin{bmatrix} 2 & -2 \\ -2 & 2 \end{bmatrix}</math>
+
<math>P=\begin{bmatrix} 2 & -2 \\ -2 & 2 \end{bmatrix}</math>
  
 
In this example, since each player has an equal ½ probability of throwing one or two fingers, neither player has a distinct advantage. Consider now a less-trivial game where the payoff matrix is no longer evenly distributed, shown below.
 
In this example, since each player has an equal ½ probability of throwing one or two fingers, neither player has a distinct advantage. Consider now a less-trivial game where the payoff matrix is no longer evenly distributed, shown below.
  
<math display="block">P=\begin{bmatrix} 1 & -2 \\ -3 & 2 \end{bmatrix}</math>
+
<math>P=\begin{bmatrix} 1 & -2 \\ -3 & 2 \end{bmatrix}</math>
  
 
While it may be intuitive that player 1 has the edge in this new game, making this determination is not as clear for much more complicated games. This is where the mathematics behind game theory comes into play. Consider a more general form of a two-person zero-sum game where two players are allowed to pick from a finite set of actions. Let <math>i  </math> represent the finite set of actions that player one (or the “row player”) can choose from for all <math>i\in (1,2,...,n) </math>''.'' Likewise, let <math>j </math> represent the finite set of actions that player two (or the “column player”) can choose from for all <math>j\in (1,2,...,m) </math>. The general form of the payoff matrix for a matrix game is now shown below. Note that all positive payments go to the row player and all negative payments go to the column player.
 
While it may be intuitive that player 1 has the edge in this new game, making this determination is not as clear for much more complicated games. This is where the mathematics behind game theory comes into play. Consider a more general form of a two-person zero-sum game where two players are allowed to pick from a finite set of actions. Let <math>i  </math> represent the finite set of actions that player one (or the “row player”) can choose from for all <math>i\in (1,2,...,n) </math>''.'' Likewise, let <math>j </math> represent the finite set of actions that player two (or the “column player”) can choose from for all <math>j\in (1,2,...,m) </math>. The general form of the payoff matrix for a matrix game is now shown below. Note that all positive payments go to the row player and all negative payments go to the column player.
  
<math display="block">P = [p_{ij}]</math>
+
<math>P = [p_{ij}]</math>
  
 
Next, we assume that each player is making a random selection in accordance with a fixed probability distribution. This probability distribution is defined by what is called the ''stochastic vector,'' <math>y</math>. Each component of the stochastic vector, <math>y_i </math>, denotes the probability that the row player selects action <math>i </math>. This stochastic vector is made up of nonnegative probabilities that sum up to one per the fundamental law of probability:
 
Next, we assume that each player is making a random selection in accordance with a fixed probability distribution. This probability distribution is defined by what is called the ''stochastic vector,'' <math>y</math>. Each component of the stochastic vector, <math>y_i </math>, denotes the probability that the row player selects action <math>i </math>. This stochastic vector is made up of nonnegative probabilities that sum up to one per the fundamental law of probability:
  
<math display="block">y \geq 0 \text{    and    } e^Ty=1, </math>
+
<math>y \geq 0 \text{    and    } e^Ty=1, </math>
  
 
where e is a vector of all ones. Likewise, the stochastic vector for the column player can be defined as <math>x </math>, with the probabilities that this player selects action <math>j </math> denoted by<math>x_j </math>. To compute the expected payoff to the column player, the payoff from each outcome <math>(i,j) </math> in the sets <math>i\in (1,2,...,n) </math> and <math>j\in (1,2,...,m) </math> times the probability of that outcome are summed. Thus, the column player’s expected payoff is defined as
 
where e is a vector of all ones. Likewise, the stochastic vector for the column player can be defined as <math>x </math>, with the probabilities that this player selects action <math>j </math> denoted by<math>x_j </math>. To compute the expected payoff to the column player, the payoff from each outcome <math>(i,j) </math> in the sets <math>i\in (1,2,...,n) </math> and <math>j\in (1,2,...,m) </math> times the probability of that outcome are summed. Thus, the column player’s expected payoff is defined as
  
<math display="block">\sum_{i,j}y_ia_{ij}x_j = y^T Px</math>.
+
<math>\sum_{i,j}y_ia_{ij}x_j = y^T Px</math>.
  
 
Since we have assumed that our column player acts rationally, we can expect them to act in accordance with the stochastic vector x. In other words, the column player has adopted strategy x. The row player’s best option for defending against strategy x is to adopt strategy y*, in which they act to minimize the column player’s payout:
 
Since we have assumed that our column player acts rationally, we can expect them to act in accordance with the stochastic vector x. In other words, the column player has adopted strategy x. The row player’s best option for defending against strategy x is to adopt strategy y*, in which they act to minimize the column player’s payout:
  
<math display="block">\begin{align}
+
<math>\begin{align}
 
\text{min} & ~~ y^TPx\\
 
\text{min} & ~~ y^TPx\\
 
\text{s.t} & ~~ e^Ty=1 \\
 
\text{s.t} & ~~ e^Ty=1 \\
Line 50: Line 50:
 
By assuming that our column player acts intelligently, this implies that they are aware of the row player’s strategy to minimize their payoff. Hence, the column player can employ strategy x* that maximizes their payoff given the row player’s strategy y*:
 
By assuming that our column player acts intelligently, this implies that they are aware of the row player’s strategy to minimize their payoff. Hence, the column player can employ strategy x* that maximizes their payoff given the row player’s strategy y*:
  
<math display="block">\max_{x} \min_{y} y^T Px</math>  
+
<math>\max_{x} \min_{y} y^T Px</math>  
  
 
The above equation can be solved by reformulating it as a linear program. By taking the inner optimization over the deterministic strategies, this equation can be re-written as:  
 
The above equation can be solved by reformulating it as a linear program. By taking the inner optimization over the deterministic strategies, this equation can be re-written as:  
  
<math display="block">\begin{align}
+
<math>\begin{align}
 
\text{max} & ~~ \text{min}_i e_i ^T Ax\\
 
\text{max} & ~~ \text{min}_i e_i ^T Ax\\
 
\text{s.t} & ~~ \sum_{j=1}^n x_{j} = 1\\
 
\text{s.t} & ~~ \sum_{j=1}^n x_{j} = 1\\
Line 62: Line 62:
 
In order to put a lower bound on the minimization term, a new variable ''v'' is introduced. This gives us the following linear program:
 
In order to put a lower bound on the minimization term, a new variable ''v'' is introduced. This gives us the following linear program:
  
<math display="block">\begin{align}
+
<math>\begin{align}
 
\text{max} & ~~ v\\
 
\text{max} & ~~ v\\
 
\text{s.t} & ~~ v \leq e_i^T Px & ~~ i = 1, 2, ..., m\\
 
\text{s.t} & ~~ v \leq e_i^T Px & ~~ i = 1, 2, ..., m\\
Line 71: Line 71:
 
or in vector notation,
 
or in vector notation,
  
<math display="block">\begin{align}
+
<math>\begin{align}
 
\text{max} & ~~ v\\
 
\text{max} & ~~ v\\
 
\text{s.t} & ~~ ve-Px \leq 0\\
 
\text{s.t} & ~~ ve-Px \leq 0\\
Line 80: Line 80:
 
The above max-min linear program governs the column player’s strategy x*. We can use this linear program to determine the row player’s strategy y* by taking the duel to yield a min-max linear program:
 
The above max-min linear program governs the column player’s strategy x*. We can use this linear program to determine the row player’s strategy y* by taking the duel to yield a min-max linear program:
  
<math display="block">\min_{x} \max_{y} y^T Px</math>
+
<math>\min_{x} \max_{y} y^T Px</math>
  
 
Similarly to the max-min linear program used for the column player’s strategy, the above equation can be reformulated into a linear program by taking the inner optimization over the deterministic strategies and introducing a new variable u:
 
Similarly to the max-min linear program used for the column player’s strategy, the above equation can be reformulated into a linear program by taking the inner optimization over the deterministic strategies and introducing a new variable u:
  
<math display="block">\begin{align}
+
<math>\begin{align}
 
\text{max} & ~~ u\\
 
\text{max} & ~~ u\\
 
\text{s.t} & ~~ ue-P^Ty \leq 0\\
 
\text{s.t} & ~~ ue-P^Ty \leq 0\\
Line 93: Line 93:
 
These linear programs can be solved to find the optimal strategies <math>x*</math> and <math>y*</math>. The Minimax Theorem can now be used to verify that both solutions are consistent with one another. The Minimax Theorem states that there exist stochastic vectors <math>x*</math>and <math>y*</math>for which
 
These linear programs can be solved to find the optimal strategies <math>x*</math> and <math>y*</math>. The Minimax Theorem can now be used to verify that both solutions are consistent with one another. The Minimax Theorem states that there exist stochastic vectors <math>x*</math>and <math>y*</math>for which
  
<math display="block">\max_{x} y^{*T} Px = \min_{y} y^T Ax^*</math>
+
<math>\max_{x} y^{*T} Px = \min_{y} y^T Ax^*</math>
  
 
In order to prove the Minimax Theorem, we first consider the fact that
 
In order to prove the Minimax Theorem, we first consider the fact that
  
<math display="block">v^* = \min_{i} e_i ^T Px^* = \min_{y} y^T Ax*,</math>
+
<math>v^* = \min_{i} e_i ^T Px^* = \min_{y} y^T Ax*,</math>
  
 
and
 
and
  
<math display="block">u* = \max_{j} e_j ^T P^T y^* = \max_{x} x^T A^T y* = \max_{x} y^{*T}Ax</math>
+
<math>u* = \max_{j} e_j ^T P^T y^* = \max_{x} x^T A^T y* = \max_{x} y^{*T}Ax</math>
  
 
Since the max-min linear program for x* and the min-max linear program for y* are duals of one another, we can assume that v* = u*. Therefore,
 
Since the max-min linear program for x* and the min-max linear program for y* are duals of one another, we can assume that v* = u*. Therefore,
  
<math display="block">\max_{x} y^{*T} Px = \min_{y} y^T Ax^*</math>
+
<math>\max_{x} y^{*T} Px = \min_{y} y^T Ax^*</math>
  
 
By solving the above equation for the optimal values v* = u* yields what is called the value of the game. The value of a game shows how much utility each player can expect to gain or lose on average. In the event that v* = u* = 0, the game is considered to be fair, meaning neither player has a distinct disadvantage. In order to illustrate the power of the minimax theorem in solving matrix games, a numerical example has been provided in the section below.
 
By solving the above equation for the optimal values v* = u* yields what is called the value of the game. The value of a game shows how much utility each player can expect to gain or lose on average. In the event that v* = u* = 0, the game is considered to be fair, meaning neither player has a distinct disadvantage. In order to illustrate the power of the minimax theorem in solving matrix games, a numerical example has been provided in the section below.
Line 143: Line 143:
 
The above payoff table can also be depicted by the following payoff matrix, <math>P</math>, where the columns represent the defensive team's actions and the rows represent the offensive team's actions.
 
The above payoff table can also be depicted by the following payoff matrix, <math>P</math>, where the columns represent the defensive team's actions and the rows represent the offensive team's actions.
  
<math>\begin{bmatrix} -7 & 7 & 3 \\ 7 & -7 & 3 \\ 3 & 3 & 3 \end{bmatrix}</math>
+
<math>P = \begin{bmatrix} -7 & 7 & 3 \\ 7 & -7 & 3 \\ 3 & 3 & 3 \end{bmatrix}</math>
 
 
<math>P = \begin{bmatrix} x & y & z \\ z & v & v \\ x & x & x \end{bmatrix}</math>
 
  
 
== Other Applications of the Matrix Game ==
 
== Other Applications of the Matrix Game ==

Revision as of 21:25, 22 November 2020

Author: David Oswalt (SysEn 6800 Fall 2020)

Steward: Wei-Han Chen, Fengqi You

Game Theory and Linear Programming

John von Neumann (1903–1957) and Oskar Morgenstern (1902–1977)

Game theory can be defined as a formal language for modeling and analyzing the interactive behaviors of intelligent, rational decision-makers (or players). Game theory provides the mathematical methods necessary to analyze the decisions of two or more players based on their preferences to determine a final outcome. The theory was first conceptualized by mathematician Ernst Zermelo in the early 20th century. However, John von Neumann pioneered modern game theory through his book Theory of Games and Economic Behavior, written with co-author Oskar Morgenstern. For this reason, John con Neumann is often credited by historians as the Father of Game Theory[1][2]. This theory has provided a framework for approaching complex, high-pressure situations and has a broad spectrum of applications. These applications of game theory have helped shape modern economics and social sciences as we know them today and are discussed in the Applications section below.

Analyzing game theoretic situations is a practical application of linear programming. These situations can get quite complex mathematically, but one of the simplest forms of game is called the Finite Two-Person Zero-Sum Game (or Matrix Game for short).  In a Matrix Game, two players are involved in a competitive situation in which one player’s loss is the other’s gain. Some common terms related to the Matrix Game that will be used throughout this chapter have been defined below:

Game – Any social situation involving two or more individuals [2].

Players – The individuals involved in a game. In the case of two-person zero-sum games, these players are assumed to be rational and intelligent [2].

Rationality – A decision maker is considered to be rational if he or she makes decisions consistently in pursuit of his or her own objectives. Assuming a player to be rational implies that said player’s objective is to maximize his or her own payoff [2].

Utility – The scale upon which a decision’s payoff is measured [2].

Analyzing these games uses John von Neumann’s Minimax Theorem that was derived using the Brouwer Fixed-Point Theorem. However, over time it was proven that the Matrix Game could be solved using Linear Programming along with the Duality Theorem[3]. This solution to the Matrix game has been proven in the Theory and Algorithmic Discussion section below.

Theory and Algorithmic Discussion

Consider a simple two-player zero-sum matrix game called Evens and Odds. In this game, two players each wager $1 before simultaneously showing either one or two fingers. If the sum of the fingers showing is even, player 1 wins the pot for that round ($2). If the sum of the fingers showing is odd, player 2 wins the pot for that round. As with all matrix games, the assumption that both players are rational and intelligent decision makers with the goal of maximizing their own total payoff in each round applies. The expected utility for each player can be defined using a payoff matrix, P. In this payoff matrix, the rows and columns represent the decisions of player 1 and player 2 respectively. The below payoff matrix represents the payoff to player 1 in this matrix game.

In this example, since each player has an equal ½ probability of throwing one or two fingers, neither player has a distinct advantage. Consider now a less-trivial game where the payoff matrix is no longer evenly distributed, shown below.

While it may be intuitive that player 1 has the edge in this new game, making this determination is not as clear for much more complicated games. This is where the mathematics behind game theory comes into play. Consider a more general form of a two-person zero-sum game where two players are allowed to pick from a finite set of actions. Let represent the finite set of actions that player one (or the “row player”) can choose from for all . Likewise, let represent the finite set of actions that player two (or the “column player”) can choose from for all . The general form of the payoff matrix for a matrix game is now shown below. Note that all positive payments go to the row player and all negative payments go to the column player.

Next, we assume that each player is making a random selection in accordance with a fixed probability distribution. This probability distribution is defined by what is called the stochastic vector, . Each component of the stochastic vector, , denotes the probability that the row player selects action . This stochastic vector is made up of nonnegative probabilities that sum up to one per the fundamental law of probability:

where e is a vector of all ones. Likewise, the stochastic vector for the column player can be defined as , with the probabilities that this player selects action denoted by. To compute the expected payoff to the column player, the payoff from each outcome in the sets and times the probability of that outcome are summed. Thus, the column player’s expected payoff is defined as

.

Since we have assumed that our column player acts rationally, we can expect them to act in accordance with the stochastic vector x. In other words, the column player has adopted strategy x. The row player’s best option for defending against strategy x is to adopt strategy y*, in which they act to minimize the column player’s payout:

By assuming that our column player acts intelligently, this implies that they are aware of the row player’s strategy to minimize their payoff. Hence, the column player can employ strategy x* that maximizes their payoff given the row player’s strategy y*:

The above equation can be solved by reformulating it as a linear program. By taking the inner optimization over the deterministic strategies, this equation can be re-written as:

In order to put a lower bound on the minimization term, a new variable v is introduced. This gives us the following linear program:

or in vector notation,

The above max-min linear program governs the column player’s strategy x*. We can use this linear program to determine the row player’s strategy y* by taking the duel to yield a min-max linear program:

Similarly to the max-min linear program used for the column player’s strategy, the above equation can be reformulated into a linear program by taking the inner optimization over the deterministic strategies and introducing a new variable u:

These linear programs can be solved to find the optimal strategies and . The Minimax Theorem can now be used to verify that both solutions are consistent with one another. The Minimax Theorem states that there exist stochastic vectors and for which

In order to prove the Minimax Theorem, we first consider the fact that

and

Since the max-min linear program for x* and the min-max linear program for y* are duals of one another, we can assume that v* = u*. Therefore,

By solving the above equation for the optimal values v* = u* yields what is called the value of the game. The value of a game shows how much utility each player can expect to gain or lose on average. In the event that v* = u* = 0, the game is considered to be fair, meaning neither player has a distinct disadvantage. In order to illustrate the power of the minimax theorem in solving matrix games, a numerical example has been provided in the section below.

Numerical Example

Many decisions made in sports can be modeled as finite two-person zero-sum games. Take, for example, a common dilemma seen in American football. The offense has driven down the field and is just a few short yards of scoring. The team has four plays, or downs, to score. On the third down, the team gets stopped by the defense and is unable to score, leaving only one more play to make it happen. There are two options for scoring. The first is a field goal, in which the team kicks the ball through the uprights for 3 points. The second option is to run a passing or running play for a touchdown, worth 7 points. This is often referred to as a “Fourth and Goal” situation and is a dilemma that play-callers face in most football games. While the option of scoring a touchdown yields a higher payoff, it is a much risker option as running and passing plays are easier to defend against than a field goal. For this reason, football coaches often settle on kicking a field goal on 4th down instead of going for it. This anticlimactic end to a long and exciting drive often leaves fans with an unsatisfying feeling, knowing that their team was only a few yards from scoring a touchdown. While kicking the field goal nearly guarantees 3 points, is it smarter to employ a more aggressive strategy and go for the touchdown? Game theory can help determine the strategy that will yield the highest amount of points on average over time.

There are a few assumptions to be made in order to model this 4th Down Dilemma. The first is that both football teams are ideal. What this means is that if the offense chooses a run play and the defense chooses to defend a run play, then the run will be stopped with zero yards gained. It also means that if the offense chooses a run play and the defense incorrectly chooses to defend a passing play, then the play will be successful with a touchdown scored. We are also assuming that if the offense chooses to kick a field goal, then it is guaranteed to be successful. This is assumed due to the fact that field goals from just a few yards out are very rarely missed. The final assumption is that all other factors contributing to play calling are neglected. This could include situations such as the offense being down 2 points with only a few seconds on the clock, when a field goal for 3 points would be the obvious best strategy. With this strategy in mind, a the payoff to the offense can be outlined as follows:

Payoff to Offense - 4th Down Dilemma
Defense
Run Pass FG
Offense Run -7 7 3
Pass 7 -7 3
FG 3 3 3

The above payoff table can also be depicted by the following payoff matrix, , where the columns represent the defensive team's actions and the rows represent the offensive team's actions.

Other Applications of the Matrix Game

The rise of game theory spanned the time frame in which both World War I and World War II occurred, so naturally one of the earliest applications was in developing winning military strategies. Game theory was used to make high-pressure decisions on attack and defense strategies that optimized their impact within a set of constraints. The Battle of Bismarck Sea between Japanese and American forces in 1943 is one of the most historic examples of game theory in this context. In this battle, the US Air Force analyzed an attack situation using a two-person zero-sum game to maximize the amount of time they had to bomb a Japanese naval fleet, given the limited information they had about the convoy’s route. This demonstrates the fact that the word “game” in “game theory” can be misleading. Not all applications of game theory are fun games. One of the other earlier applications of game theory was in economics. This ended up growing into one of the more significant applications of game theory and has formed modern economics as we know it today. The theory played a major role in the development of many sub-disciplines of economics, such as industrial organization, international trade, labor economics, and macroeconomics [1]. As game theory matured, its applications expanded into various fields of social science, including political science, international relations, philosophy, sociology and anthropology. It is also used in biology and computer science. To this day, economics remains the most prominent application of game theory.

Conclusion

  • TBD

References

  • Add later