Skip to content

Risk-Averse Quantal Equilibrium (RQE)

rqequilibrium.rqe

Player dataclass

Concept of a player in the RQE game.

Attributes:

Name Type Description
tau float

Risk aversion parameter.

epsilon float

Bounded Rational parameter.

game_matrix ndarray

The payoff matrix for the player.

Source code in src/rqequilibrium/rqe.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@dataclass
class Player:
    """
    Concept of a player in the RQE game.

    Attributes:
        tau: Risk aversion parameter.
        epsilon: Bounded Rational parameter.
        game_matrix: The payoff matrix for the player.
    """

    tau: float
    epsilon: float
    game_matrix: np.ndarray

RQE

RQE (Risk Quantal Response Equilibrium) solver for multi-player games.

This class implements the RQE solution concept, which combines risk aversion and bounded rationality in a multi-player setting. It uses projected gradient descent to optimize the policies of players.

Attributes:

Name Type Description
players list[Player]

List of Player objects representing the players in the game.

lr float

Learning rate for the optimization.

max_iter int

Maximum number of iterations for the optimization.

quantal_function Callable

Function to compute the quantal response.

risk_function Callable

Function to compute the risk term.

projection Callable

Function to project policies onto a simplex.

Methods:

Name Description
risk_term

Computes the risk term for a player given the game matrix, policy, and other player's policy.

quantal_term

Computes the quantal response term for a player given the game matrix, policy, and epsilon parameter.

optimize

Optimizes the policies for all players using projected gradient descent.

print_game

Prints the game matrices for two player games.

Source code in src/rqequilibrium/rqe.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
class RQE:
    """
    RQE (Risk Quantal Response Equilibrium) solver for multi-player games.

    This class implements the RQE solution concept, which combines risk aversion and bounded rationality
    in a multi-player setting. It uses projected gradient descent to optimize the policies of players.

    Attributes:
        players (list[Player]): List of Player objects representing the players in the game.
        lr (float): Learning rate for the optimization.
        max_iter (int): Maximum number of iterations for the optimization.
        quantal_function: Function to compute the quantal response.
        risk_function: Function to compute the risk term.
        projection: Function to project policies onto a simplex.

    Methods:
        risk_term: Computes the risk term for a player given the game matrix, policy, and other player's policy.
        quantal_term: Computes the quantal response term for a player given the game matrix, policy, and epsilon parameter.
        optimize: Optimizes the policies for all players using projected gradient descent.
        print_game: Prints the game matrices for two player games.
    """

    quantal_function: Callable
    risk_function: Callable
    players: List[Player]
    projection: Callable
    lr: float
    max_iter: int
    br_iters: int
    EPS: float = 1e-12

    def __init__(
        self,
        players: List[Player],
        lr: float = 0.1,
        max_iter: int = 500,
        br_iters: int = 50,
        quantal_function: Union[Callable, str] = "log_barrier",
        risk_function: Union[Callable, str] = "kl_divergence",
        projection: Callable = project_simplex,
    ):
        self.players = players
        self.lr = lr
        self.max_iter = max_iter
        self.br_iters = br_iters
        self.EPS = 1e-12
        self.projection = projection

        if hasattr(quantal_function, "__call__"):
            self.quantal_function = quantal_function
        elif quantal_function == "negative_entropy":
            self.quantal_function = negative_entropy
        elif quantal_function == "log_barrier":
            self.quantal_function = log_barrier
        else:
            raise ValueError("Invalid quantal function specified.")

        if hasattr(risk_function, "__call__"):
            self.risk_function = risk_function
        elif risk_function == "kl_divergence":
            self.risk_function = kl_divergence
        elif risk_function == "kl_reversed":
            self.risk_function = kl_reversed

        self.grad_risk = grad(self.risk_function)
        self.grad_quantal = grad(self.quantal_function)

    def risk_term(
        self, game: np.ndarray, x: np.ndarray, p: np.ndarray, y: np.ndarray, tau: float
    ) -> np.array:
        """
        Compute the risk term for a player given the game matrix, policy, and other player's policy.

        Parameters:
            game: The game matrix for the player.
            x: The current policy of the player.
            p: The last risk aversion term.
            y: The policy of all the other players.
        Returns:
            The gradient of the risk term
        """
        return game.T @ x + (1 / tau) * self.grad_risk(p, y)

    def quantal_term(
        self, game: np.ndarray, x: np.ndarray, p: np.ndarray, epsilon: float
    ) -> np.array:
        """
        Compute the quantal response term for a player given the game matrix, policy and epsilon parameter.
        Parameters:
            game: The game matrix for the player.
            x: The current policy of the player.
            p: The risk aversion term.
            epsilon: The epsilon parameter for bounded rationality.
        Returns:
            The gradient of the quantal term
        """
        return -game @ p + epsilon * self.grad_quantal(x)

    def optimize(self) -> np.ndarray:
        """
        Optimize the policies for both players using projected gradient descent.

        Returns:
            The optimal policy using RQE
        """

        num_players = len(self.players)  # Number of players
        max_action_set = max(player.game_matrix.shape[1] for player in self.players)

        # Initialize the Projected Gradient Descent optimizer
        pgd = ProjectedGradientDescent(
            lr=self.lr,
            projection=self.projection,
        )

        # Initialize random policies for both players
        policies = np.random.rand(num_players, max_action_set)
        risk_policies = np.random.rand(num_players, max_action_set)
        policies /= np.sum(policies, axis=1, keepdims=True)
        risk_policies /= np.sum(risk_policies, axis=1, keepdims=True)

        for _ in range(self.max_iter):
            # Compute the quantal and risk terms for both players
            policies_buff = policies.copy()
            risk_buff = risk_policies.copy()
            for i, player in enumerate(self.players):
                game = player.game_matrix if i % 2 == 0 else player.game_matrix

                quantal_grad = self.quantal_term(
                    game, policies_buff[i], risk_buff[i], player.epsilon
                )
                opponnet_policies = np.delete(policies_buff, i, axis=0)

                risk_grad = self.risk_term(
                    game,
                    policies_buff[i],
                    risk_buff[i],
                    opponnet_policies[0],
                    player.tau,
                )

                policies_buff[i] = pgd.step(policies[i], quantal_grad)
                risk_buff[i] = pgd.step(risk_policies[i], risk_grad)

            risk_policies = risk_buff
            policies = policies_buff

        return policies

    @staticmethod
    def print_game(R1: np.ndarray, R2: np.ndarray):
        """
        Print the game matrices for both players.
        """
        for i in range(R1.shape[0]):
            row = []
            for j in range(R1.shape[1]):
                row.append(f"{int(R1[i, j])}, {int(R2[i, j])}")
            print(" | ".join(row))

optimize

optimize() -> np.ndarray

Optimize the policies for both players using projected gradient descent.

Returns:

Type Description
ndarray

The optimal policy using RQE

Source code in src/rqequilibrium/rqe.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def optimize(self) -> np.ndarray:
    """
    Optimize the policies for both players using projected gradient descent.

    Returns:
        The optimal policy using RQE
    """

    num_players = len(self.players)  # Number of players
    max_action_set = max(player.game_matrix.shape[1] for player in self.players)

    # Initialize the Projected Gradient Descent optimizer
    pgd = ProjectedGradientDescent(
        lr=self.lr,
        projection=self.projection,
    )

    # Initialize random policies for both players
    policies = np.random.rand(num_players, max_action_set)
    risk_policies = np.random.rand(num_players, max_action_set)
    policies /= np.sum(policies, axis=1, keepdims=True)
    risk_policies /= np.sum(risk_policies, axis=1, keepdims=True)

    for _ in range(self.max_iter):
        # Compute the quantal and risk terms for both players
        policies_buff = policies.copy()
        risk_buff = risk_policies.copy()
        for i, player in enumerate(self.players):
            game = player.game_matrix if i % 2 == 0 else player.game_matrix

            quantal_grad = self.quantal_term(
                game, policies_buff[i], risk_buff[i], player.epsilon
            )
            opponnet_policies = np.delete(policies_buff, i, axis=0)

            risk_grad = self.risk_term(
                game,
                policies_buff[i],
                risk_buff[i],
                opponnet_policies[0],
                player.tau,
            )

            policies_buff[i] = pgd.step(policies[i], quantal_grad)
            risk_buff[i] = pgd.step(risk_policies[i], risk_grad)

        risk_policies = risk_buff
        policies = policies_buff

    return policies

print_game staticmethod

print_game(R1: ndarray, R2: ndarray)

Print the game matrices for both players.

Source code in src/rqequilibrium/rqe.py
184
185
186
187
188
189
190
191
192
193
@staticmethod
def print_game(R1: np.ndarray, R2: np.ndarray):
    """
    Print the game matrices for both players.
    """
    for i in range(R1.shape[0]):
        row = []
        for j in range(R1.shape[1]):
            row.append(f"{int(R1[i, j])}, {int(R2[i, j])}")
        print(" | ".join(row))

quantal_term

quantal_term(game: ndarray, x: ndarray, p: ndarray, epsilon: float) -> np.array

Compute the quantal response term for a player given the game matrix, policy and epsilon parameter. Parameters: game: The game matrix for the player. x: The current policy of the player. p: The risk aversion term. epsilon: The epsilon parameter for bounded rationality. Returns: The gradient of the quantal term

Source code in src/rqequilibrium/rqe.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def quantal_term(
    self, game: np.ndarray, x: np.ndarray, p: np.ndarray, epsilon: float
) -> np.array:
    """
    Compute the quantal response term for a player given the game matrix, policy and epsilon parameter.
    Parameters:
        game: The game matrix for the player.
        x: The current policy of the player.
        p: The risk aversion term.
        epsilon: The epsilon parameter for bounded rationality.
    Returns:
        The gradient of the quantal term
    """
    return -game @ p + epsilon * self.grad_quantal(x)

risk_term

risk_term(game: ndarray, x: ndarray, p: ndarray, y: ndarray, tau: float) -> np.array

Compute the risk term for a player given the game matrix, policy, and other player's policy.

Parameters:

Name Type Description Default
game ndarray

The game matrix for the player.

required
x ndarray

The current policy of the player.

required
p ndarray

The last risk aversion term.

required
y ndarray

The policy of all the other players.

required

Returns: The gradient of the risk term

Source code in src/rqequilibrium/rqe.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def risk_term(
    self, game: np.ndarray, x: np.ndarray, p: np.ndarray, y: np.ndarray, tau: float
) -> np.array:
    """
    Compute the risk term for a player given the game matrix, policy, and other player's policy.

    Parameters:
        game: The game matrix for the player.
        x: The current policy of the player.
        p: The last risk aversion term.
        y: The policy of all the other players.
    Returns:
        The gradient of the risk term
    """
    return game.T @ x + (1 / tau) * self.grad_risk(p, y)