Risk-Averse Quantal Equilibrium (RQE)

rqequilibrium.rqe

Player `dataclass`

Concept of a player in the RQE game.

Attributes:

Name	Type	Description
`tau`	`float`	Risk aversion parameter.
`epsilon`	`float`	Bounded Rational parameter.
`game_matrix`	`ndarray`	The payoff matrix for the player.

Source code in src/rqequilibrium/rqe.py

@dataclass
class Player:
    """
    Concept of a player in the RQE game.

    Attributes:
        tau: Risk aversion parameter.
        epsilon: Bounded Rational parameter.
        game_matrix: The payoff matrix for the player.
    """

    tau: float
    epsilon: float
    game_matrix: np.ndarray

RQE

RQE (Risk Quantal Response Equilibrium) solver for multi-player games.

This class implements the RQE solution concept, which combines risk aversion and bounded rationality in a multi-player setting. It uses projected gradient descent to optimize the policies of players.

Attributes:

Name	Type	Description
`players`	`list[Player]`	List of Player objects representing the players in the game.
`lr`	`float`	Learning rate for the optimization.
`max_iter`	`int`	Maximum number of iterations for the optimization.
`quantal_function`	`Callable`	Function to compute the quantal response.
`risk_function`	`Callable`	Function to compute the risk term.
`projection`	`Callable`	Function to project policies onto a simplex.

Methods:

Name	Description
`risk_term`	Computes the risk term for a player given the game matrix, policy, and other player's policy.
`quantal_term`	Computes the quantal response term for a player given the game matrix, policy, and epsilon parameter.
`optimize`	Optimizes the policies for all players using projected gradient descent.
`print_game`	Prints the game matrices for two player games.

Source code in src/rqequilibrium/rqe.py

class RQE:
    """
    RQE (Risk Quantal Response Equilibrium) solver for multi-player games.

    This class implements the RQE solution concept, which combines risk aversion and bounded rationality
    in a multi-player setting. It uses projected gradient descent to optimize the policies of players.

    Attributes:
        players (list[Player]): List of Player objects representing the players in the game.
        lr (float): Learning rate for the optimization.
        max_iter (int): Maximum number of iterations for the optimization.
        quantal_function: Function to compute the quantal response.
        risk_function: Function to compute the risk term.
        projection: Function to project policies onto a simplex.

    Methods:
        risk_term: Computes the risk term for a player given the game matrix, policy, and other player's policy.
        quantal_term: Computes the quantal response term for a player given the game matrix, policy, and epsilon parameter.
        optimize: Optimizes the policies for all players using projected gradient descent.
        print_game: Prints the game matrices for two player games.
    """

    quantal_function: Callable
    risk_function: Callable
    players: List[Player]
    projection: Callable
    lr: float
    max_iter: int
    br_iters: int
    EPS: float = 1e-12

    def __init__(
        self,
        players: List[Player],
        lr: float = 0.1,
        max_iter: int = 500,
        br_iters: int = 50,
        quantal_function: Union[Callable, str] = "log_barrier",
        risk_function: Union[Callable, str] = "kl_divergence",
        projection: Callable = project_simplex,
    ):
        self.players = players
        self.lr = lr
        self.max_iter = max_iter
        self.br_iters = br_iters
        self.EPS = 1e-12
        self.projection = projection

        if hasattr(quantal_function, "__call__"):
            self.quantal_function = quantal_function
        elif quantal_function == "negative_entropy":
            self.quantal_function = negative_entropy
        elif quantal_function == "log_barrier":
            self.quantal_function = log_barrier
        else:
            raise ValueError("Invalid quantal function specified.")

        if hasattr(risk_function, "__call__"):
            self.risk_function = risk_function
        elif risk_function == "kl_divergence":
            self.risk_function = kl_divergence
        elif risk_function == "kl_reversed":
            self.risk_function = kl_reversed

        self.grad_risk = grad(self.risk_function)
        self.grad_quantal = grad(self.quantal_function)

    def risk_term(
        self, game: np.ndarray, x: np.ndarray, p: np.ndarray, y: np.ndarray, tau: float
    ) -> np.array:
        """
        Compute the risk term for a player given the game matrix, policy, and other player's policy.

        Parameters:
            game: The game matrix for the player.
            x: The current policy of the player.
            p: The last risk aversion term.
            y: The policy of all the other players.
        Returns:
            The gradient of the risk term
        """
        return game.T @ x + (1 / tau) * self.grad_risk(p, y)

    def quantal_term(
        self, game: np.ndarray, x: np.ndarray, p: np.ndarray, epsilon: float
    ) -> np.array:
        """
        Compute the quantal response term for a player given the game matrix, policy and epsilon parameter.
        Parameters:
            game: The game matrix for the player.
            x: The current policy of the player.
            p: The risk aversion term.
            epsilon: The epsilon parameter for bounded rationality.
        Returns:
            The gradient of the quantal term
        """
        return -game @ p + epsilon * self.grad_quantal(x)

    def optimize(self) -> np.ndarray:
        """
        Optimize the policies for both players using projected gradient descent.

        Returns:
            The optimal policy using RQE
        """

        num_players = len(self.players)  # Number of players
        max_action_set = max(player.game_matrix.shape[1] for player in self.players)

        # Initialize the Projected Gradient Descent optimizer
        pgd = ProjectedGradientDescent(
            lr=self.lr,
            projection=self.projection,
        )

        # Initialize random policies for both players
        policies = np.random.rand(num_players, max_action_set)
        risk_policies = np.random.rand(num_players, max_action_set)
        policies /= np.sum(policies, axis=1, keepdims=True)
        risk_policies /= np.sum(risk_policies, axis=1, keepdims=True)

        for _ in range(self.max_iter):
            # Compute the quantal and risk terms for both players
            policies_buff = policies.copy()
            risk_buff = risk_policies.copy()
            for i, player in enumerate(self.players):
                game = player.game_matrix if i % 2 == 0 else player.game_matrix

                quantal_grad = self.quantal_term(
                    game, policies_buff[i], risk_buff[i], player.epsilon
                )
                opponnet_policies = np.delete(policies_buff, i, axis=0)

                risk_grad = self.risk_term(
                    game,
                    policies_buff[i],
                    risk_buff[i],
                    opponnet_policies[0],
                    player.tau,
                )

                policies_buff[i] = pgd.step(policies[i], quantal_grad)
                risk_buff[i] = pgd.step(risk_policies[i], risk_grad)

            risk_policies = risk_buff
            policies = policies_buff

        return policies

    @staticmethod
    def print_game(R1: np.ndarray, R2: np.ndarray):
        """
        Print the game matrices for both players.
        """
        for i in range(R1.shape[0]):
            row = []
            for j in range(R1.shape[1]):
                row.append(f"{int(R1[i, j])}, {int(R2[i, j])}")
            print(" | ".join(row))

optimize

optimize() -> np.ndarray

Optimize the policies for both players using projected gradient descent.

Returns:

Type	Description
`ndarray`	The optimal policy using RQE

Source code in src/rqequilibrium/rqe.py

def optimize(self) -> np.ndarray:
    """
    Optimize the policies for both players using projected gradient descent.

    Returns:
        The optimal policy using RQE
    """

    num_players = len(self.players)  # Number of players
    max_action_set = max(player.game_matrix.shape[1] for player in self.players)

    # Initialize the Projected Gradient Descent optimizer
    pgd = ProjectedGradientDescent(
        lr=self.lr,
        projection=self.projection,
    )

    # Initialize random policies for both players
    policies = np.random.rand(num_players, max_action_set)
    risk_policies = np.random.rand(num_players, max_action_set)
    policies /= np.sum(policies, axis=1, keepdims=True)
    risk_policies /= np.sum(risk_policies, axis=1, keepdims=True)

    for _ in range(self.max_iter):
        # Compute the quantal and risk terms for both players
        policies_buff = policies.copy()
        risk_buff = risk_policies.copy()
        for i, player in enumerate(self.players):
            game = player.game_matrix if i % 2 == 0 else player.game_matrix

            quantal_grad = self.quantal_term(
                game, policies_buff[i], risk_buff[i], player.epsilon
            )
            opponnet_policies = np.delete(policies_buff, i, axis=0)

            risk_grad = self.risk_term(
                game,
                policies_buff[i],
                risk_buff[i],
                opponnet_policies[0],
                player.tau,
            )

            policies_buff[i] = pgd.step(policies[i], quantal_grad)
            risk_buff[i] = pgd.step(risk_policies[i], risk_grad)

        risk_policies = risk_buff
        policies = policies_buff

    return policies

print_game `staticmethod`

print_game(R1: ndarray, R2: ndarray)

Print the game matrices for both players.

Source code in src/rqequilibrium/rqe.py

@staticmethod
def print_game(R1: np.ndarray, R2: np.ndarray):
    """
    Print the game matrices for both players.
    """
    for i in range(R1.shape[0]):
        row = []
        for j in range(R1.shape[1]):
            row.append(f"{int(R1[i, j])}, {int(R2[i, j])}")
        print(" | ".join(row))

quantal_term

quantal_term(game: ndarray, x: ndarray, p: ndarray, epsilon: float) -> np.array

Compute the quantal response term for a player given the game matrix, policy and epsilon parameter. Parameters: game: The game matrix for the player. x: The current policy of the player. p: The risk aversion term. epsilon: The epsilon parameter for bounded rationality. Returns: The gradient of the quantal term

Source code in src/rqequilibrium/rqe.py

def quantal_term(
    self, game: np.ndarray, x: np.ndarray, p: np.ndarray, epsilon: float
) -> np.array:
    """
    Compute the quantal response term for a player given the game matrix, policy and epsilon parameter.
    Parameters:
        game: The game matrix for the player.
        x: The current policy of the player.
        p: The risk aversion term.
        epsilon: The epsilon parameter for bounded rationality.
    Returns:
        The gradient of the quantal term
    """
    return -game @ p + epsilon * self.grad_quantal(x)

risk_term

risk_term(game: ndarray, x: ndarray, p: ndarray, y: ndarray, tau: float) -> np.array

Compute the risk term for a player given the game matrix, policy, and other player's policy.

Parameters:

Name	Type	Description	Default
`game`	`ndarray`	The game matrix for the player.	required
`x`	`ndarray`	The current policy of the player.	required
`p`	`ndarray`	The last risk aversion term.	required
`y`	`ndarray`	The policy of all the other players.	required

Returns: The gradient of the risk term

Source code in src/rqequilibrium/rqe.py

def risk_term(
    self, game: np.ndarray, x: np.ndarray, p: np.ndarray, y: np.ndarray, tau: float
) -> np.array:
    """
    Compute the risk term for a player given the game matrix, policy, and other player's policy.

    Parameters:
        game: The game matrix for the player.
        x: The current policy of the player.
        p: The last risk aversion term.
        y: The policy of all the other players.
    Returns:
        The gradient of the risk term
    """
    return game.T @ x + (1 / tau) * self.grad_risk(p, y)

Risk-Averse Quantal Equilibrium (RQE)

rqequilibrium.rqe

Player dataclass

RQE

optimize

print_game staticmethod

quantal_term

risk_term

Player `dataclass`

print_game `staticmethod`