Skip to content

Env

import mathy_envs.env

MathyEnv

MathyEnv(
    self, 
    rules: Optional[List[mathy_core.rule.BaseRule]] = None, 
    max_moves: int = 20, 
    verbose: bool = False, 
    invalid_action_response: Literal['raise', 'penalize', 'terminal'] = 'raise', 
    reward_discount: float = 0.99, 
    max_seq_len: int = 128, 
    previous_state_penalty: bool = True, 
    preferred_term_commute: bool = False, 
)
Implement a math solving game where a player wins by executing the right sequence of actions to reduce a math expression to an agreeable basic representation in as few moves as possible.

action_size

Return the number of available actions

core_rules

MathyEnv.core_rules(
    preferred_term_commute: bool = False, 
) -> List[mathy_core.rule.BaseRule]
Return the mathy core agent actions

finalize_state

MathyEnv.finalize_state(self, state: mathy_envs.state.MathyEnvState) -> None
Perform final checks on a problem state, to ensure the episode yielded results that were uncorrupted by transformation errors.

get_actions_for_node

MathyEnv.get_actions_for_node(
    self, 
    expression: mathy_core.expressions.MathExpression, 
    rule_list: Optional[List[Type[mathy_core.rule.BaseRule]]] = None, 
) -> List[List[int]]
Return a valid actions mask for the given expression and rule list.

Action masks are 2d lists of length (num_rules, max_seq_len) where a 0 indicates the action is not valid in the current state, and a 1 indicates that it is a valid action to take.

get_agent_actions_count

MathyEnv.get_agent_actions_count(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
) -> int
Return number of all possible actions

get_env_namespace

MathyEnv.get_env_namespace(self) -> str
Return a unique dot namespaced string representing the current environment. e.g. mycompany.envs.differentiate

get_initial_state

MathyEnv.get_initial_state(
    self, 
    params: Optional[mathy_envs.types.MathyEnvProblemArgs] = None, 
    print_problem: bool = True, 
) -> Tuple[mathy_envs.state.MathyEnvState, mathy_envs.types.MathyEnvProblem]
Generate an initial MathyEnvState for an episode

get_lose_signal

MathyEnv.get_lose_signal(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
) -> float
Calculate the reward value for failing to complete the episode. This is done so that the reward signal can be problem-type dependent.

get_next_state

MathyEnv.get_next_state(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
    action: Union[int, numpy.int64, Tuple[int, int]], 
) -> Tuple[mathy_envs.state.MathyEnvState, mathy_envs.time_step.TimeStep, mathy_core.rule.ExpressionChangeRule]

Parameters

  • env_state: current env_state
  • action: a tuple of two integers representing the rule and node to act on

Returns

next_state: env_state after applying action

transition: the timestep that represents the state transition

change: the change descriptor describing the change that happened

get_penalizing_actions

MathyEnv.get_penalizing_actions(
    self, 
    state: mathy_envs.state.MathyEnvState, 
) -> List[Type[mathy_core.rule.BaseRule]]
Get the list of penalizing action types. When these actions are selected, the agent gets a negative reward.

get_rewarding_actions

MathyEnv.get_rewarding_actions(
    self, 
    state: mathy_envs.state.MathyEnvState, 
) -> List[Type[mathy_core.rule.BaseRule]]
Get the list of rewarding action types. When these actions are selected, the agent gets a positive reward.

get_state_transition

MathyEnv.get_state_transition(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
) -> mathy_envs.time_step.TimeStep
Given an input state calculate the transition value of the timestep.

Parameters

  • env_state: current env_state

Returns

transition: the current state value transition

get_token_at_index

MathyEnv.get_token_at_index(
    self, 
    expression: mathy_core.expressions.MathExpression, 
    index: int, 
) -> Optional[mathy_core.expressions.MathExpression]
Get the token that is index from the left of the expression

get_valid_moves

MathyEnv.get_valid_moves(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
) -> List[List[int]]
Get a 2d list describing the valid moves for the current state.

The first dimension contains the list of known rules in the order that they're registered, and the second dimension contains a list of the max sequence length size that is 1/0 representing that the node at that index for the given rule is valid.

get_valid_rules

MathyEnv.get_valid_rules(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
) -> List[int]
Get a vector the length of the number of valid rules that is filled with 0/1 based on whether the rule has any nodes in the expression that it can be applied to.

Note

If you want to get a list of which nodes each rule can be applied to, prefer to use the get_valid_moves method.

get_win_signal

MathyEnv.get_win_signal(self, env_state: mathy_envs.state.MathyEnvState) -> float
Calculate the reward value for completing the episode. This is done so that the reward signal can be scaled based on the time it took to complete the episode.

is_terminal_state

MathyEnv.is_terminal_state(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
) -> bool
Determine if a given state is terminal or not.

Arguments

  • env_state (MathyEnvState): The state to inspect

Returns

(bool): A boolean indicating if the state is terminal or not.

max_moves_fn

MathyEnv.max_moves_fn(
    self, 
    problem: mathy_envs.types.MathyEnvProblem, 
    config: mathy_envs.types.MathyEnvProblemArgs, 
) -> int
Return the environment specific maximum move count for a given prolem.

MathyEnv.print_history(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
    pretty: bool = True, 
) -> None
Render the history of an episode from a given state.

Arguments

  • env_state (MathyEnvState): The state to render the history of.

MathyEnv.print_state(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
    action_name: str, 
    token_index: int = -1, 
    change: Optional[mathy_core.rule.ExpressionChangeRule] = None, 
    change_reward: float = 0.0, 
    pretty: bool = False, 
) -> None
Render the given state to stdout for visualization

problem_fn

MathyEnv.problem_fn(
    self, 
    params: mathy_envs.types.MathyEnvProblemArgs, 
) -> mathy_envs.types.MathyEnvProblem
Return a problem for the environment given a set of parameters to control problem generation.

This is implemented per environment so each environment can generate its own dataset with no required configuration.

random_action

MathyEnv.random_action(
    self, 
    expression: mathy_core.expressions.MathExpression, 
    rule: Optional[Type[mathy_core.rule.BaseRule]] = None, 
) -> Tuple[int, int]
Get a random action index that represents a particular rule

render_state

MathyEnv.render_state(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
    action_name: str, 
    token_index: int = -1, 
    change: Optional[mathy_core.rule.ExpressionChangeRule] = None, 
    change_reward: float = 0.0, 
    pretty: bool = False, 
) -> str
Render the given state to a string suitable for printing to a log

state_to_observation

MathyEnv.state_to_observation(
    self, 
    state: mathy_envs.state.MathyEnvState, 
    max_seq_len: Optional[int] = None, 
) -> mathy_envs.state.MathyObservation
Convert an environment state into an observation that can be used by a training agent.

to_action

MathyEnv.to_action(
    self, 
    action: Union[int, numpy.int64, Tuple[int, int]], 
) -> Tuple[int, int]
Resolve a given action input to a tuple of (rule_index, node_index).

When given an int, it is treated as an index into the flattened 2d action space. When given a tuple, it is assumed to be (rule, node)

to_hash_key

MathyEnv.to_hash_key(self, env_state: mathy_envs.state.MathyEnvState) -> str
Convert env_state to a string for MCTS cache

transition_fn

MathyEnv.transition_fn(
    self, 
    env_state: mathy_envs.state.MathyEnvState, 
    expression: mathy_core.expressions.MathExpression, 
    features: mathy_envs.state.MathyObservation, 
) -> Optional[mathy_envs.time_step.TimeStep]
Provide environment-specific transitions per timestep.