Env
import mathy_envs.env
MathyEnv¶
MathyEnv(
self,
rules: Optional[List[mathy_core.rule.BaseRule]] = None,
max_moves: int = 20,
verbose: bool = False,
invalid_action_response: Literal['raise', 'penalize', 'terminal'] = 'raise',
reward_discount: float = 0.99,
max_seq_len: int = 128,
previous_state_penalty: bool = True,
preferred_term_commute: bool = False,
)
action_size¶
Return the number of available actions
core_rules¶
MathyEnv.core_rules(
preferred_term_commute: bool = False,
) -> List[mathy_core.rule.BaseRule]
finalize_state¶
MathyEnv.finalize_state(self, state: mathy_envs.state.MathyEnvState) -> None
get_actions_for_node¶
MathyEnv.get_actions_for_node(
self,
expression: mathy_core.expressions.MathExpression,
rule_list: Optional[List[Type[mathy_core.rule.BaseRule]]] = None,
) -> List[List[int]]
Action masks are 2d lists of length (num_rules, max_seq_len) where a 0 indicates the action is not valid in the current state, and a 1 indicates that it is a valid action to take.
get_agent_actions_count¶
MathyEnv.get_agent_actions_count(
self,
env_state: mathy_envs.state.MathyEnvState,
) -> int
get_env_namespace¶
MathyEnv.get_env_namespace(self) -> str
get_initial_state¶
MathyEnv.get_initial_state(
self,
params: Optional[mathy_envs.types.MathyEnvProblemArgs] = None,
print_problem: bool = True,
) -> Tuple[mathy_envs.state.MathyEnvState, mathy_envs.types.MathyEnvProblem]
get_lose_signal¶
MathyEnv.get_lose_signal(
self,
env_state: mathy_envs.state.MathyEnvState,
) -> float
get_next_state¶
MathyEnv.get_next_state(
self,
env_state: mathy_envs.state.MathyEnvState,
action: Union[int, numpy.int64, Tuple[int, int]],
) -> Tuple[mathy_envs.state.MathyEnvState, mathy_envs.time_step.TimeStep, mathy_core.rule.ExpressionChangeRule]
Parameters
- env_state: current env_state
- action: a tuple of two integers representing the rule and node to act on
Returns
next_state
: env_state after applying action
transition
: the timestep that represents the state transition
change
: the change descriptor describing the change that happened
get_penalizing_actions¶
MathyEnv.get_penalizing_actions(
self,
state: mathy_envs.state.MathyEnvState,
) -> List[Type[mathy_core.rule.BaseRule]]
get_rewarding_actions¶
MathyEnv.get_rewarding_actions(
self,
state: mathy_envs.state.MathyEnvState,
) -> List[Type[mathy_core.rule.BaseRule]]
get_state_transition¶
MathyEnv.get_state_transition(
self,
env_state: mathy_envs.state.MathyEnvState,
) -> mathy_envs.time_step.TimeStep
Parameters
- env_state: current env_state
Returns
transition
: the current state value transition
get_token_at_index¶
MathyEnv.get_token_at_index(
self,
expression: mathy_core.expressions.MathExpression,
index: int,
) -> Optional[mathy_core.expressions.MathExpression]
index
from the left of the expression get_valid_moves¶
MathyEnv.get_valid_moves(
self,
env_state: mathy_envs.state.MathyEnvState,
) -> List[List[int]]
The first dimension contains the list of known rules in the order that they're registered, and the second dimension contains a list of the max sequence length size that is 1/0 representing that the node at that index for the given rule is valid.
get_valid_rules¶
MathyEnv.get_valid_rules(
self,
env_state: mathy_envs.state.MathyEnvState,
) -> List[int]
Note
If you want to get a list of which nodes each rule can be applied to, prefer to use the get_valid_moves
method.
get_win_signal¶
MathyEnv.get_win_signal(self, env_state: mathy_envs.state.MathyEnvState) -> float
is_terminal_state¶
MathyEnv.is_terminal_state(
self,
env_state: mathy_envs.state.MathyEnvState,
) -> bool
Arguments
- env_state (MathyEnvState): The state to inspect
Returns
(bool)
: A boolean indicating if the state is terminal or not.
max_moves_fn¶
MathyEnv.max_moves_fn(
self,
problem: mathy_envs.types.MathyEnvProblem,
config: mathy_envs.types.MathyEnvProblemArgs,
) -> int
print_history¶
MathyEnv.print_history(
self,
env_state: mathy_envs.state.MathyEnvState,
pretty: bool = True,
) -> None
Arguments
- env_state (MathyEnvState): The state to render the history of.
print_state¶
MathyEnv.print_state(
self,
env_state: mathy_envs.state.MathyEnvState,
action_name: str,
token_index: int = -1,
change: Optional[mathy_core.rule.ExpressionChangeRule] = None,
change_reward: float = 0.0,
pretty: bool = False,
) -> None
problem_fn¶
MathyEnv.problem_fn(
self,
params: mathy_envs.types.MathyEnvProblemArgs,
) -> mathy_envs.types.MathyEnvProblem
This is implemented per environment so each environment can generate its own dataset with no required configuration.
random_action¶
MathyEnv.random_action(
self,
expression: mathy_core.expressions.MathExpression,
rule: Optional[Type[mathy_core.rule.BaseRule]] = None,
) -> Tuple[int, int]
render_state¶
MathyEnv.render_state(
self,
env_state: mathy_envs.state.MathyEnvState,
action_name: str,
token_index: int = -1,
change: Optional[mathy_core.rule.ExpressionChangeRule] = None,
change_reward: float = 0.0,
pretty: bool = False,
) -> str
state_to_observation¶
MathyEnv.state_to_observation(
self,
state: mathy_envs.state.MathyEnvState,
max_seq_len: Optional[int] = None,
) -> mathy_envs.state.MathyObservation
to_action¶
MathyEnv.to_action(
self,
action: Union[int, numpy.int64, Tuple[int, int]],
) -> Tuple[int, int]
When given an int, it is treated as an index into the flattened 2d action space. When given a tuple, it is assumed to be (rule, node)
to_hash_key¶
MathyEnv.to_hash_key(self, env_state: mathy_envs.state.MathyEnvState) -> str
transition_fn¶
MathyEnv.transition_fn(
self,
env_state: mathy_envs.state.MathyEnvState,
expression: mathy_core.expressions.MathExpression,
features: mathy_envs.state.MathyObservation,
) -> Optional[mathy_envs.time_step.TimeStep]