Mathy Envs

Develop agents that can complete Mathy's challenging algebra environments.

Mathy includes a framework for building reinforcement learning environments that transform math expressions using a set of user-defined actions.

Built-in environments aim to simplify algebra problems and expose generous customization points for user-created ones.

  • Large Action Spaces: Mathy environments have 2d action spaces, where you can apply any known rule to any node in the tree. Without masking this makes mathy environments very difficult to explore.
  • Masked Action Support To enable better curriculum learning and toy problem creation, mathy agents are given access to a mask of valid actions given the state of the system. When used to select actions, mathy environments become much easier to explore.
  • Rich Reward Signals: The environments are constructed such that agents receive reward feedback at every action. Custom environments can implement their own reward schemes.
  • Curriculum Learning: Mathy envs cover related but distinct math problem types, and scale their complexity based on multiple controllable inputs. They also include built-in easy/normal/hard variants of each environment.


  • Python 3.6+


$ pip install mathy_envs


Mathy agents interact with environments through sequences of interactions called episodes, which follow a standard RL episode lifecycle:

Episode Pseudocode.

  1. set state to an initial state from the environment
  2. while state is not terminal
    • take an action and update state
  3. done

Other Libraries

Mathy supports alternative reinforcement learning libraries.


Mathy has support Gymnasium via a small wrapper.

You can import the mathy_envs.gym module separately to register the environments:

#!pip install gymnasium
import gymnasium as gym
from mathy_envs.gym import MathyGymEnv

all_envs = gym.registry.values()
# Filter to just mathy registered envs
mathy_envs = [e for e in all_envs if"mathy-")]

assert len(mathy_envs) > 0

# Each env can be created and produce an initial observation without
# special configuration.
for gym_env_spec in mathy_envs:
    wrapper_env: MathyGymEnv = gym.make(  # type:ignore
    assert wrapper_env is not None
    observation = wrapper_env.reset()
    assert observation is not None


Mathy Envs wouldn't be possible without the contributions of the following people:

This project follows the all-contributors specification. Contributions of any kind are welcome!