Develop agents that can complete Mathy's challenging algebra environments.
Mathy includes a framework for building reinforcement learning environments that transform math expressions using a set of user-defined actions.
Built-in environments aim to simplify algebra problems and expose generous customization points for user-created ones.
- Large Action Spaces: Mathy environments have 2d action spaces, where you can apply any known rule to any node in the tree. Without masking this makes mathy environments very difficult to explore.
- Masked Action Support To enable better curriculum learning and toy problem creation, mathy agents are given access to a mask of valid actions given the state of the system. When used to select actions, mathy environments become much easier to explore.
- Rich Reward Signals: The environments are constructed such that agents receive reward feedback at every action. Custom environments can implement their own reward schemes.
- Curriculum Learning: Mathy envs cover related but distinct math problem types, and scale their complexity based on multiple controllable inputs. They also include built-in easy/normal/hard variants of each environment.
- Python 3.6+
$ pip install mathy_envs
Mathy agents interact with environments through sequences of interactions called episodes, which follow a standard RL episode lifecycle:
- set state to an initial state from the environment
- while state is not terminal
- take an action and update state
Mathy supports alternative reinforcement learning libraries.
Mathy has support Gymnasium via a small wrapper.
You can import the
mathy_envs.gym module separately to register the environments:
#!pip install gymnasium
import gymnasium as gym
from mathy_envs.gym import MathyGymEnv
all_envs = gym.registry.values()
# Filter to just mathy registered envs
mathy_envs = [e for e in all_envs if e.id.startswith("mathy-")]
assert len(mathy_envs) > 0
# Each env can be created and produce an initial observation without
# special configuration.
for gym_env_spec in mathy_envs:
wrapper_env: MathyGymEnv = gym.make(gym_env_spec.id) # type:ignore
assert wrapper_env is not None
observation = wrapper_env.reset()
assert observation is not None
Mathy Envs wouldn't be possible without the contributions of the following people:
This project follows the all-contributors specification. Contributions of any kind are welcome!