RobotMem + Gymnasium-Robotics
Persistent goal-conditioned memory for Fetch, Shadow Hand, and MaMuJoCo environments — recall past achieved goals to accelerate learning.
Quick Start
import gymnasium as gym
from robotmem import RobotMemory
mem = RobotMemory(db="gym_robotics.db")
env = gym.make("FetchPickAndPlace-v3")
obs, info = env.reset()
for step in range(1000):
# Recall past trajectories where similar goals were achieved
prior = mem.recall(obs["desired_goal"], top_k=5)
action = policy.predict(obs, prior_goals=prior)
obs, reward, terminated, truncated, info = env.step(action)
# Save every achieved goal with full observation context
mem.save_perception(
observation=obs["observation"],
action=action,
reward=reward,
metadata={
"achieved_goal": obs["achieved_goal"].tolist(),
"desired_goal": obs["desired_goal"].tolist(),
"is_success": info["is_success"],
"env": "FetchPickAndPlace-v3"
}
)
What This Integration Does
Gymnasium-Robotics, maintained by the Farama Foundation, is the standard suite of goal-conditioned robotic environments for reinforcement learning research. It includes the Fetch robotic arm tasks (reach, push, slide, pick-and-place), Shadow Dexterous Hand manipulation tasks, and multi-agent MaMuJoCo environments. All of these use the GoalEnv interface, where each observation contains three components: the robot's state, the goal it achieved, and the goal it desires to achieve.
RobotMem plugs into this GoalEnv structure naturally. Every time your agent interacts with the environment, RobotMem can store the observation, the achieved goal, and whether the episode was successful. On subsequent episodes — or even subsequent training runs days later — the agent can query this memory to recall past states where similar goals were successfully achieved. This is a form of persistent hindsight experience replay that extends beyond a single training session.
The key insight is that goal-conditioned tasks share substantial structure. The motor skills needed to reach a position in FetchReach are building blocks for FetchPickAndPlace. The finger coordination learned in HandManipulateBlock transfers to HandManipulateEgg. By storing achieved goals with their full observation context in RobotMem, your agent builds a growing library of "I have been here before and I know what works" that persists indefinitely.
- Goal-space retrieval — Query the memory using desired goal vectors to find past episodes where similar goals were successfully achieved, providing concrete demonstration data for the current task.
- Persistent hindsight replay — Traditional HER discards relabeled experiences after training. RobotMem keeps them forever, enabling hindsight experience to compound across training runs.
- Cross-environment transfer — Memories tagged with environment metadata let you filter and recall experiences from related tasks (e.g., use FetchReach experiences during FetchPush training).
- Success-filtered recall — Filter queries by the
is_successflag to retrieve only successful trajectories, giving your policy positive demonstrations to learn from. - Compatible with any algorithm — Works alongside SAC, TD3, PPO, or any other algorithm. RobotMem is an observation-level add-on, not a training loop replacement.
GoalEnv Memory Architecture
Gymnasium-Robotics GoalEnv observations are dictionaries with observation, achieved_goal, and desired_goal keys. RobotMem stores all three alongside the action and reward, indexing on the goal vectors for efficient similarity search. When you call recall with a desired goal, RobotMem returns the closest achieved goals from its database — along with the full observation and action context that led to achieving them.
This architecture means your agent always has access to a growing set of positive demonstrations, even at the very start of a new training run. Instead of random exploration from scratch, the agent can bias its early actions toward states where past runs found success. For sparse-reward environments like FetchSlide, where random exploration almost never reaches the goal, this prior knowledge can dramatically reduce the number of episodes needed to learn.