RobotMem + Stable-Baselines3

A BaseCallback that gives your RL agent persistent memory — save perceptions during training and recall prior experience across runs.

pip install robotmem

Quick Start

from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback
from robotmem import RobotMemory

class RobotMemCallback(BaseCallback):
    def __init__(self, db="sb3_memory.db"):
        super().__init__()
        self.mem = RobotMemory(db=db)

    def _on_step(self) -> bool:
        obs = self.locals["new_obs"][0]
        action = self.locals["actions"][0]
        reward = self.locals["rewards"][0]
        self.mem.save_perception(
            observation=obs,
            action=action,
            reward=reward,
            metadata={"step": self.num_timesteps}
        )
        return True

model = SAC("MlpPolicy", "FetchReach-v3")
model.learn(50_000, callback=RobotMemCallback())

What This Integration Does

Stable-Baselines3 is the most widely used reinforcement learning library in the Python ecosystem. It provides reliable, well-tested implementations of PPO, SAC, TD3, A2C, DQN, and other standard algorithms, along with a clean callback system that lets you hook into every step of the training loop. RobotMem integrates through this callback system, which means it works with every algorithm SB3 supports without modifying a single line of the core training code.

The RobotMemCallback listens to each environment step during training. It captures the observation, the action selected by the policy, and the reward received, then stores them in a local SQLite database via save_perception. This creates a persistent record of everything the agent has experienced — not just in the current training run, but across all runs that share the same database file. When you restart training tomorrow, or switch to a different algorithm, or fine-tune on a new task, the full history of past experience is still there.

On the retrieval side, you can use recall at any point to query this memory. A common pattern is to recall similar past observations at the start of each episode to warm-start the agent's value estimates, or to build a demonstration buffer from high-reward experiences for offline RL pre-training. Because RobotMem uses vector similarity search, the recall is fast enough to run inside the training loop without meaningful overhead.

Advanced: Recall During Training

The basic callback stores experiences passively. For more sophisticated use, you can extend the callback to actively recall prior experience and inject it into the agent's observation or replay buffer. For example, when the agent encounters a state similar to one where it previously received high reward, you can boost the priority of that transition in the replay buffer or append the recalled demonstration to the training batch.

This pattern is especially powerful for sparse-reward environments. In tasks like FetchPickAndPlace or complex manipulation scenarios, the agent may go thousands of episodes without receiving any positive reward. With RobotMem, you can pre-seed the memory with a small number of successful demonstrations (from human teleoperation, scripted policies, or a previous training run), and the callback will recall these demonstrations whenever the agent reaches a similar state. This gives the exploration process a concrete target to aim for, rather than relying on pure random exploration.

SB3's callback architecture makes this integration point natural. The _on_step method has access to the full local variable scope of the training loop, including the replay buffer, the current policy, and the environment state. This means RobotMem can both read from and write to the training process at every step, making it the most deeply integrated memory layer available for SB3-based training.

Start Building Robots That Remember

pip install robotmem