Integration

RobotMem + Stable-Baselines3

A BaseCallback that gives your RL agent persistent memory — save perceptions during training and recall prior experience across runs.

pip install robotmem

Quick Start

from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback
from robotmem import RobotMemory

class RobotMemCallback(BaseCallback):
    def __init__(self, db="sb3_memory.db"):
        super().__init__()
        self.mem = RobotMemory(db=db)

    def _on_step(self) -> bool:
        obs = self.locals["new_obs"][0]
        action = self.locals["actions"][0]
        reward = self.locals["rewards"][0]
        self.mem.save_perception(
            observation=obs,
            action=action,
            reward=reward,
            metadata={"step": self.num_timesteps}
        )
        return True

model = SAC("MlpPolicy", "FetchReach-v3")
model.learn(50_000, callback=RobotMemCallback())

What This Integration Does

Stable-Baselines3 is the most widely used reinforcement learning library in the Python ecosystem. It provides reliable, well-tested implementations of PPO, SAC, TD3, A2C, DQN, and other standard algorithms, along with a clean callback system that lets you hook into every step of the training loop. RobotMem integrates through this callback system, which means it works with every algorithm SB3 supports without modifying a single line of the core training code.

The RobotMemCallback listens to each environment step during training. It captures the observation, the action selected by the policy, and the reward received, then stores them in a local SQLite database via save_perception. This creates a persistent record of everything the agent has experienced — not just in the current training run, but across all runs that share the same database file. When you restart training tomorrow, or switch to a different algorithm, or fine-tune on a new task, the full history of past experience is still there.

On the retrieval side, you can use recall at any point to query this memory. A common pattern is to recall similar past observations at the start of each episode to warm-start the agent's value estimates, or to build a demonstration buffer from high-reward experiences for offline RL pre-training. Because RobotMem uses vector similarity search, the recall is fast enough to run inside the training loop without meaningful overhead.

Drop-in BaseCallback — Add persistent memory to any SB3 algorithm (SAC, PPO, TD3, A2C, DQN) with a single callback argument, no training code changes needed.
Cross-run experience persistence — All observations, actions, and rewards are stored in SQLite and survive across training runs, machine reboots, and algorithm changes.
High-reward experience mining — Query the memory for top-performing episodes to build demonstration datasets for imitation learning or offline RL pre-training.
Step-level granularity — Every environment step is recorded with its timestep index, enabling precise analysis of when and where the agent learned specific behaviors.
Multi-environment support — Tag memories with environment names to maintain separate experience pools, or merge them for cross-task transfer learning.

Advanced: Recall During Training

The basic callback stores experiences passively. For more sophisticated use, you can extend the callback to actively recall prior experience and inject it into the agent's observation or replay buffer. For example, when the agent encounters a state similar to one where it previously received high reward, you can boost the priority of that transition in the replay buffer or append the recalled demonstration to the training batch.

This pattern is especially powerful for sparse-reward environments. In tasks like FetchPickAndPlace or complex manipulation scenarios, the agent may go thousands of episodes without receiving any positive reward. With RobotMem, you can pre-seed the memory with a small number of successful demonstrations (from human teleoperation, scripted policies, or a previous training run), and the callback will recall these demonstrations whenever the agent reaches a similar state. This gives the exploration process a concrete target to aim for, rather than relying on pure random exploration.

SB3's callback architecture makes this integration point natural. The _on_step method has access to the full local variable scope of the training loop, including the replay buffer, the current policy, and the environment state. This means RobotMem can both read from and write to the training process at every step, making it the most deeply integrated memory layer available for SB3-based training.

RobotMem + Stable-Baselines3

Quick Start

What This Integration Does

Advanced: Recall During Training

Related

Start Building Robots That Remember