5 Senses, One Database

Unifying Multi-Modal Robot Perception in a Single Memory
March 2026 · 4 min read

Visual. Tactile. Auditory. Proprioceptive. Procedural.

Five types of perception. One API. One database.

Most robot systems store different perception types in different places — images in one folder, force readings in a CSV, joint angles in ROS bags, action sequences in JSON logs. When you need to recall "what happened when I grasped that cup," you're querying five different systems.

We put them all in one table.

The Problem: Fragmented Perception

A typical robot manipulation pipeline generates:

Each modality typically gets its own storage system, its own query interface, and its own retrieval logic. Cross-modal queries — "find the force reading from the same grasp where I saw the red cup" — require manual joins across systems.

This is fragmentation. And it gets worse as you add sensors.

The Solution: One API for All Senses

In robotmem, every perception goes through the same function:

from robotmem import save_perception

# Visual — saw a red cup
save_perception("saw red cup at [120, 200]",
    perception_type="visual",
    data='{"bbox": [120, 200, 50, 50], "confidence": 0.94}')

# Tactile — felt contact force
save_perception("felt 12.5N contact force",
    perception_type="tactile",
    data='{"force_N": 12.5, "contact_area_mm2": 45}')

# Auditory — heard a click
save_perception("heard click during insertion",
    perception_type="auditory",
    data='{"frequency_hz": 440, "duration_ms": 12}')

# Proprioceptive — arm position
save_perception("arm at joint angles [0.1, 0.8, -0.3, 1.2, 0.0, -0.5, 0.2]",
    perception_type="proprioceptive",
    data='{"joint_angles": [0.1, 0.8, -0.3, 1.2, 0.0, -0.5, 0.2]}')

# Procedural — action sequence
save_perception("push then lift: approach → push → lift",
    perception_type="procedural",
    data='{"steps": ["approach", "push", "lift"], "duration_s": 3.2}')

All five go into the same memories table. Same schema. Same search index. Same API.

Why One Table Works

The key design decision: separate the description from the data.

FieldPurposeSearchable
descriptionHuman-readable textBM25 + Vector
perception_typeModality labelFilter
dataRaw sensor JSONJSON path query
contextTask/spatial metadataContext filter + Spatial sort

The description is always text — searchable by keywords and semantic similarity. The data field holds the raw perception in JSON format, typed by perception_type. This means you can search across modalities semantically ("what happened during grasping?") and then access the type-specific data from the results.

No schema changes needed for new sensor types. Add a LiDAR? Just use perception_type="lidar". The database doesn't care — it's a new label, not a new table.

Cross-Modal Recall

Because all perceptions share the same search index, you can query across modalities naturally:

from robotmem import recall

# Find all perceptions related to "grasp" — visual, tactile, procedural
result = recall("grasp red cup")

for m in result["memories"]:
    print(f"[{m['perception_type']}] {m['content']}")

# Output:
# [visual]       saw red cup at [120, 200]
# [tactile]      felt 12.5N contact force
# [procedural]   push then lift: approach → push → lift

One query returns the visual detection, the force reading, and the action sequence — all from the same grasp event. No joins. No cross-system queries.

Try It

pip install robotmem

python -c "
from robotmem import save_perception, recall
save_perception('felt 12.5N force', perception_type='tactile', data='{\"force\": 12.5}')
save_perception('saw red cup', perception_type='visual', data='{\"bbox\": [100,200,50,50]}')
result = recall('grasp')
for m in result['memories']:
    print(f\"[{m['perception_type']}] {m['content']}\")
"

All Senses, One Memory

Visual, tactile, auditory, proprioceptive, procedural — unified.

pip install robotmem