Search Architecture
robotmem's recall tool uses a hybrid search pipeline combining BM25 full-text search with vector similarity search, merged through Reciprocal Rank Fusion (RRF).
Pipeline Overview
recall(query="how to grasp a cup", n=5, min_confidence=0.3)
│
├── L1: Input Validation
│ query non-empty, top_k clamped to [1, 100]
│ fetch_mul = 4× if filters active, else 2×
│
┌────┴────────────────────────────┐
│ │
▼ ▼
BM25 Search Vector Search
(FTS5 engine) (vec0 engine)
│ │
├── tokenize_for_fts5() ├── embedder.embed_one(query)
│ CJK → jieba segmentation │ → float[384] or float[768]
│ Remove FTS5 operators │
│ Filter 1-char non-CJK ├── vec_search_memories()
│ │ WHERE embedding MATCH blob
├── fts_search_memories() │ AND k = fetch_limit
│ JOIN memories ON rowid │ ORDER BY cosine distance
│ WHERE collection + active │
│ ORDER BY bm25() score │
│ │
▼ ▼
ranked list A ranked list B
│ │
└────────────┬───────────────────┘
│
┌──────▼───────┐
│ RRF Merge │
│ k=60 │
└──────┬───────┘
│
┌──────▼───────────────┐
│ Source Weighting │
│ real × 1.5 │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ Confidence Filter │
│ >= min_confidence │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ Session Filter │
│ session_id match │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ Context Extraction │
│ params/spatial/ │
│ robot/task │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ context_filter │
│ (structured) │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ spatial_sort │
│ (nearest neighbor) │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ Top-K Truncation │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ MaxScore Normalize │
│ best = 1.0 │
└──────┬───────────────┘
│
┌──────▼───────────────┐
│ Touch (L3) │
│ access_count++ │
└──────────────────────┘
BM25 Search (FTS5)
BM25 is a term-frequency ranking algorithm built into SQLite's FTS5 extension. robotmem uses it for keyword-level matching.
Tokenization
Before querying FTS5, the query text goes through tokenize_for_fts5():
- CJK Detection: If the text contains CJK characters (
\u4e00-\u9fff), jieba segmentation is applied - Syntax Cleaning: Remove FTS5 operators (
AND,OR,NOT,NEAR) and special characters - Token Filtering: Single-character non-CJK tokens are removed (too noisy)
- Query Construction: Remaining tokens are joined with
ORand quoted:"token1" OR "token2"
# Example: "how to grasp a cup"
# → tokens: ["how", "to", "grasp", "cup"]
# → FTS5: '"how" OR "to" OR "grasp" OR "cup"'
# Example: "如何抓取杯子"
# → jieba: ["如何", "抓取", "杯子"]
# → FTS5: '"如何" OR "抓取" OR "杯子"'
FTS5 Query
SELECT m.id, m.content, ..., bm25(memories_fts) as bm25_score
FROM memories_fts
JOIN memories m ON m.id = memories_fts.rowid
WHERE memories_fts MATCH ?
AND m.collection = ?
AND m.status = 'active'
ORDER BY bm25(memories_fts)
LIMIT ?
The bm25() function returns negative scores (lower = more relevant), and results are ordered by this score.
FTS5 Indexed Fields
| Field | Source | Purpose |
|---|---|---|
content |
Memory text (jieba tokenized) | Primary search target |
human_summary |
Short summary (jieba tokenized) | Additional search surface |
scope_files |
File paths JSON | Code file matching |
scope_entities |
Entity names JSON | Symbol matching |
Vector Search (vec0)
Vector search uses sqlite-vec's vec0 virtual table for approximate nearest neighbor (ANN) search based on embedding similarity.
Embedding Flow
query text
│
▼
embedder.embed_one(query)
│
├── ONNX: FastEmbed local inference (~5ms)
│ → float[384] (BAAI/bge-small-en-v1.5)
│
└── Ollama: HTTP API call (~20-50ms)
→ float[768] (nomic-embed-text)
│
▼
floats_to_blob()
│
▼
struct.pack('f' * dim, *embedding)
→ bytes (raw float32 array)
vec0 Query
SELECT m.id, m.content, ..., v.distance
FROM memories_vec v
JOIN memories m ON m.id = v.rowid
WHERE v.embedding MATCH ? -- blob of float32 array
AND m.collection = ?
AND m.status = 'active'
AND k = ? -- KNN parameter
The distance field represents cosine distance (0 = identical, 2 = opposite).
RRF Merge
Reciprocal Rank Fusion combines multiple ranked lists into a single ranking without requiring score normalization.
Formula
score(document) = Σ 1 / (k + rank_i + 1)
Where:
- k = 60 (configurable via rrf_k in config)
- rank_i = 0-based position in each list
- Sum is over all lists that contain the document
Example
| Document | BM25 Rank | Vec Rank | RRF Score |
|---|---|---|---|
| doc_A | 0 | 2 | 1/61 + 1/63 = 0.032 |
| doc_B | 1 | 0 | 1/62 + 1/61 = 0.033 |
| doc_C | 2 | — | 1/63 = 0.016 |
| doc_D | — | 1 | 1/62 = 0.016 |
doc_B ranks highest because it appears in both lists with good positions.
Why RRF?
- No score normalization needed: BM25 scores and cosine distances have different scales
- Rewards agreement: Documents found by both engines rank higher
- Robust: Higher
kgives more weight to multi-list presence vs. individual rank - Simple: No hyperparameters to tune beyond
k
Search Modes
The pipeline automatically selects the best mode based on available results:
| Mode | Condition | Behavior |
|---|---|---|
hybrid |
Both BM25 and Vec return results | RRF merge of both lists |
bm25_only |
Embedder unavailable or Vec returns empty | BM25 results with synthetic RRF scores |
vec_only |
BM25 returns empty (rare) | Vec results with synthetic RRF scores |
When running in degraded mode (bm25_only), synthetic RRF scores are assigned:
score = 1.0 / (60 + rank + 1) # Same formula, single list
Source Weighting
Memories from real-world data receive a 1.5x boost over simulation data:
# Detection: context.env.sim_or_real == "real"
if parsed.get("env", {}).get("sim_or_real") == "real":
m["_rrf_score"] = m["_rrf_score"] * 1.5
After weighting, results are re-sorted by _rrf_score descending.
Post-Merge Filters
Confidence Filter
if m.get("confidence", 0) < min_confidence:
continue # Skip low-confidence memories
Default min_confidence = 0.3. Memories that have decayed below this threshold are filtered out.
Session Filter
if session_id and m.get("session_id") != session_id:
continue # Only return memories from this episode
Used for episode replay: recall(query="*", session_id="abc-123").
Context Filter (Structured)
Dot-path matching on parsed context JSON:
# Equality
{"task.success": True}
# Range operators
{"params.force.value": {"$lt": 15.0}}
{"params.force.value": {"$gte": 10.0, "$lte": 20.0}}
# Combined (AND logic)
{"task.success": True, "robot.type": "UR5e"}
Implementation uses _resolve_dotpath() to traverse nested dicts:
# "task.success" → d["task"]["success"]
def _resolve_dotpath(d, path):
current = d
for key in path.split("."):
if key not in current:
return _MISSING
current = current[key]
return current
Supported operators:
| Operator | Meaning |
|---|---|
| (none) | Exact equality |
$lt |
Less than |
$lte |
Less than or equal |
$gt |
Greater than |
$gte |
Greater than or equal |
$ne |
Not equal |
Type mismatches return False (no crash).
Spatial Sort (Nearest Neighbor)
Euclidean distance sorting on coordinate arrays:
spatial_sort = {
"field": "spatial.object_position",
"target": [1.3, 0.7, 0.42],
"max_distance": 0.1 # optional cutoff
}
Distance formula:
distance = sqrt(Σ (actual_i - target_i)²)
- Dimension mismatch →
infdistance (filtered out) - Missing field →
infdistance (filtered out) max_distancecutoff applied after distance computation
MaxScore Normalization
After all filters, the top result's _rrf_score is set to 1.0, and all other scores are normalized relative to it:
max_score = filtered[0]["_rrf_score"]
for m in filtered:
m["_rrf_score"] = m["_rrf_score"] / max_score
This ensures the best result always has a score of 1.0, making scores comparable across different queries.
Access Counting (L3)
After returning results, the pipeline updates access statistics for all returned memories:
batch_touch_memories(db.conn, hit_ids)
# Each memory: access_count++, return_count++, last_accessed = now
This serves two purposes: 1. Time decay protection: Frequently recalled memories maintain higher confidence 2. Analytics: Track which memories are most useful
Fetch Multiplier
When structured filters or spatial sorting are active, the pipeline fetches more candidates to compensate for filter losses:
| Scenario | Fetch Limit |
|---|---|
| No filters | top_k × 2 |
| context_filter or spatial_sort active | top_k × 4 |
This ensures enough candidates survive filtering to fill the requested top_k results.
Three-Layer Defense
The search pipeline follows robotmem's consistent defense pattern:
| Layer | Phase | Mechanism |
|---|---|---|
| L1 | Before | Query non-empty check, top_k clamping, FTS5 syntax cleaning |
| L2 | During | try-except around BM25 and Vec searches independently |
| L3 | After | batch_touch access counters, logging, structured response |
Each search engine failing independently doesn't crash the other — the pipeline degrades gracefully.