Status: Implemented (Core), Planned (Graph-Based Hybrid) Updated: 2026-01-27
Local embeddings + lightweight sidecar (GPT-5.3 Codex Spark) are implemented and running in production. This document describes both the current implementation and the planned graph-based hybrid architecture.
See also: Memory Regression Budget for the current measurable guardrails and review expectations.
A multi-layered memory system for cross-session learning that mimics how human memory works - relevant memories "pop up" when triggered by context rather than requiring explicit recall.
Key Design Decisions:
- Fully async and non-blocking - The main agent never waits for memory; results from turn N are available at turn N+1
- Graph-based organization - Memories form a connected graph with tags, clusters, and semantic links
- Cascade retrieval - Embedding hits trigger BFS traversal to find related memories
- Hybrid grouping - Combines explicit tags, automatic clusters, and semantic links
graph TB
subgraph "Main Agent"
MA[TUI App]
MP[build_memory_prompt]
TP[take_pending_memory]
end
subgraph "Memory Agent"
CH[Context Handler]
EMB[Embedder<br/>all-MiniLM-L6-v2]
SR[Similarity Search]
CR[Cascade Retrieval]
HC[Sidecar<br/>GPT-5.3 Codex Spark]
end
subgraph "Memory Graph"
MG[(petgraph<br/>DiGraph)]
MS[Memory Nodes]
TN[Tag Nodes]
CN[Cluster Nodes]
end
MA -->|mpsc channel| CH
CH --> EMB
EMB --> SR
SR -->|initial hits| CR
CR -->|BFS traversal| MG
MG --> MS
MG --> TN
MG --> CN
CR -->|candidates| HC
HC -->|verified| TP
TP -->|next turn| MA
graph LR
subgraph "Node Types"
M((Memory))
T[Tag]
C{Cluster}
end
M -->|HasTag| T
M -->|InCluster| C
M -.->|RelatesTo| M
M ==>|Supersedes| M
M -.->|Contradicts| M
style M fill:#e1f5fe
style T fill:#fff3e0
style C fill:#f3e5f5
| Node Type | Description | Storage |
|---|---|---|
| Memory | Core memory entry (fact, preference, procedure) | Content, metadata, embedding |
| Tag | Explicit label (user-defined or inferred) | Name, description, count |
| Cluster | Automatic grouping via embedding similarity | Centroid embedding, member count |
| Edge Type | From → To | Description |
|---|---|---|
HasTag |
Memory → Tag | Memory has this explicit tag |
InCluster |
Memory → Cluster | Memory belongs to auto-discovered cluster |
RelatesTo |
Memory → Memory | Semantic relationship (weighted) |
Supersedes |
Memory → Memory | Newer memory replaces older |
Contradicts |
Memory → Memory | Conflicting information |
DerivedFrom |
Memory → Memory | Procedural knowledge derived from facts |
use petgraph::graph::DiGraph;
/// Node in the memory graph
#[derive(Debug, Clone)]
pub enum MemoryNode {
Memory(MemoryEntry),
Tag(TagEntry),
Cluster(ClusterEntry),
}
/// Edge relationships
#[derive(Debug, Clone)]
pub enum EdgeKind {
HasTag,
InCluster,
RelatesTo { weight: f32 },
Supersedes,
Contradicts,
DerivedFrom,
}
/// The memory graph
pub struct MemoryGraph {
graph: DiGraph<MemoryNode, EdgeKind>,
// Indexes for fast lookup
memory_index: HashMap<String, NodeIndex>,
tag_index: HashMap<String, NodeIndex>,
cluster_index: HashMap<String, NodeIndex>,
}The memory system uses three complementary organization methods:
graph TB
subgraph "Explicit: Tags"
T1["rust"]
T2["auth-system"]
T3["user-preference"]
end
subgraph "Automatic: Clusters"
C1[("Error Handling<br/>Cluster")]
C2[("API Patterns<br/>Cluster")]
end
subgraph "Semantic: Links"
L1["relates_to"]
L2["supersedes"]
L3["contradicts"]
end
M1((Memory 1)) --> T1
M1 --> C1
M1 -.-> L1
L1 -.-> M2((Memory 2))
M2 --> T1
M2 --> C2
M3((Memory 3)) --> T2
M3 ==> L2
L2 ==> M4((Memory 4))
User-defined or automatically inferred labels.
Sources:
- User explicitly tags:
memory { action: "remember", tags: ["rust", "auth"] } - Inferred from context (file paths, topics, entities)
- Extracted by sidecar during end-of-session processing
Examples:
#project:jcode- Project-specific#rust,#python- Language-specific#auth,#database- Domain-specific#preference,#correction- Category tags
Automatically discovered groupings based on embedding similarity.
Algorithm:
- Periodically run HDBSCAN on memory embeddings
- Create/update cluster nodes for dense regions
- Assign
InClusteredges to nearby memories - Track cluster centroids for fast lookup
Benefits:
- Discovers hidden patterns user didn't explicitly tag
- Groups related memories even without shared tags
- Enables "find similar" queries
Explicit relationships between memories.
Types:
- RelatesTo: General semantic connection (weighted 0.0-1.0)
- Supersedes: Newer information replaces older
- Contradicts: Conflicting information (both kept, flagged)
- DerivedFrom: Procedural knowledge derived from facts
Discovery:
- Contradiction detection on write
- Sidecar identifies relationships during verification
- User can explicitly link memories
When context triggers memory search, cascade retrieval finds related memories through graph traversal.
sequenceDiagram
participant C as Context
participant E as Embedder
participant S as Similarity Search
participant G as Graph BFS
participant H as Sidecar (Codex Spark)
participant R as Results
C->>E: Current context
E->>S: Context embedding
S->>S: Find top-k similar memories
S->>G: Initial hits (seed nodes)
loop BFS Traversal depth 2
G->>G: Follow HasTag edges
G->>G: Follow InCluster edges
G->>G: Follow RelatesTo edges
end
G->>H: Candidate memories
H->>H: Verify relevance to context
H->>R: Filtered, ranked memories
pub fn cascade_retrieve(
&self,
context_embedding: &[f32],
max_depth: usize,
max_results: usize,
) -> Vec<(MemoryEntry, f32)> {
// Step 1: Embedding similarity search
let initial_hits = self.similarity_search(context_embedding, 10);
// Step 2: BFS traversal from hits
let mut visited: HashSet<NodeIndex> = HashSet::new();
let mut candidates: Vec<(NodeIndex, f32, usize)> = Vec::new();
let mut queue: VecDeque<(NodeIndex, usize)> = VecDeque::new();
for (node, score) in initial_hits {
queue.push_back((node, 0));
candidates.push((node, score, 0));
}
while let Some((node, depth)) = queue.pop_front() {
if depth >= max_depth || visited.contains(&node) {
continue;
}
visited.insert(node);
// Traverse edges
for edge in self.graph.edges(node) {
let neighbor = edge.target();
if visited.contains(&neighbor) {
continue;
}
let edge_weight = match edge.weight() {
EdgeKind::HasTag => 0.8, // Strong signal
EdgeKind::InCluster => 0.6, // Medium signal
EdgeKind::RelatesTo { weight } => *weight,
EdgeKind::Supersedes => 0.9, // Very relevant
_ => 0.3,
};
// Decay score by depth
let decayed_score = edge_weight * (0.7_f32).powi(depth as i32 + 1);
if let MemoryNode::Memory(_) = &self.graph[neighbor] {
candidates.push((neighbor, decayed_score, depth + 1));
}
queue.push_back((neighbor, depth + 1));
}
}
// Step 3: Dedupe, sort, and return top results
candidates.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
candidates.into_iter()
.filter_map(|(node, score, _)| {
if let MemoryNode::Memory(entry) = &self.graph[node] {
Some((entry.clone(), score))
} else {
None
}
})
.take(max_results)
.collect()
}| Parameter | Default | Description |
|---|---|---|
similarity_threshold |
0.4 | Minimum embedding similarity for initial hits |
max_initial_hits |
10 | Number of embedding search results |
max_depth |
2 | BFS traversal depth limit |
max_results |
10 | Final results to return |
edge_decay |
0.7 | Score decay per traversal step |
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryEntry {
// Identity
pub id: String,
pub content: String,
pub category: MemoryCategory,
// Classification
pub memory_type: MemoryType, // Fact, Preference, Procedure, Correction
pub scope: MemoryScope, // Global, Project, Session
// Source tracking
pub session_id: Option<String>,
pub message_range: Option<(u32, u32)>,
pub file_paths: Vec<String>,
pub provenance: Provenance, // UserStated, Observed, Inferred
// Lifecycle
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub last_accessed: DateTime<Utc>,
pub access_count: u32,
pub strength: u32, // Consolidation count
// Trust & status
pub confidence: f32, // 0.0-1.0, decays over time
pub trust_score: f32, // Source-based trust
pub active: bool,
pub superseded_by: Option<String>,
// Embedding
pub embedding: Option<Vec<f32>>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum MemoryType {
Fact, // "This project uses PostgreSQL"
Preference, // "User prefers 4-space indentation"
Procedure, // "To deploy: run make deploy"
Correction, // "Don't use deprecated API"
Negative, // "Never commit .env files"
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Provenance {
UserStated, // User explicitly said it
UserCorrected, // User corrected agent behavior
Observed, // Agent observed from behavior
Inferred, // Agent inferred from context
Extracted, // Extracted from session summary
}Memories have temporal context:
pub struct TemporalContext {
pub session_scope: bool, // Only relevant in session
pub recency_weight: f32, // Recent access boost
pub seasonal: Option<String>, // "end-of-sprint", "release-week"
}Recency boost formula:
boost = 1.0 + (0.5 * e^(-hours_since_access / 24))
Confidence decays over time based on memory type:
| Memory Type | Half-life | Rationale |
|---|---|---|
| Correction | 365 days | User corrections are high value |
| Preference | 90 days | Preferences may evolve |
| Fact | 30 days | Codebase facts can become stale |
| Procedure | 60 days | Procedures change less often |
| Inferred | 7 days | Low-confidence inferences |
Decay formula:
confidence = initial_confidence * e^(-age_days / half_life)
* (1 + 0.1 * log(access_count + 1))
* trust_weight
Things the agent should avoid doing:
MemoryEntry {
content: "Never use println! for logging in production code",
memory_type: MemoryType::Negative,
trigger_patterns: vec!["println!", "print!", "dbg!"],
...
}Surfacing: Negative memories are surfaced when trigger patterns match current context.
How-to knowledge with structured steps:
pub struct Procedure {
pub name: String,
pub trigger: String, // "deploy to production"
pub steps: Vec<String>,
pub prerequisites: Vec<String>,
pub warnings: Vec<String>,
}Every memory tracks its source:
pub struct ProvenanceChain {
pub source: Provenance,
pub session_id: String,
pub timestamp: DateTime<Utc>,
pub context_snippet: String, // What was being discussed
pub confidence_reason: String, // Why this confidence level
}Memories strengthen or weaken based on use:
impl MemoryEntry {
pub fn on_used(&mut self, helpful: bool) {
self.access_count += 1;
self.last_accessed = Utc::now();
if helpful {
self.strength = self.strength.saturating_add(1);
self.confidence = (self.confidence + 0.05).min(1.0);
} else {
self.confidence = (self.confidence - 0.1).max(0.0);
}
}
}After serving memories to the main agent, the memory agent has valuable context it can use for background maintenance. This "opportunistic maintenance" happens asynchronously without blocking.
graph LR
subgraph "Retrieval Phase"
R1[Context Embedding]
R2[Similarity Search]
R3[Cascade BFS]
R4[Sidecar Verify]
R5[Serve to Agent]
end
subgraph "Maintenance Phase (Background)"
M1[Link Discovery]
M2[Cluster Update]
M3[Confidence Boost]
M4[Gap Detection]
end
R5 --> M1
R5 --> M2
R5 --> M3
R5 --> M4
style M1 fill:#1f6feb
style M2 fill:#1f6feb
style M3 fill:#1f6feb
style M4 fill:#1f6feb
Available Context:
- Current context embedding
- All memories that were retrieved (initial hits + BFS expansion)
- Which memories passed sidecar verification (actually relevant)
- Which were rejected (retrieved but not relevant)
- Co-occurrence patterns (memories that appear together)
Maintenance Tasks:
| Task | Trigger | Action |
|---|---|---|
| Link Discovery | 2+ memories verified relevant | Create/strengthen RelatesTo edges between co-relevant memories |
| Cluster Refinement | Retrieved memories span clusters | Update cluster centroids, consider merging nearby clusters |
| Confidence Boost | Memory verified relevant | Increment access count, boost confidence |
| Confidence Decay | Memory retrieved but rejected | Slightly decay confidence (may be stale) |
| Gap Detection | Context has no relevant memories | Log potential memory gap for later extraction |
| Tag Inference | Multiple memories share context | Infer common tag from context if none exists |
Implementation:
impl MemoryAgent {
/// Called after serving memories, runs maintenance in background
async fn post_retrieval_maintenance(&self, ctx: RetrievalContext) {
// Don't block - spawn maintenance tasks
tokio::spawn(async move {
// 1. Strengthen links between co-relevant memories
if ctx.verified_memories.len() >= 2 {
self.discover_links(&ctx.verified_memories, &ctx.embedding).await;
}
// 2. Boost confidence for verified memories
for mem_id in &ctx.verified_memories {
self.boost_confidence(mem_id).await;
}
// 3. Decay confidence for rejected memories
for mem_id in &ctx.rejected_memories {
self.decay_confidence(mem_id, 0.02).await; // Gentle decay
}
// 4. Detect gaps (context had no relevant memories)
if ctx.verified_memories.is_empty() && ctx.initial_hits > 0 {
self.log_memory_gap(&ctx.embedding, &ctx.context_snippet).await;
}
// 5. Periodic cluster update (every N retrievals)
if self.retrieval_count.fetch_add(1, Ordering::Relaxed) % 50 == 0 {
self.update_clusters().await;
}
});
}
}Gap Detection for Future Learning:
When retrieval finds no relevant memories but the context seems important, log it:
struct MemoryGap {
context_embedding: Vec<f32>,
context_snippet: String,
timestamp: DateTime<Utc>,
session_id: String,
}These gaps can be reviewed during end-of-session extraction to create new memories for topics the system didn't know about.
Memories exist at different scopes:
graph TB
subgraph "Scope Hierarchy"
G[Global<br/>User-wide preferences]
P[Project<br/>Codebase-specific]
S[Session<br/>Current conversation]
end
G --> P
P --> S
style G fill:#e8f5e9
style P fill:#e3f2fd
style S fill:#fff3e0
| Scope | Lifetime | Examples |
|---|---|---|
| Global | Permanent | "User prefers vim keybindings" |
| Project | Until deleted | "This project uses async/await" |
| Session | Current session | "Working on auth refactor" |
sequenceDiagram
participant MA as Main Agent<br/>TUI App
participant CH as mpsc Channel
participant MEM as Memory Agent<br/>Background Task
participant EMB as Embedder
participant GR as Graph Store
participant HC as Sidecar (Codex Spark)
Note over MA,MEM: Turn N
MA->>MA: build_memory_prompt()
MA->>MA: take_pending_memory()
Note right of MA: Returns Turn N-1 results
MA->>CH: try_send(ContextUpdate)
Note right of CH: Non-blocking
MA->>MA: Continue with LLM call
CH->>MEM: update_context_sync()
MEM->>EMB: Embed context
EMB-->>MEM: Context embedding
MEM->>GR: Similarity search
GR-->>MEM: Initial hits
MEM->>GR: BFS traversal
GR-->>MEM: Related memories
MEM->>HC: Verify relevance
HC-->>MEM: Filtered results
MEM->>MEM: Topic change detection
Note right of MEM: Clear surfaced if sim < 0.3
MEM->>MEM: set_pending_memory()
Note right of MEM: Available at Turn N+1
Key Points:
- Memory agent is a singleton (OnceCell) - only one instance ever runs
- Communication is non-blocking via
try_send()on mpsc channel - Results arrive one turn behind (processed in background)
- Topic change detection resets surfaced set when conversation shifts
- Cascade retrieval traverses graph for related memories
~/.jcode/memory/
├── graph.json # Serialized petgraph
├── projects/
│ └── <project_hash>.json # Per-directory memories
├── global.json # User-wide memories
├── embeddings/
│ └── <memory_id>.vec # Embedding vectors
├── clusters/
│ └── cluster_metadata.json # Cluster centroids and metadata
└── tags/
└── tag_index.json # Tag → memory mappings
Available to the main agent:
memory { action: "remember", content: "...", category: "fact|preference|correction",
scope: "project|global", tags: ["tag1", "tag2"] }
memory { action: "recall" } # Get relevant memories for context
memory { action: "search", query: "..." } # Semantic search
memory { action: "list", tag: "..." } # List by tag
memory { action: "forget", id: "..." } # Deactivate memory
memory { action: "link", from: "id1", to: "id2", relation: "relates_to" }
memory { action: "tag", id: "...", tags: ["new", "tags"] }
- Memory store with file persistence
- Basic memory tool
- Integration with agent
- Local all-MiniLM-L6-v2 via tract-onnx
- Background embedding process
- Similarity search with cosine distance
- Async channel communication
- Lightweight sidecar for relevance verification (currently GPT-5.3 Codex Spark)
- Topic change detection
- Surfaced memory tracking
- HashMap-based graph structure (simpler than petgraph for JSON serialization)
- Tag nodes and HasTag edges
- Cluster discovery and InCluster edges
- Semantic link edges (RelatesTo)
- Cascade retrieval algorithm with BFS traversal
- Link discovery (co-relevant memories)
- Confidence boost/decay on retrieval
- Gap detection for missing knowledge
- Periodic cluster refinement
- Tag inference from context
- Confidence decay system (time-based with category-specific half-lives)
- Negative memories and trigger patterns
- Procedural memory support
- Provenance tracking
- Feedback loops (boost on use, decay on rejection)
- Temporal awareness
- End-of-session extraction
- Sidecar consolidation on write (see below)
- User control CLI (
jcode memorycommands) - Memory export/import
Lightweight consolidation that runs in the memory sidecar after returning results to the main agent. Only operates on memories already retrieved — no extra lookups, zero added latency.
extract_from_context() now performs inline write-time consolidation:
- Duplicate detection on write — semantically similar memories are reinforced instead of duplicated.
- Contradiction detection on write — contradictory memories are superseded during incremental extraction.
- Reinforcement provenance —
MemoryEntrytracksVec<Reinforcement>breadcrumbs (session_id,message_index,timestamp).
Full graph-wide consolidation that runs during ambient mode background cycles. See AMBIENT_MODE.md for the ambient mode design.
- Graph-wide similarity-based memory merging
- Redundancy detection and deduplication (beyond sidecar's local scope)
- Contradiction resolution (across full graph, not just retrieved set)
- Fact verification against codebase (check if factual memories are still true)
- Retroactive session extraction (crashed/missed sessions)
- Cluster reorganization
- Weak memory pruning (confidence < 0.05 AND strength <= 1)
- Relationship discovery across sessions
- Embedding backfill for memories missing embeddings
- Knowledge graph optimization
- API keys, secrets, credentials
- Passwords or tokens
- Personal identifying information
- File contents marked sensitive
Before storing any memory, scan for:
- Regex patterns for secrets (API keys, passwords)
- Files in
.gitignoreor.secretsignore - Content from
.envfiles
- All memories stored in human-readable JSON
- CLI for viewing/editing/deleting
- Option to disable memory entirely
- Export/import for backup
Status: TODO - Design pending
Similar to how humans consolidate memories during sleep, jcode can run background consolidation to optimize the memory graph:
graph LR
subgraph "Active Use"
A[Raw Memories]
B[Redundant Facts]
C[Weak Links]
D[Scattered Tags]
end
subgraph "Consolidation"
E[Merge Similar]
F[Detect Contradictions]
G[Prune Weak]
H[Reorganize Clusters]
end
subgraph "Optimized"
I[Unified Facts]
J[Resolved Conflicts]
K[Strong Connections]
L[Clean Taxonomy]
end
A --> E --> I
B --> E
B --> F --> J
C --> G --> K
D --> H --> L
| Feature | Description |
|---|---|
| Similarity Merge | Combine memories with >0.95 embedding similarity |
| Redundancy Detection | Find memories that express the same fact differently |
| Contradiction Resolution | Surface conflicting memories for user decision |
| Weak Pruning | Remove memories with low confidence + low access |
| Cluster Optimization | Re-run clustering, merge small clusters |
| Link Strengthening | Increase weights on frequently co-accessed pairs |
| Tag Cleanup | Merge similar tags, remove orphans |
- Periodic daemon - Run consolidation every N hours
- On-idle trigger - Run when no active sessions for M minutes
- Capacity-based - Run when memory count exceeds threshold
- Manual command - User-triggered via
/consolidate
- How to handle user confirmation for destructive merges?
- Should consolidation be reversible?
- What's the right frequency/trigger?
- How to balance between "perfect organization" and "keep everything"?
- Multi-machine sync: Should memories sync across devices via encrypted backup?
- Team sharing: Should some memories be shareable across a team?
- Cluster algorithm: HDBSCAN vs k-means vs hierarchical clustering?
- Graph persistence: JSON serialization vs SQLite for larger graphs?
Last updated: 2026-01-27