Skip to content

Latest commit

 

History

History
875 lines (688 loc) · 25 KB

File metadata and controls

875 lines (688 loc) · 25 KB

Memory Architecture Design

Status: Implemented (Core), Planned (Graph-Based Hybrid) Updated: 2026-01-27

Local embeddings + lightweight sidecar (GPT-5.3 Codex Spark) are implemented and running in production. This document describes both the current implementation and the planned graph-based hybrid architecture.

Overview

See also: Memory Regression Budget for the current measurable guardrails and review expectations.

A multi-layered memory system for cross-session learning that mimics how human memory works - relevant memories "pop up" when triggered by context rather than requiring explicit recall.

Key Design Decisions:

  1. Fully async and non-blocking - The main agent never waits for memory; results from turn N are available at turn N+1
  2. Graph-based organization - Memories form a connected graph with tags, clusters, and semantic links
  3. Cascade retrieval - Embedding hits trigger BFS traversal to find related memories
  4. Hybrid grouping - Combines explicit tags, automatic clusters, and semantic links

Architecture Overview

graph TB
    subgraph "Main Agent"
        MA[TUI App]
        MP[build_memory_prompt]
        TP[take_pending_memory]
    end

    subgraph "Memory Agent"
        CH[Context Handler]
        EMB[Embedder<br/>all-MiniLM-L6-v2]
        SR[Similarity Search]
        CR[Cascade Retrieval]
        HC[Sidecar<br/>GPT-5.3 Codex Spark]
    end

    subgraph "Memory Graph"
        MG[(petgraph<br/>DiGraph)]
        MS[Memory Nodes]
        TN[Tag Nodes]
        CN[Cluster Nodes]
    end

    MA -->|mpsc channel| CH
    CH --> EMB
    EMB --> SR
    SR -->|initial hits| CR
    CR -->|BFS traversal| MG
    MG --> MS
    MG --> TN
    MG --> CN
    CR -->|candidates| HC
    HC -->|verified| TP
    TP -->|next turn| MA
Loading

Graph-Based Data Model

Node Types

graph LR
    subgraph "Node Types"
        M((Memory))
        T[Tag]
        C{Cluster}
    end

    M -->|HasTag| T
    M -->|InCluster| C
    M -.->|RelatesTo| M
    M ==>|Supersedes| M
    M -.->|Contradicts| M

    style M fill:#e1f5fe
    style T fill:#fff3e0
    style C fill:#f3e5f5
Loading
Node Type Description Storage
Memory Core memory entry (fact, preference, procedure) Content, metadata, embedding
Tag Explicit label (user-defined or inferred) Name, description, count
Cluster Automatic grouping via embedding similarity Centroid embedding, member count

Edge Types

Edge Type From → To Description
HasTag Memory → Tag Memory has this explicit tag
InCluster Memory → Cluster Memory belongs to auto-discovered cluster
RelatesTo Memory → Memory Semantic relationship (weighted)
Supersedes Memory → Memory Newer memory replaces older
Contradicts Memory → Memory Conflicting information
DerivedFrom Memory → Memory Procedural knowledge derived from facts

Rust Implementation

use petgraph::graph::DiGraph;

/// Node in the memory graph
#[derive(Debug, Clone)]
pub enum MemoryNode {
    Memory(MemoryEntry),
    Tag(TagEntry),
    Cluster(ClusterEntry),
}

/// Edge relationships
#[derive(Debug, Clone)]
pub enum EdgeKind {
    HasTag,
    InCluster,
    RelatesTo { weight: f32 },
    Supersedes,
    Contradicts,
    DerivedFrom,
}

/// The memory graph
pub struct MemoryGraph {
    graph: DiGraph<MemoryNode, EdgeKind>,
    // Indexes for fast lookup
    memory_index: HashMap<String, NodeIndex>,
    tag_index: HashMap<String, NodeIndex>,
    cluster_index: HashMap<String, NodeIndex>,
}

Hybrid Grouping System

The memory system uses three complementary organization methods:

graph TB
    subgraph "Explicit: Tags"
        T1["rust"]
        T2["auth-system"]
        T3["user-preference"]
    end

    subgraph "Automatic: Clusters"
        C1[("Error Handling<br/>Cluster")]
        C2[("API Patterns<br/>Cluster")]
    end

    subgraph "Semantic: Links"
        L1["relates_to"]
        L2["supersedes"]
        L3["contradicts"]
    end

    M1((Memory 1)) --> T1
    M1 --> C1
    M1 -.-> L1
    L1 -.-> M2((Memory 2))
    M2 --> T1
    M2 --> C2
    M3((Memory 3)) --> T2
    M3 ==> L2
    L2 ==> M4((Memory 4))
Loading

1. Tags (Explicit)

User-defined or automatically inferred labels.

Sources:

  • User explicitly tags: memory { action: "remember", tags: ["rust", "auth"] }
  • Inferred from context (file paths, topics, entities)
  • Extracted by sidecar during end-of-session processing

Examples:

  • #project:jcode - Project-specific
  • #rust, #python - Language-specific
  • #auth, #database - Domain-specific
  • #preference, #correction - Category tags

2. Clusters (Automatic)

Automatically discovered groupings based on embedding similarity.

Algorithm:

  1. Periodically run HDBSCAN on memory embeddings
  2. Create/update cluster nodes for dense regions
  3. Assign InCluster edges to nearby memories
  4. Track cluster centroids for fast lookup

Benefits:

  • Discovers hidden patterns user didn't explicitly tag
  • Groups related memories even without shared tags
  • Enables "find similar" queries

3. Links (Semantic Relationships)

Explicit relationships between memories.

Types:

  • RelatesTo: General semantic connection (weighted 0.0-1.0)
  • Supersedes: Newer information replaces older
  • Contradicts: Conflicting information (both kept, flagged)
  • DerivedFrom: Procedural knowledge derived from facts

Discovery:

  • Contradiction detection on write
  • Sidecar identifies relationships during verification
  • User can explicitly link memories

Cascade Retrieval

When context triggers memory search, cascade retrieval finds related memories through graph traversal.

sequenceDiagram
    participant C as Context
    participant E as Embedder
    participant S as Similarity Search
    participant G as Graph BFS
    participant H as Sidecar (Codex Spark)
    participant R as Results

    C->>E: Current context
    E->>S: Context embedding
    S->>S: Find top-k similar memories
    S->>G: Initial hits (seed nodes)

    loop BFS Traversal depth 2
        G->>G: Follow HasTag edges
        G->>G: Follow InCluster edges
        G->>G: Follow RelatesTo edges
    end

    G->>H: Candidate memories
    H->>H: Verify relevance to context
    H->>R: Filtered, ranked memories
Loading

Algorithm

pub fn cascade_retrieve(
    &self,
    context_embedding: &[f32],
    max_depth: usize,
    max_results: usize,
) -> Vec<(MemoryEntry, f32)> {
    // Step 1: Embedding similarity search
    let initial_hits = self.similarity_search(context_embedding, 10);

    // Step 2: BFS traversal from hits
    let mut visited: HashSet<NodeIndex> = HashSet::new();
    let mut candidates: Vec<(NodeIndex, f32, usize)> = Vec::new();
    let mut queue: VecDeque<(NodeIndex, usize)> = VecDeque::new();

    for (node, score) in initial_hits {
        queue.push_back((node, 0));
        candidates.push((node, score, 0));
    }

    while let Some((node, depth)) = queue.pop_front() {
        if depth >= max_depth || visited.contains(&node) {
            continue;
        }
        visited.insert(node);

        // Traverse edges
        for edge in self.graph.edges(node) {
            let neighbor = edge.target();
            if visited.contains(&neighbor) {
                continue;
            }

            let edge_weight = match edge.weight() {
                EdgeKind::HasTag => 0.8,        // Strong signal
                EdgeKind::InCluster => 0.6,     // Medium signal
                EdgeKind::RelatesTo { weight } => *weight,
                EdgeKind::Supersedes => 0.9,    // Very relevant
                _ => 0.3,
            };

            // Decay score by depth
            let decayed_score = edge_weight * (0.7_f32).powi(depth as i32 + 1);

            if let MemoryNode::Memory(_) = &self.graph[neighbor] {
                candidates.push((neighbor, decayed_score, depth + 1));
            }

            queue.push_back((neighbor, depth + 1));
        }
    }

    // Step 3: Dedupe, sort, and return top results
    candidates.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    candidates.into_iter()
        .filter_map(|(node, score, _)| {
            if let MemoryNode::Memory(entry) = &self.graph[node] {
                Some((entry.clone(), score))
            } else {
                None
            }
        })
        .take(max_results)
        .collect()
}

Retrieval Parameters

Parameter Default Description
similarity_threshold 0.4 Minimum embedding similarity for initial hits
max_initial_hits 10 Number of embedding search results
max_depth 2 BFS traversal depth limit
max_results 10 Final results to return
edge_decay 0.7 Score decay per traversal step

Memory Entry Schema

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryEntry {
    // Identity
    pub id: String,
    pub content: String,
    pub category: MemoryCategory,

    // Classification
    pub memory_type: MemoryType,  // Fact, Preference, Procedure, Correction
    pub scope: MemoryScope,       // Global, Project, Session

    // Source tracking
    pub session_id: Option<String>,
    pub message_range: Option<(u32, u32)>,
    pub file_paths: Vec<String>,
    pub provenance: Provenance,   // UserStated, Observed, Inferred

    // Lifecycle
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
    pub last_accessed: DateTime<Utc>,
    pub access_count: u32,
    pub strength: u32,            // Consolidation count

    // Trust & status
    pub confidence: f32,          // 0.0-1.0, decays over time
    pub trust_score: f32,         // Source-based trust
    pub active: bool,
    pub superseded_by: Option<String>,

    // Embedding
    pub embedding: Option<Vec<f32>>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum MemoryType {
    Fact,        // "This project uses PostgreSQL"
    Preference,  // "User prefers 4-space indentation"
    Procedure,   // "To deploy: run make deploy"
    Correction,  // "Don't use deprecated API"
    Negative,    // "Never commit .env files"
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Provenance {
    UserStated,     // User explicitly said it
    UserCorrected,  // User corrected agent behavior
    Observed,       // Agent observed from behavior
    Inferred,       // Agent inferred from context
    Extracted,      // Extracted from session summary
}

Advanced Features

1. Temporal Awareness

Memories have temporal context:

pub struct TemporalContext {
    pub session_scope: bool,      // Only relevant in session
    pub recency_weight: f32,      // Recent access boost
    pub seasonal: Option<String>, // "end-of-sprint", "release-week"
}

Recency boost formula:

boost = 1.0 + (0.5 * e^(-hours_since_access / 24))

2. Confidence Decay

Confidence decays over time based on memory type:

Memory Type Half-life Rationale
Correction 365 days User corrections are high value
Preference 90 days Preferences may evolve
Fact 30 days Codebase facts can become stale
Procedure 60 days Procedures change less often
Inferred 7 days Low-confidence inferences

Decay formula:

confidence = initial_confidence * e^(-age_days / half_life)
           * (1 + 0.1 * log(access_count + 1))
           * trust_weight

3. Negative Memories

Things the agent should avoid doing:

MemoryEntry {
    content: "Never use println! for logging in production code",
    memory_type: MemoryType::Negative,
    trigger_patterns: vec!["println!", "print!", "dbg!"],
    ...
}

Surfacing: Negative memories are surfaced when trigger patterns match current context.

4. Procedural Memories

How-to knowledge with structured steps:

pub struct Procedure {
    pub name: String,
    pub trigger: String,        // "deploy to production"
    pub steps: Vec<String>,
    pub prerequisites: Vec<String>,
    pub warnings: Vec<String>,
}

5. Provenance Tracking

Every memory tracks its source:

pub struct ProvenanceChain {
    pub source: Provenance,
    pub session_id: String,
    pub timestamp: DateTime<Utc>,
    pub context_snippet: String,  // What was being discussed
    pub confidence_reason: String, // Why this confidence level
}

6. Feedback Loops

Memories strengthen or weaken based on use:

impl MemoryEntry {
    pub fn on_used(&mut self, helpful: bool) {
        self.access_count += 1;
        self.last_accessed = Utc::now();

        if helpful {
            self.strength = self.strength.saturating_add(1);
            self.confidence = (self.confidence + 0.05).min(1.0);
        } else {
            self.confidence = (self.confidence - 0.1).max(0.0);
        }
    }
}

7. Post-Retrieval Maintenance

After serving memories to the main agent, the memory agent has valuable context it can use for background maintenance. This "opportunistic maintenance" happens asynchronously without blocking.

graph LR
    subgraph "Retrieval Phase"
        R1[Context Embedding]
        R2[Similarity Search]
        R3[Cascade BFS]
        R4[Sidecar Verify]
        R5[Serve to Agent]
    end

    subgraph "Maintenance Phase (Background)"
        M1[Link Discovery]
        M2[Cluster Update]
        M3[Confidence Boost]
        M4[Gap Detection]
    end

    R5 --> M1
    R5 --> M2
    R5 --> M3
    R5 --> M4

    style M1 fill:#1f6feb
    style M2 fill:#1f6feb
    style M3 fill:#1f6feb
    style M4 fill:#1f6feb
Loading

Available Context:

  • Current context embedding
  • All memories that were retrieved (initial hits + BFS expansion)
  • Which memories passed sidecar verification (actually relevant)
  • Which were rejected (retrieved but not relevant)
  • Co-occurrence patterns (memories that appear together)

Maintenance Tasks:

Task Trigger Action
Link Discovery 2+ memories verified relevant Create/strengthen RelatesTo edges between co-relevant memories
Cluster Refinement Retrieved memories span clusters Update cluster centroids, consider merging nearby clusters
Confidence Boost Memory verified relevant Increment access count, boost confidence
Confidence Decay Memory retrieved but rejected Slightly decay confidence (may be stale)
Gap Detection Context has no relevant memories Log potential memory gap for later extraction
Tag Inference Multiple memories share context Infer common tag from context if none exists

Implementation:

impl MemoryAgent {
    /// Called after serving memories, runs maintenance in background
    async fn post_retrieval_maintenance(&self, ctx: RetrievalContext) {
        // Don't block - spawn maintenance tasks
        tokio::spawn(async move {
            // 1. Strengthen links between co-relevant memories
            if ctx.verified_memories.len() >= 2 {
                self.discover_links(&ctx.verified_memories, &ctx.embedding).await;
            }

            // 2. Boost confidence for verified memories
            for mem_id in &ctx.verified_memories {
                self.boost_confidence(mem_id).await;
            }

            // 3. Decay confidence for rejected memories
            for mem_id in &ctx.rejected_memories {
                self.decay_confidence(mem_id, 0.02).await;  // Gentle decay
            }

            // 4. Detect gaps (context had no relevant memories)
            if ctx.verified_memories.is_empty() && ctx.initial_hits > 0 {
                self.log_memory_gap(&ctx.embedding, &ctx.context_snippet).await;
            }

            // 5. Periodic cluster update (every N retrievals)
            if self.retrieval_count.fetch_add(1, Ordering::Relaxed) % 50 == 0 {
                self.update_clusters().await;
            }
        });
    }
}

Gap Detection for Future Learning:

When retrieval finds no relevant memories but the context seems important, log it:

struct MemoryGap {
    context_embedding: Vec<f32>,
    context_snippet: String,
    timestamp: DateTime<Utc>,
    session_id: String,
}

These gaps can be reviewed during end-of-session extraction to create new memories for topics the system didn't know about.

8. Scope Levels

Memories exist at different scopes:

graph TB
    subgraph "Scope Hierarchy"
        G[Global<br/>User-wide preferences]
        P[Project<br/>Codebase-specific]
        S[Session<br/>Current conversation]
    end

    G --> P
    P --> S

    style G fill:#e8f5e9
    style P fill:#e3f2fd
    style S fill:#fff3e0
Loading
Scope Lifetime Examples
Global Permanent "User prefers vim keybindings"
Project Until deleted "This project uses async/await"
Session Current session "Working on auth refactor"

Async Processing Pipeline

sequenceDiagram
    participant MA as Main Agent<br/>TUI App
    participant CH as mpsc Channel
    participant MEM as Memory Agent<br/>Background Task
    participant EMB as Embedder
    participant GR as Graph Store
    participant HC as Sidecar (Codex Spark)

    Note over MA,MEM: Turn N

    MA->>MA: build_memory_prompt()
    MA->>MA: take_pending_memory()
    Note right of MA: Returns Turn N-1 results

    MA->>CH: try_send(ContextUpdate)
    Note right of CH: Non-blocking

    MA->>MA: Continue with LLM call

    CH->>MEM: update_context_sync()

    MEM->>EMB: Embed context
    EMB-->>MEM: Context embedding

    MEM->>GR: Similarity search
    GR-->>MEM: Initial hits

    MEM->>GR: BFS traversal
    GR-->>MEM: Related memories

    MEM->>HC: Verify relevance
    HC-->>MEM: Filtered results

    MEM->>MEM: Topic change detection
    Note right of MEM: Clear surfaced if sim < 0.3

    MEM->>MEM: set_pending_memory()
    Note right of MEM: Available at Turn N+1
Loading

Key Points:

  • Memory agent is a singleton (OnceCell) - only one instance ever runs
  • Communication is non-blocking via try_send() on mpsc channel
  • Results arrive one turn behind (processed in background)
  • Topic change detection resets surfaced set when conversation shifts
  • Cascade retrieval traverses graph for related memories

Storage Layout

~/.jcode/memory/
├── graph.json                    # Serialized petgraph
├── projects/
│   └── <project_hash>.json       # Per-directory memories
├── global.json                   # User-wide memories
├── embeddings/
│   └── <memory_id>.vec           # Embedding vectors
├── clusters/
│   └── cluster_metadata.json     # Cluster centroids and metadata
└── tags/
    └── tag_index.json            # Tag → memory mappings

Memory Tools

Available to the main agent:

memory { action: "remember", content: "...", category: "fact|preference|correction",
         scope: "project|global", tags: ["tag1", "tag2"] }
memory { action: "recall" }                    # Get relevant memories for context
memory { action: "search", query: "..." }      # Semantic search
memory { action: "list", tag: "..." }          # List by tag
memory { action: "forget", id: "..." }         # Deactivate memory
memory { action: "link", from: "id1", to: "id2", relation: "relates_to" }
memory { action: "tag", id: "...", tags: ["new", "tags"] }

Implementation Status

Phase 1: Basic Memory Tools ✅

  • Memory store with file persistence
  • Basic memory tool
  • Integration with agent

Phase 2: Embedding Search ✅

  • Local all-MiniLM-L6-v2 via tract-onnx
  • Background embedding process
  • Similarity search with cosine distance

Phase 3: Memory Agent ✅

  • Async channel communication
  • Lightweight sidecar for relevance verification (currently GPT-5.3 Codex Spark)
  • Topic change detection
  • Surfaced memory tracking

Phase 4: Graph-Based Architecture ✅

  • HashMap-based graph structure (simpler than petgraph for JSON serialization)
  • Tag nodes and HasTag edges
  • Cluster discovery and InCluster edges
  • Semantic link edges (RelatesTo)
  • Cascade retrieval algorithm with BFS traversal

Phase 5: Post-Retrieval Maintenance ✅

  • Link discovery (co-relevant memories)
  • Confidence boost/decay on retrieval
  • Gap detection for missing knowledge
  • Periodic cluster refinement
  • Tag inference from context

Phase 6: Advanced Features ✅

  • Confidence decay system (time-based with category-specific half-lives)
  • Negative memories and trigger patterns
  • Procedural memory support
  • Provenance tracking
  • Feedback loops (boost on use, decay on rejection)
  • Temporal awareness

Phase 7: Full Integration ✅

  • End-of-session extraction
  • Sidecar consolidation on write (see below)
  • User control CLI (jcode memory commands)
  • Memory export/import

Phase 7.5: Sidecar Consolidation (Inline, Per-Turn) ✅

Lightweight consolidation that runs in the memory sidecar after returning results to the main agent. Only operates on memories already retrieved — no extra lookups, zero added latency.

extract_from_context() now performs inline write-time consolidation:

  • Duplicate detection on write — semantically similar memories are reinforced instead of duplicated.
  • Contradiction detection on write — contradictory memories are superseded during incremental extraction.
  • Reinforcement provenanceMemoryEntry tracks Vec<Reinforcement> breadcrumbs (session_id, message_index, timestamp).

Phase 8: Deep Memory Consolidation (Ambient Garden) 📋

Full graph-wide consolidation that runs during ambient mode background cycles. See AMBIENT_MODE.md for the ambient mode design.

  • Graph-wide similarity-based memory merging
  • Redundancy detection and deduplication (beyond sidecar's local scope)
  • Contradiction resolution (across full graph, not just retrieved set)
  • Fact verification against codebase (check if factual memories are still true)
  • Retroactive session extraction (crashed/missed sessions)
  • Cluster reorganization
  • Weak memory pruning (confidence < 0.05 AND strength <= 1)
  • Relationship discovery across sessions
  • Embedding backfill for memories missing embeddings
  • Knowledge graph optimization

Privacy & Security

Do Not Remember

  • API keys, secrets, credentials
  • Passwords or tokens
  • Personal identifying information
  • File contents marked sensitive

Filtering

Before storing any memory, scan for:

  • Regex patterns for secrets (API keys, passwords)
  • Files in .gitignore or .secretsignore
  • Content from .env files

User Control

  • All memories stored in human-readable JSON
  • CLI for viewing/editing/deleting
  • Option to disable memory entirely
  • Export/import for backup

Future: Memory Consolidation (Sleep-Like Processing)

Status: TODO - Design pending

Similar to how humans consolidate memories during sleep, jcode can run background consolidation to optimize the memory graph:

Concept

graph LR
    subgraph "Active Use"
        A[Raw Memories]
        B[Redundant Facts]
        C[Weak Links]
        D[Scattered Tags]
    end

    subgraph "Consolidation"
        E[Merge Similar]
        F[Detect Contradictions]
        G[Prune Weak]
        H[Reorganize Clusters]
    end

    subgraph "Optimized"
        I[Unified Facts]
        J[Resolved Conflicts]
        K[Strong Connections]
        L[Clean Taxonomy]
    end

    A --> E --> I
    B --> E
    B --> F --> J
    C --> G --> K
    D --> H --> L
Loading

Potential Features

Feature Description
Similarity Merge Combine memories with >0.95 embedding similarity
Redundancy Detection Find memories that express the same fact differently
Contradiction Resolution Surface conflicting memories for user decision
Weak Pruning Remove memories with low confidence + low access
Cluster Optimization Re-run clustering, merge small clusters
Link Strengthening Increase weights on frequently co-accessed pairs
Tag Cleanup Merge similar tags, remove orphans

Architecture Options (TBD)

  1. Periodic daemon - Run consolidation every N hours
  2. On-idle trigger - Run when no active sessions for M minutes
  3. Capacity-based - Run when memory count exceeds threshold
  4. Manual command - User-triggered via /consolidate

Open Questions for Consolidation

  • How to handle user confirmation for destructive merges?
  • Should consolidation be reversible?
  • What's the right frequency/trigger?
  • How to balance between "perfect organization" and "keep everything"?

Open Questions

  1. Multi-machine sync: Should memories sync across devices via encrypted backup?
  2. Team sharing: Should some memories be shareable across a team?
  3. Cluster algorithm: HDBSCAN vs k-means vs hierarchical clustering?
  4. Graph persistence: JSON serialization vs SQLite for larger graphs?

Last updated: 2026-01-27