Most AI agents have amnesia. They process a request, respond, and forget everything. The next session starts from scratch. This is the "context window problem" that limits what autonomous agents can actually accomplish.
The Problem: Context Windows Are Finite
LLMs operate within fixed context windows (4K-128K tokens). Long-running agents face "context bloat" - the window fills with tool outputs, previous reasoning, and conversation history. At some point, you hit the limit and have to drop context.
The typical solution is summarization: compress old context into shorter summaries. But summarization loses detail. The agent forgets the specific error message that caused a failure, the exact configuration that worked, the reasoning behind a decision.
How Biological Memory Actually Works
Human memory doesn't work like a context window. It operates through:
- Hebbian learning: "Neurons that fire together, wire together." Associations strengthen with repeated co-activation.
- Activation decay: Unused memories fade over time following the Ebbinghaus forgetting curve.
- Long-Term Potentiation (LTP): After sufficient repetition, synaptic connections become permanent.
- Consolidation: Episodic memories (events) transform into semantic memories (facts) during sleep.
These mechanisms evolved because they're efficient. You don't need to remember every detail of every day - you need to remember what matters.
Implementing Hebbian Learning in Code
Here's the core idea: when two memories are retrieved together, strengthen the connection between them. When a memory isn't accessed, let it decay.
/// Hebbian synaptic plasticity: "Neurons that fire together, wire together"
/// - Strength increases with co-activation
/// - Strength decays over time without use
/// - After threshold activations, becomes permanent (LTP)
const LEARNING_RATE: f32 = 0.1;
const LTP_THRESHOLD: u32 = 5;
const LTP_DECAY_FACTOR: f32 = 0.1; // Potentiated synapses decay 10x slower
impl Relationship {
pub fn strengthen(&mut self) {
self.activation_count += 1;
// Hebbian strengthening: diminishing returns as strength approaches 1.0
let delta = LEARNING_RATE * (1.0 - self.strength);
self.strength = (self.strength + delta).min(1.0);
// Check for Long-Term Potentiation
if !self.potentiated && self.activation_count >= LTP_THRESHOLD {
self.potentiated = true;
}
}
pub fn decay(&mut self) -> bool {
let days_elapsed = (Utc::now() - self.last_accessed).num_hours() as f64 / 24.0;
let half_life = BASE_HALF_LIFE_HOURS * (1.0 + self.strength) as f64;
// Potentiated synapses decay much slower
let effective_half_life = if self.potentiated {
half_life / LTP_DECAY_FACTOR as f64
} else {
half_life
};
let decay_factor = (-0.693 / effective_half_life * days_elapsed).exp() as f32;
self.strength *= decay_factor;
self.strength < MIN_STRENGTH // Return true if should be pruned
}
}Three-Tier Memory Hierarchy
Raw storage isn't enough. You need different tiers for different access patterns:
- Working memory: Fast access, limited capacity (like CPU cache). Current context and active associations.
- Session memory: Medium-term storage for the current task. Survives context window resets.
- Long-term memory: Persistent storage in RocksDB. Compressed, indexed by semantic similarity.
pub struct MemorySystem {
/// Three-tier memory hierarchy
working_memory: Arc<RwLock<WorkingMemory>>,
session_memory: Arc<RwLock<SessionMemory>>,
long_term_memory: Arc<MemoryStorage>,
/// Compression pipeline for consolidation
compressor: CompressionPipeline,
/// Semantic retrieval with graph associations
retriever: RetrievalEngine,
}Semantic Consolidation: Episodic to Semantic
Raw episodic memories ("user clicked button X at 3pm") aren't useful long-term. Semantic consolidation extracts patterns and converts them into facts:
/// Semantic consolidation: episodic → semantic transformation
/// Like what happens during human sleep
pub struct SemanticConsolidator {
/// Minimum times a pattern must appear to become a fact
min_support: usize,
/// Minimum age before consolidation (days)
min_age_days: u32,
}
pub enum FactType {
Preference, // "prefers dark mode"
Capability, // "can handle 10k req/sec"
Relationship, // "auth depends on redis"
Procedure, // "to deploy, run make release"
Definition, // "JWT is JSON Web Token"
Pattern, // "errors spike at 3am"
}
pub struct SemanticFact {
pub fact: String,
pub confidence: f32,
pub support_count: usize, // How many episodic memories support this
pub source_ids: Vec<String>,
pub fact_type: FactType,
}Example: if an agent encounters "API timeout when batch > 100" five times, the consolidator extracts the fact: "Capability: batch size limited to 100" with confidence proportional to the support count.
Rich Context: Beyond Simple Key-Value
Context isn't a single thing. The system tracks multiple dimensions:
pub struct RichContext {
pub conversation: ConversationContext, // Current dialogue state
pub user: UserContext, // User preferences, history
pub project: ProjectContext, // Codebase, dependencies
pub temporal: TemporalContext, // Time of day, session duration
pub semantic: SemanticContext, // Active topics, entities
pub code: CodeContext, // Files, functions in focus
pub document: DocumentContext, // Docs being referenced
pub environment: EnvironmentContext, // OS, shell, working dir
pub decay_rate: f32, // How fast this context loses relevance
pub embeddings: Option<Vec<f32>>, // Semantic vector for similarity
}This multi-dimensional context allows retrieval to be nuanced. Looking for memories about "authentication"? The system considers not just semantic similarity but also which project you're in, what files you're editing, and what you've been discussing recently.
Salience: What Gets Remembered
Not every piece of information deserves the same treatment. Salience scoring determines what gets prioritized during retrieval:
// Salience weights based on cognitive psychology research
pub const SALIENCE_RECENCY_WEIGHT: f32 = 0.3; // Recent = more salient
pub const SALIENCE_FREQUENCY_WEIGHT: f32 = 0.25; // Frequently accessed = important
pub const SALIENCE_IMPORTANCE_WEIGHT: f32 = 0.25; // Explicit importance markers
pub const SALIENCE_SIMILARITY_WEIGHT: f32 = 0.2; // Semantic relevance
// Matches Ebbinghaus forgetting curve decay rates
pub const BASE_HALF_LIFE_HOURS: f64 = 24.0;Practical Application
The end result: an agent that remembers what worked, forgets what didn't, and builds stronger associations between related concepts over time.
from shodh_memory import MemorySystem
memory = MemorySystem("./agent_memory")
# Record experiences
memory.record("User prefers dark mode", type="Context")
memory.record("API timeout when batch size > 100", type="Error")
memory.record("Switched to streaming for large requests", type="Decision")
# Retrieve relevant memories
results = memory.recall("API performance issues", limit=5)
# Returns: timeout error + streaming decision (high co-activation)The Numbers
Some benchmarks from production use:
- 50-80ms retrieval latency on 10K+ memories
- ~90% token savings vs. stuffing full history into context
- 6MB binary, runs on Raspberry Pi 4
The architecture isn't novel - it's based on decades of cognitive science research. The contribution is packaging it into a practical system for AI agents.