"Why does my AI keep forgetting what I told it?" This is the #1 frustration with LLM applications. The culprit: context windows. Here's what's actually happening and how to fix it.

The Problem: Context Windows Are Finite

Every LLM has a context window—the maximum amount of text it can "see" at once. GPT-4 has 128K tokens. Claude has 200K. Sounds like a lot, right?

It's not. Here's why:

The Math Problem

• System prompt: ~500-2000 tokens
• Tool definitions: ~1000-5000 tokens
• Conversation history: grows with every message
• Retrieved documents (RAG): ~1000-10000 tokens
• Current user message + response: ~500-2000 tokens

After a few exchanges, you're already using 20-50K tokens. After an hour of work? You hit the limit.

What Happens When You Hit the Limit?

The typical solution is truncation: drop old messages to make room for new ones. The result? Your AI forgets:

What you discussed an hour ago
Decisions you made together
Your preferences and context
Errors it already helped you fix

Every session starts from scratch. You repeat yourself constantly.

Why RAG Isn't Enough

Retrieval-Augmented Generation (RAG) helps by fetching relevant documents. But it has limits:

RAG is good for:

✅ Static documents
✅ Knowledge bases
✅ FAQ-style retrieval

RAG can't do:

❌ Remember conversations
❌ Learn preferences over time
❌ Track decisions and outcomes
❌ Build relationships between facts

RAG retrieves documents. Memory retrieves experiences. They solve different problems.

The Solution: Persistent Memory

Persistent memory is a separate system that stores what your AI learns—outside the context window. When needed, relevant memories are retrieved and injected into the prompt.

How It Works

1. Store: Important facts, decisions, and learnings go into memory
2. Index: Memories are embedded as vectors for semantic search
3. Retrieve: When relevant, memories are pulled into the prompt
4. Forget: Old, unused memories decay naturally (like human memory)

Memory vs RAG vs Fine-tuning

Approach	Best For	Limitation
RAG	Static knowledge bases	Doesn't learn or remember
Fine-tuning	Permanent behavior changes	Expensive, slow, can't undo
Memory	Dynamic, personal context	Requires retrieval at runtime

Most applications need all three. RAG for documents, fine-tuning for core behaviors, memory for personalization and learning.

Adding Memory in Practice

example.pypython

from shodh_memory import Memory

memory = Memory()

# Store what you learn
memory.remember("User prefers TypeScript over JavaScript", memory_type="Decision")
memory.remember("Project deadline is January 15th", memory_type="Context")
memory.remember("Auth bug was caused by expired JWT", memory_type="Error")

# Later, retrieve relevant context
context = memory.recall("What's the deadline?", limit=3)
# Returns: "Project deadline is January 15th"

# Inject into your LLM prompt
prompt = f"""Context from memory:
{context}

User question: When do we need to ship?"""

# Your LLM now has persistent context

The Result

With persistent memory, your AI:

Remembers preferences — No more "I prefer dark mode" every session
Tracks decisions — "We decided to use PostgreSQL because..."
Learns from errors — "Last time this failed because..."
Builds context over time — Gets smarter the more you use it

Get Started

Terminalbash

pip install shodh-memory

No cloud accounts. No API keys. Runs entirely on your machine.