Back to Blog
AI

LLMs Are Stateless: The Problem No One Talks About

December 31, 20258 min readBy Shodh Team · AI Research
LLMstatelessmemorycontext-windowpersistent-memoryClaudeGPT

Here's an uncomfortable truth about large language models: they have no memory. Every API call to Claude, GPT-4, or any other LLM starts completely fresh. The model doesn't know what you discussed yesterday. It doesn't remember that you prefer TypeScript over JavaScript. It has no idea that you've explained your project architecture twelve times already.

The Illusion of Memory

When you chat with ChatGPT or Claude, it feels like they remember. You say something, they respond, you continue the conversation. But here's what's actually happening:

How LLM 'Memory' Actually Workstext
Turn 1: You send "Hello, I'm building a React app"
        → Model receives: "Hello, I'm building a React app"
        → Model responds

Turn 2: You send "What testing library should I use?"
        → Model receives: "Hello, I'm building a React app" +
                          "What testing library should I use?"
        → Model responds

Turn 3: You send "How do I mock API calls?"
        → Model receives: ALL previous messages + new message
        → Model responds

The "memory" is just the chat application re-sending the entire conversation every time. The model itself remembers nothing.

Why This Matters

1. Context Windows Are Finite

Every LLM has a context window—the maximum amount of text it can process at once. Claude's is 200K tokens. GPT-4's varies. Sounds like a lot, right? It's not.

A medium-sized codebase easily exceeds this. A few hours of conversation fills it. Once you hit the limit, old context gets dropped. The model literally forgets the beginning of your conversation.

2. Sessions End

Close the tab. Start a new chat. Switch to a different project. The model forgets everything.

That decision you made about your database schema? Gone. The bug you spent an hour debugging together? Vanished. The coding style preferences you established? Reset to defaults.

3. No Cross-Session Learning

Humans learn from experience. We remember that a particular approach worked well, or that a certain pattern caused problems.

LLMs can't do this. They don't learn that your team prefers composition over inheritance. They don't remember that the last three times you asked about authentication, you were using JWT. They rediscover the same patterns over and over.

"But What About RAG?"

Retrieval-Augmented Generation (RAG) is often proposed as the solution. Store documents in a vector database, retrieve relevant chunks, inject them into the prompt.

RAG is great for document Q&A. It's not memory.

RAG gives you:

  • Access to static documents
  • Semantic search over stored content
  • The ability to answer questions about your docs

RAG doesn't give you:

  • Memory of past interactions
  • Learned preferences that strengthen over time
  • Associations that form from usage patterns
  • Context that builds across sessions

The Real Solution: Persistent Memory

What LLMs need is what humans have: a memory system that persists across sessions and learns from usage.

Memories that survive session boundaries

Persistent Memorypython
# Session 1
memory.remember("User prefers dark mode in all UIs", memory_type="Decision")

# Session 2 (days later)
results = memory.recall("UI preferences")
# Returns: "User prefers dark mode in all UIs"

Associations that form from co-retrieval

When you retrieve "React" and "TypeScript" together repeatedly, they should become associated. Query one, get the other.

Importance that emerges from usage

Memories you access frequently should become more prominent. Memories you never use should fade. This is how biological memory works.

How Shodh Memory Solves This

Shodh Memory provides exactly this: persistent, learning memory for LLMs and AI agents.

Shodh Memory Examplepython
from shodh_memory import Memory

memory = Memory(storage_path="./project_memory")

# These survive forever
memory.remember("Project uses Next.js 14 with App Router", memory_type="Context")
memory.remember("Team decided on Prisma over Drizzle", memory_type="Decision")
memory.remember("Always use server components by default", memory_type="Learning")

# Semantic search - finds relevant memories
results = memory.recall("what ORM are we using?")
# Returns: "Team decided on Prisma over Drizzle"

# Associations form automatically from co-retrieval
# After 5+ co-activations, connections become permanent (LTP)

The MCP Integration

For Claude Code and Claude Desktop, Shodh Memory works as an MCP server:

MCP Configurationjson
{
  "mcpServers": {
    "shodh-memory": {
      "command": "npx",
      "args": ["-y", "@shodh/memory-mcp"],
      "env": {
        "SHODH_API_KEY": "your-api-key"
      }
    }
  }
}

Now Claude remembers across sessions. No re-explanation. No context lost. Memory that persists.

The Future is Stateful

LLMs are incredibly powerful—but they're crippled by their statelessness. The models are there. The reasoning is there. What's missing is memory.

Shodh Memory provides that missing piece: a cognitive layer that turns stateless LLMs into learning systems.

Blog | Shodh | Shodh RAG