how to build an agent that never forgets

3 months ago, I was rejected from a technical interview because I couldn’t build an agent that never forgets. Every approach I knew worked… until it didn’t. I walked into that room confident. I’d built chatbots. I understood embeddings. I knew how to use vector databases. But when the interviewer asked me to design an agent that could remember a user’s preferences across weeks not just within a single conversation, I froze. My instinct was the standard playbook: Store everything in a vector database and retrieve similar conversations when needed. The questions that killed me were simple: What about scale? After a thousand sessions, how do you handle conflicting data? How do you stop it from faking memories just to fill the gaps? I had no answer. That failure forced me to actually deep dive and find a solution: Most tutorials about "agents with memory" are teaching how to implement RAG for memory. The problem isn't embeddings. It isn't token limits. It isn't even retrieval. The problem is that memory is infrastructure, not a feature. Here is the entire system I built to solve it and the code I used to do it. The Real Problem With "Standard" Memory Here is what I thought memory meant: Keeping the conversation history and stuffing it into the context window. That works for about 10 exchanges. Then the context window fills up. So you truncate old messages. Now your agent forgets the user is vegan and recommends a steakhouse. You realize conversation history isn't memory it's just a chat log. "Fine," I thought. "I'll embed every message and retrieve relevant ones using similarity search." This worked better. For a while. But after two weeks, the vector database had 500 entries. When the user asked, "What did I tell you about my work situation?" the retrieval system returned fragments from 12 different conversations. The agent saw: "I love my job" (Week 1) "I'm thinking about quitting" (Week 2) "My manager is supportive" (Week 1) "My manager micromanages everything" (Week 2) Which one is true? The agent had no idea. It hallucinated a synthesis: "You love your supportive manager but you're thinking about quitting because of micromanagement." Completely wrong. The user had switched jobs between Week 1 and Week 2. This is the crucial realization: Embeddings measure similarity, not truth. Vector databases have a blind spot: they don't understand time, context, or updates. They just spit back text that looks mathematically close to what you asked for. That isn’t remembering; it’s guessing. The fix required a mental shift. Memory isn't a hard drive. It’s a process. You can't just store data; you have to give it a lifespan and let it evolve. Short-Term Memory: The Solved Problem Before tackling the hard part (long-term memory), we need to handle short-term continuity. Short-term memory is the ability to remember what was said 30 seconds ago. This is actually a solved problem. The solution is Checkpointing. Every agent operates as a state machine. It receives input, updates internal state, calls tools, generates output, and updates state again. A checkpoint is a snapshot of this entire state at a specific moment. This gives you three capabilities: Determinism: Replay any conversation. Recoverability: Resume exactly where you left off if the agent crashes. Debuggability: Rewind to inspect the agent's "thoughts." In production, I use Postgres-backed checkpointers. Here is the pattern: This handles the "now." But checkpoints are ephemeral. They don't build wisdom. For that, we need Long-Term Architectures. Long-Term Memory Architectures After months of failure, I found two architectures that actually work. Architecture A: File-Based Memory (The Self-Organizing System) This mimics how humans categorize knowledge. It works best for assistants, therapists, or companions. The Three-Layer Hierarchy: Layer 1: Resources (Raw Data). The source of truth. Unprocessed logs, uploads, transcripts. Immutable and timestamped. Layer 2: Items (Atomic Facts). Discrete facts extracted from resources ("User prefers Python," "User is allergic to shellfish"). Layer 3: Categories (Evolving Summaries). The high-level context. Items are grouped into files like work_preferences.md or personal_life.md. The Write Path: Active Memorization When new information arrives, the system doesn't just file it away it processes it. It pulls up the existing summary for that category and actively weaves the new detail into the narrative. This handles contradictions automatically: if a user mentions they’ve switched to Rust, the system doesn't just add 'Rust' to the list; it rewrites the profile to replace the old preference The Read Path (Tiered Retrieval): To save tokens, you don't pull everything. Pull Category Summaries. Ask LLM: "Is this enough?" If yes -> Respond. If no -> Drill down into specific items. This works beautifully for narrative coherence. But it struggles with complex relationships. For that, you need graphs. Architecture B: Context-Graph M

Scraped Article