Agentic Memory — A Visual Guide

01 · The Memory Problem

RAG — Two
Approaches

A stateless agent forgets everything between conversations. RAG — Retrieval-Augmented Generation — solves this by storing knowledge externally and fetching only what's relevant, when it's needed.

Vector RAG

Stores text as numerical vectors. Retrieves by semantic similarity — "find things with similar meaning." Fast, scalable, excellent fuzzy search.

Chunks are isolated islands — no structural connections
Cannot follow chains of reasoning across facts
Ideal for: "find me docs about X"

Graph RAG

Stores entities and relationships as a knowledge graph. Retrieves by traversing connections — "follow the chain from A to B to C."

Explicitly models how concepts relate to each other
Enables multi-hop reasoning across the knowledge base
Ideal for: "how does A connect to B through C?"

The Detective Analogy

Vector RAG is a filing cabinet — search by similarity, retrieve relevant documents, each unaware of the others.

Graph RAG is a murder board — photos and strings connecting people, events, and locations. The connections are the intelligence.

Hybrid is the full detective operation. Cabinet for fast lookup, board for deep reasoning.

02 · The Foundation

Embedding
Models

Before anything else works, text must become numbers. An embedding model converts any text into a vector such that similar meanings produce numerically close vectors.

Tokenise

"Bengaluru" → ["Ben","##gal","##uru"] → [4521, 892, 301]

Token Vectors

Each ID → row in embedding table → 768-dim vector

Transformer

Attention layers: every token reads every other token

Mean Pooling

Average N token vectors → 1 sentence embedding

Contrastive Train

InfoNCE loss on positive/negative text pairs

Cosine Similarity: sim(A, B) = (A · B) / (|A| × |B|)

A · B = dot product (sum of component-wise products) | |A| = magnitude = √(Σ aᵢ²) | Result ∈ [-1, +1]

InfoNCE Loss: L = −log [ exp(sim(A, B⁺)/τ) / Σⱼ exp(sim(A, Bⱼ)/τ) ]

B⁺ = similar pair | Bⱼ = all batch items (positives + negatives) | τ = temperature (0.05–0.1)

03 · Learning Mechanism

Back-
propagation

The algorithm that asks: which weights caused the error, and by exactly how much should each one change? It computes all N gradients in a single backward pass — the reason training billion-parameter models is feasible.

→ Forward Pass

01 z = w₁x₁ + w₂x₂ + b (weighted sum)

02 a = σ(z) = 1/(1+e⁻ᶻ) (sigmoid activation)

03 L = (a − y)² (MSE loss — how wrong are we?)

04 Save z, a for use in backward pass

← Backward Pass (Chain Rule)

01 ∂L/∂a = 2(a−y) (how does loss change with output?)

02 ∂a/∂z = a(1−a) (sigmoid derivative)

03 ∂L/∂z = (∂L/∂a) × (∂a/∂z) (chain rule)

04 w ← w − η·(∂L/∂w) (gradient descent update)

Worked Numerical Example — Single Step

Step	Operation	Value
Inputs	x₁=2.0, x₂=3.0, w₁=0.5, w₂=−0.3, b=0.1, y=1.0	—
z	0.5(2.0) + (−0.3)(3.0) + 0.1	0.2
a = σ(z)	1 / (1 + e⁻⁰·²)	0.550
L = (a−y)²	(0.550 − 1.0)²	0.2025
∂L/∂a	2 × (0.550 − 1.0)	−0.900
∂a/∂z	0.550 × (1 − 0.550)	0.2475
∂L/∂z	−0.900 × 0.2475	−0.2228
∂L/∂w₁	−0.2228 × 2.0	−0.4456
New w₁	0.5 − 0.1 × (−0.4456)	0.5446 ↑
New w₂	−0.3 − 0.1 × (−0.6683)	−0.2332 ↑

04 · Graph Organisation

Leiden
Algorithm

After building the knowledge graph, the Leiden algorithm clusters it into communities — dense groups of related nodes. It maximises modularity Q while guaranteeing every community is internally connected.

Modularity: Q = (1/2m) × Σᵢⱼ [ Aᵢⱼ − (kᵢkⱼ/2m) ] × δ(cᵢ,cⱼ)

Aᵢⱼ = edge exists (1/0) | kᵢkⱼ/2m = expected edges (null model) | δ(cᵢ,cⱼ) = same community? (1/0)

Q near 1 = excellent community structure | Q near 0 = no better than random

05 · Fast Retrieval

HNSW Index

Hierarchical Navigable Small World graphs make nearest-neighbour search on millions of vectors run in milliseconds. The key insight: long-range shortcuts for fast global navigation, dense local edges for precise fine-grained search.

Layer 2 (long-range)

Layer 1 (medium-range)

Layer 0 (all vectors)

Query path

Layer assignment: l_max = ⌊ −ln(uniform(0,1)) × m_L ⌋

Search complexity: O( log(n) × ef × d / log(M) )

m_L = 1/ln(M) | M = connections per node | ef = candidate list size | d = dimensions

Result: 10M vectors, 1536 dims → ~2ms per query vs several seconds brute force (1000x+ speedup)

06 · The Complete System

Grand Unified
Architecture

Every algorithm in this guide is an essential gear in the same machine. Here is the complete flow — from new information arriving to an answer being generated.

New text arrives (conversation, document)

—

Tokenise + embed into a vector

Embedding Model (Transformer + mean pooling)

Embedding model weights were learned by

Backpropagation + InfoNCE Loss

Store embedding for fast future retrieval

HNSW Index (O(log n) search)

Extract entities and relationships from text

LLM extraction + coreference resolution

Write nodes and edges to knowledge graph

Graph Database (Neo4j / Neptune)

Cluster graph into topic community hierarchy

Leiden Algorithm (modularity maximisation)

Generate natural-language community summaries

LLM Summarisation

Query arrives — embed query with same model

Embedding Model

Find nearest stored vectors in ~2ms

HNSW Search

Traverse graph for entity context (local queries)

Graph Traversal → Local Search

Retrieve community summaries (global queries)

GraphRAG Global Search

Synthesise all context → answer

LLM Generation

Key Insight

Vectors find the right neighbourhood in the knowledge base. Graphs navigate the streets and connections within that neighbourhood. Leiden organises the map. HNSW makes the search instant. Backpropagation is what taught the model to read the map at all.

AgenticMemory

RAG — TwoApproaches

Vector RAG

Graph RAG

EmbeddingModels

Back-propagation