Deep-Dive AI Tutorial Series · 2026

Agentic
Memory

How AI agents remember, reason over relationships, and retrieve knowledge at scale — from raw text to navigable knowledge graphs.

Vector RAG Graph RAG Backprop Leiden HNSW GraphRAG
Scroll to explore
01 · The Memory Problem

RAG — Two
Approaches

A stateless agent forgets everything between conversations. RAG — Retrieval-Augmented Generation — solves this by storing knowledge externally and fetching only what's relevant, when it's needed.

Vector RAG

Stores text as numerical vectors. Retrieves by semantic similarity — "find things with similar meaning." Fast, scalable, excellent fuzzy search.

  • Chunks are isolated islands — no structural connections
  • Cannot follow chains of reasoning across facts
  • Ideal for: "find me docs about X"

Graph RAG

Stores entities and relationships as a knowledge graph. Retrieves by traversing connections — "follow the chain from A to B to C."

  • Explicitly models how concepts relate to each other
  • Enables multi-hop reasoning across the knowledge base
  • Ideal for: "how does A connect to B through C?"
The Detective Analogy

Vector RAG is a filing cabinet — search by similarity, retrieve relevant documents, each unaware of the others.

Graph RAG is a murder board — photos and strings connecting people, events, and locations. The connections are the intelligence.

Hybrid is the full detective operation. Cabinet for fast lookup, board for deep reasoning.

02 · The Foundation

Embedding
Models

Before anything else works, text must become numbers. An embedding model converts any text into a vector such that similar meanings produce numerically close vectors.

01
Tokenise
"Bengaluru" → ["Ben","##gal","##uru"] → [4521, 892, 301]
02
Token Vectors
Each ID → row in embedding table → 768-dim vector
03
Transformer
Attention layers: every token reads every other token
04
Mean Pooling
Average N token vectors → 1 sentence embedding
05
Contrastive Train
InfoNCE loss on positive/negative text pairs
Cosine Similarity: sim(A, B) = (A · B) / (|A| × |B|)
A · B = dot product (sum of component-wise products)  |  |A| = magnitude = √(Σ aᵢ²)  |  Result ∈ [-1, +1]
InfoNCE Loss: L = −log [ exp(sim(A, B⁺)/τ) / Σⱼ exp(sim(A, Bⱼ)/τ) ]
B⁺ = similar pair  |  Bⱼ = all batch items (positives + negatives)  |  τ = temperature (0.05–0.1)
03 · Learning Mechanism

Back-
propagation

The algorithm that asks: which weights caused the error, and by exactly how much should each one change? It computes all N gradients in a single backward pass — the reason training billion-parameter models is feasible.

→ Forward Pass

01 z = w₁x₁ + w₂x₂ + b  (weighted sum)
02 a = σ(z) = 1/(1+e⁻ᶻ)  (sigmoid activation)
03 L = (a − y)²  (MSE loss — how wrong are we?)
04 Save z, a for use in backward pass

← Backward Pass (Chain Rule)

01 ∂L/∂a = 2(a−y)  (how does loss change with output?)
02 ∂a/∂z = a(1−a)  (sigmoid derivative)
03 ∂L/∂z = (∂L/∂a) × (∂a/∂z)  (chain rule)
04 w ← w − η·(∂L/∂w)  (gradient descent update)
Worked Numerical Example — Single Step
StepOperationValue
Inputsx₁=2.0, x₂=3.0, w₁=0.5, w₂=−0.3, b=0.1, y=1.0
z0.5(2.0) + (−0.3)(3.0) + 0.10.2
a = σ(z)1 / (1 + e⁻⁰·²)0.550
L = (a−y)²(0.550 − 1.0)²0.2025
∂L/∂a2 × (0.550 − 1.0)−0.900
∂a/∂z0.550 × (1 − 0.550)0.2475
∂L/∂z−0.900 × 0.2475−0.2228
∂L/∂w₁−0.2228 × 2.0−0.4456
New w₁0.5 − 0.1 × (−0.4456)0.5446 ↑
New w₂−0.3 − 0.1 × (−0.6683)−0.2332 ↑
04 · Graph Organisation

Leiden
Algorithm

After building the knowledge graph, the Leiden algorithm clusters it into communities — dense groups of related nodes. It maximises modularity Q while guaranteeing every community is internally connected.

Modularity: Q = (1/2m) × Σᵢⱼ [ Aᵢⱼ − (kᵢkⱼ/2m) ] × δ(cᵢ,cⱼ)
Aᵢⱼ = edge exists (1/0)  |  kᵢkⱼ/2m = expected edges (null model)  |  δ(cᵢ,cⱼ) = same community? (1/0)
Q near 1 = excellent community structure  |  Q near 0 = no better than random
05 · Fast Retrieval

HNSW Index

Hierarchical Navigable Small World graphs make nearest-neighbour search on millions of vectors run in milliseconds. The key insight: long-range shortcuts for fast global navigation, dense local edges for precise fine-grained search.

Layer 2 (long-range)
Layer 1 (medium-range)
Layer 0 (all vectors)
Query path
Layer assignment: l_max = ⌊ −ln(uniform(0,1)) × m_L ⌋
Search complexity: O( log(n) × ef × d / log(M) )
m_L = 1/ln(M)  |  M = connections per node  |  ef = candidate list size  |  d = dimensions
Result: 10M vectors, 1536 dims → ~2ms per query vs several seconds brute force (1000x+ speedup)
06 · The Complete System

Grand Unified
Architecture

Every algorithm in this guide is an essential gear in the same machine. Here is the complete flow — from new information arriving to an answer being generated.

01
New text arrives (conversation, document)
02
Tokenise + embed into a vector
Embedding Model (Transformer + mean pooling)
03
Embedding model weights were learned by
Backpropagation + InfoNCE Loss
04
Store embedding for fast future retrieval
HNSW Index (O(log n) search)
05
Extract entities and relationships from text
LLM extraction + coreference resolution
06
Write nodes and edges to knowledge graph
Graph Database (Neo4j / Neptune)
07
Cluster graph into topic community hierarchy
Leiden Algorithm (modularity maximisation)
08
Generate natural-language community summaries
LLM Summarisation
09
Query arrives — embed query with same model
Embedding Model
10
Find nearest stored vectors in ~2ms
HNSW Search
11
Traverse graph for entity context (local queries)
Graph Traversal → Local Search
12
Retrieve community summaries (global queries)
GraphRAG Global Search
13
Synthesise all context → answer
LLM Generation
Key Insight

Vectors find the right neighbourhood in the knowledge base. Graphs navigate the streets and connections within that neighbourhood. Leiden organises the map. HNSW makes the search instant. Backpropagation is what taught the model to read the map at all.