🧠 Vector Search (HNSW) — AI / ML Interview Guide

LLM Applications · interactive visualization + interview prep

Open the interactive Vector Search (HNSW) visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.

What it is

Finding the nearest vector by comparing the query to all N vectors is too slow at millions of items. HNSW builds a layered "small-world" graph: sparse long hops at the top get you near the right region fast, then denser layers refine the search — so you only touch a handful of nodes instead of all of them.

Mental model

Searching a million vectors one-by-one is the "drive every road" problem. HNSW builds a layered map: the TOP layer is a sparse highway network of a few far-apart hubs; each lower layer adds more local streets. A search starts on the highway to jump near the target REGION in a few big hops, then descends layer by layer to local streets to pin the exact nearest neighbor — touching a handful of nodes instead of all of them.

Theory

Exact nearest-neighbor search compares the query to all N vectors — O(N·d) per query, which is hopeless at millions/billions of items and many queries per second. So vector search accepts APPROXIMATE nearest neighbors (ANN): trade a little recall (occasionally missing the true closest) for orders-of-magnitude speed. HNSW is the dominant ANN graph index.

The core engine is greedy search on a "small-world" proximity graph: from a starting node, repeatedly move to whichever neighbor is closest to the query, stopping when no neighbor is closer. On a well-connected graph this reaches a near-neighbor quickly, but on a single flat graph it needs many short steps to cross the space.

The hierarchy fixes that. Each node is assigned a random maximum level from a geometric distribution, so few nodes reach high levels. Upper layers are sparse graphs of only the high-level nodes, giving long-range "highway" hops; lower layers are progressively denser. Search enters at the top, greedily hops to get near the region, then DESCENDS a layer and refines — reaching the neighborhood in roughly O(log N) hops.

Two parameters tune the recall / latency / memory triangle. M is the number of neighbors per node (graph degree): higher M raises recall and memory. efSearch is the size of the candidate list kept during search: higher efSearch raises recall but slows queries (efConstruction does the same at build time). There is no free lunch — you pick the operating point your application needs.

Versus alternatives: IVF (inverted file) clusters vectors and only scans a few cells — lower memory, easy to shard, slightly lower recall — and is often combined with Product Quantization (PQ) to compress vectors. HNSW gives the best recall/latency but is memory-heavy (M links per vector) and costlier to update incrementally. Knowing this trade-off is the typical interview payoff.

Concrete example

A vector DB (Pinecone, pgvector, FAISS-HNSW) storing 10M document embeddings. A RAG query embeds the question and asks for the top-k nearest chunks; HNSW returns them in milliseconds by hopping through the graph instead of scanning all 10M.

Key equations

each node gets a random max level ℓ ~ ⌊−ln(U)·mL⌋ (few nodes reach high levels)
layer ℓ = a proximity graph over nodes with level ≥ ℓ
search: enter at the top layer’s entry point
greedily hop to the neighbor closest to the query; when none is closer, descend
repeat down to layer 0 → approximate nearest neighbor (ANN)

Step by step

Nodes are vectors; bigger/brighter = higher max layer.
Search starts at the entry point on the top layer.
At each layer, greedily move to the neighbor nearest the query (cyan hops).
Descend a layer and repeat — the search zooms in.
Compare distance comparisons made vs brute-force N — that gap is the speedup.

Interview questions & answers

Why is HNSW approximate, not exact?

Greedy graph traversal can settle in a local optimum and miss the true nearest neighbor. You trade a little recall for a huge speedup; parameters (efSearch, M) tune that trade-off.

What do M and efSearch control?

M = neighbors per node (graph degree): higher M ⇒ better recall, more memory. efSearch = size of the candidate list during search: higher ⇒ better recall, slower. They tune the accuracy/latency/memory trade-off.

HNSW vs IVF (inverted file) indexes?

HNSW is a graph index: great recall/latency, higher memory, costly updates. IVF clusters vectors and scans a few cells: lower memory, easy to shard, slightly lower recall. Many systems combine IVF + PQ for compression.

Why the hierarchical layers instead of one flat graph?

The sparse upper layers provide long-range jumps so the search reaches the right neighborhood in O(log N) hops, instead of walking many short edges on a single dense graph.

Common pitfalls

Assuming ANN results are exact — validate recall for your use case.
Ignoring memory: HNSW graphs are large (M links per vector).
Rebuilding from scratch on every update; plan for incremental inserts/deletes.

Where it shows up

Vector DBs: Pinecone, Weaviate, Qdrant, Milvus, pgvector, FAISS (HNSW)
Semantic search / RAG retrieval at scale
Recommendation & dedup over embeddings

More AI / ML interview concepts

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…