hnsw algorithm explained
How HNSW Works: A Visual Explanation
If you build vector search systems, HNSW appears everywhere: databases, retrieval APIs, and now browser-native libraries. But many explanations remain abstract. This article gives a practical mental model for developers who want to tune performance rather than memorize theory. We will walk through the layered graph idea, query traversal, insertion strategy, and key memory-speed trade-offs. We also explain why a 54KB WebAssembly implementation can still be useful in frontend products.
npm install altor-vecProblem setup
Given a query vector, find the nearest vectors in a large set. Exact scan computes distance to every item, which is expensive as corpus size grows. Approximate nearest neighbor (ANN) methods reduce work while preserving high recall. HNSW is a graph-based ANN method with strong empirical performance and good practical tunability.
Core intuition: layered navigable graph
Imagine a city map. Highways help you traverse long distances quickly; local streets help you find exact addresses. HNSW mirrors that idea: upper layers are sparse “highways,” lower layers are dense “streets.” Search starts at the top for coarse routing and descends to refine neighbors.
Layer 3 (very sparse): o ---- o ----- o
\ /
Layer 2 (sparse): o -- o -- o -- o -- o
\ | / \ |
Layer 1 (dense): o-o-o-o-o-o-o-o-o-o-o-o
\|/\|/\|/\|/\|/\|/
Layer 0 (densest): full neighborhood graph for local refinement
Each node appears in layer 0 and possibly in higher layers, chosen probabilistically. Fewer nodes appear in higher layers, producing logarithmic-like navigation behavior in many datasets.
Query algorithm step by step
- Pick an entry point at highest layer.
- Greedy walk: move to neighbor if it is closer to query.
- When no better neighbor exists at current layer, descend one level.
- At bottom layer, perform candidate expansion using a priority queue (ef_search).
- Return top-k closest candidates found.
function hnswSearch(query, topK, efSearch):
ep = entryPointAtTopLayer
for layer from maxLayer downto 1:
ep = greedyDescent(layer, query, ep)
candidates = priorityQueue([ep])
visited = set([ep])
best = maxHeap(capacity = efSearch)
while candidates not empty:
cur = candidates.popClosest(query)
for n in neighbors(cur, layer=0):
if n in visited: continue
visited.add(n)
if shouldExplore(n, best, query):
candidates.push(n)
best.push(n)
return best.topK(topK)
The “best frontier” logic is the reason HNSW usually outperforms naive greedy methods. It explores multiple promising candidates rather than committing to one path too early.
Index construction basics
Construction inserts vectors one by one. For each new vector:
- Sample its maximum layer from an exponential distribution.
- Navigate from top entry point to the insertion layer.
- At each layer, select M neighbors using heuristic diversity filtering.
- Add bidirectional edges, potentially pruning overflow links.
This process balances graph sparsity and local connectivity. Better graph quality yields higher recall at the same ef_search.
Trade-off knobs
M (max connections)
Higher M improves navigability and recall, especially in complex manifolds, but increases memory footprint and build time.
ef_construction
Higher values improve insertion quality (better edges), again at higher build cost.
ef_search
Higher ef_search increases query candidate breadth, improving recall but increasing query latency. For UI search, tune to keep p95 stable.
// altor-vec build example
const engine = WasmSearchEngine.from_vectors(
vectorsFlat,
384, // dim
16, // M
200, // ef_construction
50 // ef_search default
);
Memory vs speed in browser context
Browser execution changes constraints. CPU is usually sufficient for moderate corpora, but memory and download size matter more than server deployments. A small, efficient engine helps, but metadata design and corpus scope still dominate real-world UX. Keep index assets tight, remove unused fields, and consider domain-partitioned indexes when payload grows.
Why can HNSW still be useful in a 54KB WASM package? Because package size and index size are separate concerns. The runtime algorithm code can be small while operating over externally loaded data files. If your index is appropriately scoped, query latency can remain sub-millisecond on modern devices.
Why HNSW works well empirically
Real embedding spaces are not random point clouds. They often have local semantic neighborhoods and navigable structure. HNSW exploits this by maintaining shortcuts and local refinement layers. While worst-case guarantees are complex, practical datasets in search and recommendation tend to be “friendly” enough that HNSW reaches strong recall-speed trade-offs.
Debugging low recall
- Check embedding consistency: same model/version for index and query vectors.
- Increase ef_search before changing M; it is easier and runtime-only.
- Evaluate with benchmark queries, not anecdotal single searches.
- Inspect chunking strategy; poor chunks hurt retrieval more than graph tuning.
Browser implementation pattern
await init();
const bytes = new Uint8Array(await (await fetch('/index.bin')).arrayBuffer());
const engine = new WasmSearchEngine(bytes);
export function nearest(queryVector, topK = 6) {
return JSON.parse(engine.search(new Float32Array(queryVector), topK));
}
Keep this path pure and deterministic. Do model inference and UI work around it, not inside it.
Conclusion
HNSW is best understood as a two-phase navigation strategy: long-range movement in sparse layers and local refinement in dense layers. The algorithm is fast because it avoids exhaustive comparisons while still exploring enough candidate paths to keep recall high. For browser products, this makes HNSW an excellent retrieval primitive when dataset size is moderate and low latency matters. With altor-vec, you can apply these ideas directly in JavaScript applications without maintaining a dedicated vector backend.