hnsw algorithm explained

How HNSW Works: A Visual Explanation

If you build vector search systems, HNSW appears everywhere: databases, retrieval APIs, and now browser-native libraries. But many explanations remain abstract. This article gives a practical mental model for developers who want to tune performance rather than memorize theory. We will walk through the layered graph idea, query traversal, insertion strategy, and key memory-speed trade-offs. We also explain why a 54KB WebAssembly implementation can still be useful in frontend products.

Install altor-vec: npm install altor-vec

Problem setup

Given a query vector, find the nearest vectors in a large set. Exact scan computes distance to every item, which is expensive as corpus size grows. Approximate nearest neighbor (ANN) methods reduce work while preserving high recall. HNSW is a graph-based ANN method with strong empirical performance and good practical tunability.

Core intuition: layered navigable graph

Imagine a city map. Highways help you traverse long distances quickly; local streets help you find exact addresses. HNSW mirrors that idea: upper layers are sparse “highways,” lower layers are dense “streets.” Search starts at the top for coarse routing and descends to refine neighbors.

Layer 3 (very sparse): o ---- o ----- o \ / Layer 2 (sparse): o -- o -- o -- o -- o \ | / \ | Layer 1 (dense): o-o-o-o-o-o-o-o-o-o-o-o \|/\|/\|/\|/\|/\|/ Layer 0 (densest): full neighborhood graph for local refinement

Each node appears in layer 0 and possibly in higher layers, chosen probabilistically. Fewer nodes appear in higher layers, producing logarithmic-like navigation behavior in many datasets.

Query algorithm step by step

  1. Pick an entry point at highest layer.
  2. Greedy walk: move to neighbor if it is closer to query.
  3. When no better neighbor exists at current layer, descend one level.
  4. At bottom layer, perform candidate expansion using a priority queue (ef_search).
  5. Return top-k closest candidates found.
function hnswSearch(query, topK, efSearch):
  ep = entryPointAtTopLayer
  for layer from maxLayer downto 1:
    ep = greedyDescent(layer, query, ep)

  candidates = priorityQueue([ep])
  visited = set([ep])
  best = maxHeap(capacity = efSearch)

  while candidates not empty:
    cur = candidates.popClosest(query)
    for n in neighbors(cur, layer=0):
      if n in visited: continue
      visited.add(n)
      if shouldExplore(n, best, query):
        candidates.push(n)
        best.push(n)

  return best.topK(topK)

The “best frontier” logic is the reason HNSW usually outperforms naive greedy methods. It explores multiple promising candidates rather than committing to one path too early.

Index construction basics

Construction inserts vectors one by one. For each new vector:

This process balances graph sparsity and local connectivity. Better graph quality yields higher recall at the same ef_search.

Trade-off knobs

M (max connections)

Higher M improves navigability and recall, especially in complex manifolds, but increases memory footprint and build time.

ef_construction

Higher values improve insertion quality (better edges), again at higher build cost.

ef_search

Higher ef_search increases query candidate breadth, improving recall but increasing query latency. For UI search, tune to keep p95 stable.

// altor-vec build example
const engine = WasmSearchEngine.from_vectors(
  vectorsFlat,
  384, // dim
  16,  // M
  200, // ef_construction
  50   // ef_search default
);

Memory vs speed in browser context

Browser execution changes constraints. CPU is usually sufficient for moderate corpora, but memory and download size matter more than server deployments. A small, efficient engine helps, but metadata design and corpus scope still dominate real-world UX. Keep index assets tight, remove unused fields, and consider domain-partitioned indexes when payload grows.

Why can HNSW still be useful in a 54KB WASM package? Because package size and index size are separate concerns. The runtime algorithm code can be small while operating over externally loaded data files. If your index is appropriately scoped, query latency can remain sub-millisecond on modern devices.

Why HNSW works well empirically

Real embedding spaces are not random point clouds. They often have local semantic neighborhoods and navigable structure. HNSW exploits this by maintaining shortcuts and local refinement layers. While worst-case guarantees are complex, practical datasets in search and recommendation tend to be “friendly” enough that HNSW reaches strong recall-speed trade-offs.

Debugging low recall

Browser implementation pattern

await init();
const bytes = new Uint8Array(await (await fetch('/index.bin')).arrayBuffer());
const engine = new WasmSearchEngine(bytes);

export function nearest(queryVector, topK = 6) {
  return JSON.parse(engine.search(new Float32Array(queryVector), topK));
}

Keep this path pure and deterministic. Do model inference and UI work around it, not inside it.

Conclusion

HNSW is best understood as a two-phase navigation strategy: long-range movement in sparse layers and local refinement in dense layers. The algorithm is fast because it avoids exhaustive comparisons while still exploring enough candidate paths to keep recall high. For browser products, this makes HNSW an excellent retrieval primitive when dataset size is moderate and low latency matters. With altor-vec, you can apply these ideas directly in JavaScript applications without maintaining a dedicated vector backend.

CTA: npm install altor-vec · Star on GitHub