What is the cold start overhead for WASM in the browser?

For altor-vec's 54KB WASM module, cold start (parse + compile + instantiate) typically takes 5-30ms in Chrome and Firefox. Subsequent calls have zero overhead — the compiled module is cached. The index file download (1-20MB depending on corpus size) is the larger latency factor on first load.

How does HNSW graph search work in WASM memory?

The HNSW graph is stored in WASM linear memory as a contiguous block — node vectors, adjacency lists, and level assignments in a compact binary layout. JavaScript accesses this via a SharedArrayBuffer-like interface using Float32Array. The search algorithm never crosses the WASM/JS boundary during traversal — it stays in WASM memory until it has the final result list to return.

wasm vector search javascript

WASM Vector Search in JavaScript — How HNSW Runs in the Browser

Q: Why is WASM faster than pure JavaScript for vector search?

WASM runs at near-native speed because it uses a compact binary format that the browser compiles to machine code before execution. For vector search, the inner loop computes thousands of floating-point dot products per query — exactly the kind of tight numeric loop where WASM's predictable memory model and lack of garbage collection overhead produce 3-10x better throughput than equivalent JavaScript.

altor-vec runs HNSW approximate nearest-neighbor search in the browser at under 1ms p95 for 10,000 vectors. The reason it can do this in 54KB is specific to how Rust compiles to WASM and how the HNSW algorithm maps to WASM's memory model. This post explains the implementation decisions that make this possible.

Install: npm install altor-vec

Why not just implement HNSW in JavaScript?

The HNSW search inner loop computes cosine similarity between a query vector and candidate vectors. For 384-dimensional vectors, each similarity computation is 384 multiply-accumulate operations. A single query at 10,000 vectors might visit 100-500 candidate nodes, meaning 38,400 to 192,000 floating-point operations per query.

JavaScript handles this, but with two penalties that WASM avoids:

JIT unpredictability: V8's JIT compiler optimizes JavaScript to near-native speed for simple numeric loops, but the optimization isn't guaranteed. The JIT can deoptimize a tight loop if it encounters unexpected types or object shapes. WASM compiles to machine code deterministically — the same performance every time.
Garbage collection pressure: JavaScript's GC runs unpredictably. If a GC cycle triggers mid-search, query latency spikes. WASM linear memory has no GC — allocation is explicit and deterministic.

Approach	10K vectors, 384d (p50)	10K vectors, 384d (p99)	Memory layout
altor-vec (WASM)	0.3ms	0.8ms	WASM linear memory
Pure JS HNSW equivalent	2-4ms	8-20ms (GC spikes)	JS heap, GC-managed
fuse.js (full scan)	3-8ms	12-25ms	JS heap

From Rust to WASM: what wasm-bindgen does

altor-vec is written in Rust and compiled to WASM via wasm-pack with wasm-bindgen. The compilation produces three artifacts: a .wasm binary, a JavaScript glue file, and TypeScript type definitions.

wasm-bindgen generates the bridge between Rust types and JavaScript types. For altor-vec's primary API:

// Rust (src/lib.rs — simplified)
#[wasm_bindgen]
pub struct WasmSearchEngine {
    index: HnswIndex,
}

#[wasm_bindgen]
impl WasmSearchEngine {
    #[wasm_bindgen(constructor)]
    pub fn new(bytes: &[u8]) -> WasmSearchEngine {
        let index = HnswIndex::from_bytes(bytes);
        WasmSearchEngine { index }
    }

    pub fn search(&self, query: &[f32], top_k: usize) -> String {
        let results = self.index.search(query, top_k);
        serde_json::to_string(&results).unwrap()
    }

    pub fn to_bytes(&self) -> Vec<u8> {
        self.index.to_bytes()
    }
}

The generated JavaScript glue handles memory allocation and type conversion. When you call engine.search(new Float32Array([...]), 5):

The glue code copies your Float32Array into WASM linear memory at a freshly allocated pointer
It calls the Rust search function with that pointer and length
Rust reads the vector directly from WASM memory — no JS/WASM boundary crossing during computation
The result (a JSON string of [id, distance] pairs) is written to WASM memory and copied back to JS as a string
The temporary allocation in WASM is freed

The key insight: once the query vector is in WASM memory, the entire HNSW traversal runs in Rust at native speed. The JS/WASM boundary is crossed exactly twice — once to pass the query in, once to receive the results.

HNSW memory layout in WASM linear memory

HNSW stores vectors and graph structure. In altor-vec's implementation, the in-memory layout is:

WASM linear memory:
┌─────────────────────────────────────────┐
│ Header (8 bytes)                        │
│   - dimension (u32)                     │
│   - node_count (u32)                    │
├─────────────────────────────────────────┤
│ Vector data (node_count * dim * 4 bytes)│
│   - f32 vectors, row-major              │
│   - 10K vectors @ 384d = 15.36MB        │
├─────────────────────────────────────────┤
│ HNSW graph layers                       │
│   - per-node: level, neighbors per layer│
│   - compact adjacency lists (u32 IDs)   │
│   - typically 10-15% of vector data     │
└─────────────────────────────────────────┘

The vector data is contiguous f32 values in row-major order. Computing cosine similarity between a query vector and node i is a sequential read starting at offset i * dim * 4 — friendly to CPU cache prefetching and SIMD auto-vectorization in the Rust compiler.

Why the bundle is 54KB

Standard Rust programs include the standard library, allocator, and runtime. For WASM targets, wasm-pack uses a stripped-down configuration:

No standard library filesystem or networking — not available in WASM anyway
wee_alloc — a minimal allocator optimized for binary size rather than allocation speed
Link-time optimization (LTO) — removes dead code aggressively across crate boundaries
wasm-opt — post-compilation optimizer that reduces WASM binary size further
gzip — the final 54KB figure is the compressed transfer size; uncompressed is ~117KB

The tradeoffs: wee_alloc is slower than the default allocator for many small allocations. For altor-vec, most allocations are large (the vector array, adjacency lists) or rare (index construction), so this tradeoff is favorable.

Float32Array: zero-copy transfer

When JavaScript calls engine.search(queryVector, topK) with a Float32Array, the generated glue code doesn't necessarily copy the data — it can pass a view into the existing memory if the buffer is in a compatible form. For typical usage (Float32Array backed by an ArrayBuffer), the path is:

// JavaScript side
const queryVec = new Float32Array([0.1, 0.2, ..., 0.384]); // 384 floats
const results = engine.search(queryVec, 5);
// results is a JSON string: "[[42, 0.12], [7, 0.18], ...]"

// What happens inside wasm-bindgen glue:
// 1. Allocate space in WASM memory: wasm.__wbindgen_malloc(queryVec.byteLength, 4)
// 2. Copy Float32Array bytes to WASM memory
// 3. Call Rust search() with pointer + length
// 4. Free the allocation after call

The copy in step 2 is one memcpy of 384 * 4 = 1,536 bytes — roughly 0.1 microseconds on modern hardware. This is not the bottleneck. The bottleneck is the HNSW traversal itself, which for 10,000 vectors at ef=50 visits approximately 50-200 nodes.

Cold start: parsing and compilation

The first call to init() triggers WASM instantiation. The browser parses the binary and compiles it to native code. For altor-vec's 117KB uncompressed WASM:

Chrome (V8): 5-15ms parse + compile
Firefox (SpiderMonkey): 8-25ms parse + compile
Safari (JavaScriptCore): 10-30ms

After the first instantiation, the compiled module is cached — subsequent page loads skip compilation entirely. The index file download (1-20MB depending on corpus) is a larger factor than WASM compilation for most users.

WASM streaming compilation: Modern browsers support streaming compilation, which parses and compiles WASM incrementally as it downloads. To use it, serve the WASM file with Content-Type: application/wasm. This overlaps download and compilation, reducing perceived cold start by 30-50%.

Benchmarks: measured on real hardware

Benchmarks below were measured on a 2023 MacBook Pro (M2 Pro), Chrome 124. Each measurement is the median of 100 queries after warm-up.

Index size	Dimensions	ef_search	p50 latency	p95 latency	Recall@10
1,000 vectors	384	50	0.08ms	0.15ms	98.2%
10,000 vectors	384	50	0.31ms	0.62ms	95.4%
10,000 vectors	768	50	0.58ms	1.1ms	96.1%
50,000 vectors	384	50	0.71ms	1.4ms	93.8%
10,000 vectors	384	200	0.89ms	1.6ms	99.1%

The ef_search parameter controls the search beam width — higher values explore more of the graph and improve recall at the cost of latency. For most documentation search use cases, ef=50 at 95% recall is the right balance.

Index construction: from_vectors vs loading bytes

Two ways to get an engine into the browser:

// Option 1: Build from raw vectors in JavaScript
// (slow — use this in Node build scripts, not in browser)
const engine = WasmSearchEngine.from_vectors(vectors, dim, M, ef_construction, ef_search);

// Option 2: Load pre-built index from a binary file
// (fast — this is the browser-side path)
const bytes = await fetch('/search-index.bin').then(r => r.arrayBuffer());
const engine = new WasmSearchEngine(new Uint8Array(bytes));

from_vectors runs HNSW graph construction — an O(n log n) operation that takes seconds for large indexes. Always run this offline in a build script. new WasmSearchEngine(bytes) just deserializes a pre-built graph, which is fast (a few hundred milliseconds for a 15MB index).

Memory usage in practice

For a 10,000-document index at 384 dimensions:

Vector data: 10,000 × 384 × 4 bytes = 15.36MB
Graph structure (HNSW adjacency lists): approximately 2-3MB
WASM module: 117KB uncompressed
Total WASM linear memory: ~18MB

This is well within Chrome's typical WASM memory limit (2GB by default on 64-bit systems). For larger indexes, the main constraint is the download size and initial memory allocation time, not WASM's capability.

FAQ

Why is WASM faster than pure JavaScript for vector search?

WASM compiles to machine code deterministically — no JIT unpredictability. Its linear memory model has no garbage collection, eliminating GC pause spikes. For tight numeric loops like dot products over float arrays, WASM achieves 3-10x better throughput than equivalent JavaScript.

What is the cold start overhead?

For altor-vec's 117KB WASM, compile time is 5-30ms depending on browser. The compiled module is cached, so subsequent page loads skip compilation entirely. The index file download (1-20MB) is the dominant first-load cost.

How does HNSW graph traversal stay in WASM?

Once the query Float32Array is copied to WASM linear memory, the entire HNSW traversal runs in Rust. The JS/WASM boundary is crossed twice per query: once to copy the query vector in, once to receive the result string out. All node comparisons happen in WASM memory with no boundary crossings.

Run it yourself: npm install altor-vec · GitHub