Node.js guide

Semantic Autocomplete in Node.js with altor-vec

Use altor-vec to add semantic autocomplete to your Node.js app — entirely in the browser, with no server, no API keys, and zero per-query cost. Show search suggestions as the user types, ranked by semantic similarity rather than prefix matching — surfaces conceptually related completions even when the query doesn't share keywords with the content.

Install: npm install altor-vec @xenova/transformers

Implementation

Server-side indexing script (Node 18+, ESM). Uses module-level variable for the engine.

// build-autocomplete-index.mjs — Node.js: build autocomplete index
// Suitable for: site navigation, command palette, tag search, etc.
import { pipeline } from '@xenova/transformers';
import init, { WasmSearchEngine } from 'altor-vec';
import { writeFileSync } from 'fs';

// Your autocomplete items (nav links, commands, tags, etc.)
const items = [
  { id: 0, label: 'Getting started guide', url: '/getting-started' },
  { id: 1, label: 'API reference', url: '/api' },
  { id: 2, label: 'React integration guide', url: '/frameworks/react' },
  { id: 3, label: 'Next.js integration guide', url: '/frameworks/nextjs' },
  { id: 4, label: 'Document search example', url: '/examples/document-search' },
  // ... add all your navigation/command items
];

await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;
const vectors = new Float32Array(items.length * DIM);

for (const [i, item] of items.entries()) {
  const out = await embedder(item.label, { pooling: 'mean', normalize: true });
  vectors.set(out.data, i * DIM);
}

const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
writeFileSync('public/autocomplete-index.json', engine.to_json());
writeFileSync('public/autocomplete-items.json', JSON.stringify(items));
console.log(\`Autocomplete index ready: \${items.length} items\`);

// In the browser:
// const engine = WasmSearchEngine.from_json(await fetch('/autocomplete-index.json').then(r => r.text()));
// const items = await fetch('/autocomplete-items.json').then(r => r.json());
// const hits = JSON.parse(engine.search(queryEmbedding, 5)).map(h => items[h.id]);

Performance

5K items: <0.1ms per keystroke. 50K items: ~0.9ms — fast enough for real-time autocomplete. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.

Index sizeDimensionsQuery p50Memory
1,000 vectors384~0.1ms~2MB
10,000 vectors384~0.4ms~17MB
50,000 vectors384~0.9ms~85MB

When this approach works best

Limitations

Frequently asked questions

How do I debounce the embedding call so it doesn't fire on every keystroke?

Use a setTimeout/clearTimeout debounce of 200-300ms. Only embed and search when the user pauses typing. This avoids flooding the embedding model with partial queries.

Can I show autocomplete suggestions before the full embedding model loads?

Yes. Show a lightweight keyword prefix-match fallback while the WASM + embedding model initializes, then switch to semantic results once ready. The transition is usually seamless within 1-2 seconds of page load.

What's the difference between semantic autocomplete and prefix autocomplete?

Prefix autocomplete only matches strings that start with the typed characters. Semantic autocomplete finds items that are conceptually related even if no word matches — e.g., typing 'fast search' might surface a document titled 'HNSW: Sub-millisecond retrieval'.

Related resources

framework

use case

reference