React guide
Semantic Autocomplete in React with altor-vec
Use altor-vec to add semantic autocomplete to your React app — entirely in the browser, with no server, no API keys, and zero per-query cost. Show search suggestions as the user types, ranked by semantic similarity rather than prefix matching — surfaces conceptually related completions even when the query doesn't share keywords with the content.
npm install altor-vec @xenova/transformersImplementation
Works with Vite, CRA, or any React 18+ setup. Uses useState + useRef for the engine.
// SemanticAutocomplete.tsx — debounced React autocomplete
import { useState, useEffect, useRef, useCallback } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';
type Item = { id: number; label: string };
export function SemanticAutocomplete({ items }: { items: Item[] }) {
const engine = useRef(null);
const embedder = useRef(null);
const debounceRef = useRef>();
const [query, setQuery] = useState('');
const [suggestions, setSuggestions] = useState- ([]);
const [ready, setReady] = useState(false);
useEffect(() => {
(async () => {
await init();
embedder.current = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;
const vecs = new Float32Array(items.length * DIM);
for (const [i, item] of items.entries()) {
const out = await embedder.current(item.label, { pooling: 'mean', normalize: true });
vecs.set(out.data, i * DIM);
}
engine.current = WasmSearchEngine.from_vectors(vecs, DIM, 16, 200, 50);
setReady(true);
})();
}, []);
const handleChange = useCallback((e: React.ChangeEvent
) => {
const q = e.target.value;
setQuery(q);
clearTimeout(debounceRef.current);
if (q.length < 2) { setSuggestions([]); return; }
debounceRef.current = setTimeout(async () => {
if (!engine.current) return;
const out = await embedder.current(q, { pooling: 'mean', normalize: true });
const hits = JSON.parse(engine.current.search(new Float32Array(out.data), 5));
setSuggestions(hits.map((h: any) => items[h.id]));
}, 250);
}, [items]);
return (
{suggestions.length > 0 && (
{suggestions.map(s => - {s.label}
)}
)}
);
}
Performance
5K items: <0.1ms per keystroke. 50K items: ~0.9ms — fast enough for real-time autocomplete. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.
| Index size | Dimensions | Query p50 | Memory |
|---|---|---|---|
| 1,000 vectors | 384 | ~0.1ms | ~2MB |
| 10,000 vectors | 384 | ~0.4ms | ~17MB |
| 50,000 vectors | 384 | ~0.9ms | ~85MB |
When this approach works best
- Search boxes with 500–50K candidate items
- Apps where users often know the concept but not the exact name
- Offline-capable apps where server round-trips are not an option
Limitations
- Very short queries (1–2 characters) embed poorly — consider minimum query length of 3 characters
- Embedding each keystroke is slow if done in-browser; use debouncing (200–300ms) or pre-embed common short queries
Frequently asked questions
How do I debounce the embedding call so it doesn't fire on every keystroke?
Use a setTimeout/clearTimeout debounce of 200-300ms. Only embed and search when the user pauses typing. This avoids flooding the embedding model with partial queries.
Can I show autocomplete suggestions before the full embedding model loads?
Yes. Show a lightweight keyword prefix-match fallback while the WASM + embedding model initializes, then switch to semantic results once ready. The transition is usually seamless within 1-2 seconds of page load.
What's the difference between semantic autocomplete and prefix autocomplete?
Prefix autocomplete only matches strings that start with the typed characters. Semantic autocomplete finds items that are conceptually related even if no word matches — e.g., typing 'fast search' might surface a document titled 'HNSW: Sub-millisecond retrieval'.