How is semantic autocomplete different from prefix matching?

Prefix matching relies on string starts-with, while semantic autocomplete ranks by vector similarity to intent and meaning.

Can I combine semantic and lexical autocomplete?

Yes. Hybrid ranking works best: lexical boosts exact entities, semantic ranking handles natural-language intent.

Is sub-millisecond autocomplete realistic?

For nearest-neighbor retrieval itself, yes on moderate corpora; total latency still includes embedding generation and rendering.

Should embedding run on each keystroke?

Use debounce/throttle and worker offloading; usually run semantic embedding after a short pause or minimum token count.

semantic autocomplete javascript

Embedding-Based Autocomplete in the Browser

Traditional autocomplete is prefix-centric: it finds terms that begin with what the user typed. That works for known commands or exact names, but it breaks when users type intent instead of literal tokens. A user entering reduce payload size may want suggestions like bundle splitting or tree shaking guide, none of which share prefixes. Semantic autocomplete solves this by ranking candidates in embedding space. In this tutorial, we build a browser implementation with altor-vec, discuss latency budgets, and provide a React component you can adapt directly.

Install altor-vec: npm install altor-vec

Prefix matching vs semantic matching

Prefix matching is deterministic and cheap: O(term lookups) against a trie or sorted list. Semantic matching is intent-aware but requires embedding generation plus vector retrieval. In practice, use both:

Lexical path for exact entities: route names, API symbols, IDs.
Semantic path for natural language intent and paraphrases.

Hybrid ranking gives you predictable behavior for exact tokens while greatly improving “I don’t know exact wording” scenarios.

Data preparation

Autocomplete candidates should be concise and user-facing. Build a candidate list from docs titles, command descriptions, and frequent support answers. Generate embeddings offline and store candidate metadata separately from vectors.

// candidate shape
{
  "id": 102,
  "label": "Optimize bundle size with code splitting",
  "slug": "/docs/performance/code-splitting",
  "kind": "guide"
}

For quality, include both title and short description when embedding each candidate. This adds context and reduces near-duplicate collisions.

Runtime flow and latency budget

User types in input.
Debounced pipeline generates query embedding.
altor-vec returns top-k nearest candidate IDs.
UI merges semantic and lexical scores, renders suggestions.

Retrieval itself can be sub-millisecond for many corpora. Embedding time is usually the dominant factor. Therefore, optimize embedding lifecycle first: lazy init model, run in worker, and skip semantic path for very short inputs.

React component example

import { useEffect, useMemo, useRef, useState } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';

export function SemanticAutocomplete({ embed }) {
  const [q, setQ] = useState('');
  const [items, setItems] = useState([]);
  const [engine, setEngine] = useState(null);
  const metaRef = useRef([]);
  const timerRef = useRef(null);

  useEffect(() => {
    (async () => {
      await init();
      const [iRes, mRes] = await Promise.all([
        fetch('/autocomplete-index.bin'),
        fetch('/autocomplete-meta.json')
      ]);
      const bytes = new Uint8Array(await iRes.arrayBuffer());
      const meta = await mRes.json();
      metaRef.current = meta;
      setEngine(new WasmSearchEngine(bytes));
    })();
  }, []);

  useEffect(() => {
    if (!engine) return;
    if (timerRef.current) clearTimeout(timerRef.current);
    if (!q.trim()) return setItems([]);

    timerRef.current = setTimeout(async () => {
      const vec = await embed(q); // run embed in worker
      const hits = JSON.parse(engine.search(new Float32Array(vec), 8));
      const semantic = hits.map(([id, d]) => ({ ...metaRef.current[id], score: 1 - d }));
      const lexical = metaRef.current
        .filter((x) => x.label.toLowerCase().includes(q.toLowerCase()))
        .slice(0, 5)
        .map((x) => ({ ...x, score: 0.2 }));

      const merged = [...semantic, ...lexical]
        .reduce((acc, cur) => acc.set(cur.id, cur.score > (acc.get(cur.id)?.score ?? -1) ? cur : acc.get(cur.id)), new Map())
        .values();

      setItems(Array.from(merged).sort((a,b)=>b.score-a.score).slice(0,8));
    }, 120);
  }, [q, engine, embed]);

  return (
    <div>
      <input value={q} onChange={(e)=>setQ(e.target.value)} placeholder="Search docs" />
      <ul>{items.map((i)=><li key={i.id}>{i.label} ({i.score.toFixed(3)})</li>)}</ul>
    </div>
  );
}

Why semantic autocomplete feels better

Users often begin with symptom-level language, not canonical labels. Semantic ranking bridges that mismatch. It also reduces “dead-end typing,” where the user sees no suggestions until they accidentally hit exact wording. By surfacing intent-aligned options early, semantic typeahead decreases time-to-first-click and improves confidence in search quality.

For multilingual or terminology-heavy products, embeddings can encode associations that prefix methods cannot represent. For example, a query with “auth token expired” can surface docs titled “401 refresh flow” even without token overlap.

UX improvements that matter

Minimum token threshold: run semantic path only after 2-3 tokens to avoid noisy vectors.
Score display: developers often like seeing confidence; expose score in debug mode.
Keyboard navigation: maintain arrow + enter behavior regardless of ranking backend.
Suggestion grouping: separate commands, docs, and FAQ items to reduce ambiguity.

Operational concerns

Keep index payloads small by curating candidate set and trimming metadata fields. If you need thousands of suggestions, partition by domain and load segment-specific indexes on demand. For model lifecycle, avoid eager loading at page start; initialize after first input focus. This preserves core page performance while keeping autocomplete responsive.

Failure modes and fallbacks

Embedding model unavailable? Fall back to lexical suggestions and show a subtle degraded-mode indicator. Index missing? Hide semantic score and use static shortcut lists. Avoid throwing hard errors inside input handlers; autocomplete should always remain interactive even if advanced ranking path fails.

Conclusion

Embedding-based autocomplete gives frontend teams a practical quality upgrade over pure prefix matching, especially for intent-driven queries. With altor-vec, nearest-neighbor retrieval runs directly in the browser and typically contributes minimal latency compared to embedding generation. A hybrid ranker with lexical boosts plus semantic relevance usually delivers the best user experience and keeps behavior predictable for exact terms.

CTA: npm install altor-vec · Star on GitHub