Do I need a backend to run semantic search?

No. You can host a prebuilt index and run retrieval directly in the browser using altor-vec.

Where are embeddings generated?

Embeddings can be generated offline at build time, or in-browser using a local model depending on UX constraints.

How many vectors can I search in browser?

For many use cases, tens of thousands of vectors are practical with sub-millisecond query latency on modern hardware.

How do I evaluate result quality?

Create a small benchmark set of queries and expected results, then compare semantic ranking versus keyword baseline.

semantic search javascript

How to Add Semantic Search to Your Website in 5 Minutes

If your site search depends on exact keyword matches, users often fail to find content they would consider relevant. A query like cheap vector db might not match an article titled cost-efficient approximate nearest neighbor retrieval, even though they describe the same intent. Semantic search JavaScript closes that gap by ranking results according to vector similarity rather than raw token overlap. This tutorial shows the shortest production-capable path using altor-vec, a browser-native HNSW engine (54KB, sub-millisecond queries, no backend requirement for retrieval).

Install altor-vec: npm install altor-vec

Architecture in one minute

A practical semantic retrieval stack has four parts: (1) content chunking, (2) embedding generation, (3) vector index construction, and (4) client-side querying. The first three can run in CI or local tooling, then you deploy two artifacts: index.bin and metadata.json. At runtime the browser loads both, computes a query embedding, and asks the WASM engine for top-k nearest vectors. Because search is local, latency is stable and independent of network roundtrips.

Why this pattern is fast: network latency is usually 30-200ms end-to-end. Local ANN lookup is often below 1ms. If your embedding model is local too, you can keep total interactive latency surprisingly low while preserving user privacy.

Step 1 - Install dependencies

npm install altor-vec @huggingface/transformers

You only need altor-vec for retrieval. We include @huggingface/transformers here for a local embedding path. In many teams, embeddings are generated offline in a Node pipeline so browser runtime stays lightweight.

Step 2 - Build embeddings and index offline

Below is a Node script that reads your content, generates normalized vectors, and creates an HNSW index with conservative defaults. The script writes files that your frontend can serve from /public.

import fs from 'node:fs/promises';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@huggingface/transformers';

const docs = JSON.parse(await fs.readFile('./content.json', 'utf8'));
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

const vectors = [];
const metadata = [];

for (let i = 0; i < docs.length; i++) {
  const text = docs[i].title + '\n' + docs[i].body;
  const out = await embed(text, { pooling: 'mean', normalize: true });
  vectors.push(...Array.from(out.data));
  metadata.push({ id: i, slug: docs[i].slug, title: docs[i].title, excerpt: docs[i].excerpt });
}

await init();
const dim = 384;
const flat = new Float32Array(vectors);
const engine = WasmSearchEngine.from_vectors(flat, dim, 16, 200, 50);

await fs.writeFile('./public/index.bin', Buffer.from(engine.to_bytes()));
await fs.writeFile('./public/metadata.json', JSON.stringify(metadata));

The two hyperparameters that matter most initially are M and ef_search. Higher values improve recall but increase memory and query cost. Start with defaults above, then tune using measured recall@k against your own query set.

Step 3 - Query from the browser

At runtime, load the index once and keep the engine in memory. Query vectors can come from a local browser model, from a server embedding API, or from precomputed vectors (for curated suggestions).

import init, { WasmSearchEngine } from 'altor-vec';

await init();
const indexResp = await fetch('/index.bin');
const metaResp = await fetch('/metadata.json');

const engine = new WasmSearchEngine(new Uint8Array(await indexResp.arrayBuffer()));
const metadata = await metaResp.json();

export function searchByVector(queryVector, topK = 5) {
  const pairs = JSON.parse(engine.search(new Float32Array(queryVector), topK));
  return pairs.map(([id, distance]) => ({ ...metadata[id], score: 1 - distance }));
}

Step 4 - Compare before/after quality

To verify impact, run a small benchmark with 30-50 representative user intents. Compare lexical ranking (BM25 or prefix) against vector retrieval. A common result is that semantic retrieval significantly improves queries with synonyms, abbreviations, and “problem description” phrasing. Example from docs search:

Query: "index vectors in browser"

Keyword baseline top result:
  "Using Web Workers for CPU-heavy tasks"

Semantic top result:
  "Build HNSW index with WasmSearchEngine.from_vectors"

Query: "privacy search no server"

Keyword baseline top result:
  "Pricing FAQ"

Semantic top result:
  "Client-side retrieval architecture and data egress guarantees"

For developers, this is the real value: users type intent, and results still align even when exact wording differs from your docs and UI copy.

Performance notes for production

1) Keep metadata small

Only include fields needed for result rendering. Large metadata payloads dominate download time and can erase perceived speed gains from fast ANN lookup.

2) Warm up early

Load index.bin after initial page interaction or when the search box gets focus. This makes first query feel instant.

3) Worker isolation

Move embedding generation to a web worker so UI stays responsive while users type. Retrieval itself is already very fast, but embedding can be the larger budget item depending on model.

4) Add lexical fallback

For exact identifiers (error codes, class names, route names), combine semantic ranking with a lexical boost. Hybrid ranking avoids “too semantic” misses for strict token lookups.

Common mistakes

Building embeddings from full documents without chunking. Use paragraph-sized chunks with source IDs for better match granularity.
Mixing embedding models between index build and query time. Dimensions and vector space must match exactly.
Ignoring quality metrics. Always maintain a fixed evaluation set and track precision@k changes as you tune.
Re-downloading index assets on every page transition. Cache aggressively with immutable file names.

Minimal React integration example

import { useEffect, useMemo, useState } from 'react';
import { searchByVector } from './search';

export function SearchBox({ embed }) {
  const [q, setQ] = useState('');
  const [results, setResults] = useState([]);

  useEffect(() => {
    let cancelled = false;
    const run = async () => {
      if (!q.trim()) return setResults([]);
      const vec = await embed(q);
      const next = searchByVector(vec, 6);
      if (!cancelled) setResults(next);
    };
    run();
    return () => { cancelled = true; };
  }, [q, embed]);

  return (
    <div>
      <input value={q} onChange={(e) => setQ(e.target.value)} placeholder="Ask naturally" />
      <ul>{results.map((r) => <li key={r.id}>{r.title} ({r.score.toFixed(3)})</li>) }</ul>
    </div>
  );
}

Wrap-up

You can ship semantic search JavaScript quickly if you separate offline indexing from online querying. The offline side handles content processing and vector generation; the online side only loads static artifacts and performs ANN lookup locally. That keeps operating cost near zero for retrieval while improving relevance over exact-match keyword search.

As your dataset grows, keep profiling memory, download size, and recall metrics, but the architecture remains the same: static index + browser retrieval + optional local embedding model. For many product docs, knowledge bases, and internal tools, this is the simplest path to “search that understands meaning” without introducing a dedicated vector backend.

CTA: npm install altor-vec · Star on GitHub