How to Build a Search Engine in JavaScript (Without a Backend)
In 2023, Algolia processed 1.7 trillion search queries. Most of those required a server, an index rebuild pipeline, and a credit card. But if your dataset fits in a few megabytes and your users already have a browser, you can build a surprisingly capable search engine that runs entirely client-side—no API keys, no rate limits, no infrastructure.
This isn't about substring matching or regex hacks. We're talking about semantic vector search that understands intent, runs in the browser, and ships as a single npm package.
Why Client-Side Search Isn't Stupid Anymore
Five years ago, this approach would've been laughable. Browsers were slow. JavaScript had no SIMD. Embedding models were gigabytes large and required GPUs.
Today, WebAssembly gives you near-native speed. ONNX Runtime Web lets you run quantized transformer models in under 5MB. IndexedDB can store millions of vectors without choking. A typical product catalog, documentation site, or local-first app can embed its entire search index in the initial page load.
The result: zero-latency search that works offline, costs nothing to scale, and keeps user data local.
Architecture: Three Pieces
A client-side vector search engine has three parts:
1. An embedding model that converts text into fixed-length vectors. We'll use a quantized version of all-MiniLM-L6-v2, which produces 384-dimensional embeddings and runs at ~40ms per query in-browser.
2. A vector index that stores pre-computed embeddings and finds nearest neighbors fast. HNSW (Hierarchical Navigable Small World) is the standard. It trades a small amount of recall for 10-100x faster queries than brute-force cosine similarity.
3. A query pipeline that embeds the search term, retrieves top-k results, and optionally reranks them with metadata filters or recency boosts.
Step 1: Embed Your Content
Start by generating embeddings for everything you want to search. This happens once, at build time.
import { pipeline } from '@xenova/transformers';
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const docs = [
{ id: 1, text: "React hooks let you use state without classes" },
{ id: 2, text: "Vue 3 composition API provides reactive primitives" },
{ id: 3, text: "Svelte compiles components to vanilla JavaScript" }
];
const embeddings = await Promise.all(
docs.map(async doc => ({
id: doc.id,
vector: await embedder(doc.text, { pooling: 'mean', normalize: true })
}))
);
The output is an array of 384-dimensional float arrays. Serialize it to JSON or MessagePack and ship it with your app bundle. For 10,000 documents, expect ~15MB uncompressed, ~4MB gzipped.
Step 2: Build the Index
You could loop through all embeddings and calculate cosine similarity. That's O(n), which is fine for 1,000 items but unusable at 100,000.
HNSW builds a multi-layer graph where each node connects to its nearest neighbors. Queries traverse the graph from coarse to fine, visiting only a fraction of the total vectors.
import { HNSWIndex } from 'altor-vec';
const index = new HNSWIndex({ dimensions: 384, maxElements: 10000 });
embeddings.forEach(({ id, vector }) => {
index.addPoint(vector, id);
});
const serialized = index.serialize();
localStorage.setItem('search-index', serialized);
At query time, deserialize the index and search:
const index = HNSWIndex.deserialize(localStorage.getItem('search-index'));
const queryVector = await embedder("state management in frontend frameworks");
const results = index.searchKnn(queryVector, 5);
// results = [{ id: 2, distance: 0.12 }, { id: 1, distance: 0.19 }, ...]
Typical query time: 5-15ms for a 10k vector index on a mid-range laptop.
Step 3: Add Metadata Filtering
Pure vector search ignores structure. If you need to filter by category, date, or custom fields, maintain a parallel metadata store and post-filter:
const metadata = new Map([
[1, { category: 'react', date: '2023-01-15' }],
[2, { category: 'vue', date: '2023-03-20' }],
[3, { category: 'svelte', date: '2023-02-10' }]
]);
const rawResults = index.searchKnn(queryVector, 20);
const filtered = rawResults
.filter(r => metadata.get(r.id).category === 'react')
.slice(0, 5);
Retrieve more candidates (k=20) than you need (top 5) to account for filter attrition.
Optimization: Quantize the Vectors
384 floats × 4 bytes = 1,536 bytes per embedding. For 10,000 docs, that's 15MB. You can cut this by 75% with uint8 quantization:
function quantize(vector) {
const min = Math.min(...vector);
const max = Math.max(...vector);
return vector.map(v => Math.round(((v - min) / (max - min)) * 255));
}
You'll lose ~2-3% recall, but query speed improves and bundle size shrinks to ~4MB uncompressed. For most use cases, the tradeoff is worth it.
Real-World Example: Documentation Search
Let's say you're indexing 5,000 MDX files from a Next.js docs site. At build time:
// scripts/build-search-index.js
import fs from 'fs';
import { pipeline } from '@xenova/transformers';
import { HNSWIndex } from 'altor-vec';
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const docs = JSON.parse(fs.readFileSync('docs-manifest.json'));
const index = new HNSWIndex({ dimensions: 384, maxElements: 5000 });
for (const doc of docs) {
const vector = await embedder(doc.content, { pooling: 'mean', normalize: true });
index.addPoint(vector, doc.id);
}
fs.writeFileSync('public/search-index.bin', index.serialize());
In your app:
const [index, setIndex] = useState(null);
useEffect(() => {
fetch('/search-index.bin')
.then(r => r.arrayBuffer())
.then(buf => setIndex(HNSWIndex.deserialize(buf)));
}, []);
function search(query) {
if (!index) return [];
const vector = await embedder(query);
return index.searchKnn(vector, 10).map(r => docsById.get(r.id));
}
Total overhead: ~6MB (3MB model + 3MB index). First query after page load: ~200ms. Subsequent queries: <20ms.
When Not to Do This
Client-side vector search breaks down when:
- Your dataset exceeds 50MB compressed. Users won't wait.
- You need real-time updates from multiple users. There's no sync mechanism.
- You're indexing private data per-user. Embedding 100k unique documents per session isn't feasible.
For those cases, use Typesense, Meilisearch, or build a proper backend with Pinecone or Qdrant.
Why This Matters
Every search query you handle client-side is one you don't pay Algolia $1.50/1000 for. Every embedding you compute at build time is one you don't send to OpenAI's API at runtime. Every index you ship in the bundle is one less round-trip to a vector database.
For documentation, product catalogs, internal tools, and local-first apps, client-side search isn't a compromise. It's faster, cheaper, and more private than the server-side default.
Frequently Asked Questions
Can I use this for autocomplete?
Yes, but you'll want to optimize for sub-10ms queries. Use a smaller embedding model (e.g., a 128-dim distilled version), keep your index under 1,000 items, or fall back to prefix matching for the first few characters and vector search only after 3+ characters.
How do I handle large documents?
Split them into chunks (e.g., 200-word paragraphs), embed each chunk separately, and store chunk IDs with parent document references. At query time, retrieve top chunks and deduplicate by parent document.
Does this work with languages other than English?
Yes. Multilingual models like paraphrase-multilingual-MiniLM-L12-v2 support 50+ languages. Expect slightly larger model sizes (~8MB) and slower inference (~60ms per query).
What about typos and exact match?
Vector search is typo-tolerant by design—"javsacript" and "javascript" have similar embeddings. For exact match (e.g., error codes, product SKUs), combine vector results with a keyword index using something like FlexSearch.
Can I update the index without rebuilding everything?
HNSW supports incremental adds, but you'll need to re-serialize and re-download the index. For frequently changing data, consider a hybrid approach: static index for 95% of content, live API for the recent 5%.
Ready to ship search without a backend? Install altor-vec and start embedding: npm install altor-vec — github.com/Altor-lab/altor-vec