How to Build a Search Engine in JavaScript
Most tutorials on building a search engine in JavaScript teach you to filter arrays with .includes() or wire up Elasticsearch. Neither approach actually shows you how search engines work. One is too simple to be useful beyond a hundred records. The other offloads the interesting problem to a black box running in Docker.
If you want to understand semantic search — the kind that returns "react hooks tutorial" when someone types "modern state management guide" — you need to build with vectors. This tutorial walks through building a client-side search engine using embeddings, cosine similarity, and zero external APIs. Everything runs in the browser. You'll generate vectors, index documents, and query semantically in under 200 lines of JavaScript.
Why Vectors Instead of Keywords
Traditional search matches strings. "JavaScript performance" finds documents containing those exact words. It misses "optimizing JS bundle size" even though that's clearly related. Vector search converts text into numerical representations (embeddings) that capture meaning. Documents about the same concept cluster together in high-dimensional space. Your search query becomes a vector, and you rank results by geometric proximity.
The trade-off: vector search requires more upfront computation and approximately 1KB of storage per document for a 384-dimension embedding. For client-side use cases under 10,000 documents, that's perfectly viable. A blog with 500 posts needs roughly 500KB of vector data — less than a single hero image.
Architecture Overview
We're building three pieces:
1. Embedding generator: Convert text to vectors using a small transformer model (all-MiniLM-L6-v2, 23MB) running via ONNX Runtime in the browser.
2. Vector index: Store document vectors with metadata in a flat array. For sub-10K document collections, brute-force cosine similarity is fast enough. No need for HNSW graphs yet.
3. Query engine: Embed the search query, compute similarity scores against all indexed vectors, return top-k results sorted by relevance.
Step 1: Generate Embeddings
First, install Transformers.js, which gives us access to pre-trained models that run entirely in the browser via WebAssembly:
npm install @xenova/transformers
Create an embedder module:
// embedder.js
import { pipeline } from '@xenova/transformers';
let embedder = null;
export async function initEmbedder() {
if (!embedder) {
embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
}
return embedder;
}
export async function embed(text) {
const model = await initEmbedder();
const output = await model(text, { pooling: 'mean', normalize: true });
return Array.from(output.data);
}
The all-MiniLM-L6-v2 model produces 384-dimensional embeddings. It's small enough to load in a few seconds on decent connections. The normalize: true flag ensures vectors have unit length, which lets us use dot product instead of cosine similarity (they're equivalent for normalized vectors, but dot product is faster).
Step 2: Index Your Documents
Assume you have an array of blog posts or documentation pages. Each needs an ID, content, and any metadata you want to filter on:
const documents = [
{ id: 1, title: 'React Hooks Guide', content: 'Learn about useState, useEffect...' },
{ id: 2, title: 'Vue Composition API', content: 'Modern reactive patterns in Vue 3...' },
// ...
];
async function buildIndex(docs) {
const index = [];
for (const doc of docs) {
const text = `${doc.title} ${doc.content}`.slice(0, 500); // Use first 500 chars
const vector = await embed(text);
index.push({
id: doc.id,
vector,
title: doc.title,
content: doc.content.slice(0, 200) // Store snippet for display
});
}
return index;
}
This takes a few seconds for 100 documents, maybe 30 seconds for 1,000. You'd typically run this at build time and serialize the index to JSON. Load it at runtime instead of recomputing embeddings on every page load:
// At build time
const index = await buildIndex(documents);
fs.writeFileSync('search-index.json', JSON.stringify(index));
// At runtime
const index = await fetch('/search-index.json').then(r => r.json());
Step 3: Implement Similarity Search
Dot product between normalized vectors gives us a similarity score from -1 to 1. Higher is more similar:
function dotProduct(a, b) {
return a.reduce((sum, val, i) => sum + val * b[i], 0);
}
async function search(query, index, topK = 10) {
const queryVector = await embed(query);
const results = index.map(doc => ({
...doc,
score: dotProduct(queryVector, doc.vector)
}));
results.sort((a, b) => b.score - a.score);
return results.slice(0, topK);
}
That's it. This brute-force approach computes 384 multiplications and additions per document. On a modern CPU, that's around 50,000 comparisons per millisecond. Even 5,000 documents search in under 100ms.
Making It Production-Ready
The code above works but needs a few refinements:
Chunking large documents: If your docs exceed 512 tokens (roughly 400 words), split them into overlapping chunks and index each separately. When a chunk matches, return its parent document.
Hybrid search: Combine vector similarity with keyword filters. If someone searches "react hooks 2024," filter by date first, then rank by semantic similarity within that subset.
Web Workers: Move embedding and search to a worker thread so they don't block the UI. Transformers.js already supports this via env.allowLocalModels.
Caching embeddings: Store the query vector for common searches in IndexedDB. "getting started" probably gets searched a lot — no need to re-embed it every time.
When This Approach Breaks Down
Client-side vector search stops making sense around 50,000 documents. At that scale, you're pushing 50MB of vector data to the client and search latency creeps above 500ms. You'd want to move to a server-side solution with approximate nearest neighbor indices (HNSW, IVF) like Qdrant or Weaviate.
But for documentation sites, personal blogs, product catalogs under 10K items, or any use case where you want instant search without server costs, this architecture works beautifully. GitHub Pages, Netlify, Vercel — all just serving static JSON.
Working Example
Here's a minimal HTML page that ties everything together:
<input type="text" id="query" placeholder="Search..." />
<div id="results"></div>
<script type="module">
import { embed } from './embedder.js';
const index = await fetch('/search-index.json').then(r => r.json());
document.getElementById('query').addEventListener('input', async (e) => {
const query = e.target.value;
if (query.length < 3) return;
const results = await search(query, index, 5);
document.getElementById('results').innerHTML = results
.map(r => `<div><strong>${r.title}</strong> (${r.score.toFixed(2)})</div>`)
.join('');
});
</script>
Type "state management," and you'll see results ranked by semantic similarity, not keyword overlap. It feels like magic the first time you search for "beginner JS tutorial" and get back "Introduction to JavaScript" even though those phrases share only one word.
FAQ
Can I use OpenAI embeddings instead of running a local model?
Yes, but you'd need to call their API server-side at build time to generate embeddings. The advantage of Transformers.js is everything runs in the browser with no API keys or ongoing costs. For private data or air-gapped environments, local models are the only option.
How accurate is semantic search compared to Elasticsearch?
For pure semantic matching, vector search often outperforms keyword-based BM25 scoring. Elasticsearch 8+ supports vector search via kNN plugins, combining both approaches. A small local model like MiniLM won't match GPT-4 embeddings in nuance, but it's surprisingly good for most domains.
What if I need to update the index frequently?
Regenerate the index and push the new JSON file. With a build step, this happens automatically on content changes. For truly dynamic data (user-generated content changing every second), you'd need a different architecture — probably server-side with incremental index updates.
Does this work on mobile browsers?
Yes. ONNX Runtime targets WebAssembly, which runs on iOS Safari and Android Chrome. The initial model download (23MB) is the main concern on slower connections. Consider lazy-loading the search feature or showing a loading state.
If you'd rather skip the setup, altor-vec packages this entire pattern into a single npm install. It handles model loading, indexing, persistence, and exposes a clean search API. Check it out: npm install altor-vec — https://github.com/Altor-lab/altor-vec