What is Build a JavaScript Search Engine with AI Embeddings?

Build a JavaScript Search Engine with AI Embeddings using altor-vec runs entirely in the browser via HNSW-powered vector search. No server, no API keys, no per-query cost.

Does this require a backend server?

No. altor-vec runs as 54KB WASM in the browser. You can ship vector search with zero backend infrastructure.

How do I install altor-vec?

Install via npm: npm install altor-vec. Full API reference and live demos at altorlab.dev.

How to Build a Search Engine in JavaScript (Without a Backend)

Most tutorials on building a search engine in JavaScript start with Express, a database, and some variation of "now just plug in Elasticsearch." That's not a search engine. That's a thin client wrapping someone else's infrastructure.

Here's a different approach: build a legitimate semantic search engine that runs entirely in the browser using vector embeddings, cosine similarity, and client-side indexing. No server. No API calls after the initial page load. Just JavaScript, Web Workers, and about 300 lines of code.

This isn't a toy. GitHub uses client-side search for repository file navigation. Algolia's InstantSearch can run entirely browser-side for small datasets. The constraint isn't capability - it's that most developers never learned how vector search actually works under the hood.

Why Client-Side Vector Search Matters

Traditional keyword search matches exact strings. Vector search converts text into high-dimensional numerical representations (embeddings) and finds semantically similar content through distance calculations. When you search "fix broken authentication" and it surfaces a doc titled "Troubleshooting login errors," that's vector search working.

Running this in the browser means:

Zero latency after initial load. No network roundtrips.

Privacy by default. User queries never leave their machine.

No backend costs. Host static files on Cloudflare Pages and call it done.

The tradeoff is dataset size. You're limited to what fits in browser memory - call it 50MB of embeddings maximum, which covers roughly 10,000 documents at 384-dimensional vectors. For documentation sites, internal tools, or product catalogs, that's plenty.

The Architecture in Three Pieces

First, generate embeddings offline using a model like all-MiniLM-L6-v2. This produces 384-dimensional vectors that capture semantic meaning. You do this once during build time, not in the browser.

Second, ship those embeddings as a compact binary format or JSON to the client. Use a Web Worker to avoid blocking the main thread during indexing.

Third, when a user searches, convert their query to a vector using the same model (via ONNX Runtime or transformers.js), then calculate cosine similarity against your index. Return the top K results ranked by similarity score.

Let's build it.

Step 1: Generate Embeddings Offline

You need Node.js and a sentence transformer model. Install @xenova/transformers, which runs models via ONNX:

npm install @xenova/transformers

Create a build script that processes your documents:

import { pipeline } from '@xenova/transformers';

const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

const documents = [
  { id: 1, text: "JavaScript closures explained with examples" },
  { id: 2, text: "Understanding async/await in Node.js" },
  { id: 3, text: "How to debug memory leaks in Chrome DevTools" }
];

const embeddings = await Promise.all(
  documents.map(async (doc) => {
    const output = await extractor(doc.text, { pooling: 'mean', normalize: true });
    return {
      id: doc.id,
      vector: Array.from(output.data)
    };
  })
);

console.log(JSON.stringify(embeddings));

This generates normalized 384-dimensional vectors. The normalize: true flag is critical - it lets you use dot product instead of cosine similarity later, which is faster.

Save this output to embeddings.json and ship it with your frontend bundle.

Step 2: Build the Browser Search Index

In your main application, load the embeddings and set up a Web Worker to handle search without freezing the UI:

// search-worker.js
let index = [];

self.addEventListener('message', async (e) => {
  if (e.data.type === 'init') {
    index = e.data.embeddings;
    self.postMessage({ type: 'ready' });
  }
  
  if (e.data.type === 'search') {
    const results = search(e.data.queryVector, 10);
    self.postMessage({ type: 'results', results });
  }
});

function search(queryVector, topK) {
  const scores = index.map(item => ({
    id: item.id,
    score: dotProduct(queryVector, item.vector)
  }));
  
  return scores
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

function dotProduct(a, b) {
  return a.reduce((sum, val, i) => sum + val * b[i], 0);
}

Initialize the worker in your main thread:

const worker = new Worker('search-worker.js');

fetch('/embeddings.json')
  .then(r => r.json())
  .then(embeddings => {
    worker.postMessage({ type: 'init', embeddings });
  });

worker.addEventListener('message', (e) => {
  if (e.data.type === 'results') {
    displayResults(e.data.results);
  }
});

Step 3: Convert Queries to Vectors in Real-Time

When the user types a search query, you need to convert it to a vector using the same model. This is where transformers.js shines - it runs inference directly in the browser via WebAssembly.

import { pipeline } from '@xenova/transformers';

let queryExtractor;

async function initSearch() {
  queryExtractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
}

async function handleSearch(queryText) {
  const output = await queryExtractor(queryText, { pooling: 'mean', normalize: true });
  const queryVector = Array.from(output.data);
  
  worker.postMessage({ type: 'search', queryVector });
}

First load takes 2-3 seconds as the model downloads (about 23MB for all-MiniLM-L6-v2). Cache it in IndexedDB or rely on browser HTTP cache. Subsequent queries return in under 100ms.

Optimization: Quantization and Compression

Shipping 10,000 embeddings at 384 dimensions in JSON is roughly 15MB uncompressed. Two quick wins:

Use Float32Array instead of JSON arrays. Binary formats compress better and parse faster.

Quantize vectors to 8-bit integers. You lose about 2% accuracy but cut size by 75%. Libraries like faiss.js handle this, but you can roll your own with simple min-max scaling.

Here's a minimal quantization approach:

function quantize(vector) {
  const min = Math.min(...vector);
  const max = Math.max(...vector);
  return vector.map(v => 
    Math.round(((v - min) / (max - min)) * 255)
  );
}

Store min/max values alongside the quantized vector, then dequantize during similarity calculation.

When to Skip the DIY Route

If you need approximate nearest neighbor search for 100K+ documents, build-it-yourself stops making sense. At that scale, you want HNSW indexes, SIMD optimizations, and memory-mapped file I/O - basically, you want a library.

For client-side TypeScript projects under 10K documents, altor-vec handles the quantization, indexing, and worker management out of the box. It's 12KB gzipped and works with any sentence transformer model:

npm install altor-vec

It gives you the same architecture described here but with better compression and a cleaner API. Check the repo for examples: github.com/Altor-lab/altor-vec.

What You Actually Built

This isn't a search box with autocomplete. It's a semantic search engine that understands intent, runs privately in the user's browser, and returns results in under 100ms after initial load. You converted text to meaning, indexed meaning as vectors, and ranked results by semantic similarity - all without touching a server.

The same principles scale to image search (CLIP embeddings), code search (CodeBERT), or multilingual search (multilingual sentence transformers). The constraint is dataset size, not capability.

Most developers never build this because they assume search requires infrastructure. It doesn't. It requires math, a good model, and 300 lines of JavaScript.

Frequently Asked Questions

Can I use this for a production app with thousands of users?

Yes, if your dataset fits in browser memory (under 50MB of embeddings). GitHub does this for file search in repos. The bottleneck is initial model download time, which you mitigate with aggressive HTTP caching and service workers.

How do I handle typos or exact keyword matches?

Vector search handles typos naturally through semantic similarity. For exact matches, run a parallel BM25 index using something like fuzzysort or flexsearch, then blend results. Hybrid search (vector + keyword) consistently outperforms either alone.

What's the performance difference between dot product and cosine similarity?

If your vectors are normalized (which they should be), dot product and cosine similarity produce identical rankings. Dot product is faster because it skips the magnitude calculation. Always normalize during embedding generation, not at query time.

Do I need TypeScript for this?

No, but you'll want it. Vector operations involve a lot of array manipulation, and TypeScript catches dimension mismatches at compile time. The altor-vec library is fully typed if you go that route.

Start building: npm install altor-vec - Full examples and API docs at github.com/Altor-lab/altor-vec