What is Build a Vector Search Engine in JavaScript (Browser AI)?

Build a Vector Search Engine in JavaScript (Browser AI) using altor-vec runs entirely in the browser via HNSW-powered vector search. No server, no API keys, no per-query cost.

Does this require a backend server?

No. altor-vec runs as 54KB WASM in the browser. You can ship vector search with zero backend infrastructure.

How do I install altor-vec?

Install via npm: npm install altor-vec. Full API reference and live demos at altorlab.dev.

How to Build a Search Engine in JavaScript (Without a Backend)

Most tutorials on building a search engine in JavaScript hand you Elasticsearch, a Node.js server, and a bill for $200/month in hosting. This one doesn't. You're going to build a semantic search engine that runs entirely in the browser using vector embeddings, stores its index in IndexedDB, and returns results in under 50ms on a 10,000-document corpus.

No API calls. No server. No infrastructure. Just a 43KB library and some surprisingly elegant math.

The trick is understanding that modern search isn't about substring matching anymore. It's about meaning. When someone types "async bugs in React," they want results about concurrency issues, race conditions, useEffect cleanup - not just documents containing those exact words. Vector embeddings let you capture that semantic similarity, and browser-based vector search libraries like altor-vec let you do it without shipping queries to a third party.

Why Client-Side Vector Search Works Now

Three things changed in the last 18 months that make this viable:

WebAssembly SIMD operations run fast enough to compute cosine similarity across thousands of vectors without blocking the main thread. Quantized embeddings (converting float32 to uint8) dropped memory requirements by 75%. And the Transformers.js project made it possible to run inference models like all-MiniLM-L6-v2 directly in the browser at ~200ms per query on midrange hardware.

The result: you can embed a 384-dimensional vector representation of a query, compare it against 10,000 pre-computed document vectors, and return ranked results faster than a round-trip to a CDN edge node.

Architecture: How This Actually Works

You need three pieces. An embedding model that converts text into vectors. A vector store that holds your document embeddings and metadata. A search function that embeds the query and finds the nearest neighbors.

Here's the simplest possible implementation:

import { VectorStore } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const store = new VectorStore({ dimensions: 384 });

// Index documents
const docs = [
  { id: 1, text: 'JavaScript async patterns with promises and async/await' },
  { id: 2, text: 'Debugging race conditions in React useEffect hooks' },
  { id: 3, text: 'Memory leaks from uncleaned event listeners' }
];

for (const doc of docs) {
  const embedding = await embedder(doc.text, { pooling: 'mean', normalize: true });
  store.add(doc.id, embedding.data, doc);
}

// Search
async function search(query) {
  const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true });
  return store.search(queryEmbedding.data, { limit: 5 });
}

const results = await search('async bugs in React');
console.log(results); // Returns doc #2 first, despite no exact word matches

This is 23 lines. It works. And it understands that "async bugs in React" is semantically closer to "race conditions in useEffect" than to "async/await patterns," even though the latter shares more keywords.

The Embedding Model Problem

The all-MiniLM-L6-v2 model is 23MB. Loading it on every page view is unacceptable. The solution is to pre-compute embeddings at build time for your document corpus, ship only the vectors and metadata, and lazy-load the model only when the user actually opens the search interface.

Build-time indexing script:

// build-index.js
import { pipeline } from '@xenova/transformers';
import fs from 'fs';

const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const docs = JSON.parse(fs.readFileSync('docs.json'));

const index = await Promise.all(docs.map(async (doc) => {
  const emb = await embedder(doc.text, { pooling: 'mean', normalize: true });
  return {
    id: doc.id,
    embedding: Array.from(emb.data),
    metadata: { title: doc.title, url: doc.url }
  };
}));

fs.writeFileSync('search-index.json', JSON.stringify(index));

Now your client-side code loads a 400KB JSON file of pre-computed vectors instead of a 23MB model. The model only loads when the user types their first query, and it stays cached for the session.

Making It Fast: Quantization and HNSW

Cosine similarity is a dot product followed by normalization. For 384-dimensional vectors, that's 384 multiplications and additions per document. At 10,000 documents, that's 3.8 million operations per query.

Quantization solves this. Convert your float32 embeddings to uint8 (0-255 range), and you can use SIMD instructions to process 16 dimensions at once. The precision loss is negligible for search ranking, and you get a 4x speed boost plus 75% memory savings.

altor-vec does this automatically:

const store = new VectorStore({ 
  dimensions: 384,
  quantize: true  // Converts to uint8 internally
});

For corpora above 50,000 documents, you want HNSW (Hierarchical Navigable Small World graphs), which reduces search from O(n) to O(log n) by building a multi-layer proximity graph. But for most documentation sites, blog archives, or product catalogs, linear scan with quantized vectors is fast enough.

Handling Multimodal Search

The same embedding model works for titles, descriptions, tags - anything textual. You can boost certain fields by concatenating them with weight multipliers:

const weightedText = `${doc.title} ${doc.title} ${doc.title} ${doc.content}`;
const embedding = await embedder(weightedText, { pooling: 'mean', normalize: true });

Repeating the title three times gives it ~3x influence in the embedding space. Crude but effective. For more control, compute separate embeddings and average them with explicit weights:

const titleEmb = await embedder(doc.title, { pooling: 'mean', normalize: true });
const contentEmb = await embedder(doc.content, { pooling: 'mean', normalize: true });

const weighted = titleEmb.data.map((v, i) => v * 0.7 + contentEmb.data[i] * 0.3);

Real-World Considerations

Stale indexes are your biggest risk. If you're indexing a CMS or changelog, you need a cache invalidation strategy. The cleanest approach is versioning your index file (search-index-v2.json) and busting it on deploy. Service workers can prefetch the new index in the background.

For very large corpora, consider splitting into shards by category or date range, and only loading relevant shards. A documentation site might have separate indexes for API reference vs. guides vs. changelog.

Privacy matters here. Because everything runs client-side, queries never leave the browser. No analytics on what users search for unless you explicitly add telemetry. This is a feature for GDPR-sensitive contexts.

FAQ

Can I use this for full-text search with filters?

Yes, but combine it with a traditional inverted index for exact-match filters (dates, categories, tags). Use vector search for the initial semantic retrieval, then filter results in JavaScript. Libraries like FlexSearch or Lunr.js pair well for hybrid search.

What about typo tolerance?

Embeddings handle typos naturally because they encode meaning, not spelling. "asyncronous" and "asynchronous" embed nearly identically. You don't need Levenshtein distance or fuzzy matching.

How do I handle updates to the index without re-embedding everything?

Incremental updates are tricky. The safest pattern is treating your index as immutable and regenerating it on content changes. For high-frequency updates, maintain a small "hot" index in memory and merge it with the main index at query time.

Can this scale to 100,000 documents?

With quantization and lazy loading, yes - but expect longer initial load times. At that scale, consider server-side rendering the index or using HNSW. The performance curve starts bending around 50k documents on average hardware.

What's the mobile browser performance like?

The embedding model runs slower on mobile (300-500ms vs. 200ms on desktop), but the vector search itself is fast because it's just arithmetic. Quantized indexes work especially well on memory-constrained devices. Test on a mid-range Android phone, not just your MacBook.

Start building: npm install altor-vec - Full examples and API docs at github.com/Altor-lab/altor-vec