React guide
Document Search in React with altor-vec
Use altor-vec to add document search to your React app — entirely in the browser, with no server, no API keys, and zero per-query cost. Search a collection of documents by semantic meaning — find articles, docs, or notes that are conceptually related to the user's query, not just keyword matches.
npm install altor-vec @xenova/transformersImplementation
Works with Vite, CRA, or any React 18+ setup. Uses useState + useRef for the engine.
// src/hooks/useDocSearch.ts
import { useState, useEffect, useRef, useCallback } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';
export function useDocSearch(docs: {title:string; content:string}[]) {
const engine = useRef(null);
const embedder = useRef(null);
const [ready, setReady] = useState(false);
const [results, setResults] = useState([]);
useEffect(() => {
async function init_() {
await init();
embedder.current = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;
const vectors = new Float32Array(docs.length * DIM);
for (const [i, doc] of docs.entries()) {
const out = await embedder.current(`${doc.title}. ${doc.content}`,
{ pooling: 'mean', normalize: true });
vectors.set(out.data, i * DIM);
}
engine.current = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
setReady(true);
}
init_();
}, []);
const search = useCallback(async (query: string) => {
if (!engine.current || !embedder.current) return;
const out = await embedder.current(query, { pooling: 'mean', normalize: true });
const hits = JSON.parse(engine.current.search(new Float32Array(out.data), 5));
setResults(hits.map((h: any) => docs[h.id]));
}, [docs]);
return { search, results, ready };
}
Performance
10,000 documents at 384 dimensions: ~17MB memory, <1ms per query. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.
| Index size | Dimensions | Query p50 | Memory |
|---|---|---|---|
| 1,000 vectors | 384 | ~0.1ms | ~2MB |
| 10,000 vectors | 384 | ~0.4ms | ~17MB |
| 50,000 vectors | 384 | ~0.9ms | ~85MB |
When this approach works best
- Documentation sites and knowledge bases with 500–50K pages
- Blog or article archives where keyword search misses conceptual queries
- Offline-first apps that need search to work without a network connection
Limitations
- Index must be rebuilt on every content update (no real-time sync)
- Requires pre-computed embeddings — you need an embedding step at build time
Frequently asked questions
How do I update the document index when content changes?
Rebuild the index at deploy time using a Node.js build script. Call WasmSearchEngine.from_vectors() with the updated embeddings and write the result to public/search-index.json. The browser loads the new index on the next page load.
Can I search PDF or Word documents with altor-vec?
Yes, but you need to extract the text first. Use pdf-parse or mammoth.js to extract plain text, then embed the text chunks with your embedding model, and index the embeddings with altor-vec.
How many documents can I search before performance degrades?
altor-vec handles up to ~100K documents comfortably in modern browsers. A 10K-document index at 384 dimensions uses ~17MB RAM and searches in under 1ms. For 100K documents, expect ~170MB and ~1.2ms — test on mobile before deploying.
Related resources
framework
reference