React guide

Document Search in React with altor-vec

Q: How do I update the document index when content changes?

Rebuild the index at deploy time using a Node.js build script. Call WasmSearchEngine.from_vectors() with the updated embeddings and write the result to public/search-index.json. The browser loads the new index on the next page load.

Q: Can I search PDF or Word documents with altor-vec?

Yes, but you need to extract the text first. Use pdf-parse or mammoth.js to extract plain text, then embed the text chunks with your embedding model, and index the embeddings with altor-vec.

Q: How many documents can I search before performance degrades?

altor-vec handles up to ~100K documents comfortably in modern browsers. A 10K-document index at 384 dimensions uses ~17MB RAM and searches in under 1ms. For 100K documents, expect ~170MB and ~1.2ms — test on mobile before deploying.

Use altor-vec to add document search to your React app — entirely in the browser, with no server, no API keys, and zero per-query cost. Search a collection of documents by semantic meaning — find articles, docs, or notes that are conceptually related to the user's query, not just keyword matches.

Install: npm install altor-vec @xenova/transformers

Implementation

Works with Vite, CRA, or any React 18+ setup. Uses useState + useRef for the engine.

// src/hooks/useDocSearch.ts
import { useState, useEffect, useRef, useCallback } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

export function useDocSearch(docs: {title:string; content:string}[]) {
  const engine = useRef(null);
  const embedder = useRef(null);
  const [ready, setReady] = useState(false);
  const [results, setResults] = useState([]);

  useEffect(() => {
    async function init_() {
      await init();
      embedder.current = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
      const DIM = 384;
      const vectors = new Float32Array(docs.length * DIM);
      for (const [i, doc] of docs.entries()) {
        const out = await embedder.current(`${doc.title}. ${doc.content}`,
          { pooling: 'mean', normalize: true });
        vectors.set(out.data, i * DIM);
      }
      engine.current = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
      setReady(true);
    }
    init_();
  }, []);

  const search = useCallback(async (query: string) => {
    if (!engine.current || !embedder.current) return;
    const out = await embedder.current(query, { pooling: 'mean', normalize: true });
    const hits = JSON.parse(engine.current.search(new Float32Array(out.data), 5));
    setResults(hits.map((h: any) => docs[h.id]));
  }, [docs]);

  return { search, results, ready };
}

Performance

10,000 documents at 384 dimensions: ~17MB memory, <1ms per query. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.

Index size	Dimensions	Query p50	Memory
1,000 vectors	384	~0.1ms	~2MB
10,000 vectors	384	~0.4ms	~17MB
50,000 vectors	384	~0.9ms	~85MB

When this approach works best

Documentation sites and knowledge bases with 500–50K pages
Blog or article archives where keyword search misses conceptual queries
Offline-first apps that need search to work without a network connection

Limitations

Index must be rebuilt on every content update (no real-time sync)
Requires pre-computed embeddings — you need an embedding step at build time

Frequently asked questions

How do I update the document index when content changes?

Rebuild the index at deploy time using a Node.js build script. Call WasmSearchEngine.from_vectors() with the updated embeddings and write the result to public/search-index.json. The browser loads the new index on the next page load.

Can I search PDF or Word documents with altor-vec?

Yes, but you need to extract the text first. Use pdf-parse or mammoth.js to extract plain text, then embed the text chunks with your embedding model, and index the embeddings with altor-vec.

How many documents can I search before performance degrades?

altor-vec handles up to ~100K documents comfortably in modern browsers. A 10K-document index at 384 dimensions uses ~17MB RAM and searches in under 1ms. For 100K documents, expect ~170MB and ~1.2ms — test on mobile before deploying.

Related resources

framework

use case

reference