Next.js guide

Document Search in Next.js with altor-vec

Q: How do I update the document index when content changes?

Rebuild the index at deploy time using a Node.js build script. Call WasmSearchEngine.from_vectors() with the updated embeddings and write the result to public/search-index.json. The browser loads the new index on the next page load.

Q: Can I search PDF or Word documents with altor-vec?

Yes, but you need to extract the text first. Use pdf-parse or mammoth.js to extract plain text, then embed the text chunks with your embedding model, and index the embeddings with altor-vec.

Q: How many documents can I search before performance degrades?

altor-vec handles up to ~100K documents comfortably in modern browsers. A 10K-document index at 384 dimensions uses ~17MB RAM and searches in under 1ms. For 100K documents, expect ~170MB and ~1.2ms — test on mobile before deploying.

Use altor-vec to add document search to your Next.js app — entirely in the browser, with no server, no API keys, and zero per-query cost. Search a collection of documents by semantic meaning — find articles, docs, or notes that are conceptually related to the user's query, not just keyword matches.

Install: npm install altor-vec @xenova/transformers

Implementation

Uses App Router with 'use client' directive. Uses useRef for the engine, useState for results.

// app/search/page.tsx — Next.js App Router document search
'use client';
import { useState, useEffect, useRef } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

type Doc = { id: number; title: string; excerpt: string };

export default function SearchPage({ docs }: { docs: Doc[] }) {
  const engine = useRef(null);
  const embedder = useRef(null);
  const [results, setResults] = useState([]);
  const [query, setQuery] = useState('');
  const [ready, setReady] = useState(false);

  useEffect(() => {
    (async () => {
      await init();
      embedder.current = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
      // Load pre-built index from public/ (generated at build time)
      const resp = await fetch('/search-index.json');
      engine.current = WasmSearchEngine.from_json(await resp.text());
      setReady(true);
    })();
  }, []);

  async function handleSearch(q: string) {
    setQuery(q);
    if (!engine.current || q.length < 2) { setResults([]); return; }
    const out = await embedder.current(q, { pooling: 'mean', normalize: true });
    const hits = JSON.parse(engine.current.search(new Float32Array(out.data), 5));
    setResults(hits.map((h: any) => docs[h.id]));
  }

  return (
    
       handleSearch(e.target.value)}
        placeholder={ready ? 'Search docs...' : 'Loading search...'} />
      
        {results.map(doc => (
          
            {doc.title}
            {doc.excerpt}
          
        ))}
      
    
  );
}

Performance

10,000 documents at 384 dimensions: ~17MB memory, <1ms per query. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.

Index size	Dimensions	Query p50	Memory
1,000 vectors	384	~0.1ms	~2MB
10,000 vectors	384	~0.4ms	~17MB
50,000 vectors	384	~0.9ms	~85MB

When this approach works best

Documentation sites and knowledge bases with 500–50K pages
Blog or article archives where keyword search misses conceptual queries
Offline-first apps that need search to work without a network connection

Limitations

Index must be rebuilt on every content update (no real-time sync)
Requires pre-computed embeddings — you need an embedding step at build time

Frequently asked questions

How do I update the document index when content changes?

Rebuild the index at deploy time using a Node.js build script. Call WasmSearchEngine.from_vectors() with the updated embeddings and write the result to public/search-index.json. The browser loads the new index on the next page load.

Can I search PDF or Word documents with altor-vec?

Yes, but you need to extract the text first. Use pdf-parse or mammoth.js to extract plain text, then embed the text chunks with your embedding model, and index the embeddings with altor-vec.

How many documents can I search before performance degrades?

altor-vec handles up to ~100K documents comfortably in modern browsers. A 10K-document index at 384 dimensions uses ~17MB RAM and searches in under 1ms. For 100K documents, expect ~170MB and ~1.2ms — test on mobile before deploying.

Related resources

framework

use case

reference