docusaurus search plugin

Add Semantic Search to Docusaurus — Replace Algolia DocSearch

Q: Does Algolia DocSearch work with private documentation?

Algolia DocSearch is free only for open-source or publicly accessible documentation. Private docs, intranet docs, or paid-product documentation require a paid Algolia plan starting at around $50/month. altor-vec works on any docs site regardless of whether it's publicly accessible.

Q: What Docusaurus version does this support?

The swizzle approach shown here works with Docusaurus v3 and v2. The SearchBar component location and swizzle command are the same across versions. The index build script reads from the /build output directory which hasn't changed significantly between versions.

Q: Can I keep the default search as a fallback?

Yes. You can render both the default SearchBar and the custom semantic search in separate UI elements. However, having two search interfaces is confusing for users. Most teams either replace search entirely or add the semantic search as an additional feature with a different trigger key.

Docusaurus ships with a local search option (keyword-only, via Lunr) and supports Algolia DocSearch (free only for publicly accessible open-source projects). If your docs are private, internal, or for a paid product, you're paying Algolia or settling for keyword search. This guide shows how to replace both with client-side semantic search using altor-vec — no account required, no per-query cost, and intent-aware results.

Install: npm install altor-vec @huggingface/transformers tsx

Why replace Algolia DocSearch

Algolia DocSearch is genuinely convenient. You add three lines to docusaurus.config.ts, Algolia crawls your site, and search works. The catch: it's free only for open-source projects with publicly accessible documentation. If your docs are behind a login, on an internal network, or for a commercial product, you pay Algolia's standard rates — starting around $50/month at modest traffic.

Beyond pricing, DocSearch is keyword matching with Algolia's ranking heuristics. A user searching "how to increase throughput" won't find your "performance tuning" page if those words don't overlap. Semantic search closes that gap.

How the implementation works

Docusaurus uses a concept called "swizzling" to override theme components. You run docusaurus swizzle to copy a component from the default theme into your src/theme/ directory, then modify it. This is the correct way to customize Docusaurus without forking the theme.

The search implementation has two parts:

Build script — reads your compiled /build output, extracts content from HTML, generates embeddings, writes a binary index to /static
SearchBar override — replaces Docusaurus's SearchBar component with a React component that loads the altor-vec index and queries it in the browser

Step 1: Swizzle the SearchBar component

npx docusaurus swizzle @docusaurus/theme-classic SearchBar --eject --typescript

This creates src/theme/SearchBar/index.tsx. You'll replace its contents with the semantic search implementation below.

Why --eject and not --wrap: Ejecting gives you the full component to replace. Wrapping keeps the original component and adds around it. For replacing search entirely, eject is cleaner. If you want to keep the original search as a fallback, use --wrap instead.

Step 2: Write the index build script

Create scripts/build-search-index.mjs. Run this after docusaurus build — it reads the compiled HTML from /build.

// scripts/build-search-index.mjs
import fs from 'node:fs/promises';
import { glob } from 'glob';
import { JSDOM } from 'jsdom';
import { pipeline } from '@huggingface/transformers';
import init, { WasmSearchEngine } from 'altor-vec';

await init();
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

const htmlFiles = await glob('build/**/*.html');
const vectors = [];
const metadata = [];

for (let i = 0; i < htmlFiles.length; i++) {
  const file = htmlFiles[i];
  const html = await fs.readFile(file, 'utf8');
  const dom = new JSDOM(html);
  const doc = dom.window.document;

  const title = doc.querySelector('article h1')?.textContent?.trim()
    ?? doc.querySelector('h1')?.textContent?.trim()
    ?? 'Untitled';

  // Docusaurus puts main content in .theme-doc-markdown or article
  const main = doc.querySelector('.theme-doc-markdown')
    ?? doc.querySelector('article')
    ?? doc.querySelector('main');

  if (!main) continue;
  main.querySelectorAll('nav, .pagination-nav, .theme-doc-toc-desktop, script, style').forEach(el => el.remove());
  const text = main.textContent?.replace(/\s+/g, ' ').trim() ?? '';
  if (!text || text.length < 30) continue;

  const out = await embed(`${title}\n${text.slice(0, 900)}`, { pooling: 'mean', normalize: true });
  vectors.push(...Array.from(out.data));

  const url = '/' + file.replace('build/', '').replace('index.html', '').replace('.html', '');
  metadata.push({ id: metadata.length, title, excerpt: text.slice(0, 200), url });

  if (i % 10 === 0) process.stdout.write(`\r${i + 1}/${htmlFiles.length}`);
}

const engine = WasmSearchEngine.from_vectors(new Float32Array(vectors), 384, 16, 200, 50);
await fs.writeFile('./static/search-index.bin', Buffer.from(engine.to_bytes()));
await fs.writeFile('./static/search-metadata.json', JSON.stringify(metadata));
console.log(`\nDone. Indexed ${metadata.length} pages.`);

Add to package.json:

{
  "scripts": {
    "build": "docusaurus build",
    "postbuild": "node scripts/build-search-index.mjs"
  }
}

Step 3: Replace SearchBar with the semantic search component

Replace the contents of src/theme/SearchBar/index.tsx:

import React, { useCallback, useEffect, useRef, useState } from 'react';
import type { WasmSearchEngine } from 'altor-vec';

interface Result { id: number; title: string; excerpt: string; url: string; score: number; }

let engine: WasmSearchEngine | null = null;
let metadata: Omit<Result, 'score'>[] = [];
let initPromise: Promise<void> | null = null;

async function initSearch() {
  if (engine) return;
  if (initPromise) return initPromise;
  initPromise = (async () => {
    const { default: init, WasmSearchEngine } = await import('altor-vec');
    await init();
    const [buf, meta] = await Promise.all([
      fetch('/search-index.bin').then(r => r.arrayBuffer()),
      fetch('/search-metadata.json').then(r => r.json()),
    ]);
    engine = new WasmSearchEngine(new Uint8Array(buf));
    metadata = meta;
  })();
  return initPromise;
}

export default function SearchBar(): JSX.Element {
  const [open, setOpen] = useState(false);
  const [query, setQuery] = useState('');
  const [results, setResults] = useState<Result[]>([]);
  const [loading, setLoading] = useState(false);
  const inputRef = useRef<HTMLInputElement>(null);
  const timer = useRef<ReturnType<typeof setTimeout>>();

  useEffect(() => { initSearch(); }, []);

  useEffect(() => {
    const handler = (e: KeyboardEvent) => {
      if ((e.metaKey || e.ctrlKey) && e.key === 'k') { e.preventDefault(); setOpen(o => !o); }
      if (e.key === 'Escape') setOpen(false);
    };
    document.addEventListener('keydown', handler);
    return () => document.removeEventListener('keydown', handler);
  }, []);

  useEffect(() => {
    if (open) setTimeout(() => inputRef.current?.focus(), 50);
  }, [open]);

  const handleInput = useCallback(async (value: string) => {
    setQuery(value);
    clearTimeout(timer.current);
    if (!value.trim()) { setResults([]); return; }
    setLoading(true);
    timer.current = setTimeout(async () => {
      await initSearch();
      if (!engine) return;
      const { pipeline } = await import('@huggingface/transformers');
      const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
      const out = await embedder(value, { pooling: 'mean', normalize: true });
      const hits = JSON.parse(engine.search(new Float32Array(out.data as Float32Array), 6)) as [number, number][];
      setResults(hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist })));
      setLoading(false);
    }, 200);
  }, []);

  return (
    <>
      <button
        onClick={() => setOpen(true)}
        style={{ background: 'none', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: 6, padding: '6px 12px', cursor: 'pointer', color: 'var(--ifm-color-content-secondary)', fontSize: 14 }}
        aria-label="Search docs (Cmd+K)"
      >
        Search ⌘K
      </button>

      {open && (
        <div
          onClick={e => e.target === e.currentTarget && setOpen(false)}
          style={{ position: 'fixed', inset: 0, background: 'rgba(0,0,0,.55)', zIndex: 9999, display: 'flex', alignItems: 'flex-start', justifyContent: 'center', paddingTop: 80 }}
        >
          <div style={{ background: 'var(--ifm-background-color)', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: 12, width: 'min(640px, 92vw)', overflow: 'hidden' }}>
            <input
              ref={inputRef}
              value={query}
              onChange={e => handleInput(e.target.value)}
              placeholder="Search documentation..."
              style={{ width: '100%', padding: '14px 18px', fontSize: 16, border: 'none', outline: 'none', background: 'transparent', color: 'var(--ifm-color-content)', borderBottom: '1px solid var(--ifm-color-emphasis-200)', boxSizing: 'border-box' }}
            />
            {loading && <p style={{ padding: '12px 18px', margin: 0, color: 'var(--ifm-color-content-secondary)', fontSize: 14 }}>Searching…</p>}
            {!loading && query && !results.length && (
              <p style={{ padding: '12px 18px', margin: 0, color: 'var(--ifm-color-content-secondary)', fontSize: 14 }}>No results for "{query}"</p>
            )}
            <ul style={{ listStyle: 'none', margin: 0, padding: '8px', maxHeight: 380, overflowY: 'auto' }}>
              {results.map(r => (
                <li key={r.id}>
                  <a href={r.url} onClick={() => setOpen(false)} style={{ display: 'block', padding: '10px 12px', borderRadius: 8, textDecoration: 'none' }}>
                    <strong style={{ display: 'block', color: 'var(--ifm-color-content)', fontSize: 14 }}>{r.title}</strong>
                    <span style={{ display: 'block', color: 'var(--ifm-color-content-secondary)', fontSize: 13, overflow: 'hidden', textOverflow: 'ellipsis', whiteSpace: 'nowrap' }}>{r.excerpt}</span>
                  </a>
                </li>
              ))}
            </ul>
          </div>
        </div>
      )}
    </>
  );
}

Step 4: Disable the default search in docusaurus.config.ts

// docusaurus.config.ts
const config: Config = {
  // Remove or comment out the algolia or localSearch block:
  // themeConfig: {
  //   algolia: { ... },       // remove this
  // },

  // If you had @docusaurus/plugin-search-local, remove it from plugins too
  plugins: [
    // remove '@docusaurus/plugin-search-local' if present
  ],
};

Handling the embedding model size

The Xenova/all-MiniLM-L6-v2 model is 23MB. On first search it downloads and caches in the browser. Subsequent searches in the same session are instant. To make this fast for users:

Initialize the engine (but not the embedder) when the page loads — this fetches the 15-20MB index in the background
Initialize the embedder only on first query, not on page load
Show a loading state for the first query ("Loading search model…") so users know something is happening

For production, you can pre-compute embeddings at build time and store them in the index, then use a lightweight query embedding service instead of a full local model. This trades model download size for a network call per query — worthwhile if your users are on slow connections.

Testing locally

# Build docs + generate index
npm run build

# Serve the built output
npx serve build

# Navigate to http://localhost:3000 and press Cmd+K to test search

Don't test with docusaurus start (dev mode) — the /build directory doesn't exist in dev mode, so the index build script won't have content to read. Always test search against the production build.

FAQ

Does Algolia DocSearch work with private documentation?

Algolia DocSearch is free only for open-source projects with publicly accessible docs. Private docs — internal tools, commercial products, anything behind a login — require a paid Algolia plan. altor-vec has no such restriction.

What Docusaurus version does this support?

This guide targets Docusaurus v3. The swizzle command and SearchBar component location are the same in v2. The /build output structure is identical between versions, so the index build script works unchanged.

Can I keep the default search as a fallback?

Yes — use --wrap instead of --eject when swizzling, which lets you render both. But two search interfaces in the navbar is a UX problem. A better approach: replace the SearchBar and add a graceful fallback inside the component if the index fails to load.

Get started: npm install altor-vec · GitHub