docusaurus search plugin
Add Semantic Search to Docusaurus — Replace Algolia DocSearch
Docusaurus ships with a local search option (keyword-only, via Lunr) and supports Algolia DocSearch (free only for publicly accessible open-source projects). If your docs are private, internal, or for a paid product, you're paying Algolia or settling for keyword search. This guide shows how to replace both with client-side semantic search using altor-vec — no account required, no per-query cost, and intent-aware results.
npm install altor-vec @huggingface/transformers tsxWhy replace Algolia DocSearch
Algolia DocSearch is genuinely convenient. You add three lines to docusaurus.config.ts, Algolia crawls your site, and search works. The catch: it's free only for open-source projects with publicly accessible documentation. If your docs are behind a login, on an internal network, or for a commercial product, you pay Algolia's standard rates — starting around $50/month at modest traffic.
Beyond pricing, DocSearch is keyword matching with Algolia's ranking heuristics. A user searching "how to increase throughput" won't find your "performance tuning" page if those words don't overlap. Semantic search closes that gap.
How the implementation works
Docusaurus uses a concept called "swizzling" to override theme components. You run docusaurus swizzle to copy a component from the default theme into your src/theme/ directory, then modify it. This is the correct way to customize Docusaurus without forking the theme.
The search implementation has two parts:
- Build script — reads your compiled
/buildoutput, extracts content from HTML, generates embeddings, writes a binary index to/static - SearchBar override — replaces Docusaurus's SearchBar component with a React component that loads the altor-vec index and queries it in the browser
Step 1: Swizzle the SearchBar component
npx docusaurus swizzle @docusaurus/theme-classic SearchBar --eject --typescript
This creates src/theme/SearchBar/index.tsx. You'll replace its contents with the semantic search implementation below.
Why --eject and not --wrap: Ejecting gives you the full component to replace. Wrapping keeps the original component and adds around it. For replacing search entirely, eject is cleaner. If you want to keep the original search as a fallback, use --wrap instead.
Step 2: Write the index build script
Create scripts/build-search-index.mjs. Run this after docusaurus build — it reads the compiled HTML from /build.
// scripts/build-search-index.mjs
import fs from 'node:fs/promises';
import { glob } from 'glob';
import { JSDOM } from 'jsdom';
import { pipeline } from '@huggingface/transformers';
import init, { WasmSearchEngine } from 'altor-vec';
await init();
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const htmlFiles = await glob('build/**/*.html');
const vectors = [];
const metadata = [];
for (let i = 0; i < htmlFiles.length; i++) {
const file = htmlFiles[i];
const html = await fs.readFile(file, 'utf8');
const dom = new JSDOM(html);
const doc = dom.window.document;
const title = doc.querySelector('article h1')?.textContent?.trim()
?? doc.querySelector('h1')?.textContent?.trim()
?? 'Untitled';
// Docusaurus puts main content in .theme-doc-markdown or article
const main = doc.querySelector('.theme-doc-markdown')
?? doc.querySelector('article')
?? doc.querySelector('main');
if (!main) continue;
main.querySelectorAll('nav, .pagination-nav, .theme-doc-toc-desktop, script, style').forEach(el => el.remove());
const text = main.textContent?.replace(/\s+/g, ' ').trim() ?? '';
if (!text || text.length < 30) continue;
const out = await embed(`${title}\n${text.slice(0, 900)}`, { pooling: 'mean', normalize: true });
vectors.push(...Array.from(out.data));
const url = '/' + file.replace('build/', '').replace('index.html', '').replace('.html', '');
metadata.push({ id: metadata.length, title, excerpt: text.slice(0, 200), url });
if (i % 10 === 0) process.stdout.write(`\r${i + 1}/${htmlFiles.length}`);
}
const engine = WasmSearchEngine.from_vectors(new Float32Array(vectors), 384, 16, 200, 50);
await fs.writeFile('./static/search-index.bin', Buffer.from(engine.to_bytes()));
await fs.writeFile('./static/search-metadata.json', JSON.stringify(metadata));
console.log(`\nDone. Indexed ${metadata.length} pages.`);
Add to package.json:
{
"scripts": {
"build": "docusaurus build",
"postbuild": "node scripts/build-search-index.mjs"
}
}
Step 3: Replace SearchBar with the semantic search component
Replace the contents of src/theme/SearchBar/index.tsx:
import React, { useCallback, useEffect, useRef, useState } from 'react';
import type { WasmSearchEngine } from 'altor-vec';
interface Result { id: number; title: string; excerpt: string; url: string; score: number; }
let engine: WasmSearchEngine | null = null;
let metadata: Omit<Result, 'score'>[] = [];
let initPromise: Promise<void> | null = null;
async function initSearch() {
if (engine) return;
if (initPromise) return initPromise;
initPromise = (async () => {
const { default: init, WasmSearchEngine } = await import('altor-vec');
await init();
const [buf, meta] = await Promise.all([
fetch('/search-index.bin').then(r => r.arrayBuffer()),
fetch('/search-metadata.json').then(r => r.json()),
]);
engine = new WasmSearchEngine(new Uint8Array(buf));
metadata = meta;
})();
return initPromise;
}
export default function SearchBar(): JSX.Element {
const [open, setOpen] = useState(false);
const [query, setQuery] = useState('');
const [results, setResults] = useState<Result[]>([]);
const [loading, setLoading] = useState(false);
const inputRef = useRef<HTMLInputElement>(null);
const timer = useRef<ReturnType<typeof setTimeout>>();
useEffect(() => { initSearch(); }, []);
useEffect(() => {
const handler = (e: KeyboardEvent) => {
if ((e.metaKey || e.ctrlKey) && e.key === 'k') { e.preventDefault(); setOpen(o => !o); }
if (e.key === 'Escape') setOpen(false);
};
document.addEventListener('keydown', handler);
return () => document.removeEventListener('keydown', handler);
}, []);
useEffect(() => {
if (open) setTimeout(() => inputRef.current?.focus(), 50);
}, [open]);
const handleInput = useCallback(async (value: string) => {
setQuery(value);
clearTimeout(timer.current);
if (!value.trim()) { setResults([]); return; }
setLoading(true);
timer.current = setTimeout(async () => {
await initSearch();
if (!engine) return;
const { pipeline } = await import('@huggingface/transformers');
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const out = await embedder(value, { pooling: 'mean', normalize: true });
const hits = JSON.parse(engine.search(new Float32Array(out.data as Float32Array), 6)) as [number, number][];
setResults(hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist })));
setLoading(false);
}, 200);
}, []);
return (
<>
<button
onClick={() => setOpen(true)}
style={{ background: 'none', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: 6, padding: '6px 12px', cursor: 'pointer', color: 'var(--ifm-color-content-secondary)', fontSize: 14 }}
aria-label="Search docs (Cmd+K)"
>
Search ⌘K
</button>
{open && (
<div
onClick={e => e.target === e.currentTarget && setOpen(false)}
style={{ position: 'fixed', inset: 0, background: 'rgba(0,0,0,.55)', zIndex: 9999, display: 'flex', alignItems: 'flex-start', justifyContent: 'center', paddingTop: 80 }}
>
<div style={{ background: 'var(--ifm-background-color)', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: 12, width: 'min(640px, 92vw)', overflow: 'hidden' }}>
<input
ref={inputRef}
value={query}
onChange={e => handleInput(e.target.value)}
placeholder="Search documentation..."
style={{ width: '100%', padding: '14px 18px', fontSize: 16, border: 'none', outline: 'none', background: 'transparent', color: 'var(--ifm-color-content)', borderBottom: '1px solid var(--ifm-color-emphasis-200)', boxSizing: 'border-box' }}
/>
{loading && <p style={{ padding: '12px 18px', margin: 0, color: 'var(--ifm-color-content-secondary)', fontSize: 14 }}>Searching…</p>}
{!loading && query && !results.length && (
<p style={{ padding: '12px 18px', margin: 0, color: 'var(--ifm-color-content-secondary)', fontSize: 14 }}>No results for "{query}"</p>
)}
<ul style={{ listStyle: 'none', margin: 0, padding: '8px', maxHeight: 380, overflowY: 'auto' }}>
{results.map(r => (
<li key={r.id}>
<a href={r.url} onClick={() => setOpen(false)} style={{ display: 'block', padding: '10px 12px', borderRadius: 8, textDecoration: 'none' }}>
<strong style={{ display: 'block', color: 'var(--ifm-color-content)', fontSize: 14 }}>{r.title}</strong>
<span style={{ display: 'block', color: 'var(--ifm-color-content-secondary)', fontSize: 13, overflow: 'hidden', textOverflow: 'ellipsis', whiteSpace: 'nowrap' }}>{r.excerpt}</span>
</a>
</li>
))}
</ul>
</div>
</div>
)}
</>
);
}
Step 4: Disable the default search in docusaurus.config.ts
// docusaurus.config.ts
const config: Config = {
// Remove or comment out the algolia or localSearch block:
// themeConfig: {
// algolia: { ... }, // remove this
// },
// If you had @docusaurus/plugin-search-local, remove it from plugins too
plugins: [
// remove '@docusaurus/plugin-search-local' if present
],
};
Handling the embedding model size
The Xenova/all-MiniLM-L6-v2 model is 23MB. On first search it downloads and caches in the browser. Subsequent searches in the same session are instant. To make this fast for users:
- Initialize the engine (but not the embedder) when the page loads — this fetches the 15-20MB index in the background
- Initialize the embedder only on first query, not on page load
- Show a loading state for the first query ("Loading search model…") so users know something is happening
For production, you can pre-compute embeddings at build time and store them in the index, then use a lightweight query embedding service instead of a full local model. This trades model download size for a network call per query — worthwhile if your users are on slow connections.
Testing locally
# Build docs + generate index
npm run build
# Serve the built output
npx serve build
# Navigate to http://localhost:3000 and press Cmd+K to test search
Don't test with docusaurus start (dev mode) — the /build directory doesn't exist in dev mode, so the index build script won't have content to read. Always test search against the production build.
FAQ
Does Algolia DocSearch work with private documentation?
Algolia DocSearch is free only for open-source projects with publicly accessible docs. Private docs — internal tools, commercial products, anything behind a login — require a paid Algolia plan. altor-vec has no such restriction.
What Docusaurus version does this support?
This guide targets Docusaurus v3. The swizzle command and SearchBar component location are the same in v2. The /build output structure is identical between versions, so the index build script works unchanged.
Can I keep the default search as a fallback?
Yes — use --wrap instead of --eject when swizzling, which lets you render both. But two search interfaces in the navbar is a UX problem. A better approach: replace the SearchBar and add a graceful fallback inside the component if the index fails to load.