How large does the index get for a typical VitePress docs site?

For a docs site with 100 pages at 384-dimension embeddings, the binary index is roughly 15-20MB. For 500 pages it's around 75-100MB. Cache the index aggressively — it only changes when docs change, which is typically on deploy.

vitepress search

AI-Powered Search for VitePress — Replace the Default in 10 Minutes

Q: Does this replace VitePress's built-in search entirely?

You can either replace it or run both side by side. Replacing it gives a cleaner UX with one search modal. Running both lets you A/B compare keyword vs semantic results. The guide shows the replacement approach but the theme extension pattern works either way.

Q: Will this work with VitePress's default theme?

Yes. VitePress theme extension lets you add components without ejecting from the default theme. You add files to .vitepress/theme/ and extend the Layout component. The default theme's sidebar, navigation, and styling all stay intact.

VitePress ships with a local full-text search powered by MiniSearch. It's fast and zero-config, but it matches keywords — not intent. A user searching "how to configure rate limits" won't find your "request throttling" page unless those words overlap. This guide shows how to augment or replace VitePress search with semantic vector search using altor-vec, without ejecting from the default theme.

Install: npm install altor-vec @huggingface/transformers tsx

What VitePress search does and doesn't do

VitePress's built-in search uses MiniSearch under the hood — a solid 22KB inverted-index library. At build time it crawls your .md files and produces a search index. At runtime, queries are scored by BM25-style term frequency against the index.

This works well when users type exact terms from your documentation. It breaks down when user vocabulary doesn't match your content vocabulary — which is common in technical documentation, where users describe problems in their own words while documentation uses precise API terminology.

Approach	Handles typos	Understands intent	Bundle size	Setup
VitePress default (MiniSearch)	No	No	22KB	Zero config
altor-vec (HNSW vector)	Yes (via embeddings)	Yes	54KB WASM	Build script + theme extension

Overview: how this works

The implementation has three parts:

Index build script — reads your .vitepress/dist output after vitepress build, extracts text content from HTML files, generates embeddings, writes a binary index to public/
Theme extension — adds a search component to VitePress's default theme via .vitepress/theme/index.ts without replacing the whole theme
Search component — a Vue component (or vanilla JS) that loads the index, accepts queries, and renders results

Step 1: Build the search index from your compiled docs

Create scripts/build-search.mjs. This runs after vitepress build, reading the compiled HTML output to extract clean text.

// scripts/build-search.mjs
import fs from 'node:fs/promises';
import { glob } from 'glob';
import { JSDOM } from 'jsdom';
import { pipeline } from '@huggingface/transformers';
import init, { WasmSearchEngine } from 'altor-vec';

await init();
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

// Read compiled HTML from the VitePress output directory
const htmlFiles = await glob('.vitepress/dist/**/*.html');
const vectors = [];
const metadata = [];

for (let i = 0; i < htmlFiles.length; i++) {
  const file = htmlFiles[i];
  const html = await fs.readFile(file, 'utf8');
  const dom = new JSDOM(html);
  const doc = dom.window.document;

  // Extract title and main content — skip nav, sidebar, footer
  const title = doc.querySelector('h1')?.textContent?.trim() ?? 'Untitled';
  const mainContent = doc.querySelector('.vp-doc') ?? doc.querySelector('main') ?? doc.body;

  // Remove script and style tags
  mainContent.querySelectorAll('script,style,nav,.aside,.sidebar').forEach(el => el.remove());
  const text = mainContent.textContent?.replace(/\s+/g, ' ').trim() ?? '';

  if (!text || text.length < 50) continue; // skip empty pages

  const textToEmbed = `${title}\n${text.slice(0, 1000)}`;
  const out = await embed(textToEmbed, { pooling: 'mean', normalize: true });
  vectors.push(...Array.from(out.data));

  // Build the URL from file path
  const url = '/' + file
    .replace('.vitepress/dist/', '')
    .replace('index.html', '')
    .replace('.html', '');

  metadata.push({
    id: vectors.length / 384 - 1,
    title,
    excerpt: text.slice(0, 200),
    url,
  });

  if (i % 5 === 0) process.stdout.write(`\rProcessing ${i + 1}/${htmlFiles.length}...`);
}

const dim = 384;
const engine = WasmSearchEngine.from_vectors(new Float32Array(vectors), dim, 16, 200, 50);

await fs.writeFile('.vitepress/dist/search-index.bin', Buffer.from(engine.to_bytes()));
await fs.writeFile('.vitepress/dist/search-metadata.json', JSON.stringify(metadata));
console.log(`\nIndexed ${metadata.length} pages`);

Install jsdom for HTML parsing:

npm install -D jsdom @types/jsdom

Step 2: Wire the build script into your package.json

// package.json
{
  "scripts": {
    "docs:dev": "vitepress dev",
    "docs:build": "vitepress build && node scripts/build-search.mjs",
    "docs:preview": "vitepress preview"
  }
}

Now every docs:build automatically generates the search index after VitePress finishes compiling.

Step 3: Create the search component

Create .vitepress/theme/SearchModal.vue:

<script setup lang="ts">
import { ref, onMounted, onUnmounted } from 'vue';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@huggingface/transformers';

interface Result {
  id: number;
  title: string;
  excerpt: string;
  url: string;
  score: number;
}

const open = ref(false);
const query = ref('');
const results = ref<Result[]>([]);
const loading = ref(false);

let engine: WasmSearchEngine | null = null;
let metadata: Omit<Result, 'score'>[] = [];
let embedder: Awaited<ReturnType<typeof pipeline>> | null = null;
let debounceTimer: ReturnType<typeof setTimeout>;

async function initEngine() {
  if (engine) return;
  await init();
  const [indexBuf, meta] = await Promise.all([
    fetch('/search-index.bin').then(r => r.arrayBuffer()),
    fetch('/search-metadata.json').then(r => r.json()),
  ]);
  engine = new WasmSearchEngine(new Uint8Array(indexBuf));
  metadata = meta;
  embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
}

async function runSearch(q: string) {
  if (!engine || !embedder || !q.trim()) { results.value = []; return; }
  loading.value = true;
  const out = await embedder(q, { pooling: 'mean', normalize: true });
  const hits = JSON.parse(engine.search(new Float32Array(out.data as Float32Array), 6)) as [number, number][];
  results.value = hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist }));
  loading.value = false;
}

function onInput(e: Event) {
  const val = (e.target as HTMLInputElement).value;
  query.value = val;
  clearTimeout(debounceTimer);
  debounceTimer = setTimeout(() => runSearch(val), 220);
}

function openModal() { open.value = true; initEngine(); }
function closeModal() { open.value = false; query.value = ''; results.value = []; }

function onKeydown(e: KeyboardEvent) {
  if ((e.metaKey || e.ctrlKey) && e.key === 'k') { e.preventDefault(); open.value ? closeModal() : openModal(); }
  if (e.key === 'Escape') closeModal();
}

onMounted(() => window.addEventListener('keydown', onKeydown));
onUnmounted(() => window.removeEventListener('keydown', onKeydown));
</script>

<template>
  <button class="search-btn" @click="openModal" aria-label="Search (Cmd+K)">
    <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
      <circle cx="11" cy="11" r="8"/><path d="m21 21-4.35-4.35"/>
    </svg>
    Search <kbd>⌘K</kbd>
  </button>

  <Teleport to="body">
    <div v-if="open" class="search-overlay" @click.self="closeModal">
      <div class="search-modal">
        <input
          autofocus
          type="search"
          placeholder="Search docs..."
          :value="query"
          @input="onInput"
          class="search-input"
        />
        <p v-if="loading" class="search-hint">Searching…</p>
        <p v-else-if="query && !results.length" class="search-hint">No results for "{{ query }}"</p>
        <ul v-else class="search-results">
          <li v-for="r in results" :key="r.id">
            <a :href="r.url" @click="closeModal">
              <strong>{{ r.title }}</strong>
              <span>{{ r.excerpt }}</span>
              <small>{{ (r.score * 100).toFixed(0) }}% match</small>
            </a>
          </li>
        </ul>
      </div>
    </div>
  </Teleport>
</template>

<style scoped>
.search-btn { background: transparent; border: 1px solid var(--vp-c-border); border-radius: 8px; padding: 6px 12px; cursor: pointer; font-size: 14px; color: var(--vp-c-text-2); display: flex; align-items: center; gap: 6px; }
.search-overlay { position: fixed; inset: 0; background: rgba(0,0,0,.6); z-index: 9999; display: flex; align-items: flex-start; justify-content: center; padding-top: 80px; }
.search-modal { background: var(--vp-c-bg); border: 1px solid var(--vp-c-border); border-radius: 12px; width: min(640px, 92vw); overflow: hidden; }
.search-input { width: 100%; padding: 14px 18px; font-size: 16px; border: none; outline: none; background: transparent; color: var(--vp-c-text-1); border-bottom: 1px solid var(--vp-c-border); }
.search-hint { padding: 16px 18px; color: var(--vp-c-text-3); margin: 0; font-size: 14px; }
.search-results { list-style: none; margin: 0; padding: 8px; max-height: 400px; overflow-y: auto; }
.search-results li a { display: block; padding: 10px 12px; border-radius: 8px; text-decoration: none; }
.search-results li a:hover { background: var(--vp-c-bg-soft); }
.search-results li a strong { display: block; color: var(--vp-c-text-1); font-size: 14px; margin-bottom: 2px; }
.search-results li a span { display: block; color: var(--vp-c-text-3); font-size: 13px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }
.search-results li a small { display: block; color: var(--vp-c-brand); font-size: 11px; margin-top: 2px; }
</style>

Step 4: Register the component via theme extension

Create or update .vitepress/theme/index.ts:

// .vitepress/theme/index.ts
import DefaultTheme from 'vitepress/theme';
import SearchModal from './SearchModal.vue';
import type { Theme } from 'vitepress';

export default {
  extends: DefaultTheme,
  enhanceApp({ app }) {
    app.component('SearchModal', SearchModal);
  },
  Layout() {
    return h(DefaultTheme.Layout, null, {
      'nav-bar-content-before': () => h(SearchModal),
    });
  },
} satisfies Theme;

The nav-bar-content-before slot injects the search button into VitePress's navbar before the existing content. Other available slots are nav-bar-content-after, sidebar-nav-before, and aside-top. Pick whichever placement fits your design.

Note on disabling built-in search: To disable VitePress's default search while keeping the new one, add themeConfig: { search: { provider: 'local', options: { detailedView: false } } } or set search: false in your VitePress config. The built-in search and the custom component can coexist, but two search buttons in the nav is confusing for users.

Step 5: Configure VitePress to handle WASM

VitePress uses Vite under the hood. WASM imports from altor-vec need a small config addition:

// .vitepress/config.ts
import { defineConfig } from 'vitepress';

export default defineConfig({
  vite: {
    optimizeDeps: {
      exclude: ['altor-vec'],
    },
    assetsInclude: ['**/*.wasm'],
  },
  // ... rest of your config
});

Handling hot reload in dev mode

During vitepress dev, the .vitepress/dist directory doesn't exist yet — the dev server serves content directly from your Markdown files. The search index script reads from dist, so it can only run after a full build.

For development, you have two options:

Run docs:build once to generate the index, then use docs:dev — the index is served statically and works in the dev server
Guard the search component with a check: if /search-index.bin returns 404, fall back to showing VitePress's default search or a "search coming soon" message

// In SearchModal.vue — graceful fallback
async function initEngine() {
  const probe = await fetch('/search-index.bin', { method: 'HEAD' });
  if (!probe.ok) {
    console.info('Search index not built yet. Run npm run docs:build.');
    return;
  }
  // ... rest of init
}

Serving the index from VitePress's public directory

An alternative to writing to .vitepress/dist is writing to public/ in your VitePress root. VitePress copies everything from public/ to the output directory during build. This means you can run the index build script before vitepress build and have the files available in both dev and production:

// package.json — alternative approach
{
  "scripts": {
    "prebuild:search": "vitepress build",
    "build:search": "node scripts/build-search-public.mjs",
    "docs:build": "npm run prebuild:search && npm run build:search"
  }
}

In build-search-public.mjs, write to ./public/search-index.bin instead of .vitepress/dist/. This makes the file available via /search-index.bin in both dev server and production.

Performance: index size and loading

A VitePress docs site with 150 pages at 384 dimensions produces a binary index of approximately 22MB. This loads in about 2 seconds on a typical broadband connection. To keep perceived performance high:

Initialize the engine only when the user opens the search modal, not on page load
Show a loading indicator while the index fetches
Cache the binary with a long-lived cache header — add a content hash to the filename if your content changes frequently
Use the Xenova/all-MiniLM-L6-v2 model (23MB) rather than larger models for faster first-query times

FAQ

Does this replace VitePress's built-in search entirely?

You can replace it or run both. To disable the built-in search, set search: false in your VitePress config. The custom component handles all searching independently. Running both is possible but adds UI clutter — most teams pick one.

Will this work with VitePress's default theme?

Yes. Theme extension via .vitepress/theme/index.ts is the standard VitePress pattern. You add files without ejecting from the default theme. All default theme features — sidebar, navigation, dark mode — continue to work.

How large does the index get for a typical docs site?

Roughly 150KB per 1,000 documents at 384 dimensions. A 100-page docs site produces around 15-20MB. A 500-page site produces 75-100MB. Cache the binary aggressively — it only changes when documentation changes, which is typically once per deployment.

Add to your VitePress site: npm install altor-vec · GitHub