migration guide
Migrate from Pagefind to altor-vec
Upgrade from Pagefind's keyword search to semantic vector search with altor-vec. Guide for static site owners who want meaning-based search — finds results by concept, not just by matching words.
When migration makes sense
- Users search for concepts (e.g., 'authentication guide') and your keyword search returns no results because the page title says 'Login implementation'
- You want search that understands synonyms, related concepts, and intent
- You want to build a more sophisticated search experience than Pagefind's keyword index supports
What you give up
Migration is not always the right call. altor-vec cannot replace Pagefind for:
- Automatic HTML indexing: Pagefind's CLI scans your HTML build output automatically; altor-vec requires explicit content extraction and embedding
- Large site scale: Pagefind handles 500K+ pages efficiently; altor-vec is best under ~100K documents
- Zero-config setup: Pagefind works out of the box with a single CLI command; altor-vec requires a build script
Step-by-step migration
npm install altor-vec @xenova/transformers// Pagefind indexes HTML automatically; altor-vec needs explicit content extraction.
// build-search.mjs — replace Pagefind CLI with this script in your build
import { pipeline } from '@xenova/transformers';
import init, { WasmSearchEngine } from 'altor-vec';
import { readdirSync, readFileSync, writeFileSync } from 'fs';
import { join } from 'path';
// Extract content from your source (MDX, Markdown, JSON)
// Example: MDX files in a docs/ directory
import matter from 'gray-matter';
const DOCS_DIR = './content';
const files = readdirSync(DOCS_DIR, { recursive: true })
.filter(f => f.endsWith('.mdx') || f.endsWith('.md'));
const docs = files.map((f, id) => {
const raw = readFileSync(join(DOCS_DIR, f), 'utf8');
const { data: frontmatter, content } = matter(raw);
return {
id,
title: frontmatter.title ?? f,
content: content.replace(/[#*`]/g, '').slice(0, 800), // strip markdown
url: '/' + f.replace('.mdx', '').replace('.md', ''),
};
});
console.log(`Embedding ${docs.length} pages...`);
await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;
const vectors = new Float32Array(docs.length * DIM);
for (const [i, doc] of docs.entries()) {
const out = await embedder(`${doc.title}. ${doc.content}`,
{ pooling: 'mean', normalize: true });
vectors.set(out.data, i * DIM);
}
const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
writeFileSync('public/search-index.json', engine.to_json());
writeFileSync('public/search-docs.json', JSON.stringify(
docs.map(d => ({ id: d.id, title: d.title, url: d.url }))
));
// Update package.json: "build": "node build-search.mjs && your-site-builder"
console.log('Done. Replace pagefind in your search UI with altor-vec.');
After migration
Once your index is built and deployed to public/search-index.json, load it in the browser:
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';
await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const resp = await fetch('/search-index.json');
const engine = WasmSearchEngine.from_json(await resp.text());
async function search(query, k = 5) {
const out = await embedder(query, { pooling: 'mean', normalize: true });
const hits = JSON.parse(engine.search(new Float32Array(out.data), k));
return hits; // [{id, score}] - map id back to your metadata
}
Frequently asked questions
What is the main difference between Pagefind and altor-vec?
Pagefind does keyword search: it finds pages containing the exact words you type. altor-vec does semantic search: it finds pages that are conceptually related to your query, even when no words match. Use Pagefind for exact-term retrieval; use altor-vec when you want meaning-based results.
Can I use both Pagefind and altor-vec on the same site?
Yes. A hybrid approach works well: run both indexes in parallel and merge results. Pagefind handles exact-match queries; altor-vec handles conceptual queries. Display deduplicated results sorted by a combined score.
Is the setup more complex than Pagefind?
Yes. Pagefind is zero-config — one CLI command indexes your HTML. altor-vec requires a build script that extracts content, generates embeddings (1-10 minutes for large sites), and writes the index. The tradeoff is better search quality and semantic understanding.
Will altor-vec find misspelled queries like Pagefind can with fuzzy matching?
Sometimes. Embedding models may map misspellings close to correct words in vector space, but this is inconsistent. altor-vec is not designed for typo correction. If typo tolerance is important, use Pagefind for keyword search alongside altor-vec for semantic search.