migration guide

Migrate from Pagefind to altor-vec

Upgrade from Pagefind's keyword search to semantic vector search with altor-vec. Guide for static site owners who want meaning-based search — finds results by concept, not just by matching words.

When migration makes sense

What you give up

Migration is not always the right call. altor-vec cannot replace Pagefind for:

Not sure? See the full altor-vec vs Pagefind comparison — it covers architecture differences and use-case fit in detail.

Step-by-step migration

Install altor-vec: npm install altor-vec @xenova/transformers
// Pagefind indexes HTML automatically; altor-vec needs explicit content extraction.
// build-search.mjs — replace Pagefind CLI with this script in your build

import { pipeline } from '@xenova/transformers';
import init, { WasmSearchEngine } from 'altor-vec';
import { readdirSync, readFileSync, writeFileSync } from 'fs';
import { join } from 'path';

// Extract content from your source (MDX, Markdown, JSON)
// Example: MDX files in a docs/ directory
import matter from 'gray-matter';

const DOCS_DIR = './content';
const files = readdirSync(DOCS_DIR, { recursive: true })
  .filter(f => f.endsWith('.mdx') || f.endsWith('.md'));

const docs = files.map((f, id) => {
  const raw = readFileSync(join(DOCS_DIR, f), 'utf8');
  const { data: frontmatter, content } = matter(raw);
  return {
    id,
    title: frontmatter.title ?? f,
    content: content.replace(/[#*`]/g, '').slice(0, 800), // strip markdown
    url: '/' + f.replace('.mdx', '').replace('.md', ''),
  };
});

console.log(`Embedding ${docs.length} pages...`);
await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;
const vectors = new Float32Array(docs.length * DIM);

for (const [i, doc] of docs.entries()) {
  const out = await embedder(`${doc.title}. ${doc.content}`,
    { pooling: 'mean', normalize: true });
  vectors.set(out.data, i * DIM);
}

const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
writeFileSync('public/search-index.json', engine.to_json());
writeFileSync('public/search-docs.json', JSON.stringify(
  docs.map(d => ({ id: d.id, title: d.title, url: d.url }))
));

// Update package.json: "build": "node build-search.mjs && your-site-builder"
console.log('Done. Replace pagefind in your search UI with altor-vec.');

After migration

Once your index is built and deployed to public/search-index.json, load it in the browser:

import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const resp = await fetch('/search-index.json');
const engine = WasmSearchEngine.from_json(await resp.text());

async function search(query, k = 5) {
  const out = await embedder(query, { pooling: 'mean', normalize: true });
  const hits = JSON.parse(engine.search(new Float32Array(out.data), k));
  return hits; // [{id, score}] - map id back to your metadata
}

Frequently asked questions

What is the main difference between Pagefind and altor-vec?

Pagefind does keyword search: it finds pages containing the exact words you type. altor-vec does semantic search: it finds pages that are conceptually related to your query, even when no words match. Use Pagefind for exact-term retrieval; use altor-vec when you want meaning-based results.

Can I use both Pagefind and altor-vec on the same site?

Yes. A hybrid approach works well: run both indexes in parallel and merge results. Pagefind handles exact-match queries; altor-vec handles conceptual queries. Display deduplicated results sorted by a combined score.

Is the setup more complex than Pagefind?

Yes. Pagefind is zero-config — one CLI command indexes your HTML. altor-vec requires a build script that extracts content, generates embeddings (1-10 minutes for large sites), and writes the index. The tradeoff is better search quality and semantic understanding.

Will altor-vec find misspelled queries like Pagefind can with fuzzy matching?

Sometimes. Embedding models may map misspellings close to correct words in vector space, but this is inconsistent. altor-vec is not designed for typo correction. If typo tolerance is important, use Pagefind for keyword search alongside altor-vec for semantic search.