migration guide

Migrate from ChromaDB to altor-vec

Replace your ChromaDB server with zero-cost browser vector search using altor-vec. Migration guide for JavaScript apps moving from Python-first RAG to browser-native retrieval.

When migration makes sense

Your RAG pipeline runs in a browser app and ChromaDB's Python server adds unnecessary infrastructure
You want to eliminate the Python dependency and run retrieval entirely in JavaScript/TypeScript
Your dataset is static or updates infrequently enough that a build-time index rebuild is acceptable

What you give up

Migration is not always the right call. altor-vec cannot replace ChromaDB for:

Python ecosystem integration: ChromaDB works natively with LangChain, LlamaIndex, and other Python RAG frameworks
Metadata where-filtering: ChromaDB's where={'category': 'docs'} filtering requires post-retrieval filtering with altor-vec
Persistent server state: ChromaDB stores and manages the index server-side; altor-vec requires explicit serialization

Not sure? See the full altor-vec vs ChromaDB comparison — it covers architecture differences and use-case fit in detail.

Step-by-step migration

Install altor-vec: npm install altor-vec @xenova/transformers

// 1. Export from ChromaDB (Python)
import chromadb, json
import numpy as np

client = chromadb.Client()  # or chromadb.HttpClient() for server mode
collection = client.get_collection("your-collection")

# Get all documents and embeddings
result = collection.get(include=["documents", "embeddings", "metadatas"])

# Save documents and embeddings
with open("chroma-export.json", "w") as f:
    json.dump({
        "ids": result["ids"],
        "documents": result["documents"],
        "metadatas": result["metadatas"],
    }, f)

vectors = np.array(result["embeddings"], dtype=np.float32)
np.save("chroma-vectors.npy", vectors)
print(f"Exported {len(result['ids'])} items, dim={vectors.shape[1]}")

// 2. Build altor-vec index (Node.js build step)
import init, { WasmSearchEngine } from 'altor-vec';
import { readFileSync, writeFileSync } from 'fs';
await init();

const data = JSON.parse(readFileSync('chroma-export.json', 'utf8'));
// Load Float32Array from exported numpy (convert with python: vectors.tofile('vectors.bin'))
const buffer = readFileSync('vectors.bin');
const vectors = new Float32Array(buffer.buffer);
const DIM = vectors.length / data.ids.length;

const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
writeFileSync('public/rag-index.json', engine.to_json());
writeFileSync('public/rag-chunks.json', JSON.stringify(
  data.ids.map((id, i) => ({
    id: i, text: data.documents[i], metadata: data.metadatas[i]
  }))
));
console.log('RAG index ready for browser use');

After migration

Once your index is built and deployed to public/search-index.json, load it in the browser:

import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const resp = await fetch('/search-index.json');
const engine = WasmSearchEngine.from_json(await resp.text());

async function search(query, k = 5) {
  const out = await embedder(query, { pooling: 'mean', normalize: true });
  const hits = JSON.parse(engine.search(new Float32Array(out.data), k));
  return hits; // [{id, score}] - map id back to your metadata
}

Frequently asked questions

Can altor-vec replace ChromaDB for RAG applications?

Yes, for browser-side RAG. If your retrieval step runs in the browser (fetching relevant chunks before passing to an LLM API), altor-vec is a direct replacement for ChromaDB. If your RAG pipeline is entirely server-side in Python, keep ChromaDB.

Does altor-vec support metadata filtering like ChromaDB's where clause?

Not natively. After calling engine.search(query, k), filter the results array in JavaScript: hits.filter(h => chunks[h.id].metadata.category === 'docs'). Over-fetch (request 50 results, filter to top 5) to compensate for the post-retrieval filter reducing your result count.

How do I handle RAG pipeline changes when moving from Python+ChromaDB to browser+altor-vec?

The retrieval step moves to the browser. Embedding the query with Transformers.js, searching altor-vec, and building the prompt context all happen client-side. The LLM API call (OpenAI, Anthropic) can still be server-side via your API routes or called directly from the browser.

What embedding models can I use as a drop-in for ChromaDB's default embeddings?

ChromaDB defaults to all-MiniLM-L6-v2 (384 dimensions). altor-vec works with the same model via Transformers.js. If you used OpenAI embeddings in ChromaDB, generate them at build time and ship the resulting Float32Array index — same model, same dimensions, same quality.