migration guide
Migrate from ChromaDB to altor-vec
Replace your ChromaDB server with zero-cost browser vector search using altor-vec. Migration guide for JavaScript apps moving from Python-first RAG to browser-native retrieval.
When migration makes sense
- Your RAG pipeline runs in a browser app and ChromaDB's Python server adds unnecessary infrastructure
- You want to eliminate the Python dependency and run retrieval entirely in JavaScript/TypeScript
- Your dataset is static or updates infrequently enough that a build-time index rebuild is acceptable
What you give up
Migration is not always the right call. altor-vec cannot replace ChromaDB for:
- Python ecosystem integration: ChromaDB works natively with LangChain, LlamaIndex, and other Python RAG frameworks
- Metadata where-filtering: ChromaDB's where={'category': 'docs'} filtering requires post-retrieval filtering with altor-vec
- Persistent server state: ChromaDB stores and manages the index server-side; altor-vec requires explicit serialization
Step-by-step migration
npm install altor-vec @xenova/transformers// 1. Export from ChromaDB (Python)
import chromadb, json
import numpy as np
client = chromadb.Client() # or chromadb.HttpClient() for server mode
collection = client.get_collection("your-collection")
# Get all documents and embeddings
result = collection.get(include=["documents", "embeddings", "metadatas"])
# Save documents and embeddings
with open("chroma-export.json", "w") as f:
json.dump({
"ids": result["ids"],
"documents": result["documents"],
"metadatas": result["metadatas"],
}, f)
vectors = np.array(result["embeddings"], dtype=np.float32)
np.save("chroma-vectors.npy", vectors)
print(f"Exported {len(result['ids'])} items, dim={vectors.shape[1]}")
// 2. Build altor-vec index (Node.js build step)
import init, { WasmSearchEngine } from 'altor-vec';
import { readFileSync, writeFileSync } from 'fs';
await init();
const data = JSON.parse(readFileSync('chroma-export.json', 'utf8'));
// Load Float32Array from exported numpy (convert with python: vectors.tofile('vectors.bin'))
const buffer = readFileSync('vectors.bin');
const vectors = new Float32Array(buffer.buffer);
const DIM = vectors.length / data.ids.length;
const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
writeFileSync('public/rag-index.json', engine.to_json());
writeFileSync('public/rag-chunks.json', JSON.stringify(
data.ids.map((id, i) => ({
id: i, text: data.documents[i], metadata: data.metadatas[i]
}))
));
console.log('RAG index ready for browser use');
After migration
Once your index is built and deployed to public/search-index.json, load it in the browser:
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';
await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const resp = await fetch('/search-index.json');
const engine = WasmSearchEngine.from_json(await resp.text());
async function search(query, k = 5) {
const out = await embedder(query, { pooling: 'mean', normalize: true });
const hits = JSON.parse(engine.search(new Float32Array(out.data), k));
return hits; // [{id, score}] - map id back to your metadata
}
Frequently asked questions
Can altor-vec replace ChromaDB for RAG applications?
Yes, for browser-side RAG. If your retrieval step runs in the browser (fetching relevant chunks before passing to an LLM API), altor-vec is a direct replacement for ChromaDB. If your RAG pipeline is entirely server-side in Python, keep ChromaDB.
Does altor-vec support metadata filtering like ChromaDB's where clause?
Not natively. After calling engine.search(query, k), filter the results array in JavaScript: hits.filter(h => chunks[h.id].metadata.category === 'docs'). Over-fetch (request 50 results, filter to top 5) to compensate for the post-retrieval filter reducing your result count.
How do I handle RAG pipeline changes when moving from Python+ChromaDB to browser+altor-vec?
The retrieval step moves to the browser. Embedding the query with Transformers.js, searching altor-vec, and building the prompt context all happen client-side. The LLM API call (OpenAI, Anthropic) can still be server-side via your API routes or called directly from the browser.
What embedding models can I use as a drop-in for ChromaDB's default embeddings?
ChromaDB defaults to all-MiniLM-L6-v2 (384 dimensions). altor-vec works with the same model via Transformers.js. If you used OpenAI embeddings in ChromaDB, generate them at build time and ship the resulting Float32Array index — same model, same dimensions, same quality.