Node.js guide
Product Search in Node.js with altor-vec
Use altor-vec to add product search to your Node.js app — entirely in the browser, with no server, no API keys, and zero per-query cost. Search a product catalog by semantic meaning — find products by concept, synonym, or intent rather than requiring exact keyword matches.
npm install altor-vec @xenova/transformersImplementation
Server-side indexing script (Node 18+, ESM). Uses module-level variable for the engine.
// build-product-index.mjs — Node.js build script for product catalog
// Run: node build-product-index.mjs
import { pipeline } from '@xenova/transformers';
import init, { WasmSearchEngine } from 'altor-vec';
import { readFileSync, writeFileSync } from 'fs';
// Load products from JSON (Shopify export, Stripe products, etc.)
const products = JSON.parse(readFileSync('data/products.json', 'utf8'));
await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;
const vectors = new Float32Array(products.length * DIM);
console.log(\`Building product search index for \${products.length} products...\`);
for (const [i, p] of products.entries()) {
// Combine name + description + category for richer embeddings
const text = \`\${p.name}. \${p.description}. Category: \${p.category}.\`;
const out = await embedder(text, { pooling: 'mean', normalize: true });
vectors.set(out.data, i * DIM);
}
const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);
writeFileSync('public/product-search-index.json', engine.to_json());
writeFileSync('public/products-metadata.json', JSON.stringify(
products.map(p => ({ id: p.id, name: p.name, price: p.price, category: p.category }))
));
console.log('Product search index ready.');
// package.json: add "build:search": "node build-product-index.mjs" to scripts
// Run before build: npm run build:search && npm run build
Performance
50K products at 384 dimensions: ~85MB memory, ~1ms per query. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.
| Index size | Dimensions | Query p50 | Memory |
|---|---|---|---|
| 1,000 vectors | 384 | ~0.1ms | ~2MB |
| 10,000 vectors | 384 | ~0.4ms | ~17MB |
| 50,000 vectors | 384 | ~0.9ms | ~85MB |
When this approach works best
- Static e-commerce catalogs (Shopify/Stripe products exported as JSON)
- Apps where search must work offline or with zero API budget
- Product discovery UIs where semantic matching improves conversion
Limitations
- No built-in faceted filtering (category, price range) — implement with post-filter on results
- Index updates require a rebuild step — not suitable for catalogs that change hourly
Frequently asked questions
How do I filter by category or price after a semantic search?
Run engine.search(queryEmbedding, 50) to get 50 candidates, then filter the results array by category, price range, or in-stock status in JavaScript before showing the top N to the user. This is called post-retrieval filtering.
Will semantic search understand synonyms like 'sneakers' vs 'trainers'?
Yes. Embedding models encode semantic meaning, so 'sneakers', 'trainers', 'running shoes', and 'athletic footwear' will all map to nearby vector positions and return similar results.
How do I generate embeddings for product titles and descriptions?
Concatenate the product name and description: `${product.name}. ${product.description}`. Embed this combined string with all-MiniLM-L6-v2 via Transformers.js. This gives better results than embedding the title alone.
Related resources
framework
reference