fuse.js alternative
fuse.js Alternative: Semantic Search for What fuse.js Gets Wrong
fuse.js is the most popular client-side search library at 10.5 million weekly npm downloads, and it earns that usage for short string matching. But fuzzy character matching doesn't understand meaning. A user searching "inexpensive vector database" gets zero results when your content says "cost-efficient ANN retrieval" — the character overlap is too low regardless of your threshold setting. This post covers when to replace fuse.js, when to keep it, and how to run both together.
What fuse.js actually does
fuse.js implements the Bitap algorithm for approximate string matching. Given a query and a set of strings, it measures how many character insertions, deletions, and substitutions separate the query from each candidate. Results are scored by edit distance — closer character sequences score higher.
This works well for a specific class of problems: you have a list of items with short names or codes, and users might mistype or use partial terms. Autocomplete for city names, searching a product catalog by SKU, filtering a list of users by name — these are fuse.js's domain.
Where it breaks down is natural language. Characters don't encode meaning. "cheap" and "affordable" share almost no characters but mean the same thing. "how to install" and "installation guide" describe identical intent but score near zero similarity in fuse.js regardless of configuration.
The concrete failure mode
Here's the same search query against a documentation site, run through fuse.js and altor-vec:
// Query: "how do I reduce memory usage"
// Document: "Optimizing heap allocation for large HNSW indexes"
// fuse.js result: score 0.0 (no character overlap)
// altor-vec result: cosine similarity 0.87 (semantic match)
// Query: "search without internet"
// Document: "Offline-first retrieval with service workers"
// fuse.js result: score 0.1 (only "search" overlaps loosely)
// altor-vec result: cosine similarity 0.91 (semantic match)
The fuse.js threshold you'd need to catch these ("reduce memory" ↔ "optimizing heap allocation") would also produce so many false positives that the results become noise.
When fuse.js is actually the right choice
fuse.js isn't wrong — it's solving a different problem. Keep it when:
Short string matching where typos matter
User types "Johnsn" and should match "Johnson." User types "javascrpt" and should match "JavaScript." This is fuse.js's home territory. Vector search doesn't help here because typos don't affect semantic meaning but do affect character distance — and character distance is exactly what you want to measure.
Exact identifier search
Product codes (SKU-4821), error codes (E_TIMEOUT), route names (/api/v2/users). Users expect an exact or near-exact match. Semantic search would surface unrelated results that happen to be "conceptually similar" to an error code, which is unhelpful.
Datasets under 500 items
fuse.js is 5KB. altor-vec is 54KB WASM plus your index file. For a dropdown that filters 50 team members, the overhead of vector search isn't justified. The relevance difference is also minimal — at small scale and with short strings, fuse.js works well enough.
Rule of thumb: if your users type exact terms and your items are short strings, use fuse.js. If your users type questions and your items are paragraphs, use vector search.
When to replace fuse.js with altor-vec
The switch is worth making when your search targets paragraph-length content and users phrase queries naturally rather than typing exact keywords.
Documentation search
Users ask "how do I configure rate limiting" and your docs say "request throttling configuration." fuse.js returns nothing. altor-vec returns the correct page because throttling and rate limiting share the same semantic space.
Help center and support content
Support articles use formal language. Users describe their problem in their own words. "My bill went up" should match "unexpected charges" and "pricing change." The vocabulary gap between user language and content language is a core problem that only semantic search solves.
Blog and article search
Technical articles use precise terminology. Users may not know the exact term. "fast approximate search" should match "HNSW algorithm" because that's what fast approximate vector search is. Character matching won't bridge that gap.
Multilingual user bases
If your users search in Spanish, French, or Hindi and your content is in English, multilingual embedding models (like paraphrase-multilingual-MiniLM-L12-v2) produce vectors in a shared space. A Spanish query can match an English document because the vectors are close regardless of language. fuse.js has no equivalent capability.
Side-by-side code comparison
// fuse.js — fuzzy matching
import Fuse from 'fuse.js';
const docs = [
{ id: 1, title: 'Optimizing heap allocation', body: 'Large HNSW indexes...' },
{ id: 2, title: 'Service worker caching', body: 'Offline-first retrieval...' },
];
const fuse = new Fuse(docs, {
keys: ['title', 'body'],
threshold: 0.4,
includeScore: true,
});
const results = fuse.search('reduce memory usage');
// Returns [] — no character overlap with "heap allocation"
// altor-vec — vector search (after index built offline)
import init, { WasmSearchEngine } from 'altor-vec';
await init();
const indexBuf = await fetch('/search-index.bin').then(r => r.arrayBuffer());
const metadata = await fetch('/search-metadata.json').then(r => r.json());
const engine = new WasmSearchEngine(new Uint8Array(indexBuf));
// queryVec comes from your embedding model
const hits = JSON.parse(engine.search(new Float32Array(queryVec), 5));
const results = hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist }));
// Returns [{ title: 'Optimizing heap allocation', score: 0.87 }]
Migration from fuse.js to altor-vec
Step 1: Install
npm install altor-vec @huggingface/transformers
Step 2: Build the index offline
Create a Node script that reads your content and generates a binary index. Run it as a build step.
// scripts/build-index.mjs
import fs from 'node:fs/promises';
import { pipeline } from '@huggingface/transformers';
import init, { WasmSearchEngine } from 'altor-vec';
await init();
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
// Replace with however you load your content
const docs = JSON.parse(await fs.readFile('./src/data/docs.json', 'utf8'));
const vectors = [];
for (const doc of docs) {
const out = await embed(doc.title + '\n' + doc.body, {
pooling: 'mean',
normalize: true,
});
vectors.push(...Array.from(out.data));
}
const dim = 384;
const engine = WasmSearchEngine.from_vectors(new Float32Array(vectors), dim, 16, 200, 50);
await fs.writeFile('./public/search-index.bin', Buffer.from(engine.to_bytes()));
await fs.writeFile('./public/search-metadata.json', JSON.stringify(docs.map(d => ({
id: d.id, title: d.title, excerpt: d.body.slice(0, 200), url: d.url
}))));
Step 3: Replace the search call in your component
// Before: fuse.js
const results = fuse.search(query).map(r => r.item);
// After: altor-vec
async function search(query) {
const embedding = await embed(query, { pooling: 'mean', normalize: true });
const hits = JSON.parse(engine.search(new Float32Array(embedding.data), 5));
return hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist }));
}
Hybrid approach: run both
The cleanest solution for many apps is keeping fuse.js for short string / identifier matching and adding altor-vec for semantic content search. Run both and show results from each source with a label:
async function hybridSearch(query) {
// Run in parallel
const [exactHits, semanticHits] = await Promise.all([
Promise.resolve(fuse.search(query, { limit: 3 }).map(r => ({
...r.item,
source: 'exact',
}))),
search(query).then(hits => hits.map(h => ({ ...h, source: 'semantic' }))),
]);
// Deduplicate by id, prefer exact matches
const seen = new Set();
const merged = [];
for (const hit of [...exactHits, ...semanticHits]) {
if (!seen.has(hit.id)) {
seen.add(hit.id);
merged.push(hit);
}
}
return merged.slice(0, 8);
}
This gives users exact match behavior (typing "SKU-4821" returns that exact item) alongside semantic behavior (typing "how do I configure billing" returns relevant docs).
Bundle size and performance reality
| Library | Bundle (gzipped) | Index size (1K docs) | Query latency (1K docs) | Semantic search |
|---|---|---|---|---|
| fuse.js | 5KB | None (in-memory) | <1ms | No |
| altor-vec | 54KB WASM | ~1.5MB (384d) | <0.5ms | Yes |
| MiniSearch | 22KB | ~0.5MB | <1ms | No |
The index file is the real cost of altor-vec, not the WASM. At 1,000 documents with 384-dimension embeddings, expect roughly 1.5MB. Cache it aggressively with a long-lived cache header and it downloads once per user session.
FAQ
Is altor-vec faster than fuse.js?
For query execution on datasets over 1,000 items, yes. altor-vec HNSW lookup is sub-millisecond regardless of index size. fuse.js is O(n) and adds latency as your dataset grows. For tiny datasets under 500 items, fuse.js is effectively instant and the WASM initialization overhead tips the balance.
Can I use fuse.js and altor-vec together?
Yes. Run fuse.js for short string / identifier matching and altor-vec for semantic content search. Merge results in your component with deduplication. This is often the right default for product apps that have both catalogs (fuse.js) and documentation (altor-vec).
What is the minimum dataset size where altor-vec makes sense?
Around 200-500 paragraph-length documents. Below that threshold, fuse.js with a generous threshold captures enough relevant results that the added complexity of vector search isn't justified. The clearest signal to switch: your users are typing natural language queries and getting zero results.