fuse.js alternative

fuse.js Alternative: Semantic Search for What fuse.js Gets Wrong

fuse.js is the most popular client-side search library at 10.5 million weekly npm downloads, and it earns that usage for short string matching. But fuzzy character matching doesn't understand meaning. A user searching "inexpensive vector database" gets zero results when your content says "cost-efficient ANN retrieval" — the character overlap is too low regardless of your threshold setting. This post covers when to replace fuse.js, when to keep it, and how to run both together.

What fuse.js actually does

fuse.js implements the Bitap algorithm for approximate string matching. Given a query and a set of strings, it measures how many character insertions, deletions, and substitutions separate the query from each candidate. Results are scored by edit distance — closer character sequences score higher.

This works well for a specific class of problems: you have a list of items with short names or codes, and users might mistype or use partial terms. Autocomplete for city names, searching a product catalog by SKU, filtering a list of users by name — these are fuse.js's domain.

Where it breaks down is natural language. Characters don't encode meaning. "cheap" and "affordable" share almost no characters but mean the same thing. "how to install" and "installation guide" describe identical intent but score near zero similarity in fuse.js regardless of configuration.

The concrete failure mode

Here's the same search query against a documentation site, run through fuse.js and altor-vec:

// Query: "how do I reduce memory usage"
// Document: "Optimizing heap allocation for large HNSW indexes"

// fuse.js result: score 0.0 (no character overlap)
// altor-vec result: cosine similarity 0.87 (semantic match)

// Query: "search without internet"
// Document: "Offline-first retrieval with service workers"

// fuse.js result: score 0.1 (only "search" overlaps loosely)
// altor-vec result: cosine similarity 0.91 (semantic match)

The fuse.js threshold you'd need to catch these ("reduce memory" ↔ "optimizing heap allocation") would also produce so many false positives that the results become noise.

When fuse.js is actually the right choice

fuse.js isn't wrong — it's solving a different problem. Keep it when:

Short string matching where typos matter

User types "Johnsn" and should match "Johnson." User types "javascrpt" and should match "JavaScript." This is fuse.js's home territory. Vector search doesn't help here because typos don't affect semantic meaning but do affect character distance — and character distance is exactly what you want to measure.

Exact identifier search

Product codes (SKU-4821), error codes (E_TIMEOUT), route names (/api/v2/users). Users expect an exact or near-exact match. Semantic search would surface unrelated results that happen to be "conceptually similar" to an error code, which is unhelpful.

Datasets under 500 items

fuse.js is 5KB. altor-vec is 54KB WASM plus your index file. For a dropdown that filters 50 team members, the overhead of vector search isn't justified. The relevance difference is also minimal — at small scale and with short strings, fuse.js works well enough.

Rule of thumb: if your users type exact terms and your items are short strings, use fuse.js. If your users type questions and your items are paragraphs, use vector search.

When to replace fuse.js with altor-vec

The switch is worth making when your search targets paragraph-length content and users phrase queries naturally rather than typing exact keywords.

Documentation search

Users ask "how do I configure rate limiting" and your docs say "request throttling configuration." fuse.js returns nothing. altor-vec returns the correct page because throttling and rate limiting share the same semantic space.

Help center and support content

Support articles use formal language. Users describe their problem in their own words. "My bill went up" should match "unexpected charges" and "pricing change." The vocabulary gap between user language and content language is a core problem that only semantic search solves.

Blog and article search

Technical articles use precise terminology. Users may not know the exact term. "fast approximate search" should match "HNSW algorithm" because that's what fast approximate vector search is. Character matching won't bridge that gap.

Multilingual user bases

If your users search in Spanish, French, or Hindi and your content is in English, multilingual embedding models (like paraphrase-multilingual-MiniLM-L12-v2) produce vectors in a shared space. A Spanish query can match an English document because the vectors are close regardless of language. fuse.js has no equivalent capability.

Side-by-side code comparison

// fuse.js — fuzzy matching
import Fuse from 'fuse.js';

const docs = [
  { id: 1, title: 'Optimizing heap allocation', body: 'Large HNSW indexes...' },
  { id: 2, title: 'Service worker caching', body: 'Offline-first retrieval...' },
];

const fuse = new Fuse(docs, {
  keys: ['title', 'body'],
  threshold: 0.4,
  includeScore: true,
});

const results = fuse.search('reduce memory usage');
// Returns [] — no character overlap with "heap allocation"
// altor-vec — vector search (after index built offline)
import init, { WasmSearchEngine } from 'altor-vec';

await init();
const indexBuf = await fetch('/search-index.bin').then(r => r.arrayBuffer());
const metadata = await fetch('/search-metadata.json').then(r => r.json());
const engine = new WasmSearchEngine(new Uint8Array(indexBuf));

// queryVec comes from your embedding model
const hits = JSON.parse(engine.search(new Float32Array(queryVec), 5));
const results = hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist }));
// Returns [{ title: 'Optimizing heap allocation', score: 0.87 }]

Migration from fuse.js to altor-vec

Step 1: Install

npm install altor-vec @huggingface/transformers

Step 2: Build the index offline

Create a Node script that reads your content and generates a binary index. Run it as a build step.

// scripts/build-index.mjs
import fs from 'node:fs/promises';
import { pipeline } from '@huggingface/transformers';
import init, { WasmSearchEngine } from 'altor-vec';

await init();
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

// Replace with however you load your content
const docs = JSON.parse(await fs.readFile('./src/data/docs.json', 'utf8'));

const vectors = [];
for (const doc of docs) {
  const out = await embed(doc.title + '\n' + doc.body, {
    pooling: 'mean',
    normalize: true,
  });
  vectors.push(...Array.from(out.data));
}

const dim = 384;
const engine = WasmSearchEngine.from_vectors(new Float32Array(vectors), dim, 16, 200, 50);
await fs.writeFile('./public/search-index.bin', Buffer.from(engine.to_bytes()));
await fs.writeFile('./public/search-metadata.json', JSON.stringify(docs.map(d => ({
  id: d.id, title: d.title, excerpt: d.body.slice(0, 200), url: d.url
}))));

Step 3: Replace the search call in your component

// Before: fuse.js
const results = fuse.search(query).map(r => r.item);

// After: altor-vec
async function search(query) {
  const embedding = await embed(query, { pooling: 'mean', normalize: true });
  const hits = JSON.parse(engine.search(new Float32Array(embedding.data), 5));
  return hits.map(([id, dist]) => ({ ...metadata[id], score: 1 - dist }));
}

Hybrid approach: run both

The cleanest solution for many apps is keeping fuse.js for short string / identifier matching and adding altor-vec for semantic content search. Run both and show results from each source with a label:

async function hybridSearch(query) {
  // Run in parallel
  const [exactHits, semanticHits] = await Promise.all([
    Promise.resolve(fuse.search(query, { limit: 3 }).map(r => ({
      ...r.item,
      source: 'exact',
    }))),
    search(query).then(hits => hits.map(h => ({ ...h, source: 'semantic' }))),
  ]);

  // Deduplicate by id, prefer exact matches
  const seen = new Set();
  const merged = [];
  for (const hit of [...exactHits, ...semanticHits]) {
    if (!seen.has(hit.id)) {
      seen.add(hit.id);
      merged.push(hit);
    }
  }
  return merged.slice(0, 8);
}

This gives users exact match behavior (typing "SKU-4821" returns that exact item) alongside semantic behavior (typing "how do I configure billing" returns relevant docs).

Bundle size and performance reality

LibraryBundle (gzipped)Index size (1K docs)Query latency (1K docs)Semantic search
fuse.js5KBNone (in-memory)<1msNo
altor-vec54KB WASM~1.5MB (384d)<0.5msYes
MiniSearch22KB~0.5MB<1msNo

The index file is the real cost of altor-vec, not the WASM. At 1,000 documents with 384-dimension embeddings, expect roughly 1.5MB. Cache it aggressively with a long-lived cache header and it downloads once per user session.

FAQ

Is altor-vec faster than fuse.js?

For query execution on datasets over 1,000 items, yes. altor-vec HNSW lookup is sub-millisecond regardless of index size. fuse.js is O(n) and adds latency as your dataset grows. For tiny datasets under 500 items, fuse.js is effectively instant and the WASM initialization overhead tips the balance.

Can I use fuse.js and altor-vec together?

Yes. Run fuse.js for short string / identifier matching and altor-vec for semantic content search. Merge results in your component with deduplication. This is often the right default for product apps that have both catalogs (fuse.js) and documentation (altor-vec).

What is the minimum dataset size where altor-vec makes sense?

Around 200-500 paragraph-length documents. Below that threshold, fuse.js with a generous threshold captures enough relevant results that the added complexity of vector search isn't justified. The clearest signal to switch: your users are typing natural language queries and getting zero results.

Add semantic search: npm install altor-vec · GitHub