benchmark comparison

altor-vec vs Pinecone

Q: How does browser WASM performance compare to native C++ for vector search?

Browser WASM is typically 3-5x slower than native C++ for HNSW search. However, it eliminates the network round-trip entirely. Even at 3x overhead, 0.4ms WASM beats 20-150ms cloud API latency by 50-375x for read-only workloads.

Managed vector database versus browser-native retrieval.

This comparison is intentionally not framed as a universal winner. Pinecone is infrastructure. altor-vec is a client-side search primitive. The decision starts with where the corpus should live and who should pay the latency cost.

These numbers are representative, not universal. Bundle size, query latency, and memory usage all vary with vector dimensions, index parameters, browser runtime, hardware, and whether embeddings are generated on device or ahead of time.

Comparison table

Category	altor-vec	Pinecone
Runtime model	Browser WebAssembly HNSW running entirely on the client.	Managed vector database accessed over an API.
Bundle size / delivery	~54KB gzipped library payload plus your vector asset.	No client search bundle, but every query depends on a backend call.
Query latency	~0.4ms p50 local ANN lookup on a 10K / 384d benchmark excluding embedding generation.	Usually tens of milliseconds plus network roundtrip, which is acceptable for backend retrieval but not as snappy for keystroke UX.
Memory usage	Browser memory scales with the shipped corpus; roughly ~17MB for a 10K / 384d representative index.	Server-side memory and storage, with almost nothing held in the browser.
Features	Approximate nearest-neighbor search, serialization, local-first delivery, no hosted ops.	Metadata filtering, namespaces, scaling, backups, observability, and hosted operations.
Dataset sweet spot	Best for moderate corpora that are safe to ship to the user.	Best for large, private, multi-tenant, or frequently updated corpora.
Pricing model	Free. MIT license. No API key. No per-query cost.	Serverless: $0.096 per 1M reads. Pods: from $70/month. Scales with usage.

Where altor-vec wins

Instant local UX without an API dependency.
Zero per-query infrastructure cost after shipping the asset.
Good fit for offline or privacy-sensitive search experiences.
Eliminates the 20-150ms network round-trip on every query.
No API key management, rate limits, or vendor lock-in.

Where Pinecone wins

Large private datasets and frequent writes.
Operational controls, filtering, and hosted scaling.
Clearer fit for multi-user backend retrieval and production RAG infrastructure.
Metadata filtering and namespace isolation for multi-tenant products.
Automatic scaling to billions of vectors without client memory constraints.

Honest decision guide

Choose Pinecone when search is part of your backend platform. Choose altor-vec when search is a frontend capability and the corpus is intentionally shipped to the browser.

The honest pattern across all of these benchmark pages is simple: if the search corpus should stay on the server, choose server-oriented infrastructure. If the search corpus is intentionally shipped with the product and the UX benefit of local retrieval matters more than backend scale, altor-vec is usually the more natural fit.

For many documentation sites, product catalogs, and embedded help centers, the corpus is fewer than 10,000 documents and is safe to ship to the browser. At that scale, altor-vec eliminates an entire category of infrastructure: no Pinecone account, no API key rotation, no egress costs, no cold-start latency, and no single point of failure between your user and their search results.

Benchmark methodology

These benchmarks measure query latency for approximate nearest-neighbor search in a controlled browser environment. All altor-vec measurements run in Chrome 124 on M2 MacBook Pro, using a pre-built HNSW index loaded from JSON. No embedding generation time is included — we measure pure retrieval latency.

Test configuration

Parameter	Value
Index size	10,000 vectors
Vector dimensions	384 (all-MiniLM-L6-v2 output)
HNSW M	16
ef_construction	200
ef_search	50
k (neighbors returned)	5
Browser	Chrome 124, M2 MacBook Pro
Measurement	p50 and p95 of 1,000 consecutive queries

altor-vec latency results

Metric	Result
p50 query latency	0.4ms
p95 query latency	0.8ms
p99 query latency	1.2ms
Index load time (10K vectors)	~35ms (JSON parse + WASM init)
Index memory footprint	~17MB (10K × 384d)
WASM bundle size	54KB gzipped

What these numbers mean for your app

A p50 of 0.4ms and p95 of 0.8ms means that for a typical 10,000-document index, search is effectively instant from the user's perspective. Human perception of "instantaneous" begins around 100ms. At sub-millisecond latency, the bottleneck is rendering results, not computing them.

For comparison, network-dependent search (any cloud API) adds a baseline of 20–150ms for the round-trip, before the server executes its own query. At 100ms total, a cloud search query takes 125× longer than an altor-vec local query at p95. Whether that matters depends entirely on your product — for autocomplete-as-you-type, the difference is significant; for triggered search (user presses Enter), it is less critical.

The 17MB memory footprint for 10K vectors at 384 dimensions fits comfortably in modern browser memory budgets. Most consumer devices have 4-8GB of RAM available to browser tabs. Practical limits are typically higher than 10K vectors for most documentation and product catalog use cases. For 100K vectors at 384 dimensions, expect approximately 170MB — viable for desktop but worth testing on mobile.

Running your own benchmark

import init, { WasmSearchEngine } from 'altor-vec';
await init();

// Build index
const vectors = new Float32Array(N * DIM); // your embeddings
const engine = WasmSearchEngine.from_vectors(vectors, DIM, 16, 200, 50);

// Benchmark query latency
const query = new Float32Array(DIM); // your query embedding
const iterations = 1000;
const times = [];

for (let i = 0; i < iterations; i++) {
  const start = performance.now();
  engine.search(query, 5);
  times.push(performance.now() - start);
}

times.sort((a, b) => a - b);
console.log('p50:', times[Math.floor(iterations * 0.5)].toFixed(2) + 'ms');
console.log('p95:', times[Math.floor(iterations * 0.95)].toFixed(2) + 'ms');
console.log('p99:', times[Math.floor(iterations * 0.99)].toFixed(2) + 'ms');

Run this in your browser console against your own index to get accurate numbers for your specific hardware, vector dimensions, and index size.

FAQ

Is altor-vec a Pinecone replacement?

Not broadly. They solve different layers of the stack. altor-vec handles local retrieval; Pinecone handles hosted vector infrastructure. Use altor-vec when the corpus belongs in the browser; use Pinecone when the corpus should stay on the server and be centrally managed.

When does Pinecone clearly win?

When the corpus is large, private, access-controlled, or updated continuously. Pinecone also wins when you need metadata filtering, namespace isolation for multiple users, or operational observability across a shared dataset.

When does altor-vec clearly win?

When the search experience itself belongs in the browser and local latency matters more than backend scale. Documentation sites, product catalogs, embedded help centers, and offline-first apps are ideal use cases where the corpus is safe to ship and smaller than 50,000 vectors.

How was this benchmark run?

All altor-vec measurements run in Chrome 124 on M2 MacBook Pro using a pre-built HNSW index with 10,000 vectors at 384 dimensions. Latency is the p50 and p95 of 1,000 consecutive queries using performance.now(). No embedding generation time is included — we measure pure retrieval latency from Float32Array query to result array.

Does altor-vec performance degrade with more vectors?

Yes, but gradually. HNSW scales logarithmically, so doubling the index size increases query latency by roughly 10–15%, not 2×. A 100K-vector index at 384 dimensions delivers approximately 1.2ms p50 versus 0.4ms for 10K vectors. The memory footprint scales linearly: expect ~170MB for 100K × 384d.

How does browser WASM performance compare to native C++ for vector search?

Browser WASM is typically 3–5× slower than native C++ for HNSW search due to the WebAssembly execution overhead and lack of SIMD in older builds. However, it eliminates the network round-trip entirely. Even at 5× overhead, 2ms WASM beats 20–150ms cloud API latency by 10–75× for read-only workloads on static corpora.

Get started: npm install altor-vec · GitHub