vector search without server
Vector Search Without a Backend — Is It Possible?
Short answer: yes, for a large class of products. Long answer: it depends on dataset size, update frequency, and your control-plane needs. For developer docs, app command palettes, local knowledge bases, onboarding assistants, and many internal tools, browser-native retrieval is not just feasible—it can be the better default. You avoid per-query infrastructure, reduce privacy risk surface, and remove network latency from the critical path. This article explains when serverless vector search works, why HNSW is the right primitive in browser environments, and where the boundaries are.
npm install altor-vecWhy teams want serverless retrieval
Most projects start with “just call a vector API,” and that works early. But over time, developers notice three operational pressures. First is unpredictable cost: as usage scales, per-query billing accumulates even when retrieval logic is simple. Second is privacy/legal complexity: sending every user query to remote services introduces obligations for storage, retention, consent, and security review. Third is latency variance: geographically distributed users can feel inconsistent response times due to network conditions. Running search locally removes these constraints from the hot path.
In practical terms, local retrieval makes performance deterministic because query execution happens close to user input. It also enables offline or poor-network behavior. Instead of waiting for degraded connectivity, the app still returns ranked results from the local index. That directly improves UX in mobile and enterprise environments where network quality is uneven.
How the architecture works
You do not need to generate vectors online. A common pattern is:
- Offline pipeline chunks content and computes embeddings.
- Offline build creates HNSW index and metadata payload.
- Deploy both assets to CDN/static hosting.
- Browser downloads assets once and queries locally.
// Runtime path (browser)
await init();
const bytes = new Uint8Array(await (await fetch('/index.bin')).arrayBuffer());
const engine = new WasmSearchEngine(bytes);
function retrieve(queryVec, topK = 8) {
const hits = JSON.parse(engine.search(new Float32Array(queryVec), topK));
return hits; // [[id, distance], ...]
}
Because the index is immutable between deploys, caching is straightforward. Use content-hashed filenames for long-term cacheability and invalidate only when you publish a new build.
HNSW basics relevant to browser execution
HNSW (Hierarchical Navigable Small World) builds a layered proximity graph. Top layers are sparse, enabling quick long-range navigation; lower layers become denser and refine nearest neighbors. Querying uses a greedy walk plus candidate expansion, which avoids exhaustive scan over every vector. This is exactly why HNSW is attractive for client-side usage: high recall with low query cost on moderate corpora.
Two practical knobs matter most for web apps:
- M: graph connectivity per node. Higher M improves recall but increases memory.
- ef_search: candidate list size at query time. Higher ef_search increases recall and latency.
Start conservative and tune from measured quality, not assumptions. For docs search, user trust often depends more on top-3 ranking than top-20 completeness.
Benchmark framing: browser vs server-side retrieval
The table below is illustrative, not universal. Real numbers depend on hardware, model dimensions, and dataset shape. But the latency decomposition is representative.
| Path | Search Compute | Network | Total p95 |
|---|---|---|---|
| Browser HNSW (altor-vec) | < 1 ms | 0 ms | ~0.5–3 ms* |
| Server ANN API (same region) | 3–15 ms | 25–70 ms | 30–100 ms |
| Server ANN API (cross-region/mobile) | 3–15 ms | 80–250 ms | 90–280 ms |
*Excludes query embedding generation; embedding can be local, remote, or precomputed depending on app behavior.
Use cases where serverless wins
Offline-capable products
PWAs, field apps, and travel scenarios benefit from local retrieval because users still get results in airplane mode or constrained networks.
Privacy-sensitive search
If query text can contain internal project names, personal notes, or regulated terms, keeping retrieval local minimizes data egress risk and governance burden.
Cost-sensitive high-query surfaces
Command palettes, typeahead components, and frequently accessed docs can generate massive query counts. Removing per-query backend calls stabilizes spend.
When serverless is not enough
There are clear limits. If your corpus is extremely large, changes continuously, or requires tenant-level access controls at query time, a backend may still be necessary. The same is true if you need cross-user analytics tied to retrieval events, or strict legal audit trails for query processing. In these cases, hybrid architecture works well: keep hot, user-local subsets in browser and fallback to server for long-tail/global retrieval.
Implementation checklist
// 1) Build-time pipeline
// - chunk documents
// - create embeddings
// - build HNSW index
// - publish index.bin + metadata.json
// 2) Runtime pipeline
// - preload index after user intent signal
// - compute query embedding (worker)
// - run engine.search(queryVector, topK)
// - map IDs to metadata and render
// 3) Reliability
// - keep lexical fallback for exact tokens
// - monitor top-k relevance with benchmark set
// - version index artifacts for cache safety
Security and privacy notes
Local retrieval does not automatically make your app private. You still must review analytics events, third-party scripts, and logging behavior that may capture raw queries. Use explicit redaction and disable query logging where possible. Also, if you ship proprietary documents in browser-readable metadata, ensure this aligns with your product’s access model. Browser search is powerful, but “what gets downloaded” remains a security boundary question.
Conclusion
Vector search without a backend is not a gimmick; it is an architecture option with clear performance and cost advantages for specific workloads. The key is scope discipline: define corpus size, update cadence, and privacy constraints up front, then choose local-only or hybrid accordingly. With altor-vec, implementing browser retrieval is straightforward, and HNSW gives you a proven ANN strategy that fits the constraints of frontend applications.