Can vector search run without a server?

Yes. You can load a static HNSW index in the browser and execute nearest-neighbor queries locally.

What do you still need a backend for?

You may still need backend jobs for indexing pipelines, access control, analytics, or very large datasets.

Is browser retrieval private by default?

Retrieval is local, so query text does not leave the device unless your app sends telemetry.

How does latency compare with API-based search?

Local ANN lookup is often sub-millisecond, while network API calls add tens to hundreds of milliseconds.

vector search without server

Vector Search Without a Backend - Is It Possible?

Short answer: yes, for a large class of products. Long answer: it depends on dataset size, update frequency, and your control-plane needs. For developer docs, app command palettes, local knowledge bases, onboarding assistants, and many internal tools, browser-native retrieval is not just feasible - it can be the better default. You avoid per-query infrastructure, reduce privacy risk surface, and remove network latency from the critical path. This article explains when serverless vector search works, why HNSW is the right primitive in browser environments, and where the boundaries are.

Install altor-vec: npm install altor-vec

Why teams want serverless retrieval

Most projects start with “just call a vector API,” and that works early. But over time, developers notice three operational pressures. First is unpredictable cost: as usage scales, per-query billing accumulates even when retrieval logic is simple. Second is privacy/legal complexity: sending every user query to remote services introduces obligations for storage, retention, consent, and security review. Third is latency variance: geographically distributed users can feel inconsistent response times due to network conditions. Running search locally removes these constraints from the hot path.

In practical terms, local retrieval makes performance deterministic because query execution happens close to user input. It also enables offline or poor-network behavior. Instead of waiting for degraded connectivity, the app still returns ranked results from the local index. That directly improves UX in mobile and enterprise environments where network quality is uneven.

How the architecture works

You do not need to generate vectors online. A common pattern is:

Offline pipeline chunks content and computes embeddings.
Offline build creates HNSW index and metadata payload.
Deploy both assets to CDN/static hosting.
Browser downloads assets once and queries locally.

// Runtime path (browser)
await init();
const bytes = new Uint8Array(await (await fetch('/index.bin')).arrayBuffer());
const engine = new WasmSearchEngine(bytes);

function retrieve(queryVec, topK = 8) {
  const hits = JSON.parse(engine.search(new Float32Array(queryVec), topK));
  return hits; // [[id, distance], ...]
}

Because the index is immutable between deploys, caching is straightforward. Use content-hashed filenames for long-term cacheability and invalidate only when you publish a new build.

HNSW basics relevant to browser execution

HNSW (Hierarchical Navigable Small World) builds a layered proximity graph. Top layers are sparse, enabling quick long-range navigation; lower layers become denser and refine nearest neighbors. Querying uses a greedy walk plus candidate expansion, which avoids exhaustive scan over every vector. This is exactly why HNSW is attractive for client-side usage: high recall with low query cost on moderate corpora.

Two practical knobs matter most for web apps:

M: graph connectivity per node. Higher M improves recall but increases memory.
ef_search: candidate list size at query time. Higher ef_search increases recall and latency.

Start conservative and tune from measured quality, not assumptions. For docs search, user trust often depends more on top-3 ranking than top-20 completeness.

Benchmark framing: browser vs server-side retrieval

The table below is illustrative, not universal. Real numbers depend on hardware, model dimensions, and dataset shape. But the latency decomposition is representative.

Path	Search Compute	Network	Total p95
Browser HNSW (altor-vec)	< 1 ms	0 ms	~0.5-3 ms*
Server ANN API (same region)	3-15 ms	25-70 ms	30-100 ms
Server ANN API (cross-region/mobile)	3-15 ms	80-250 ms	90-280 ms

*Excludes query embedding generation; embedding can be local, remote, or precomputed depending on app behavior.

Use cases where serverless wins

Offline-capable products

PWAs, field apps, and travel scenarios benefit from local retrieval because users still get results in airplane mode or constrained networks.

Privacy-sensitive search

If query text can contain internal project names, personal notes, or regulated terms, keeping retrieval local minimizes data egress risk and governance burden.

Cost-sensitive high-query surfaces

Command palettes, typeahead components, and frequently accessed docs can generate massive query counts. Removing per-query backend calls stabilizes spend.

When serverless is not enough

There are clear limits. If your corpus is extremely large, changes continuously, or requires tenant-level access controls at query time, a backend may still be necessary. The same is true if you need cross-user analytics tied to retrieval events, or strict legal audit trails for query processing. In these cases, hybrid architecture works well: keep hot, user-local subsets in browser and fallback to server for long-tail/global retrieval.

Implementation checklist

// 1) Build-time pipeline
// - chunk documents
// - create embeddings
// - build HNSW index
// - publish index.bin + metadata.json

// 2) Runtime pipeline
// - preload index after user intent signal
// - compute query embedding (worker)
// - run engine.search(queryVector, topK)
// - map IDs to metadata and render

// 3) Reliability
// - keep lexical fallback for exact tokens
// - monitor top-k relevance with benchmark set
// - version index artifacts for cache safety

Security and privacy notes

Local retrieval does not automatically make your app private. You still must review analytics events, third-party scripts, and logging behavior that may capture raw queries. Use explicit redaction and disable query logging where possible. Also, if you ship proprietary documents in browser-readable metadata, ensure this aligns with your product’s access model. Browser search is powerful, but “what gets downloaded” remains a security boundary question.

Conclusion

Vector search without a backend is not a gimmick; it is an architecture option with clear performance and cost advantages for specific workloads. The key is scope discipline: define corpus size, update cadence, and privacy constraints up front, then choose local-only or hybrid accordingly. With altor-vec, implementing browser retrieval is straightforward, and HNSW gives you a proven ANN strategy that fits the constraints of frontend applications.

CTA: npm install altor-vec · Star on GitHub

Vector Search Without a Backend - Is It Possible?

Why teams want serverless retrieval

How the architecture works

HNSW basics relevant to browser execution

Benchmark framing: browser vs server-side retrieval

Use cases where serverless wins

Offline-capable products

Privacy-sensitive search

Cost-sensitive high-query surfaces

When serverless is not enough

Implementation checklist

Security and privacy notes

Conclusion

Related articles