React guide

Chat Memory in React with altor-vec

Use altor-vec to add chat memory to your React app — entirely in the browser, with no server, no API keys, and zero per-query cost. Store conversation history as vector embeddings and retrieve the most semantically relevant past messages as context for each new turn — giving your chatbot long-term, topic-aware memory without a server.

Install: npm install altor-vec @xenova/transformers

Implementation

Works with Vite, CRA, or any React 18+ setup. Uses useState + useRef for the engine.

// useChatMemory.ts — semantic long-term chat memory in React
import { useRef, useCallback } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

type Message = { role: 'user' | 'assistant'; content: string; timestamp: number };

export function useChatMemory() {
  const engine = useRef(null);
  const embedder = useRef(null);
  const messages = useRef([]);
  const nextId = useRef(0);

  const init_ = useCallback(async () => {
    await init();
    embedder.current = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
    // Start with empty index — add messages incrementally
    engine.current = WasmSearchEngine.from_vectors(new Float32Array(0), 384, 16, 200, 50);
  }, []);

  const addMessage = useCallback(async (msg: Omit) => {
    if (!engine.current || !embedder.current) return;
    const message = { ...msg, timestamp: Date.now() };
    const out = await embedder.current(msg.content, { pooling: 'mean', normalize: true });
    engine.current.add(nextId.current, new Float32Array(out.data), message);
    messages.current.push(message);
    nextId.current++;
  }, []);

  const getRelevantHistory = useCallback(async (query: string, k = 5): Promise => {
    if (!engine.current || !embedder.current) return [];
    const out = await embedder.current(query, { pooling: 'mean', normalize: true });
    const hits = JSON.parse(engine.current.search(new Float32Array(out.data), k));
    return hits.map((h: any) => messages.current[h.id]);
  }, []);

  const saveToStorage = useCallback(() => {
    if (!engine.current) return;
    localStorage.setItem('chat-memory-index', engine.current.to_json());
    localStorage.setItem('chat-memory-messages', JSON.stringify(messages.current));
  }, []);

  return { init_, addMessage, getRelevantHistory, saveToStorage };
}

Performance

10K message turns at 384 dimensions: ~17MB, <1ms retrieval. Sufficient for months of conversation history. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.

Index sizeDimensionsQuery p50Memory
1,000 vectors384~0.1ms~2MB
10,000 vectors384~0.4ms~17MB
50,000 vectors384~0.9ms~85MB

When this approach works best

Limitations

Frequently asked questions

How do I persist chat memory across browser sessions?

Call engine.to_json() and store the result in localStorage (small memory) or IndexedDB (large memory). On next session, restore with WasmSearchEngine.from_json(). Also persist your messages array to reconstruct the full conversation.

How many turns of conversation history can I store?

altor-vec handles up to ~100K vectors. For chat memory, each turn is one vector — you can store 100K message turns before hitting browser memory limits. In practice, 1,000–10,000 turns is sufficient for most applications.

Should I embed each message separately or chunk multiple messages together?

Embed each message turn separately for retrieval. Use a sliding window of recent turns as context for the LLM (last 5-10 turns by recency), plus the top-k semantically similar historical turns retrieved by altor-vec.

Related resources

framework

use case

reference