Node.js guide

Chat Memory in Node.js with altor-vec

Use altor-vec to add chat memory to your Node.js app — entirely in the browser, with no server, no API keys, and zero per-query cost. Store conversation history as vector embeddings and retrieve the most semantically relevant past messages as context for each new turn — giving your chatbot long-term, topic-aware memory without a server.

Install: npm install altor-vec @xenova/transformers

Implementation

Server-side indexing script (Node 18+, ESM). Uses module-level variable for the engine.

// chat-memory-server.mjs — Node.js: server-side chat memory with altor-vec
// Use case: multi-user app where each user has a persistent memory index
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';
import { readFileSync, writeFileSync, existsSync } from 'fs';

await init();
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const DIM = 384;

// Per-user memory store (in production: use a database)
const userMemories = new Map(); // userId -> { engine, messages }

async function getOrCreateMemory(userId) {
  if (userMemories.has(userId)) return userMemories.get(userId);
  const indexPath = \`./data/memory-\${userId}.json\`;
  const msgsPath = \`./data/messages-\${userId}.json\`;
  let engine, messages = [], nextId = 0;
  if (existsSync(indexPath)) {
    engine = WasmSearchEngine.from_json(readFileSync(indexPath, 'utf8'));
    messages = JSON.parse(readFileSync(msgsPath, 'utf8'));
    nextId = messages.length;
  } else {
    engine = WasmSearchEngine.from_vectors(new Float32Array(0), DIM, 16, 200, 50);
  }
  const memory = { engine, messages, nextId };
  userMemories.set(userId, memory);
  return memory;
}

async function addMessage(userId, role, content) {
  const mem = await getOrCreateMemory(userId);
  const out = await embedder(content, { pooling: 'mean', normalize: true });
  mem.engine.add(mem.nextId++, new Float32Array(out.data), { role, content });
  mem.messages.push({ role, content, ts: Date.now() });
}

async function getRelevantContext(userId, query, k = 5) {
  const mem = await getOrCreateMemory(userId);
  if (mem.messages.length === 0) return [];
  const out = await embedder(query, { pooling: 'mean', normalize: true });
  const hits = JSON.parse(mem.engine.search(new Float32Array(out.data), k));
  return hits.map(h => mem.messages[h.id]);
}

function saveMemory(userId) {
  const mem = userMemories.get(userId);
  if (!mem) return;
  writeFileSync(\`./data/memory-\${userId}.json\`, mem.engine.to_json());
  writeFileSync(\`./data/messages-\${userId}.json\`, JSON.stringify(mem.messages));
}

export { addMessage, getRelevantContext, saveMemory };

Performance

10K message turns at 384 dimensions: ~17MB, <1ms retrieval. Sufficient for months of conversation history. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.

Index sizeDimensionsQuery p50Memory
1,000 vectors384~0.1ms~2MB
10,000 vectors384~0.4ms~17MB
50,000 vectors384~0.9ms~85MB

When this approach works best

Limitations

Frequently asked questions

How do I persist chat memory across browser sessions?

Call engine.to_json() and store the result in localStorage (small memory) or IndexedDB (large memory). On next session, restore with WasmSearchEngine.from_json(). Also persist your messages array to reconstruct the full conversation.

How many turns of conversation history can I store?

altor-vec handles up to ~100K vectors. For chat memory, each turn is one vector — you can store 100K message turns before hitting browser memory limits. In practice, 1,000–10,000 turns is sufficient for most applications.

Should I embed each message separately or chunk multiple messages together?

Embed each message turn separately for retrieval. Use a sliding window of recent turns as context for the LLM (last 5-10 turns by recency), plus the top-k semantically similar historical turns retrieved by altor-vec.

Related resources

framework

use case

reference