Next.js guide

Chat Memory in Next.js with altor-vec

Use altor-vec to add chat memory to your Next.js app — entirely in the browser, with no server, no API keys, and zero per-query cost. Store conversation history as vector embeddings and retrieve the most semantically relevant past messages as context for each new turn — giving your chatbot long-term, topic-aware memory without a server.

Install: npm install altor-vec @xenova/transformers

Implementation

Uses App Router with 'use client' directive. Uses useRef for the engine, useState for results.

// Next.js — persistent chat memory with altor-vec (App Router)
// app/chat/page.tsx
'use client';
import { useState, useEffect, useRef } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
import { pipeline } from '@xenova/transformers';

type Message = { role: 'user'|'assistant'; content: string };

export default function ChatPage() {
  const engine = useRef(null);
  const embedder = useRef(null);
  const allMessages = useRef([]);
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');

  useEffect(() => {
    (async () => {
      await init();
      embedder.current = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
      // Restore from localStorage if available
      const saved = localStorage.getItem('chat-index');
      if (saved) {
        engine.current = WasmSearchEngine.from_json(saved);
        allMessages.current = JSON.parse(localStorage.getItem('chat-messages') ?? '[]');
        setMessages(allMessages.current.slice(-10));
      } else {
        engine.current = WasmSearchEngine.from_vectors(new Float32Array(0), 384, 16, 200, 50);
      }
    })();
  }, []);

  async function sendMessage() {
    if (!engine.current || !input.trim()) return;
    const userMsg: Message = { role: 'user', content: input };
    setInput('');

    // Retrieve relevant past context
    const qOut = await embedder.current(input, { pooling: 'mean', normalize: true });
    const hits = JSON.parse(engine.current.search(new Float32Array(qOut.data), 3));
    const context = hits.map((h: any) => allMessages.current[h.id]?.content ?? '');

    // Store user message in memory
    const uOut = await embedder.current(input, { pooling: 'mean', normalize: true });
    engine.current.add(allMessages.current.length, new Float32Array(uOut.data), userMsg);
    allMessages.current.push(userMsg);
    setMessages(prev => [...prev, userMsg]);

    // Call LLM with context
    const resp = await fetch('/api/chat', { method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: input, memory: context }) });
    const { reply } = await resp.json();

    const botMsg: Message = { role: 'assistant', content: reply };
    const bOut = await embedder.current(reply, { pooling: 'mean', normalize: true });
    engine.current.add(allMessages.current.length, new Float32Array(bOut.data), botMsg);
    allMessages.current.push(botMsg);
    setMessages(prev => [...prev, botMsg]);

    // Persist
    localStorage.setItem('chat-index', engine.current.to_json());
    localStorage.setItem('chat-messages', JSON.stringify(allMessages.current));
  }

  return (
    
{messages.map((m, i) =>

{m.role}: {m.content}

)}
setInput(e.target.value)} onKeyDown={e => e.key === 'Enter' && sendMessage()} />
); }

Performance

10K message turns at 384 dimensions: ~17MB, <1ms retrieval. Sufficient for months of conversation history. Measured on M2 MacBook Pro, Chrome 124. Mobile is typically 2–4× slower — test on target devices before deploying.

Index sizeDimensionsQuery p50Memory
1,000 vectors384~0.1ms~2MB
10,000 vectors384~0.4ms~17MB
50,000 vectors384~0.9ms~85MB

When this approach works best

Limitations

Frequently asked questions

How do I persist chat memory across browser sessions?

Call engine.to_json() and store the result in localStorage (small memory) or IndexedDB (large memory). On next session, restore with WasmSearchEngine.from_json(). Also persist your messages array to reconstruct the full conversation.

How many turns of conversation history can I store?

altor-vec handles up to ~100K vectors. For chat memory, each turn is one vector — you can store 100K message turns before hitting browser memory limits. In practice, 1,000–10,000 turns is sufficient for most applications.

Should I embed each message separately or chunk multiple messages together?

Embed each message turn separately for retrieval. Use a sliding window of recent turns as context for the LLM (last 5-10 turns by recency), plus the top-k semantically similar historical turns retrieved by altor-vec.

Related resources

framework

use case

reference