Skip to main content

Architecture

This page explains the internal architecture of Kiro Memory, covering data flow, the hooks system, search and ranking algorithms, storage design, and security.

System Overview

┌──────────────────────────────────────────────────────────┐
│ AI Coding Editor │
│ (Claude Code / Cursor / Windsurf / Cline) │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────────┐ │
│ │ Hooks / │ │ MCP Server │ │
│ │ Rules │ │ (10 tools) │ │
│ │ │ │ │ │
│ │ Capture: │ │ Search, Store, │ │
│ │ - Files │ │ Resume, Report │ │
│ │ - Commands │ │ │ │
│ │ - Decisions │ │ │ │
│ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │
└──────────┼───────────────────────────────┼───────────────┘
│ HTTP │
v v
┌─────────────────────────────────────────────┐
│ Kiro Memory Worker │
│ (port 3001) │
│ │
│ ┌───────────┐ ┌────────────────────┐ │
│ │ REST API │ │ Web Dashboard │ │
│ │ Endpoints │ │ (SSE live feed) │ │
│ └─────┬──────┘ └────────────────────┘ │
│ │ │
│ ┌─────┴──────────────────────────────┐ │
│ │ Service Layer │ │
│ │ │ │
│ │ ┌──────────┐ ┌───────────────┐ │ │
│ │ │ SQLite │ │ Vector Search │ │ │
│ │ │ + FTS5 │ │ (Embeddings) │ │ │
│ │ └──────────┘ └───────────────┘ │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────────┘

v
~/.kiro-memory/kiro-memory.db

Hooks System

Claude Code Hooks

Claude Code provides lifecycle hooks that Kiro Memory uses to automatically capture and inject context. Four hooks are registered:

Session Start                              Session End
│ │
v v
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ PreTool │ │ PostTool │ │ Stop │
│ Use │ │ Use │ │ │
│ │ │ │ │ - Summary │
│ Context │ │ Capture: │ │ - Checkpoint│
│ inject │ │ - Writes │ │ │
└──────────┘ │ - Commands │ └──────────────┘
│ - Research │
└──────────────┘

┌──────────────┐
│ Notification │
│ │
│ Store user │
│ prompts │
└──────────────┘

Hook Details

PreToolUse (agentSpawn)

  • Triggers at session start
  • Retrieves smart context from the SDK using 4-signal ranking
  • Injects recent summaries and relevant observations into the session
  • Ensures the background worker is running (spawns it if needed)
  • Waits up to 3 seconds for worker availability
  • Gracefully degrades if the worker is unreachable

PostToolUse (postToolUse)

  • Triggers after every tool execution
  • Normalizes tool events from different editor sources
  • Classifies observations by type:
    • file-write -- File creation or modification
    • command -- Shell command execution
    • research -- Code search, grep, file reads
    • tool-use -- Other tool invocations
    • delegation -- Agent delegation events
  • Filters out uninformative tools (introspect, thinking, todo)
  • Tracks read-only tools (glob, grep, read) with file paths only
  • Caps content at 500 characters per field
  • Sends real-time notification to the dashboard

Notification (userPromptSubmit)

  • Triggers when the user submits a prompt
  • Stores the prompt text with session association
  • Enables prompt history for context reconstruction

Stop

  • Triggers when the agent completes its response
  • Retrieves up to 50 recent observations (by session or 4-hour window)
  • Generates a session summary containing:
    • Completed tasks (up to 10, from observation titles)
    • Modified files (deduplicated)
    • Learned insights (up to 5, from research observations)
  • Creates a structured checkpoint with task, progress, next steps, and relevant files (up to 20)
  • Sends real-time notifications for both summary and checkpoint creation

Rules-Based Editors (Cursor, Windsurf, Cline)

Editors without native hook support use rules/instructions files that direct the AI assistant to call MCP tools at appropriate moments:

  • Session start: Call get_context or resume_session
  • During session: Call store_observation and store_knowledge as needed
  • Session end: Call store_summary with learnings and next steps

Kiro Memory provides semantic search using locally-generated vector embeddings. No external API keys are required.

Embedding Pipeline

Observation Text

v
┌─────────────────┐
│ Embedding Model │ Local inference
│ (fastembed or │ No API keys
│ transformers) │
└────────┬─────────┘

v
384-dim vector

v
┌─────────────────┐
│ SQLite Storage │ Stored alongside
│ (embeddings col) │ observation data
└─────────────────┘

Embedding Providers

Kiro Memory supports two local embedding providers (listed as optional dependencies):

  1. fastembed -- Fast native embedding generation
  2. @huggingface/transformers -- HuggingFace transformer models

The embedding service is lazily initialized on first use and runs entirely locally.

Search Modes

ModeMethodHow It Works
Keywordsearch()SQLite FTS5 full-text search with BM25 ranking
SemanticsemanticSearch()Cosine similarity between query and observation embeddings
HybridhybridSearch()Combines FTS5 keyword scores with vector similarity scores

Hybrid search provides the best results by catching both exact keyword matches and semantically similar content.

4-Signal Ranking

The smart context system ranks every memory item using four signals to surface the most relevant context:

┌──────────────────────────────────────────────┐
│ Smart Ranking Score │
│ │
│ Score = w1 * Recency │
│ + w2 * Frequency │
│ + w3 * Semantic Similarity │
│ + w4 * Decay Penalty │
│ │
└──────────────────────────────────────────────┘

Signal Definitions

1. Recency

How recently the observation was created. Recent observations receive higher scores, with exponential decay over time.

recency_score = exp(-age_hours / half_life)

2. Frequency

How often the observation's associated files or concepts appear across the memory. Frequently referenced items are more likely to be relevant.

3. Semantic Similarity

When a query is provided (e.g., during getSmartContext with a query parameter), observations are scored by cosine similarity between their embedding vector and the query embedding.

4. Decay Penalty

Observations linked to files that have been modified since capture are penalized. This prevents stale information from surfacing. Observations that have never been accessed also receive a lower score over time.

Token Budget

The getSmartContext() method accepts a tokenBudget parameter (default: 2000 tokens) that limits how much context is injected. Items are ranked by their composite score and included until the budget is exhausted. This prevents context overflow in editors with limited prompt windows.

Storage Architecture

SQLite with FTS5

All data is stored in a single SQLite database at ~/.kiro-memory/kiro-memory.db. SQLite was chosen for:

  • Zero-configuration setup
  • Single-file portability
  • Excellent read performance for local workloads
  • FTS5 extension for full-text search

Database Schema

The database contains the following core tables:

┌─────────────────┐     ┌─────────────────┐
│ observations │ │ summaries │
│ │ │ │
│ id (PK) │ │ id (PK) │
│ type │ │ project │
│ title │ │ session_id │
│ content │ │ request │
│ project │ │ learned │
│ session_id │ │ completed │
│ concepts │ │ next_steps │
│ files │ │ created_at │
│ embedding │ └─────────────────┘
│ stale │
│ access_count │ ┌─────────────────┐
│ last_accessed │ │ checkpoints │
│ created_at │ │ │
└─────────────────┘ │ id (PK) │
│ session_id │
┌─────────────────┐ │ project │
│ sessions │ │ task │
│ │ │ progress │
│ id (PK) │ │ next_steps │
│ content_id │ │ questions │
│ project │ │ files │
│ status │ │ created_at │
│ created_at │ └─────────────────┘
│ completed_at │
└─────────────────┘ ┌─────────────────┐
│ user_prompts │
┌─────────────────┐ │ │
│ observations_fts│ │ id (PK) │
│ (FTS5 virtual) │ │ session_id │
│ │ │ prompt_number │
│ title │ │ text │
│ content │ │ created_at │
│ concepts │ └─────────────────┘
└─────────────────┘

FTS5 Virtual Tables

Full-text search is powered by SQLite FTS5 virtual tables that index observation titles, content, and concepts. FTS5 provides:

  • BM25 relevance ranking
  • Prefix queries
  • Boolean operators (AND, OR, NOT)
  • Phrase matching
  • Column-specific search

Vector Storage

Embedding vectors are stored directly in the observations table as binary blobs. This avoids the overhead of a separate vector database while still supporting cosine similarity search. The vector index is rebuilt in memory when the embedding service initializes.

Worker Architecture

Auto-Start Mechanism

The worker is a standalone Node.js process that runs in the background:

Editor Session Starts

v
Hook: agentSpawn

v
Is worker reachable? ──── Yes ───> Inject context

No

v
Spawn worker-service.js
(background process)

v
Wait up to 3s

v
Worker ready? ──── No ───> Graceful degradation
│ (continue without worker)
Yes

v
Inject context

Worker Responsibilities

The worker process handles:

  1. HTTP API -- RESTful endpoints for all CRUD operations
  2. Web Dashboard -- Serves the React-based dashboard UI
  3. SSE Streaming -- Real-time event streaming for the dashboard live feed
  4. Embedding Operations -- Lazy initialization and batch processing of vector embeddings
  5. Database Management -- Connection pooling and migration handling

API Endpoints

The worker exposes REST endpoints under http://127.0.0.1:3001/api/:

EndpointMethodDescription
/api/context/:projectGETProject context
/api/searchGETFull-text search
/api/hybrid-searchGETHybrid vector + keyword search
/api/timelineGETChronological timeline
/api/observations/batchPOSTBatch retrieve observations
/api/observationsPOSTStore observation
/api/summariesPOSTStore summary
/api/knowledgePOSTStore knowledge
/api/checkpointGETLatest project checkpoint
/api/sessions/:id/checkpointGETSession checkpoint
/api/reportGETActivity report
/api/embeddings/statsGETEmbedding statistics
/api/eventsGETSSE event stream

Security

Kiro Memory is designed for local development use but includes multiple security layers:

Token Authentication

API requests to the worker require a valid authentication token. The token is generated during installation and stored in the local configuration. Hooks and the MCP server include the token automatically.

Rate Limiting

The Express-based worker uses express-rate-limit to protect against excessive API calls. This prevents runaway loops in hooks from overwhelming the system.

Helmet

The Helmet middleware sets secure HTTP headers:

  • Content-Security-Policy -- Restricts resource loading
  • X-Content-Type-Options: nosniff -- Prevents MIME sniffing
  • X-Frame-Options: DENY -- Prevents clickjacking
  • Strict-Transport-Security -- Enforces HTTPS (when applicable)

CORS

Cross-Origin Resource Sharing is restricted to localhost origins. The web dashboard is served from the same origin as the API, and external requests are rejected.

Input Validation

All API inputs are validated using Zod schemas. Invalid payloads are rejected before reaching the database layer. Content fields are sanitized with DOMPurify to prevent XSS in the web dashboard.

Local-Only Design

By default, the worker binds to 127.0.0.1 (localhost only). All data remains on your machine. No telemetry, no cloud services, no external API calls. Embedding generation runs locally using bundled models.