Architecture

This page explains the internal architecture of Kiro Memory, covering data flow, the hooks system, search and ranking algorithms, storage design, and security.

System Overview

┌──────────────────────────────────────────────────────────┐
│                    AI Coding Editor                       │
│         (Claude Code / Cursor / Windsurf / Cline)        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│   ┌─────────────┐              ┌──────────────────┐     │
│   │   Hooks /    │              │   MCP Server     │     │
│   │   Rules      │              │   (10 tools)     │     │
│   │              │              │                  │     │
│   │  Capture:    │              │  Search, Store,  │     │
│   │  - Files     │              │  Resume, Report  │     │
│   │  - Commands  │              │                  │     │
│   │  - Decisions │              │                  │     │
│   └──────┬───────┘              └────────┬─────────┘     │
│          │                               │               │
└──────────┼───────────────────────────────┼───────────────┘
           │            HTTP               │
           v                               v
     ┌─────────────────────────────────────────────┐
     │           Kiro Memory Worker                │
     │           (port 3001)                       │
     │                                             │
     │   ┌───────────┐  ┌────────────────────┐    │
     │   │  REST API  │  │  Web Dashboard     │    │
     │   │  Endpoints │  │  (SSE live feed)   │    │
     │   └─────┬──────┘  └────────────────────┘    │
     │         │                                    │
     │   ┌─────┴──────────────────────────────┐    │
     │   │         Service Layer              │    │
     │   │                                    │    │
     │   │  ┌──────────┐  ┌───────────────┐  │    │
     │   │  │  SQLite   │  │ Vector Search │  │    │
     │   │  │  + FTS5   │  │ (Embeddings)  │  │    │
     │   │  └──────────┘  └───────────────┘  │    │
     │   └────────────────────────────────────┘    │
     └─────────────────────────────────────────────┘
                        │
                        v
              ~/.kiro-memory/kiro-memory.db

Hooks System

Claude Code Hooks

Claude Code provides lifecycle hooks that Kiro Memory uses to automatically capture and inject context. Four hooks are registered:

Session Start                              Session End
     │                                          │
     v                                          v
 ┌──────────┐    ┌──────────────┐    ┌──────────────┐
 │ PreTool  │    │  PostTool    │    │    Stop      │
 │ Use      │    │  Use         │    │              │
 │          │    │              │    │  - Summary   │
 │ Context  │    │  Capture:    │    │  - Checkpoint│
 │ inject   │    │  - Writes    │    │              │
 └──────────┘    │  - Commands  │    └──────────────┘
                 │  - Research  │
                 └──────────────┘
                        │
                 ┌──────────────┐
                 │ Notification │
                 │              │
                 │ Store user   │
                 │ prompts      │
                 └──────────────┘

Hook Details

PreToolUse (agentSpawn)

Triggers at session start
Retrieves smart context from the SDK using 4-signal ranking
Injects recent summaries and relevant observations into the session
Ensures the background worker is running (spawns it if needed)
Waits up to 3 seconds for worker availability
Gracefully degrades if the worker is unreachable

PostToolUse (postToolUse)

Triggers after every tool execution
Normalizes tool events from different editor sources
Classifies observations by type:
- file-write -- File creation or modification
- command -- Shell command execution
- research -- Code search, grep, file reads
- tool-use -- Other tool invocations
- delegation -- Agent delegation events
Filters out uninformative tools (introspect, thinking, todo)
Tracks read-only tools (glob, grep, read) with file paths only
Caps content at 500 characters per field
Sends real-time notification to the dashboard

Notification (userPromptSubmit)

Triggers when the user submits a prompt
Stores the prompt text with session association
Enables prompt history for context reconstruction

Stop

Triggers when the agent completes its response
Retrieves up to 50 recent observations (by session or 4-hour window)
Generates a session summary containing:
- Completed tasks (up to 10, from observation titles)
- Modified files (deduplicated)
- Learned insights (up to 5, from research observations)
Creates a structured checkpoint with task, progress, next steps, and relevant files (up to 20)
Sends real-time notifications for both summary and checkpoint creation

Rules-Based Editors (Cursor, Windsurf, Cline)

Editors without native hook support use rules/instructions files that direct the AI assistant to call MCP tools at appropriate moments:

Session start: Call get_context or resume_session
During session: Call store_observation and store_knowledge as needed
Session end: Call store_summary with learnings and next steps

Vector Search

Kiro Memory provides semantic search using locally-generated vector embeddings. No external API keys are required.

Embedding Pipeline

Observation Text
      │
      v
┌─────────────────┐
│ Embedding Model  │    Local inference
│ (fastembed or    │    No API keys
│  transformers)   │
└────────┬─────────┘
         │
         v
   384-dim vector
         │
         v
┌─────────────────┐
│ SQLite Storage   │    Stored alongside
│ (embeddings col) │    observation data
└─────────────────┘

Embedding Providers

Kiro Memory supports two local embedding providers (listed as optional dependencies):

fastembed -- Fast native embedding generation
@huggingface/transformers -- HuggingFace transformer models

The embedding service is lazily initialized on first use and runs entirely locally.

Search Modes

Mode	Method	How It Works
Keyword	`search()`	SQLite FTS5 full-text search with BM25 ranking
Semantic	`semanticSearch()`	Cosine similarity between query and observation embeddings
Hybrid	`hybridSearch()`	Combines FTS5 keyword scores with vector similarity scores

Hybrid search provides the best results by catching both exact keyword matches and semantically similar content.

4-Signal Ranking

The smart context system ranks every memory item using four signals to surface the most relevant context:

┌──────────────────────────────────────────────┐
│              Smart Ranking Score              │
│                                              │
│   Score = w1 * Recency                       │
│         + w2 * Frequency                     │
│         + w3 * Semantic Similarity           │
│         + w4 * Decay Penalty                 │
│                                              │
└──────────────────────────────────────────────┘

Signal Definitions

1. Recency

How recently the observation was created. Recent observations receive higher scores, with exponential decay over time.

recency_score = exp(-age_hours / half_life)

2. Frequency

How often the observation's associated files or concepts appear across the memory. Frequently referenced items are more likely to be relevant.

3. Semantic Similarity

When a query is provided (e.g., during getSmartContext with a query parameter), observations are scored by cosine similarity between their embedding vector and the query embedding.

4. Decay Penalty

Observations linked to files that have been modified since capture are penalized. This prevents stale information from surfacing. Observations that have never been accessed also receive a lower score over time.

Token Budget

The getSmartContext() method accepts a tokenBudget parameter (default: 2000 tokens) that limits how much context is injected. Items are ranked by their composite score and included until the budget is exhausted. This prevents context overflow in editors with limited prompt windows.

Storage Architecture

SQLite with FTS5

All data is stored in a single SQLite database at ~/.kiro-memory/kiro-memory.db. SQLite was chosen for:

Zero-configuration setup
Single-file portability
Excellent read performance for local workloads
FTS5 extension for full-text search

Database Schema

The database contains the following core tables:

┌─────────────────┐     ┌─────────────────┐
│  observations   │     │   summaries     │
│                 │     │                 │
│  id (PK)        │     │  id (PK)        │
│  type           │     │  project        │
│  title          │     │  session_id     │
│  content        │     │  request        │
│  project        │     │  learned        │
│  session_id     │     │  completed      │
│  concepts       │     │  next_steps     │
│  files          │     │  created_at     │
│  embedding      │     └─────────────────┘
│  stale           │
│  access_count   │     ┌─────────────────┐
│  last_accessed  │     │  checkpoints    │
│  created_at     │     │                 │
└─────────────────┘     │  id (PK)        │
                        │  session_id     │
┌─────────────────┐     │  project        │
│  sessions       │     │  task           │
│                 │     │  progress       │
│  id (PK)        │     │  next_steps     │
│  content_id     │     │  questions      │
│  project        │     │  files          │
│  status         │     │  created_at     │
│  created_at     │     └─────────────────┘
│  completed_at   │
└─────────────────┘     ┌─────────────────┐
                        │  user_prompts   │
┌─────────────────┐     │                 │
│ observations_fts│     │  id (PK)        │
│ (FTS5 virtual)  │     │  session_id     │
│                 │     │  prompt_number  │
│  title          │     │  text           │
│  content        │     │  created_at     │
│  concepts       │     └─────────────────┘
└─────────────────┘

FTS5 Virtual Tables

Full-text search is powered by SQLite FTS5 virtual tables that index observation titles, content, and concepts. FTS5 provides:

BM25 relevance ranking
Prefix queries
Boolean operators (AND, OR, NOT)
Phrase matching
Column-specific search

Vector Storage

Embedding vectors are stored directly in the observations table as binary blobs. This avoids the overhead of a separate vector database while still supporting cosine similarity search. The vector index is rebuilt in memory when the embedding service initializes.

Worker Architecture

Auto-Start Mechanism

The worker is a standalone Node.js process that runs in the background:

Editor Session Starts
        │
        v
  Hook: agentSpawn
        │
        v
  Is worker reachable?  ──── Yes ───> Inject context
        │
        No
        │
        v
  Spawn worker-service.js
  (background process)
        │
        v
  Wait up to 3s
        │
        v
  Worker ready? ──── No ───> Graceful degradation
        │                     (continue without worker)
        Yes
        │
        v
  Inject context

Worker Responsibilities

The worker process handles:

HTTP API -- RESTful endpoints for all CRUD operations
Web Dashboard -- Serves the React-based dashboard UI
SSE Streaming -- Real-time event streaming for the dashboard live feed
Embedding Operations -- Lazy initialization and batch processing of vector embeddings
Database Management -- Connection pooling and migration handling

API Endpoints

The worker exposes REST endpoints under http://127.0.0.1:3001/api/:

Endpoint	Method	Description
`/api/context/:project`	GET	Project context
`/api/search`	GET	Full-text search
`/api/hybrid-search`	GET	Hybrid vector + keyword search
`/api/timeline`	GET	Chronological timeline
`/api/observations/batch`	POST	Batch retrieve observations
`/api/observations`	POST	Store observation
`/api/summaries`	POST	Store summary
`/api/knowledge`	POST	Store knowledge
`/api/checkpoint`	GET	Latest project checkpoint
`/api/sessions/:id/checkpoint`	GET	Session checkpoint
`/api/report`	GET	Activity report
`/api/embeddings/stats`	GET	Embedding statistics
`/api/events`	GET	SSE event stream

Security

Kiro Memory is designed for local development use but includes multiple security layers:

Token Authentication

API requests to the worker require a valid authentication token. The token is generated during installation and stored in the local configuration. Hooks and the MCP server include the token automatically.

Rate Limiting

The Express-based worker uses express-rate-limit to protect against excessive API calls. This prevents runaway loops in hooks from overwhelming the system.

Helmet

The Helmet middleware sets secure HTTP headers:

Content-Security-Policy -- Restricts resource loading
X-Content-Type-Options: nosniff -- Prevents MIME sniffing
X-Frame-Options: DENY -- Prevents clickjacking
Strict-Transport-Security -- Enforces HTTPS (when applicable)

CORS

Cross-Origin Resource Sharing is restricted to localhost origins. The web dashboard is served from the same origin as the API, and external requests are rejected.

Input Validation

All API inputs are validated using Zod schemas. Invalid payloads are rejected before reaching the database layer. Content fields are sanitized with DOMPurify to prevent XSS in the web dashboard.

Local-Only Design

By default, the worker binds to 127.0.0.1 (localhost only). All data remains on your machine. No telemetry, no cloud services, no external API calls. Embedding generation runs locally using bundled models.

System Overview​

Hooks System​

Claude Code Hooks​

Hook Details​

Rules-Based Editors (Cursor, Windsurf, Cline)​

Vector Search​

Embedding Pipeline​

Embedding Providers​

Search Modes​

4-Signal Ranking​

Signal Definitions​

Token Budget​

Storage Architecture​

SQLite with FTS5​

Database Schema​

FTS5 Virtual Tables​

Vector Storage​

Worker Architecture​

Auto-Start Mechanism​

Worker Responsibilities​

API Endpoints​

Security​

Token Authentication​

Rate Limiting​

Helmet​

CORS​

Input Validation​

Local-Only Design​