Architecture
This page explains the internal architecture of Kiro Memory, covering data flow, the hooks system, search and ranking algorithms, storage design, and security.
System Overview
┌──────────────────────────────────────────────────────────┐
│ AI Coding Editor │
│ (Claude Code / Cursor / Windsurf / Cline) │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────────┐ │
│ │ Hooks / │ │ MCP Server │ │
│ │ Rules │ │ (10 tools) │ │
│ │ │ │ │ │
│ │ Capture: │ │ Search, Store, │ │
│ │ - Files │ │ Resume, Report │ │
│ │ - Commands │ │ │ │
│ │ - Decisions │ │ │ │
│ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │
└──────────┼───────────────────────────────┼───────────────┘
│ HTTP │
v v
┌─────────────────────────────────────────────┐
│ Kiro Memory Worker │
│ (port 3001) │
│ │
│ ┌───────────┐ ┌────────────────────┐ │
│ │ REST API │ │ Web Dashboard │ │
│ │ Endpoints │ │ (SSE live feed) │ │
│ └─────┬──────┘ └────────────────────┘ │
│ │ │
│ ┌─────┴──────────────────────────────┐ │
│ │ Service Layer │ │
│ │ │ │
│ │ ┌──────────┐ ┌───────────────┐ │ │
│ │ │ SQLite │ │ Vector Search │ │ │
│ │ │ + FTS5 │ │ (Embeddings) │ │ │
│ │ └──────────┘ └───────────────┘ │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
│
v
~/.kiro-memory/kiro-memory.db
Hooks System
Claude Code Hooks
Claude Code provides lifecycle hooks that Kiro Memory uses to automatically capture and inject context. Four hooks are registered:
Session Start Session End
│ │
v v
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ PreTool │ │ PostTool │ │ Stop │
│ Use │ │ Use │ │ │
│ │ │ │ │ - Summary │
│ Context │ │ Capture: │ │ - Checkpoint│
│ inject │ │ - Writes │ │ │
└──────────┘ │ - Commands │ └──────────────┘
│ - Research │
└──────────────┘
│
┌──────────────┐
│ Notification │
│ │
│ Store user │
│ prompts │
└──────────────┘
Hook Details
PreToolUse (agentSpawn)
- Triggers at session start
- Retrieves smart context from the SDK using 4-signal ranking
- Injects recent summaries and relevant observations into the session
- Ensures the background worker is running (spawns it if needed)
- Waits up to 3 seconds for worker availability
- Gracefully degrades if the worker is unreachable
PostToolUse (postToolUse)
- Triggers after every tool execution
- Normalizes tool events from different editor sources
- Classifies observations by type:
file-write-- File creation or modificationcommand-- Shell command executionresearch-- Code search, grep, file readstool-use-- Other tool invocationsdelegation-- Agent delegation events
- Filters out uninformative tools (introspect, thinking, todo)
- Tracks read-only tools (glob, grep, read) with file paths only
- Caps content at 500 characters per field
- Sends real-time notification to the dashboard
Notification (userPromptSubmit)
- Triggers when the user submits a prompt
- Stores the prompt text with session association
- Enables prompt history for context reconstruction
Stop
- Triggers when the agent completes its response
- Retrieves up to 50 recent observations (by session or 4-hour window)
- Generates a session summary containing:
- Completed tasks (up to 10, from observation titles)
- Modified files (deduplicated)
- Learned insights (up to 5, from research observations)
- Creates a structured checkpoint with task, progress, next steps, and relevant files (up to 20)
- Sends real-time notifications for both summary and checkpoint creation
Rules-Based Editors (Cursor, Windsurf, Cline)
Editors without native hook support use rules/instructions files that direct the AI assistant to call MCP tools at appropriate moments:
- Session start: Call
get_contextorresume_session - During session: Call
store_observationandstore_knowledgeas needed - Session end: Call
store_summarywith learnings and next steps
Vector Search
Kiro Memory provides semantic search using locally-generated vector embeddings. No external API keys are required.
Embedding Pipeline
Observation Text
│
v
┌─────────────────┐
│ Embedding Model │ Local inference
│ (fastembed or │ No API keys
│ transformers) │
└────────┬─────────┘
│
v
384-dim vector
│
v
┌─────────────────┐
│ SQLite Storage │ Stored alongside
│ (embeddings col) │ observation data
└─────────────────┘
Embedding Providers
Kiro Memory supports two local embedding providers (listed as optional dependencies):
- fastembed -- Fast native embedding generation
- @huggingface/transformers -- HuggingFace transformer models
The embedding service is lazily initialized on first use and runs entirely locally.
Search Modes
| Mode | Method | How It Works |
|---|---|---|
| Keyword | search() | SQLite FTS5 full-text search with BM25 ranking |
| Semantic | semanticSearch() | Cosine similarity between query and observation embeddings |
| Hybrid | hybridSearch() | Combines FTS5 keyword scores with vector similarity scores |
Hybrid search provides the best results by catching both exact keyword matches and semantically similar content.
4-Signal Ranking
The smart context system ranks every memory item using four signals to surface the most relevant context:
┌──────────────────────────────────────────────┐
│ Smart Ranking Score │
│ │
│ Score = w1 * Recency │
│ + w2 * Frequency │
│ + w3 * Semantic Similarity │
│ + w4 * Decay Penalty │
│ │
└──────────────────────────────────────────────┘
Signal Definitions
1. Recency
How recently the observation was created. Recent observations receive higher scores, with exponential decay over time.
recency_score = exp(-age_hours / half_life)
2. Frequency
How often the observation's associated files or concepts appear across the memory. Frequently referenced items are more likely to be relevant.
3. Semantic Similarity
When a query is provided (e.g., during getSmartContext with a query parameter), observations are scored by cosine similarity between their embedding vector and the query embedding.
4. Decay Penalty
Observations linked to files that have been modified since capture are penalized. This prevents stale information from surfacing. Observations that have never been accessed also receive a lower score over time.
Token Budget
The getSmartContext() method accepts a tokenBudget parameter (default: 2000 tokens) that limits how much context is injected. Items are ranked by their composite score and included until the budget is exhausted. This prevents context overflow in editors with limited prompt windows.
Storage Architecture
SQLite with FTS5
All data is stored in a single SQLite database at ~/.kiro-memory/kiro-memory.db. SQLite was chosen for:
- Zero-configuration setup
- Single-file portability
- Excellent read performance for local workloads
- FTS5 extension for full-text search
Database Schema
The database contains the following core tables:
┌─────────────────┐ ┌─────────────────┐
│ observations │ │ summaries │
│ │ │ │
│ id (PK) │ │ id (PK) │
│ type │ │ project │
│ title │ │ session_id │
│ content │ │ request │
│ project │ │ learned │
│ session_id │ │ completed │
│ concepts │ │ next_steps │
│ files │ │ created_at │
│ embedding │ └─────────────────┘
│ stale │
│ access_count │ ┌─────────────────┐
│ last_accessed │ │ checkpoints │
│ created_at │ │ │
└─────────────────┘ │ id (PK) │
│ session_id │
┌─────────────────┐ │ project │
│ sessions │ │ task │
│ │ │ progress │
│ id (PK) │ │ next_steps │
│ content_id │ │ questions │
│ project │ │ files │
│ status │ │ created_at │
│ created_at │ └─────────────────┘
│ completed_at │
└─────────────────┘ ┌─────────────────┐
│ user_prompts │
┌─────────────────┐ │ │
│ observations_fts│ │ id (PK) │
│ (FTS5 virtual) │ │ session_id │
│ │ │ prompt_number │
│ title │ │ text │
│ content │ │ created_at │
│ concepts │ └─────────────────┘
└─────────────────┘
FTS5 Virtual Tables
Full-text search is powered by SQLite FTS5 virtual tables that index observation titles, content, and concepts. FTS5 provides:
- BM25 relevance ranking
- Prefix queries
- Boolean operators (AND, OR, NOT)
- Phrase matching
- Column-specific search
Vector Storage
Embedding vectors are stored directly in the observations table as binary blobs. This avoids the overhead of a separate vector database while still supporting cosine similarity search. The vector index is rebuilt in memory when the embedding service initializes.
Worker Architecture
Auto-Start Mechanism
The worker is a standalone Node.js process that runs in the background:
Editor Session Starts
│
v
Hook: agentSpawn
│
v
Is worker reachable? ──── Yes ───> Inject context
│
No
│
v
Spawn worker-service.js
(background process)
│
v
Wait up to 3s
│
v
Worker ready? ──── No ───> Graceful degradation
│ (continue without worker)
Yes
│
v
Inject context
Worker Responsibilities
The worker process handles:
- HTTP API -- RESTful endpoints for all CRUD operations
- Web Dashboard -- Serves the React-based dashboard UI
- SSE Streaming -- Real-time event streaming for the dashboard live feed
- Embedding Operations -- Lazy initialization and batch processing of vector embeddings
- Database Management -- Connection pooling and migration handling
API Endpoints
The worker exposes REST endpoints under http://127.0.0.1:3001/api/:
| Endpoint | Method | Description |
|---|---|---|
/api/context/:project | GET | Project context |
/api/search | GET | Full-text search |
/api/hybrid-search | GET | Hybrid vector + keyword search |
/api/timeline | GET | Chronological timeline |
/api/observations/batch | POST | Batch retrieve observations |
/api/observations | POST | Store observation |
/api/summaries | POST | Store summary |
/api/knowledge | POST | Store knowledge |
/api/checkpoint | GET | Latest project checkpoint |
/api/sessions/:id/checkpoint | GET | Session checkpoint |
/api/report | GET | Activity report |
/api/embeddings/stats | GET | Embedding statistics |
/api/events | GET | SSE event stream |
Security
Kiro Memory is designed for local development use but includes multiple security layers:
Token Authentication
API requests to the worker require a valid authentication token. The token is generated during installation and stored in the local configuration. Hooks and the MCP server include the token automatically.
Rate Limiting
The Express-based worker uses express-rate-limit to protect against excessive API calls. This prevents runaway loops in hooks from overwhelming the system.
Helmet
The Helmet middleware sets secure HTTP headers:
Content-Security-Policy-- Restricts resource loadingX-Content-Type-Options: nosniff-- Prevents MIME sniffingX-Frame-Options: DENY-- Prevents clickjackingStrict-Transport-Security-- Enforces HTTPS (when applicable)
CORS
Cross-Origin Resource Sharing is restricted to localhost origins. The web dashboard is served from the same origin as the API, and external requests are rejected.
Input Validation
All API inputs are validated using Zod schemas. Invalid payloads are rejected before reaching the database layer. Content fields are sanitized with DOMPurify to prevent XSS in the web dashboard.
Local-Only Design
By default, the worker binds to 127.0.0.1 (localhost only). All data remains on your machine. No telemetry, no cloud services, no external API calls. Embedding generation runs locally using bundled models.