docs: file_uuid generation rules for M4

2026-05-17 02:26:09 +08:00
parent 3a6c186575
commit eec2eea880
79 changed files with 23293 additions and 0 deletions
--- a/docs_v1.0/DESIGN/API_KEY_DESIGN.md
+++ b/docs_v1.0/DESIGN/API_KEY_DESIGN.md
@@ -0,0 +1,731 @@
+---
+document_type: "reference_doc"
+service: "MOMENTRY_CORE"
+title: "Momentry API Key 管理系統設計"
+date: "2026-03-21"
+version: "V1.0"
+status: "active"
+owner: "Warren"
+created_by: "OpenCode"
+tags:
+  - "momentry"
+  - "管理系統設計"
+ai_query_hints:
+  - "查詢 Momentry API Key 管理系統設計 的內容"
+  - "Momentry API Key 管理系統設計 的主要目的是什麼？"
+  - "如何操作或實施 Momentry API Key 管理系統設計？"
+---
+
+# Momentry API Key 管理系統設計
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | Warren |
+| 建立時間 | 2026-03-21 |
+| 文件版本 | V1.2 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-03-18 | 創建文件 | Warren | OpenCode / MiniMax M2.5 |
+| V1.1 | 2026-03-20 | 新增 Key 類型與管理流程 | Warren | OpenCode |
+| V1.2 | 2026-03-21 | 更新 API Key 格式與驗證流程 | Warren | OpenCode |
+
+---
+
+**狀態**: 開發中
+
+---
+
+## 1. 概述
+
+### 1.1 目標
+
+建立安全的 API Key 管理機制，支援：
+- 多類型 API Key（系統、用戶、服務）
+- 自動過期與輪換
+- 異常使用偵測
+- 強制更新機制
+- 完整審計日誌
+- Gitea Token 整合
+- n8n API Key 整合
+
+### 1.2 設計原則
+
+| 原則 | 說明 |
+|------|------|
+| 最小權限 | 每個 Key 僅授予必要權限 |
+| 定期輪換 | 自動過期強制更新 |
+| 追蹤可審 | 所有操作都有日誌 |
+| 分離儲存 | Key 與使用者資料分離 |
+
+---
+
+## 2. API Key 類型
+
+### 2.1 Key 類型矩陣
+
+| 類型 | 前綴 | 用途 | 預設有效期 | 輪換方式 |
+|------|------|------|------------|----------|
+| `system` | `msys_` | 系統內部服務 | 365 天 | 手動 |
+| `user` | `muser_` | 個人用戶 | 90 天 | 自動 |
+| `service` | `msvc_` | 服務間通訊 | 180 天 | 自動 |
+| `integration` | `mint_` | 第三方整合 | 30 天 | 強制更新 |
+| `emergency` | `memg_` | 緊急存取 | 24 小時 | 一次性 |
+
+### 2.2 Key 格式
+
+```
+{prefix}{uuid_v4}_{timestamp}_{checksum}
+```
+
+**範例：**
+```
+msys_a1b2c3d4-e5f6-7890-abcd-ef1234567890_1710998400_sha256
+```
+
+---
+
+## 3. 資料庫 Schema
+
+### 3.1 api_keys 表
+
+```sql
+CREATE TABLE api_keys (
+    id              BIGSERIAL PRIMARY KEY,
+    key_id         VARCHAR(64) UNIQUE NOT NULL,  -- 公開 Key ID
+    key_hash       VARCHAR(128) NOT NULL,       -- SHA256 哈希
+    key_prefix     VARCHAR(8) NOT NULL,         -- Key 前綴
+    name           VARCHAR(128) NOT NULL,        -- Key 名稱
+    key_type       VARCHAR(32) NOT NULL,        -- system/user/service/integration/emergency
+    user_id        BIGINT,                      -- 關聯用戶 (nullable for system)
+    service_name   VARCHAR(64),                  -- 服務名稱 (for service keys)
+    permissions     JSONB NOT NULL DEFAULT '[]', -- 權限列表
+    expires_at     TIMESTAMP,                   -- 過期時間
+    last_used_at   TIMESTAMP,                   -- 最後使用時間
+    last_used_ip   VARCHAR(45),                 -- 最後使用 IP
+    usage_count    BIGINT DEFAULT 0,            -- 使用次數
+    status         VARCHAR(16) DEFAULT 'active', -- active/suspended/expired/revoked
+    rotation_required BOOLEAN DEFAULT FALSE,     -- 強制輪換標記
+    rotation_reason VARCHAR(256),               -- 輪換原因
+    created_at     TIMESTAMP DEFAULT NOW(),
+    updated_at     TIMESTAMP DEFAULT NOW()
+);
+
+CREATE INDEX idx_api_keys_key_id ON api_keys(key_id);
+CREATE INDEX idx_api_keys_user_id ON api_keys(user_id);
+CREATE INDEX idx_api_keys_type ON api_keys(key_type);
+CREATE INDEX idx_api_keys_status ON api_keys(status);
+CREATE INDEX idx_api_keys_expires ON api_keys(expires_at);
+```
+
+### 3.2 api_key_audit_log 表
+
+```sql
+CREATE TABLE api_key_audit_log (
+    id              BIGSERIAL PRIMARY KEY,
+    key_id         VARCHAR(64) NOT NULL,
+    action         VARCHAR(32) NOT NULL,         -- created/used/rotated/revoked/expired/suspended
+    actor          VARCHAR(64),                   -- 操作者 (user_id or 'system')
+    ip_address     VARCHAR(45),
+    user_agent     VARCHAR(512),
+    request_path   VARCHAR(256),
+    response_code  INTEGER,
+    details        JSONB,
+    created_at     TIMESTAMP DEFAULT NOW()
+);
+
+CREATE INDEX idx_audit_key_id ON api_key_audit_log(key_id);
+CREATE INDEX idx_audit_action ON api_key_audit_log(action);
+CREATE INDEX idx_audit_created ON api_key_audit_log(created_at);
+```
+
+### 3.3 api_key_rotation_log 表
+
+```sql
+CREATE TABLE api_key_rotation_log (
+    id                  BIGSERIAL PRIMARY KEY,
+    key_id             VARCHAR(64) NOT NULL,
+    old_key_id         VARCHAR(64),
+    new_key_id         VARCHAR(64),
+    rotation_type      VARCHAR(32) NOT NULL,     -- scheduled/manual/forced/emergency
+    reason             VARCHAR(256),
+    triggered_by       VARCHAR(64),              -- system/user/scheduler
+    grace_period_end   TIMESTAMP,               -- 寬限期結束時間
+    created_at         TIMESTAMP DEFAULT NOW()
+);
+```
+
+---
+
+## 4. API Key 狀態機
+
+```
+                    ┌──────────────┐
+                    │   created    │
+                    └──────┬───────┘
+                           │
+                           ▼
+              ┌────────────────────┐
+              │      active        │◄─────────────┐
+              └─────────┬──────────┘              │
+                        │                          │
+          ┌─────────────┼─────────────┐            │
+          │             │             │            │
+          ▼             ▼             ▼            │
+    ┌──────────┐ ┌──────────┐ ┌──────────┐     │
+    │ suspended │ │ expired  │ │ revoked  │─────┘
+    └──────────┘ └──────────┘ └──────────┘
+```
+
+### 狀態轉換規則
+
+| 從 | 到 | 觸發條件 |
+|----|----|----------|
+| created | active | 啟用 Key |
+| active | suspended | 異常使用偵測 |
+| active | expired | 達到過期時間 |
+| active | revoked | 手動撤銷 |
+| suspended | active | 解除鎖定 |
+| suspended | revoked | 確認異常 |
+| expired | active | 重新啟用 |
+
+---
+
+## 5. 異常偵測機制
+
+### 5.1 異常指標
+
+| 指標 | 閾值 | 處置 |
+|------|------|------|
+| 每分鐘請求數 | > 1000 | 警告 |
+| 每小時請求數 | > 10000 | 鎖定 |
+| 錯誤率 | > 50% | 警告 |
+| 不同 IP 數 | > 5/小時 | 警告 |
+| 非工作時間使用 | 深夜請求 | 警告 |
+| 異常模式 | 暴力破解 | 鎖定 |
+
+### 5.2 異常處理流程
+
+```
+異常偵測
+    │
+    ▼
+┌─────────┐
+│  分析   │──→ 排除正常流量
+└────┬────┘
+     │
+     ▼
+┌─────────┐
+│  評估   │──→ 輕微 → 警告
+└────┬────┘
+     │
+     ▼
+┌─────────┐
+│  處置   │──→ 嚴重 → 鎖定 + 輪換
+└─────────┘
+```
+
+---
+
+## 6. 強制更新機制
+
+### 6.1 觸發條件
+
+| 條件 | 嚴重性 | 動作 |
+|------|--------|------|
+| 疑似洩露 | 高 | 立即停用 + 強制輪換 |
+| 異常使用 | 中 | 警告 + 建議輪換 |
+| 計劃性維護 | 低 | 通知 + 排程輪換 |
+| 政策要求 | 高 | 強制輪換 |
+| 過期 | 低 | 停用 + 通知 |
+
+### 6.2 強制輪換流程
+
+```
+1. 系統偵測到需要強制更新
+         │
+         ▼
+2. 建立新 Key（保留舊 Key 在寬限期內）
+         │
+         ▼
+3. 發送通知（Email/Slack/Redis PubSub）
+         │
+         ▼
+4. 寬限期開始（預設 24 小時）
+         │
+         ├── 在寬限期內更新 → 完成輪換
+         │
+         └── 寬限期結束 → 舊 Key 停用
+```
+
+### 6.3 寬限期配置
+
+| Key 類型 | 寬限期 |
+|----------|--------|
+| system | 72 小時 |
+| user | 24 小時 |
+| service | 48 小時 |
+| integration | 24 小時 |
+| emergency | 0 小時 |
+
+---
+
+## 7. CLI 管理命令
+
+### 7.1 命令列表
+
+```bash
+# Key 管理
+momentry api-key create --name "My Key" --type user --permissions read,write
+momentry api-key list --type user
+momentry api-key info <key_id>
+momentry api-key revoke <key_id> --reason "安全原因"
+
+# 輪換管理
+momentry api-key rotate <key_id>           # 正常輪換
+momentry api-key force-rotate <key_id>       # 強制輪換
+momentry api-key rotation-status <key_id>    # 查看輪換狀態
+
+# 異常管理
+momentry api-key suspend <key_id> --reason "異常使用"
+momentry api-key unsuspend <key_id>
+momentry api-key blacklist <key_id>         # 列入黑名單
+
+# 審計
+momentry api-key audit <key_id> --since 7d
+momentry api-key stats --type service --period 30d
+```
+
+### 7.2 輸出範例
+
+```bash
+$ momentry api-key list --type service
+
+┌────────────────────────────────────┬─────────┬──────────────┬────────────────┐
+│ Key ID                             │ Name    │ Status       │ Expires        │
+├────────────────────────────────────┼─────────┼──────────────┼────────────────┤
+│ msvc_a1b2c3d4_1710998400_sha256    │ N8N     │ active       │ 2026-09-21     │
+│ msvc_e5f6g7h8_1713600000_sha256   │ OpenCode│ rotation_req  │ 2026-09-21     │
+└────────────────────────────────────┴─────────┴──────────────┴────────────────┘
+
+⚠️  1 個 Key 需要輪換
+```
+
+---
+
+## 8. 實現計畫
+
+### Phase 1: 核心功能
+- [ ] 資料庫 Schema
+- [ ] Key 生成與哈希
+- [ ] 基本 CRUD API
+- [ ] 過期檢查
+
+### Phase 2: 安全機制
+- [ ] 異常偵測
+- [ ] 自動鎖定
+- [ ] 強制輪換
+- [ ] 寬限期管理
+
+### Phase 3: 管理工具
+- [ ] CLI 命令
+- [ ] 審計日誌
+- [ ] 統計報表
+- [ ] 通知系統
+
+### Phase 4: 自動化
+- [ ] 定時輪換排程
+- [ ] Prometheus 指標
+- [ ] Alertmanager 整合
+- [ ] 自動化回應
+
+---
+
+## 9. 安全考量
+
+### 9.1 Key 儲存
+- 明文 Key 只顯示一次（創建時）
+- 儲存時使用 SHA256 哈希
+- 使用 Fernet 對稱加密敏感配置
+
+### 9.2 傳輸安全
+- 所有 API 必須使用 HTTPS
+- Key 在 Header 中傳輸（X-API-Key）
+- 避免 Key 在 URL 中
+
+### 9.3 存取控制
+- 只有管理員可創建/撤銷 Key
+- 用戶只能管理自己的 Key
+- 系統 Key 需要特殊權限
+
+---
+
+## 10. 環境變數配置
+
+```bash
+# API Key 管理
+MOMENTRY_API_KEY_GRACE_PERIOD=86400          # 寬限期（秒）
+MOMENTRY_API_KEY_MAX_PER_USER=5              # 每用戶最大 Key 數
+MOMENTRY_API_KEY_ROTATION_DAYS=90            # 自動輪換天數
+
+# 異常偵測
+MOMENTRY_API_KEY_RATE_LIMIT=1000             # 每分鐘限制
+MOMENTRY_API_KEY_ERROR_THRESHOLD=0.5         # 錯誤率閾值
+MOMENTRY_API_KEY_IP_LIMIT=5                 # 每小時 IP 限制
+
+# 通知
+MOMENTRY_API_KEY_ALERT_WEBHOOK=             # 異常通知 webhook
+```
+
+---
+
+## 11. Gitea API Token 整合
+
+### 11.1 概述
+
+支援透過 API Key 管理系統建立和管理 Gitea Personal Access Tokens，採用「建立時納管」模式。
+
+### 11.2 納管模式
+
+```
+使用者提供帳號密碼 → 呼叫 Gitea API 建立 Token → 明文只顯示一次 → 同步儲存至管理系統
+```
+
+**特點：**
+- Token 明文僅在建立時取得
+- 管理系統記錄 Token 元數據（不含明文）
+- 支援本地查詢和刪除
+
+### 11.3 資料庫結構
+
+```sql
+CREATE TABLE gitea_tokens (
+    id SERIAL PRIMARY KEY,
+    gitea_token_id BIGINT NOT NULL,      -- Gitea 內部 Token ID
+    gitea_user VARCHAR(128) NOT NULL,    -- Gitea 用戶名
+    token_name VARCHAR(128) NOT NULL,    -- Token 名稱
+    token_last_eight VARCHAR(8) NOT NULL, -- SHA1 最後 8 碼（顯示用）
+    scopes JSONB DEFAULT '[]',           -- 權限範圍
+    api_key_id VARCHAR(48),              -- 關聯的 API Key ID（可選）
+    last_verified TIMESTAMP,             -- 最後驗證時間
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    UNIQUE(gitea_user, token_name)
+);
+```
+
+### 11.4 Token 權限範圍
+
+| 範圍 | 說明 |
+|------|------|
+| `read:repository` | 讀取倉庫 |
+| `write:repository` | 寫入倉庫 |
+| `read:issue` | 讀取議題 |
+| `write:issue` | 寫入議題 |
+| `read:user` | 讀取用戶資訊 |
+| `write:write` | 修改用戶資訊 |
+| `read:organization` | 讀取組織 |
+| `write:organization` | 修改組織 |
+| `read:package` | 讀取套件 |
+| `write:package` | 發布套件 |
+| `read:notification` | 讀取通知 |
+| `write:notification` | 修改通知 |
+| `read:admin` | 管理員讀取 |
+| `write:admin` | 管理員寫入 |
+
+### 11.5 CLI 命令
+
+#### 建立 Token
+
+```bash
+# 基本用法
+momentry gitea create \
+  --username <gitea_user> \
+  --password <gitea_password> \
+  --token-name <token_name> \
+  --scopes "read:repository,write:repository"
+
+# 範例：建立整合用 Token
+momentry gitea create \
+  --username admin \
+  --password "MyPassword123" \
+  --token-name "ci-pipeline" \
+  --scopes "read:repository,write:repository,read:issue,write:issue"
+```
+
+**輸出範例：**
+```
+✅ Gitea Token created successfully!
+
+┌─────────────────────────────────────────────────────────────────────────────┐
+│  ⚠️  IMPORTANT: Save this token now - it will not be shown again!         │
+└─────────────────────────────────────────────────────────────────────────────┘
+
+Token ID:   9
+Token Name: ci-pipeline
+SHA1:       9a4f282e9ba817b430082e6bff2c18e2ae38e480
+Last 8:     ae38e480
+
+Authorization Header:
+  Authorization: token 9a4f282e9ba817b430082e6bff2c18e2ae38e480
+```
+
+#### 列出 Token
+
+```bash
+# 列出用戶的所有 Token
+momentry gitea list \
+  --username <gitea_user> \
+  --password <gitea_password>
+```
+
+**輸出範例：**
+```
+📋 Gitea Tokens for user: admin
+
+┌────────────────────────────────────────────────────────────────────────────┐
+│ ID       │ Name                 │ Last 8    │ Registered                  │
+├────────────────────────────────────────────────────────────────────────────┤
+│        9 │ ci-pipeline          │ ae38e480  │ ✓                          │
+│        8 │ dev-token            │ 1234abcd  │ -                          │
+└────────────────────────────────────────────────────────────────────────────┘
+
+Total: 2 token(s)
+```
+
+#### 刪除 Token
+
+```bash
+# 刪除指定 Token
+momentry gitea delete \
+  --username <gitea_user> \
+  --password <gitea_password> \
+  --token-name <token_name>
+```
+
+#### 查詢本地記錄
+
+```bash
+# 查詢已納管的 Token 記錄
+momentry gitea verify --token-name <token_name>
+```
+
+**輸出範例：**
+```
+📋 Gitea Token: ci-pipeline
+  User:         admin
+  Token ID:     9
+  Last 8:       ae38e480
+  Scopes:       ["read:repository","write:repository"]
+  Created:      2026-03-21 06:44:55.577586 UTC
+  Last Verified: never
+```
+
+### 11.6 使用範圍
+
+#### 適用場景
+
+| 場景 | 說明 |
+|------|------|
+| CI/CD 整合 | 建立專用 Token 用於自動化流程 |
+| 服務間通訊 | 建立 Token 供其他服務存取 Gitea API |
+| 開發環境 | 為開發者建立短期 Token |
+| 監控整合 | 建立只讀 Token 用於監控和報告 |
+
+#### 限制
+
+| 限制 | 說明 |
+|------|------|
+| 明文 Token | 僅在建立時取得，無法再次查詢 |
+| 管理 API | 需要帳號密碼（BasicAuth） |
+| Token 驗證 | 只能透過 API 呼叫驗證有效性 |
+| 同步刪除 | 本地刪除不會自動同步到 Gitea |
+
+### 11.7 環境變數
+
+```bash
+# Gitea 連線設定
+GITEA_URL=http://localhost:3000    # Gitea API URL
+```
+
+### 11.8 安全考量
+
+| 項目 | 措施 |
+|------|------|
+| 密碼傳輸 | 僅在 CLI 命令中使用，不儲存 |
+| Token 儲存 | 本地僅存元數據，不含明文 |
+| 權限最小化 | 建議僅授予必要權限 |
+| 定期輪換 | 建議定期更新 Token |
+
+---
+
+## 12. n8n API Key 整合
+
+### 12.1 概述
+
+支援透過 API Key 管理系統建立和管理 n8n API Keys，採用「建立時納管」模式。
+
+### 12.2 納管模式
+
+```
+使用者提供現有 n8n API Key → 呼叫 n8n API 建立新 Key → 明文只顯示一次 → 同步儲存至管理系統
+```
+
+**特點：**
+- 需要一個現有的 n8n API Key 作為管理憑證
+- API Key 明文僅在建立時取得
+- 管理系統記錄 Key 元數據（不含明文）
+- 支援本地查詢和刪除
+
+### 12.3 資料庫結構
+
+```sql
+CREATE TABLE n8n_api_keys (
+    id SERIAL PRIMARY KEY,
+    n8n_key_id VARCHAR(64) UNIQUE NOT NULL,  -- n8n 內部 Key ID
+    label VARCHAR(100) NOT NULL,             -- Key 標籤
+    api_key_last_eight VARCHAR(8) NOT NULL,  -- API Key 最後 8 碼（顯示用）
+    momentry_api_key_id VARCHAR(48),         -- 關聯的 API Key ID（可選）
+    expires_at TIMESTAMP WITH TIME ZONE,     -- 過期時間
+    last_verified TIMESTAMP WITH TIME ZONE,  -- 最後驗證時間
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+### 12.4 認證方式
+
+n8n 使用 JWT-based API Key，透過 `X-N8N-API-KEY` Header 認證：
+
+```bash
+curl -H "X-N8N-API-KEY: <your-api-key>" https://n8n.example.com/api/v1/workflows
+```
+
+### 12.5 CLI 命令
+
+#### 建立 API Key
+
+```bash
+# 基本用法
+momentry n8n create \
+  --api-key <existing_n8n_api_key> \
+  --label <key_label> \
+  --expires-in-days <days>
+
+# 範例：建立 CI/CD 用 Key
+momentry n8n create \
+  --api_key "n8n_api_xxxxxxxxxxxx" \
+  --label "ci-pipeline" \
+  --expires-in-days 90
+```
+
+**輸出範例：**
+```
+✅ n8n API Key created successfully!
+
+┌─────────────────────────────────────────────────────────────────────────────┐
+│  ⚠️  IMPORTANT: Save this API key now - it will not be shown again!       │
+└─────────────────────────────────────────────────────────────────────────────┘
+
+Key ID:    abc123-def456
+Label:     ci-pipeline
+API Key:   eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
+
+Usage:
+  curl -H 'X-N8N-API-KEY: eyJhbGciOiJIUz...' https://n8n.momentry.ddns.net/api/v1/workflows
+```
+
+#### 列出 API Keys
+
+```bash
+# 列出所有 API Keys
+momentry n8n list --api-key <existing_n8n_api_key>
+```
+
+**輸出範例：**
+```
+📋 n8n API Keys
+
+┌────────────────────────────────────────────────────────────────────────────┐
+│ Label                       │ ID                                      │
+├────────────────────────────────────────────────────────────────────────────┤
+│ ci-pipeline                 │ abc123-def456-789                      │
+│ monitoring                  │ xyz789-abc123-456                      │
+└────────────────────────────────────────────────────────────────────────────┘
+
+Total: 2 key(s)
+```
+
+#### 刪除 API Key
+
+```bash
+# 刪除指定 API Key
+momentry n8n delete \
+  --api-key <existing_n8n_api_key> \
+  --label <key_label>
+```
+
+#### 查詢本地記錄
+
+```bash
+# 查詢已納管的 API Key 記錄
+momentry n8n verify --label <key_label>
+```
+
+**輸出範例：**
+```
+📋 n8n API Key: ci-pipeline
+  Key ID:        abc123-def456
+  Last 8:        ...JVCJ9
+  Created:       2026-03-21 06:44:55.577586 UTC
+  Expires:       2026-06-19 06:44:55.577586 UTC
+  Last Verified: never
+```
+
+### 12.6 使用範圍
+
+#### 適用場景
+
+| 場景 | 說明 |
+|------|------|
+| CI/CD 整合 | 建立專用 Key 用於自動化流程 |
+| 監控整合 | 建立只讀 Key 用於監控工作流狀態 |
+| 服務間通訊 | 建立 Key 供其他服務呼叫 n8n API |
+| 開發環境 | 為開發者建立短期 Key |
+
+#### 限制
+
+| 限制 | 說明 |
+|------|------|
+| 明文 API Key | 僅在建立時取得，無法再次查詢 |
+| 管理憑證 | 需要一個現有的 n8n API Key |
+| 本地刪除 | 不會自動同步到 n8n |
+| 權限範圍 | 非 Enterprise 版無細粒度權限 |
+
+### 12.7 環境變數
+
+```bash
+# n8n 連線設定
+N8N_URL=https://n8n.momentry.ddns.net    # n8n API URL
+```
+
+### 12.8 安全考量
+
+| 項目 | 措施 |
+|------|------|
+| 管理 Key | 需妥善保管，作為管理其他 Key 的憑證 |
+| API Key 儲存 | 本地僅存元數據，不含明文 |
+| 過期機制 | 建議設定過期時間 |
+| 定期輪換 | 建議定期更新 Key |
+
+---
+
+## 13. 參考文檔
+
+- PostgreSQL Schema
+- Redis Key 設計（ MOMENTRY_CORE_REDIS_KEYS.md）
+- 監控系統（MOMENTRY_CORE_MONITORING.md）
+- Gitea 安裝指南（INSTALL_GITEA.md）
+- n8n API 文件（https://docs.n8n.io/api/authentication/）
--- a/docs_v1.0/DESIGN/ASR_MODEL_SELECTION_REPORT.md
+++ b/docs_v1.0/DESIGN/ASR_MODEL_SELECTION_REPORT.md
@@ -0,0 +1,133 @@
+# ASR Model Selection Report
+
+**Date:** 2026-05-10
+**Video:** Charade (1963), 113min
+**Test setup:** faster-whisper on M5 MacBook Pro (Apple Silicon, CPU int8)
+
+## Test Clips
+
+| Clip | Time range | Duration | Characteristics |
+|------|-----------|----------|-----------------|
+| A — Rapid | 25:40–28:40 | 3 min | Fast back-and-forth dialogue, Cary & Audrey |
+| B — Normal | 10:00–13:00 | 3 min | Normal conversation pace |
+| C — Complex | 73:20–76:20 | 3 min | Multi-person scene, background audio |
+
+## Test Matrix
+
+| Variable | Values |
+|----------|--------|
+| Model | tiny, base, small, medium, large-v3 |
+| VAD min_silence | 200ms, 500ms |
+| Beam size | 5 (fixed) |
+
+## Results Summary
+
+### Clip A — Rapid Dialogue
+
+| Model | VAD | Segments | Chars | Runtime | Δ chars vs best |
+|-------|-----|----------|-------|---------|-----------------|
+| tiny | 200 | **55** | **1618** | **4.8s** | — |
+| tiny | 500 | **59** | 1582 | **4.8s** | −36 |
+| base | 200 | 50 | 1543 | 9.7s | −75 |
+| base | 500 | 51 | 1547 | 11.6s | −71 |
+| small | 200 | 47 | 1538 | 15.0s | −80 |
+| small | 500 | 47 | 1538 | 14.5s | −80 |
+| medium | 200 | 45 | 1241 | 34.0s | −377 |
+| medium | 500 | 45 | 1241 | 34.9s | −377 |
+| large-v3 | 200 | 14 | 916 | 42.1s | −702 |
+| large-v3 | 500 | 14 | 916 | 42.0s | −702 |
+
+**Winner: tiny** — 55–59 segments, most text captured, 4.8s (3× faster than small)
+
+### Clip B — Normal Dialogue
+
+| Model | VAD | Segments | Chars | Runtime | Δ chars vs best |
+|-------|-----|----------|-------|---------|-----------------|
+| tiny | 200 | 57 | 1875 | 11.9s | −40 |
+| tiny | 500 | **59** | 1801 | 10.9s | −114 |
+| base | 200 | 23 | 1695 | **5.1s** | −220 |
+| base | 500 | 23 | 1695 | **5.1s** | −220 |
+| small | 200 | **62** | 1731 | 15.7s | −184 |
+| small | 500 | **62** | 1731 | 16.4s | −184 |
+| medium | 200 | 59 | 1758 | 44.9s | −157 |
+| medium | 500 | 59 | 1758 | 44.8s | −157 |
+| large-v3 | 200 | 32 | **1915** | 95.6s | — |
+| large-v3 | 500 | — | — | — | — (slow) |
+
+**Winner: small** — 62 segments (most), good balance of speed vs accuracy
+**Note:** large-v3 captured 1915 chars (most text) but at 95.6s (6× slower than small)
+
+### Clip C — Complex Scene
+
+| Model | VAD | Segments | Chars | Runtime | Δ chars vs best |
+|-------|-----|----------|-------|---------|-----------------|
+| tiny | 200 | 54 | 1817 | 12.2s | −336 |
+| tiny | 500 | 52 | 1788 | 10.5s | −365 |
+| base | 200 | 51 | 2018 | 10.1s | −135 |
+| base | 500 | 51 | 2006 | 9.2s | −147 |
+| small | 200 | **64** | 1902 | 22.5s | −251 |
+| small | 500 | 61 | **2041** | 21.2s | −112 |
+| medium | 200 | 57 | 2044 | 999.3s | −109 |
+| medium | 500 | — | — | — | — (hang) |
+| large-v3 | 200 | — | — | — | — (hang) |
+| large-v3 | 500 | — | — | — | — (hang) |
+
+**Winner: base** — 51 segments, 2018 chars, 9.2s fastest reliable
+**Note:** medium and large-v3 both hang/timeout on complex audio in this scene
+
+## Aggregate Scores
+
+Weighted ranking (higher = better, equal weight: segment count, char count, inverse runtime):
+
+| Model | Segments (avg) | Chars (avg) | Runtime (avg) | Score | Rank |
+|-------|---------------|-------------|---------------|-------|------|
+| **tiny** | 56.0 | 1730 | **9.2s** | **8.5** | 🥇 |
+| **small** | 54.7 | 1704 | 17.6s | **7.8** | 🥈 |
+| base | 41.5 | 1751 | 10.1s | 7.0 | 🥉 |
+| medium | 51.5 | 1627 | 339.6s | 3.5 | 4 |
+| large-v3 | 20.0 | 1249 | 68.8s | 2.0 | 5 |
+
+## VAD Comparison (200ms vs 500ms)
+
+Averaged across all models and clips:
+
+| VAD | Segments | Chars | Runtime |
+|-----|----------|-------|---------|
+| 200ms | 45.9 | 1683 | 86.1s |
+| 500ms | 46.6 | 1685 | 69.2s |
+
+**Difference:** Negligible. VAD 200ms vs 500ms produces essentially identical results across all models.
+
+## Conclusions
+
+### 1. Smaller is better for this use case
+
+Contrary to expectations, **tiny and small** consistently outperform medium and large-v3 on every metric for Charade's dialogue:
+
+| Metric | tiny | large-v3 | Δ |
+|--------|------|----------|---|
+| Segments/clip | 56 | 20 | **+180%** |
+| Text captured | 98% | 72% | **+26%** |
+| Speed | 9.2s | 68.8s | **7.5× faster** |
+
+### 2. Large models lose text, not gain it
+
+medium and large-v3 produce fewer, longer segments that **merge multiple utterances together**, resulting in less total text. This is the opposite of what we need for segment-level speaker diarization.
+
+### 3. VAD parameter has minimal impact
+
+Changing `min_silence_duration_ms` between 200 and 500 produces <2% difference in all metrics. The current default (500ms) is fine.
+
+### 4. Recommendation
+
+**Keep current model: faster-whisper small (VAD 500ms)**
+
+| Reason | Detail |
+|--------|--------|
+| Segment quality | 47–64 segs/clip, clean sentence boundaries |
+| Speed | 14–22s per 3-min clip (real-time 0.1×) |
+| Stability | Never hangs, consistent across all scenes |
+| Text capture | 90–98% of best model |
+| Current integration | Already production-tested |
+
+The missing text problem for rapid dialogue is not solvable by model size — even tiny captures more text than large-v3. The root cause is Whisper's **lack of speaker turn detection** in its segment boundary logic, which is what ASRX (ECAPA-TDNN) is meant to solve.
--- a/docs_v1.0/DESIGN/ASR_SEGMENTATION_ENHANCEMENT.md
+++ b/docs_v1.0/DESIGN/ASR_SEGMENTATION_ENHANCEMENT.md
@@ -0,0 +1,133 @@
+# ASR Segmentation Enhancement Report
+
+**Date:** 2026-05-10
+**Movie:** Charade (1963), 113 min
+**Goal:** Fix merged-speaker segments in ASR output by detecting speaker change points within ASR segments.
+
+## Problem
+
+Whisper ASR produces segments at sentence boundaries, but during rapid back-and-forth dialogue (common in Charade), a single ASR segment may contain utterances from **multiple speakers**:
+
+```
+ASR segment [1550.0-1554.0] (4.0s):
+  "What's she saying now?"
+
+Actual dialogue:
+  1552.7: Audrey: "What's she saying now?"
+  1553.4: Cary:   "That she's innocent."
+```
+
+The old ASRX pipeline (ECAPA-TDNN on ASR boundaries) assigned one speaker per ASR segment, losing the turn boundary.
+
+## Solution: Sliding-Window Speaker Change Detection
+
+### Detection Method
+
+Instead of relying on ASR segment boundaries, we:
+
+1. **Slide a 1.5s window (0.75s stride)** across the entire audio
+2. **Extract ECAPA-TDNN 192D embeddings** per window (239 windows per 3 min of audio)
+3. **Classify each window** against reference centroids built from the full movie's known speaker assignments
+4. **Smooth** with a 3-window majority filter (eliminates single-window noise)
+5. **Detect change points** where the classified speaker changes between adjacent windows
+6. **Split** the original ASR segment at each change point
+
+### Reference Centroids
+
+Built from the existing 3417 ASRX embedding set:
+- **Cary Grant**: centroid from 1420 known segments
+- **Audrey Hepburn**: centroid from 1689 known segments
+- **Unknown**: centroid from 308 segments (background/minor characters)
+
+Classification uses cosine similarity to nearest centroid, giving ~0.8+ similarity for main characters.
+
+### Validation: Gender Classification
+
+Each speaker cluster was independently validated via gender classification:
+
+| Cluster | Assigned | Voice Gender | Confidence |
+|---------|----------|-------------|------------|
+| SPEAKER_0 | Audrey Hepburn | FEMALE | 0.71 |
+| SPEAKER_1 | Cary Grant | MALE | 0.71 |
+| SPEAKER_2 | Unknown | MIXED | — |
+
+2 small clusters (10 segs each) initially showed MALE voice → "Audrey" assignment. These were segments where a male voice speaks while Audrey is on screen (old face-based matching was wrong). The fine-grained segmentation correctly resolves these.
+
+### Results
+
+| Metric | Before (ASR) | After (Fine) | Change |
+|--------|-------------|-------------|--------|
+| Total segments | 3,417 | **4,188** | **+771 (+22.6%)** |
+| Cary Grant | 1,420 | **2,033** | +613 |
+| Audrey Hepburn | 1,689 | **1,658** | −31 |
+| Unknown | 308 | **497** | +189 |
+| Avg segment duration | 2.0s | **1.6s** | −20% |
+
+### Effect on Problem Zone (1544-1565s)
+
+```
+BEFORE — ASR segments (47 total for 3min clip):
+[1544.0-1546.0] "Who's that with the hat?"           → single speaker
+[1546.0-1548.0] "That's the policeman."                → single speaker
+[1548.0-1550.0] "He wants to arrest Judy for Punch."   → single speaker
+[1550.0-1554.0] "What's she saying now?"               → merged! multiple speakers
+[1554.0-1557.5] "That she's innocent. She didn't do it." → merged
+[1557.5-1560.7] "Oh, she did it all right."            → merged
+...
+
+AFTER — Fine segments (64 total for 3min clip):
+[1550.3-1551.0] "He wants to arrest Judy..."           → Audrey Hepburn
+[1552.7-1553.4] "What's she saying now?"                → Audrey Hepburn
+[1553.4-1554.2] "now? That"                              → Cary Grant
+[1554.2-1559.3] "That she's innocent. She didn't..."    → Cary Grant
+[1559.3-1560.5] "Oh, she did it all right."             → Audrey Hepburn
+[1560.5-1561.6] "right. I"                               → Cary Grant
+[1561.6-1562.8] "I believe her."                        → Cary Grant
+```
+
+12 long ASR segments (>3s) were detected; 78% were successfully split into multi-speaker groups.
+
+### Text Acquisition
+
+Split segments needed their own text (since the parent ASR segment's text covers a different time range). Three approaches were tested:
+
+1. **Proportional split** (failed): Split text by time ratio → produces broken words
+2. **Word-timestamp ASR** (partially succeeded): faster-whisper with `word_timestamps=True` → 87% coverage; remaining gaps from ASR word boundary mismatches
+3. **Per-segment ASR** (fallback): Individual faster-whisper on empty segments → filled remaining 13%
+
+Final result: **4,188/4,188 segments with text.**
+
+### Voice Embeddings
+
+ECAPA-TDNN 192D embeddings were extracted per segment:
+- Runtime: 63s for 4,188 segments
+- Stored in `asrx_fine.json` alongside segment metadata
+
+### Data Files
+
+| File | Size | Description |
+|------|------|-------------|
+| `asrx_fine.json` | ~45 MB | 4,188 fine segments + 4,188 embeddings |
+| `asrx_fine.json → segments[].speaker_name` | — | Centroid-matched identity |
+| `asrx_fine.json → segments[].speaker_id` | — | SPEAKER_0/1/2 |
+| `asrx_fine.json → segments[].text` | — | ASR text (word-timestamp mapped) |
+| `asrx_fine.json → embeddings[]` | — | 192D ECAPA-TDNN per segment |
+
+### Continued Limitations
+
+1. **Word boundary alignment**: Split segment text sometimes has ±1 word due to sliding-window vs. ASR boundary mismatch (cosmetic, not semantic)
+2. **ASR merge in silence zones**: Very short utterances (<0.5s) merged into adjacent segments
+3. **Background speakers**: Multiple background speakers grouped as "Unknown"
+
+### Pipeline Integration
+
+The `asrx_fine.json` file serves as the new ASRX output. The original `asr.json` (3,417 segments with text) remains the primary text source, while `asrx_fine.json` provides superior speaker diarization at 4,188 segments.
+
+Speaker assignments in DB `dev.chunks` metadata were updated with `fine_speaker_name` and `fine_speaker_id` fields. Qdrant collections `momentry_dev_v1`, `sentence_story`, `sentence_summary` payloads were batch-updated with new speaker_name/speaker_id.
+
+### Hardware & Performance
+
+- Machine: M5 MacBook Pro, 48GB, Apple Silicon
+- Model: faster-whisper small (int8 CPU)
+- Embedding: ECAPA-TDNN via SpeechBrain
+- Total processing time: ~5 min for the full 113-min movie
--- a/docs_v1.0/DESIGN/DETECTOR_REGISTRY.md
+++ b/docs_v1.0/DESIGN/DETECTOR_REGISTRY.md
@@ -0,0 +1,602 @@
+# Momentry Core — Detector Registry
+
+**Date**: 2026-05-13
+**Version**: 1.0
+**Purpose**: 所有模型/演算法檢測器的座標約定、轉換鏈、驗證狀態統整
+
+---
+
+## 原則
+
+1. **每 detector 一條**：獨立記錄輸入/輸出格式、座標原點、單位、轉換公式。
+2. **原始座標系標註**：不隱藏轉換，任何異於 Top-Left pixel 的輸出必須明列。
+3. **轉換鏈可追溯**：從 detector 原始輸出到入庫欄位，每一步轉換都記錄。
+4. **驗證狀態三級**：`verified`（已測試） / `assumed`（文檔推斷，未實測） / `buggy`（已知有誤）。
+
+---
+
+## 分類總覽
+
+| Category | 數量 | Active | Experimental | Deprecated |
+|----------|:----:|:------:|:----------:|:--------:|
+| face | 8 | 2 | 4 | 2 |
+| body | 3 | 1 | 2 | 0 |
+| object | 4 | 1 | 3 | 0 |
+| text | 3 | 1 | 2 | 0 |
+| speech | 3 | 2 | 1 | 0 |
+| scene | 2 | 1 | 0 | 1 |
+| stamps | 2 | 0 | 2 | 0 |
+| **Total** | **25** | **8** | **14** | **3** |
+
+| Status | 定義 |
+|:------:|------|
+| **Active** | 生產 pipeline 中執行，`ProcessorType` 有註冊，產出被消費 |
+| **Experimental** | 獨立腳本或 CLI，不連 pipeline；評估中或備用 |
+| **Deprecated** | 評估後棄用；或已被新版取代但未從 codebase 移除 |
+
+---
+
+## Pipeline Status Quick-Reference
+
+| # | Detector ID | Short Name | Pipeline Status | Reason |
+|---|-------------|-----------|:-----:|--------|
+| 1 | DET-CUT-001 | PySceneDetect | active | CUT processor |
+| 2 | DET-SCN-001 | Places365 | **active but rejected** ⚠️ | M5 eval rejected; never removed from ProcessorType |
+| 3 | DET-ASR-001 | faster-whisper | active | ASR processor |
+| 4 | DET-SPCH-003 | ECAPA-TDNN | active | ASRX speaker embedding |
+| 5 | DET-OBJ-001 | YOLOv8s | active | YOLO processor (v5nu→v8s, 2026-05-13) |
+| 6 | DET-TEXT-001 | swift_ocr | active | OCR processor (primary) |
+| 7 | DET-FACE-001/002/003 | swift_face + FaceNet | active | Face processor |
+| 8 | DET-BODY-001/002 | swift_pose + YOLOv8-pose | active | Pose processor (primary + fallback) |
+| 9 | DET-FACE-006 | AgglomerativeClustering | active | Identity Agent (post-processing) |
+| 10 | DET-TEXT-005 | llama.cpp embed | active | Text embedding (chunk vectors) |
+| 11 | DET-FACE-005 | InsightFace | experimental | Not in production ProcessorType |
+| 12 | DET-FACE-007 | MediaPipe BlazeFace | experimental | MPS fallback, tested but not primary |
+| 13 | DET-FACE-008 | MediaPipe Face Mesh | experimental | Lip processor, not in main pipeline |
+| 14 | DET-BODY-003 | MediaPipe Holistic | experimental | Tested, not in production |
+| 15 | DET-OBJ-003 | OWL-ViT | experimental | Tested for stamps, not in pipeline |
+| 16 | DET-OBJ-004 | Grounding DINO | experimental | Tested for stamps/objects |
+| 17 | DET-TEXT-002 | Florence-2 | experimental | Tested for stamps |
+| 18 | DET-OBJ-002 | Gun Detector | experimental | Evaluated, all FP, rejected for pipeline |
+| 19 | DET-STP-001 | OpenCV Stamp | experimental | Used in scan scripts only |
+| 20 | DET-STP-002 | Pose Action Decoder | experimental | Derived from pose, standalone |
+| 21 | DET-FACE-004 | DeepFace ArcFace | deprecated | Replaced by CoreML FaceNet |
+| 22 | DET-SPCH-002 | Apple Speech ASR | deprecated | Replaced by faster-whisper |
+| 23 | DET-SCN-001 | Places365 (scene) | ⚠️ deprecated per eval | Still in ProcessorType, needs removal |
+| 24 | DET-TEXT-003 | EmbeddingGemma | experimental | Text embed endpoint, not primary |
+| 25 | DET-TEXT-004 | mxbai CoreML | experimental | Text embed endpoint, not primary |
+
+---
+
+## Known Misjudgments in Existing Evaluations
+
+| # | Evaluation | Issue | Impact | Action |
+|---|-----------|-------|--------|--------|
+| M1 | **Scene Classification** (2026-05-07) | M5 evaluated and REJECTED Places365. But it was never removed from `ProcessorType::all()`. Still runs on every file. | Wastes ~2min per registration. Produces meaningless scene.json. | Remove from pipeline or re-evaluate |
+| M2 | **Face Processor** benchmark (2026-04-28) | Compared InsightFace vs MediaPipe vs OpenCV vs Contract v1. But the final pipeline uses **swift_face + FaceNet**, a completely different solution not in the benchmark. | Selection criteria from benchmark don't apply to actual pipeline detector. | Document the actual selection decision for swift_face |
+| M3 | **Gun Detector** (2026-05-07) | Properly rejected: 7/7 FP. Correct decision. Model files still in repo. | No impact (correctly excluded). Clean up model files. | Archive or remove `models/gun/` |
+| M4 | **OCR processor** | No selection document exists. swift_ocr chosen without comparison against EasyOCR/PaddleOCR. | Unknown if optimal. PaddleOCR fallback may never trigger. | Document selection decision |
+
+---
+
+### 技術分類（有空間座標 vs 無）
+
+| Category | 數量 | 有空間座標 | 僅 Embedding | 純時間/文字 |
+|----------|:----:|:--------:|:----------:|:--------:|
+| face | 8 | 5 | 3 | — |
+| body | 3 | 3 | — | — |
+| object | 4 | 4 | — | — |
+| text | 3 | 1 | 2 | — |
+| speech | 3 | — | 2 | 1 |
+| scene | 2 | — | 1 | 1 |
+| stamps | 2 | 2 | — | — |
+| **Total** | **25** | **15** | **8** | **2** |
+
+---
+
+## Face Detectors
+
+### DET-FACE-001 — Face Bbox (Apple Vision)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | Apple Vision |
+| **Model** | `VNDetectFaceRectanglesRequest` |
+| **Input** | `CVPixelBuffer` (BGRA, via CGImage) |
+| **Output** | bbox: `x, y, width, height` |
+| **Coordinate** | Input: normalized [0-1], origin **bottom-left** |
+| **Transform** | `x = bb.origin.x * imgW` |
+| | `y = (1.0 - bb.origin.y - bb.size.height) * imgH` |
+| **Image size** | `cgImage.width / cgImage.height` |
+| **Target** | Top-Left pixel integer |
+| **File** | `scripts/swift_processors/swift_face.swift:134-136` |
+| **Status** | ✅ verified (2026-05-13, landmark QC + visual check) |
+
+---
+
+### DET-FACE-002 — Face Landmarks (Apple Vision)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | Apple Vision |
+| **Model** | `VNDetectFaceLandmarksRequest` |
+| **Input** | `CVPixelBuffer` (BGRA, via CGImage) |
+| **Output** | landmarks: `left_eye (6pt)`, `right_eye (6pt)`, `nose (8pt)`, `outer_lips`, `inner_lips` |
+| **Coordinate** | Input: `VNFaceLandmarks2D.pointsInImage(imageSize:)` |
+| | Returned: macOS AppKit convention → **bottom-left** origin ⚠️ |
+| **Transform** | `y_top_left = imgH - $0.y` (Y-flip) |
+| **Image size** | `cgImage.width / cgImage.height` |
+| **Target** | Top-Left pixel float → JSON |
+| **Pairing** | Not by array index. Landmark observations used as primary source (self-consistent bbox + landmarks). Face rect observations deduplicated via IoU > 0.3. |
+| **File** | `scripts/swift_processors/swift_face.swift:155-184` |
+| **Status** | ✅ verified (2026-05-13, Y-flip fix, 100% landmark-in-bbox) |
+| **Bugs fixed** | BUG-001: index-based pairing (landmarkObs[idx] ≠ faceObs[idx]) |
+| | BUG-002: macOS bottom-left Y axis (missing Y-flip) |
+
+---
+
+### DET-FACE-003 — Face Embedding (CoreML FaceNet)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | CoreML (ANE-accelerated) |
+| **Model** | `models/facenet512.mlpackage` |
+| **Input** | Face crop 160×160, RGB, normalized `[-1, 1]` |
+| **Output** | 512-dim float embedding |
+| **Coordinate** | N/A (no spatial output). Bbox from DET-FACE-001 used for crop. |
+| **File** | `scripts/face_processor.py`, `scripts/embed_faces.py`, `scripts/tmdb_embed_extractor.py` |
+| **Embedding space** | [-1, 1] per dimension, cosine similarity for matching |
+| **Status** | ✅ verified (routinely used for identity matching) |
+
+---
+
+### DET-FACE-004 — Face Embedding (DeepFace ArcFace)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | DeepFace / TensorFlow |
+| **Model** | `ArcFace` (512-dim) |
+| **Input** | Face crop (from bbox), BGR, no explicit normalization |
+| **Output** | 512-dim float embedding |
+| **Coordinate** | N/A |
+| **File** | `scripts/face_embedding_extractor.py` |
+| **Status** | 🟡 assumed (legacy fallback, not primary pipeline) |
+
+---
+
+### DET-FACE-005 — Face Recognition (InsightFace)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | InsightFace / ONNX Runtime |
+| **Model** | `buffalo_l` (detection + recognition + 5-point landmarks) |
+| **Input** | Video frame (BGR, numpy array) |
+| **Output** | `bbox: [x1, y1, x2, y2]` pixel int |
+| | `landmarks: 5-point` (left_eye, right_eye, nose, mouth_left, mouth_right) |
+| | `embedding: 512-dim float` |
+| **Coordinate** | Bbox: **Top-Left pixel** (InsightFace native) |
+| | Landmarks: **normalized [0-1]** to image size |
+| **Transform** | Bbox: `face.bbox.astype(int)` — direct |
+| | Landmarks: `kps * imgW, kps * imgH` — needs manual conversion ⚠️ |
+| **File** | `scripts/face_recognition_processor.py:123-153` |
+| **Status** | 🟡 assumed (landmark pixel conversion chain not independently verified) |
+
+---
+
+### DET-FACE-006 — Face Clustering (sklearn)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | sklearn |
+| **Model** | `AgglomerativeClustering` |
+| **Input** | 512-dim face embeddings from DET-FACE-003 or DET-FACE-004 |
+| **Output** | cluster labels, centroids (512-dim float) |
+| **Coordinate** | N/A (no spatial output) |
+| **File** | `scripts/face_clustering_processor.py`, `scripts/identity_bind.py` |
+| **Status** | ✅ verified (428 clusters for Charade, identity_bindings created) |
+
+---
+
+### DET-FACE-007 — Face Detection (MediaPipe BlazeFace)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | MediaPipe / MPS |
+| **Model** | `blaze_face_short_range.tflite` |
+| **Input** | Frame (numpy array / MPS image) |
+| **Output** | `bbox: [x, y, width, height]` pixel |
+| | `6 keypoints`: eyes, nose tip, mouth center, ear tragions — **pixel** |
+| **Coordinate** | **Top-Left pixel** (MediaPipe native) |
+| **Transform** | Direct, no conversion needed |
+| **File** | `scripts/face_processor_mps.py` |
+| **Status** | 🟡 assumed (MPS fallback, rarely used in pipeline) |
+
+---
+
+### DET-FACE-008 — Lip Detection (MediaPipe Face Mesh)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | MediaPipe |
+| **Model** | `Face Mesh` (468 landmarks) |
+| **Input** | Face crop or full frame |
+| **Output** | `lip_openness: [0-1]` (vertical/mouth_width) |
+| | `mouth keypoints`: indices 13, 14, 61, 291 from 468 mesh |
+| **Coordinate** | Landmarks: **normalized [0-1]**, Top-Left origin |
+| **Transform** | Normalized → pixel: `x * imgW, y * imgH` |
+| | Lip openness: derived ratio, unitless |
+| **File** | `scripts/lip_processor.py` |
+| **Status** | 🟡 assumed |
+
+---
+
+## Body Pose Detectors
+
+### DET-BODY-001 — Body Pose (Apple Vision)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | Apple Vision |
+| **Model** | `VNDetectHumanBodyPoseRequest` |
+| **Input** | `CGImage` (from frame export or NSImage) |
+| **Output** | `19 keypoints`: nose, eyes, ears, neck, root, shoulders, elbows, wrists, hips, knees, ankles |
+| | `bbox: [x, y, width, height]` derived from keypoint min/max |
+| **Coordinate** | Input: normalized [0-1], origin **bottom-left** |
+| **Transform** (current) | ✅ `y = h - location.y * h` — Y-flip applied |
+| **Transform** (correct) | `y = h - location.y * h` |
+| **Image size** | `cgImage.width / cgImage.height` |
+| **Target** | Top-Left pixel float |
+| **File** | `scripts/swift_processors/swift_pose.swift:154-159` |
+| **Status** | ✅ verified (2026-05-13, Y-flip fix applied) |
+
+---
+
+### DET-BODY-002 — Body Pose (YOLOv8 Pose fallback)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | ultralytics / PyTorch |
+| **Model** | `yolov8n-pose.pt` |
+| **Input** | Frame (PIL or numpy) |
+| **Output** | `17 COCO keypoints`: nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
+| | `bbox: [x, y, width, height]` derived from keypoints (conf > 0.1) |
+| **Coordinate** | **Top-Left pixel** (YOLO native, `.xy[0]` → numpy float) |
+| **Transform** | Direct: `x, y = float(kps[j][0]), float(kps[j][1])` |
+| | Bbox: `min(xs), min(ys), max(xs)-min(xs), max(ys)-min(ys)` |
+| **File** | `scripts/pose_processor.py:78-97` |
+| **Status** | ✅ top-left native |
+
+---
+
+### DET-BODY-003 — Full Body (MediaPipe Holistic)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | MediaPipe |
+| **Model** | `Holistic` (pose + face mesh + hands) |
+| **Input** | Frame (BGR numpy) |
+| **Output** | `468 face mesh`: `[[x, y, z], ...]` normalized [0-1] |
+| | `33 body pose`: `[[x, y, z, visibility], ...]` normalized [0-1] |
+| | `21 hand × 2`: `[[x, y, z], ...]` normalized [0-1] |
+| **Coordinate** | **normalized [0-1]**, Top-Left origin |
+| **Transform** | `x * imgW, y * imgH` → pixel (if needed) |
+| | Z: depth relative, not metric |
+| **File** | `scripts/mediapipe_holistic_processor.py` |
+| **Status** | ✅ top-left native, normalized→pixel straightforward |
+
+---
+
+## Object Detectors
+
+### DET-OBJ-001 — Object Detection (YOLOv8s)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | ultralytics / CoreML + PyTorch fallback |
+| **Model** | `yolov8s.mlpackage` (primary, CoreML ANE), `yolov8s.pt` (fallback) |
+| **mAP (COCO)** | 44.9 (was 34.3 with YOLOv5nu, +31%) |
+| **Input** | Frame (PIL or numpy) |
+| **Output** | `bbox: [x1, y1, x2, y2]` — float pixel |
+| | `class_name, class_id` (80 COCO classes) |
+| | `confidence: [0-1]` |
+| **Coordinate** | **Top-Left pixel** (YOLO `.xyxy[0]` → float) |
+| **Transform** | Rust: `x = detection.x1 as i32, y = detection.y1 as i32` — **int truncation** |
+| | `width = x2 - x1, height = y2 - y1` |
+| **Image size** | YOLO auto-handles via ultralytics inference |
+| **File** | `scripts/yolo_processor.py:272-285`, `src/core/processor/yolo.rs:83-117` |
+| **Status** | ✅ verified (2026-05-13, replaced YOLOv5nu, +19% detections, scene indicators +162~+473%) |
+| **Replaced** | YOLOv5nu (mAP 34.3, removed 2026-05-13) |
+
+---
+
+### DET-OBJ-002 — Weapon Detection (YOLOv8n Fine-tuned)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | ultralytics / PyTorch |
+| **Model** | `models/gun/gun_detector/weights/best.pt` |
+| **Input** | Frame (numpy array) |
+| **Output** | `bbox: [x1, y1, x2, y2]` pixel |
+| | `class: {0: grenade, 1: knife, 2: pistol, 3: rifle}` |
+| **Coordinate** | **Top-Left pixel** (YOLO native) |
+| **File** | `scripts/gun_detector_scan.py` |
+| **Status** | ✅ top-left native |
+
+---
+
+### DET-OBJ-003 — Open-Vocabulary Detection (OWL-ViT)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | HuggingFace Transformers |
+| **Model** | `google/owlvit-base-patch32` |
+| **Input** | PIL Image + text queries |
+| **Output** | `bbox, scores, labels` |
+| **Coordinate** | post_process_object_detection returns boxes in `[x1, y1, x2, y2]` format |
+| | scaled to `target_sizes` parameter |
+| **Transform** | `target_sizes = torch.Tensor([image_pil.size[::-1]])` — PIL (w,h) → (h,w) |
+| | `box.int().tolist()` or `box.tolist()` → Python list |
+| **Format risk** | HuggingFace processor version may return `[cx, cy, w, h]` not `[x1,y1,x2,y2]` |
+| **File** | `scripts/test_owl_vit_stamps.py:69-80`, `scripts/magnifying_glass_owl.py:65-77` |
+| **Status** | 🟡 **assumed** (bbox format not independently verified with visual check) |
+| **Verify** | Render bbox overlay on a known target image, confirm x1 < x2, y1 < y2 |
+
+---
+
+### DET-OBJ-004 — Open-Vocabulary Detection (Grounding DINO)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | HuggingFace Transformers |
+| **Model** | `IDEA-Research/grounding-dino-base` |
+| **Input** | PIL Image + text prompts |
+| **Output** | `boxes, labels, scores` |
+| **Coordinate** | processor rescales to `target_sizes`, returns pixel boxes |
+| **Transform** | `target_sizes=[img.size[::-1]]` — PIL (w,h) → (h,w) |
+| | `[round(v, 1) for v in dets["boxes"][i].tolist()]` |
+| **Format risk** | `[::-1]` order depends on processor expectations. If processor expects (w,h), axes swapped. |
+| **File** | `scripts/gdino_frame_api.py:176-180` |
+| **Status** | 🟡 **assumed** (rescale direction not independently verified) |
+| **Verify** | Single-frame output: check bbox x range ≤ imgW, y range ≤ imgH |
+
+---
+
+## Text / OCR Detectors
+
+### DET-TEXT-001 — OCR (Apple Vision)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | Apple Vision |
+| **Model** | `VNRecognizeTextRequest` (accurate/fast) |
+| **Input** | `CVPixelBuffer` (via CGImage) |
+| **Output** | `text: string`, `bbox: [x, y, w, h]`, `confidence: [0-1]` |
+| **Coordinate** | Input: `VNRecognizedTextObservation.boundingBox` — normalized [0-1], origin **bottom-left** |
+| **Transform** | ✅ `y = (1.0 - bb.origin.y - bb.size.height) * cgH` — Y-flip applied |
+| **Image size** | Main loop: `cgImage.width / cgImage.height` ✅ |
+| | `recognizeText()` helper: `CVPixelBufferGetWidth/Height` ✅ |
+| **File** | `scripts/swift_processors/swift_ocr.swift:125-133`, `:181-182` |
+| **Status** | ✅ verified (2026-05-13, Y-flip + image size fix applied) |
+
+---
+
+### DET-TEXT-002 — Open-Vocabulary (Florence-2)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | HuggingFace Transformers |
+| **Model** | `microsoft/Florence-2-base` |
+| **Input** | PIL Image + task prompt |
+| **Output** | `bbox: [x1, y1, x2, y2]` pixel |
+| | `label, text` (depending on task) |
+| **Coordinate** | processor `post_process_generation` rescales to `image_size`, returns pixel |
+| **Transform** | `x1, y1, x2, y2 = map(int, bbox)` — direct |
+| | `image_size=(image_pil.width, image_pil.height)` — (w, h) order ✅ |
+| **File** | `scripts/florence2_scan_stamps.py:67-79`, `scripts/test_florence2_direct.py` |
+| **Status** | ✅ top-left native (HuggingFace post_process output) |
+
+---
+
+### DET-TEXT-003 — Text Embedding (EmbeddingGemma)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | HuggingFace / PyTorch MPS |
+| **Model** | `google/embeddinggemma-300m` |
+| **Input** | Text string |
+| **Output** | Embedding vector (L2 normalized, dimension model-dependent) |
+| **Coordinate** | N/A |
+| **File** | `scripts/embeddinggemma_server.py` |
+| **Status** | ✅ verified (embedding API server) |
+
+---
+
+## Text Embedding (Non-Detector)
+
+### DET-TEXT-004 — Text Embedding (mxbai CoreML)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | CoreML (ANE-accelerated) |
+| **Model** | `mxbai-embed-large-v1.mlpackage` |
+| **Input** | Text tokenized |
+| **Output** | Embedding vector |
+| **Coordinate** | N/A |
+| **File** | `scripts/coreml_embed_server.py` |
+| **Status** | 🟡 assumed |
+
+---
+
+### DET-TEXT-005 — Text Embedding (Ollama / llama.cpp)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | llama.cpp / Ollama API |
+| **Model** | llama.cpp embedding endpoint (port 11436) |
+| **Input** | Text (optionally prefixed `search_document:`) |
+| **Output** | 768-dim float embedding |
+| **Coordinate** | N/A |
+| **File** | `src/core/embedding/comic_embed.rs` |
+| **Status** | ✅ verified (embedding pipeline) |
+
+---
+
+## Speech / Audio Detectors
+
+### DET-SPCH-001 — ASR (faster-whisper)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | faster-whisper / CTranslate2 |
+| **Model** | `faster-whisper/small` (int8 CPU) |
+| **Input** | Audio extracted from video |
+| **Output** | `[{start, end, text}, ...]` — temporal segments (seconds) |
+| **Coordinate** | Temporal only (seconds), no spatial |
+| **File** | `scripts/asr_processor.py` |
+| **Status** | ✅ verified (ASR pipeline) |
+
+---
+
+### DET-SPCH-002 — ASR (Apple Speech)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | Apple Speech (ANE) |
+| **Model** | `SFSpeechRecognizer` |
+| **Input** | Audio file |
+| **Output** | `[{start, end, text, confidence}, ...]` — temporal segments |
+| **Coordinate** | Temporal only (seconds), no spatial |
+| **File** | `scripts/swift_processors/asr_swift.swift` |
+| **Status** | 🟡 assumed (Apple Speech quality lower than faster-whisper) |
+
+---
+
+### DET-SPCH-003 — Speaker Embedding (ECAPA-TDNN)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | SpeechBrain / PyTorch |
+| **Model** | `speechbrain/spkrec-ecapa-voxceleb` |
+| **Input** | Audio segments per speaker |
+| **Output** | `192-dim float embedding` |
+| **Coordinate** | N/A (vector space, cosine similarity) |
+| **File** | `scripts/asrx_processor_custom.py`, `scripts/voice_embedding_extractor.py` |
+| **Status** | ✅ verified (voice embeddings exported to SQLite + Qdrant) |
+
+---
+
+## Scene Detectors
+
+### DET-SCN-001 — Scene Classification (Places365)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | CoreML (ANE) + PyTorch MPS fallback |
+| **Model** | `resnet18_places365.mlpackage` |
+| **Input** | Frame resized to 224×224 |
+| **Output** | `[{scene_type, confidence, top_5}, ...]` — temporal segments |
+| **Coordinate** | Temporal only, no spatial |
+| **File** | `scripts/scene_classifier.py` |
+| **Status** | ✅ verified |
+
+---
+
+### DET-SCN-002 — Scene Cut Detection (PySceneDetect)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | PySceneDetect |
+| **Model** | `ContentDetector` (threshold-based frame difference) |
+| **Input** | Video frames |
+| **Output** | `[{scene_number, start_frame, end_frame, start_time, end_time}]` |
+| **Coordinate** | Temporal (frames + seconds), no spatial |
+| **File** | `scripts/cut_processor.py` |
+| **Status** | ✅ verified |
+
+---
+
+## Stamp / Specific Target Detectors
+
+### DET-STP-001 — Stamp Detection (OpenCV Color)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | OpenCV |
+| **Model** | HSV color masking + contour analysis (rule-based, no ML) |
+| **Input** | Frame (BGR numpy) |
+| **Output** | `bbox: [x, y, w, h]` pixel |
+| **Coordinate** | **Top-Left pixel** (`cv2.boundingRect()` native) |
+| **Transform** | Direct, no conversion |
+| **File** | `scripts/scan_full_video_stamps.py`, `scripts/find_blue_stamp_opencv.py` |
+| **Status** | ✅ top-left native |
+
+---
+
+### DET-STP-002 — Pose Action Decoder (Coordinate-derived)
+
+| Field | Value |
+|-------|-------|
+| **Framework** | Rule-based from keypoints |
+| **Model** | N/A (derived from DET-BODY-001/002/003 keypoints) |
+| **Input** | Pose keypoints (pixel) |
+| **Output** | Action labels: turn_left, turn_right, look_up, look_down, shake_head, nod_head, blink, smile, etc. |
+| **Coordinate** | Derived angles/ratios, no raw spatial output |
+| **File** | `scripts/utils/pose_action_decoder.py`, `scripts/utils/integrated_body_action_decoder.py` |
+| **Status** | 🟡 assumed (actions derived from pose keypoints; dependent on upstream keypoint correctness) |
+| **Warning** | Affected by DET-BODY-001 Y-flip bug — all action labels wrong when using Vision pose |
+
+---
+
+## Known Bugs Summary
+
+| Bug ID | Detector | Issue | Impact | Fixed |
+|:------|----------|-------|--------|:-----:|
+| BUG-001 | DET-FACE-001/002 | Index-based landmark↔face pairing | Wrong landmarks assigned to wrong faces | ✅ 2026-05-13 |
+| BUG-002 | DET-FACE-002 | macOS bottom-left → missing Y-flip | Landmarks 731px offset from bbox | ✅ 2026-05-13 |
+| BUG-003 | DET-BODY-001 | Missing Y-flip on keypoints | All 19 joint Y coordinates inverted | ✅ 2026-05-13 |
+| BUG-004 | DET-BODY-001 | Derived bbox Y inverted | Bbox doesn't cover actual person | ✅ 2026-05-13 |
+| BUG-005 | DET-TEXT-001 | Missing Y-flip on bbox | Text bbox Y inverted | ✅ 2026-05-13 |
+| BUG-006 | DET-TEXT-001 | Hardcoded 640×360 in `recognizeText()` | Wrong bbox scale for non-640×360 images | ✅ 2026-05-13 |
+
+---
+
+## Coordinate Convention Quick Reference
+
+### Apple Vision (all detectors)
+
+| Item | Convention |
+|------|-----------|
+| boundingBox origin | Bottom-Left |
+| boundingBox units | normalized [0-1] |
+| pointsInImage Y axis | Bottom-Left (macOS AppKit) |
+| Required Y-flip formula | bbox: `y = (1 - y_norm - h_norm) * imgH` |
+| | points: `y = imgH - raw_y` |
+
+### Non-Vision Detectors
+
+| Framework | Origin | Units |
+|-----------|:------:|-------|
+| YOLO (ultralytics) | Top-Left | pixel float |
+| MediaPipe | Top-Left | normalized [0-1] |
+| InsightFace bbox | Top-Left | pixel int |
+| InsightFace landmarks | Top-Left | normalized [0-1] |
+| HuggingFace (post_process) | Top-Left | pixel (after rescale) |
+| OpenCV | Top-Left | pixel int |
+
+---
+
+## 納管規則
+
+1. **新增 detector**：必須在此 Registry 註冊，含座標系、轉換公式、檔案位置。
+2. **座標變更**：任何轉換公式修改，必須更新此文件並標註變更日期。
+3. **驗證要求**：每個有空間座標的 detector 必須通過至少一次 visual check（bbox/keypoints 疊加原圖）。
+4. **跨 detector 比對**：同一 frame 的不同 detector 輸出 bbox，IoU 應合理（非零且非 1.0）。
+5. **Vision detector 鐵律**：任何使用 Apple Vision Framework 的 detector，必須確認 Y-flip 已實作。
+
+---
+
+## 維護
+
+- **Owner**: M5
+- **更新頻率**: 每次新增 processor 或修改座標轉換時
+- **參照**: `SPATIAL_COORDINATE_REGISTRY.md`（上層座標系統）
--- a/docs_v1.0/DESIGN/DETECTOR_SELECTION_SOP.md
+++ b/docs_v1.0/DESIGN/DETECTOR_SELECTION_SOP.md
@@ -0,0 +1,238 @@
+# Momentry Core — Detector 選型標準作業程序 (SOP)
+
+**Date**: 2026-05-13
+**Version**: 1.0
+**Ref**: `DETECTOR_REGISTRY.md`, `SPATIAL_COORDINATE_REGISTRY.md`
+
+---
+
+## 目的
+
+規範 detector（模型/演算法）的新增、評估、選型、入庫流程，確保每個進入生產 pipeline 的 detector 都經過完整驗證。
+
+---
+
+## 選型流程（6 Phase）
+
+```
+Phase 1: 需求定義 → Phase 2: 候選名單 → Phase 3: 基準測試
+→ Phase 4: 座標校驗 → Phase 5: 選型決策 → Phase 6: 入庫納管
+```
+
+---
+
+## Phase 1 — 需求定義
+
+### 1.1 輸出規格
+
+| 項目 | 必填 |
+|------|:--:|
+| 輸出類型（bbox / landmarks / keypoints / embedding / label / text） | ✅ |
+| 有無空間座標 | ✅ |
+| 預期精度（如：IoU > 0.5 with ground truth） | ✅ |
+| 預期速度（如：< 0.1s/frame on MPS） | ✅ |
+| 預期 memory（如：< 1GB） | ✅ |
+| 授權限制（MIT / Apache / GPL / commercial） | ✅ |
+
+### 1.2 輸入規格
+
+| 項目 | 必填 |
+|------|:--:|
+| 輸入型別（frame image / audio / text） | ✅ |
+| 是否需要前處理（resize / crop / normalize） | ✅ |
+| 需要的輸入尺寸 | ✅ |
+
+---
+
+## Phase 2 — 候選名單
+
+### 2.1 蒐集條件
+
+至少收集 **3 個候選**，涵蓋不同技術路線：
+
+| 技術路線 | 範例 |
+|---------|------|
+| Apple Vision (ANE) | swift_face, swift_pose, swift_ocr |
+| PyTorch / CoreML | YOLOv5n, FaceNet, ResNet18 |
+| HuggingFace Transformers | OWL-ViT, Florence-2, Grounding DINO |
+| 傳統 CV | OpenCV Haar, HSV masking |
+| MediaPipe | BlazeFace, Holistic, Face Mesh |
+
+### 2.2 排除條件
+
+以下任一成立即排除，不進入測試：
+
+- 授權不合（GPL/AGPL 在無 commercial license 時排除）
+- 已知在 target 平台無法運行（如 CUDA-only on Mac）
+- 維護狀態超過 2 年未更新（除非無替代方案）
+- 模型大小超過 1GB（除非有強烈理由）
+
+---
+
+## Phase 3 — 基準測試
+
+### 3.1 測試項目（全部強制）
+
+| # | 測試項目 | 方法 | 最低門檻 |
+|---|---------|------|:--:|
+| T1 | **處理速度** | 同影片 100 frame sample，測 wall time | 候選中最快 ±20% 內 |
+| T2 | **Memory 峰值** | `psutil` 監控，記錄 process RSS peak | < 2GB |
+| T3 | **檢出率** | vs 人工標註 ground truth（≥50 frame），算 Precision/Recall | Recall > 0.6 |
+| T4 | **誤報率** | TP / (TP + FP)，從同上 ground truth | Precision > 0.3（視任務） |
+| T5 | **輸出完整性** | 檢查 output JSON 格式符合 schema | 100% 欄位存在 |
+| **T6** | **座標正規化** | ← **新增，見 Phase 4** | |
+
+### 3.2 基準測試腳本規範
+
+每組候選必須產出：
+
+```
+output/benchmark/{category}/
+├── BENCHMARK_REPORT.md        # 人類可讀報告
+├── BENCHMARK_REPORT.json      # 機器可讀結果
+└── {scheme}_{detector}.json   # 各候選原始輸出
+```
+
+使用現有 `*_benchmark_runner.py` 模板，或參考 `scripts/compare_*.py`。
+
+---
+
+## Phase 4 — 座標正規化校驗（T6）← 強制新增
+
+### 4.1 為何強制
+
+以下 6 個已發現的座標 bug 全部來自**選型時未校驗座標**：
+
+| Bug | Detector | 問題 |
+|-----|----------|------|
+| BUG-001 | face landmarks | index-based pairing 錯誤 |
+| BUG-002 | face landmarks | macOS Vision Y-flip 遺漏 |
+| BUG-003 | body pose | Y-flip 遺漏 |
+| BUG-004 | body pose | bbox Y 反轉 |
+| BUG-005 | OCR text | Y-flip 遺漏 |
+| BUG-006 | OCR text | hardcoded 640×360 image size |
+
+> **原則：任何產出空間座標的 detector，座標校驗為選型的必要條件，未通過不得納入 pipeline。**
+
+### 4.2 校驗項目
+
+| # | 項目 | 方法 | 門檻 |
+|---|------|------|:--:|
+| C1 | **原點確認** | 查閱 detector framework 文檔，記錄原始座標系（BL/TL/Center） | 必須明列 |
+| C2 | **軸向確認** | 同上，記錄 X/Y 軸方向（right-positive / down-positive） | 必須明列 |
+| C3 | **單位確認** | 記錄原始輸出單位（normalized [0-1] / pixel / 其他） | 必須明列 |
+| C4 | **Y-flip 驗證** | 對 Apple Vision detector 輸出 Y 值：若 face 在 frame 上半部，bbox y 應 < frame_height/2 | 必須 pass |
+| C5 | **bbox↔landmark 一致性** | 對同一 detection，檢查 ≥50% landmark 點在 bbox 內 | ≥90% faces pass |
+| C6 | **bbox 範圍檢查** | 確認 x ∈ [0, imgW], y ∈ [0, imgH], w > 0, h > 0 | 100% |
+| C7 | **跨 detector 對齊** | 同一 frame 的不同 detector bbox，IoU 應合理（置信度加權） | — |
+| C8 | **轉換鏈文件化** | 寫出完整的 E→P→A 座標轉換公式，含每一步的 image size 來源 | 必須完成 |
+
+### 4.3 校驗腳本
+
+使用 `scripts/face_landmark_qc.py` 模式（可擴展到其他類別）：
+
+```python
+# 對每個 frame:
+#   1. 讀取 detector 輸出
+#   2. 檢查 x ∈ [0, imgW], y ∈ [0, imgH]
+#   3. 若有 landmarks: 檢查 ≥50% inside bbox
+#   4. 輸出 pass/fail report
+```
+
+完成後在 `DETECTOR_REGISTRY.md` 中標記 `verified`。
+
+---
+
+## Phase 5 — 選型決策
+
+### 5.1 評分矩陣
+
+| 權重 | 維度 | 評分方式 |
+|:---:|------|---------|
+| 30% | 品質（Precision/Recall/準確度） | vs ground truth |
+| 25% | 速度（throughput） | ms/frame，越低越好 |
+| 15% | 座標正確性（C1-C8） | 全 pass = 滿分 |
+| 15% | Memory | MB peak，越低越好 |
+| 10% | 維護性（license, dep, 更新頻率） | 主觀評分 |
+| 5% | 輸出豐富度（額外資訊如 pose/age/gender） | 加分項 |
+
+### 5.2 決策記錄
+
+決策必須以文件記錄，格式：
+
+```markdown
+# {Category} Detector 選型決策
+
+**日期**: YYYY-MM-DD
+**決策者**: {name}
+**選中**: {detector_id}
+**淘汰**: {列出所有候選及淘汰原因}
+
+## 評估數據
+| 候選 | 品質 | 速度 | 座標 | Memory | 總分 |
+|------|------|------|------|--------|------|
+| A    |      |      |      |        |      |
+| B    |      |      |      |        |      |
+
+## 座標校驗
+| 候選 | C1-C3 | C4 | C5 | C6 | C7 | C8 | Pass |
+|------|-------|----|----|----|----|----|:--:|
+| A    |       |    |    |    |    |    |     |
+| B    |       |    |    |    |    |    |     |
+
+## 決策理由
+（1-2 段解釋為何選 A 不選 B）
+```
+
+保存至 `docs_v1.0/decisions/{YYYY-MM-DD}_{category}_detector_selection.md`。
+
+---
+
+## Phase 6 — 入庫納管
+
+### 6.1 Registry 更新
+
+選定後必須更新：
+
+1. `DETECTOR_REGISTRY.md` — 新增 detector 條目（若未存在），狀態標 `verified`
+2. `SPATIAL_COORDINATE_REGISTRY.md` — 更新 E 層 + P 層校準路徑
+3. 在 `src/worker/processor.rs` 或對應呼叫處，新增註解標註 detector ID
+
+### 6.2 Rollback 機制
+
+若偵測到已部署 detector 有嚴重問題（如 BUG-003/004），執行：
+
+1. 立即標記 `buggy` 在 `DETECTOR_REGISTRY.md`
+2. 修復後重新 build
+3. 更新 `SPATIAL_COORDINATE_REGISTRY.md` 校準狀態
+
+---
+
+## 現有 Detector 重新檢視清單
+
+以下為目前 pipeline 中所有 active detector，需逐一檢視是否符合此 SOP：
+
+| # | Detector | 目前狀態 | 座標校驗 | 有選型文件 |
+|---|----------|:------:|:--:|:--:|
+| 1 | Cut (PySceneDetect) | active ✅ | N/A（無空間座標） | ✅ |
+| 2 | Scene (Places365) | **active but rejected in eval** ⚠️ | N/A | ❌ 評估建議棄用但未移除 |
+| 3 | ASR (faster-whisper) | active ✅ | N/A | ✅ |
+| 4 | ASRX (ECAPA-TDNN) | active ✅ | N/A | ✅ |
+| 5 | YOLO (YOLOv5n) | active ✅ | TL native | ✅ |
+| 6 | OCR (swift_ocr) | active ✅ | ✅ fixed | ❌ 無選型文件 |
+| 7 | Face (swift_face + FaceNet) | active ✅ | ✅ fixed | ❌ 無選型文件 |
+| 8 | Pose (swift_pose + YOLOv8-pose) | active ✅ | ✅ fixed | ❌ 無選型文件 |
+| 9 | VisualChunk | active ✅ | N/A（衍生） | ❌ 無選型文件 |
+| 10 | Story (Gemma4) | active ✅ | N/A（LLM） | ❌ 無選型文件 |
+| 11 | TKG Builder | active ✅ | N/A（graph） | — |
+| 12 | TMDB Matcher | active ✅ | N/A（cosine） | — |
+| 13 | Identity Agent | active ✅ | N/A（clustering） | — |
+| 14 | Embedding (llama.cpp) | active ✅ | N/A（vector） | ✅ |
+
+---
+
+## 維護
+
+- **Owner**: M5
+- **更新頻率**: 每次新增 detector 時
+- **稽核**: 每季度檢視一次所有 active detector 是否仍符合品質標準
--- a/docs_v1.0/DESIGN/DOCUMENT_EMBEDDING_STRATEGY.md
+++ b/docs_v1.0/DESIGN/DOCUMENT_EMBEDDING_STRATEGY.md
@@ -0,0 +1,187 @@
+---
+document_type: "reference_doc"
+service: "MOMENTRY_CORE"
+title: "Document Embedding Strategy - Parent-Child Chunks"
+date: "2026-03-23"
+version: "V1.0"
+status: "active"
+owner: "Warren"
+created_by: "OpenCode"
+tags:
+  - "embedding"
+  - "chunks"
+  - "strategy"
+  - "document"
+ai_query_hints:
+  - "查詢 Document Embedding Strategy - Parent-Child Chunks 的內容"
+  - "Document Embedding Strategy - Parent-Child Chunks 的主要目的是什麼？"
+  - "如何操作或實施 Document Embedding Strategy - Parent-Child Chunks？"
+---
+
+# Document Embedding Strategy - Parent-Child Chunks
+
+| Item | Content |
+|------|---------|
+| Author | Warren |
+| Created | 2026-03-23 |
+| Document Version | V1.0 |
+
+---
+
+## Version History
+
+| Version | Date | Purpose | Operator | Tool/Model |
+|---------|------|---------|----------|------------|
+| V1.0 | 2026-03-23 | Create document embedding strategy | Warren | OpenCode |
+
+---
+
+## Overview
+
+Momentry uses a **parent-child chunk hierarchy** for improved RAG retrieval. This document describes the embedding strategy for this hierarchy.
+
+## Chunk Structure
+
+### Parent Chunk
+- **Purpose**: Summarize multiple child chunks with narrative description
+- **Content**: High-level description of multiple scenes/segments
+- **Example**:
+```json
+{
+  "chunk_id": "story_asr_0000",
+  "chunk_type": "story",
+  "text_content": "[0s-125s] A man enters a building. He walks down a hallway.",
+  "child_chunk_ids": ["asr_0001", "asr_0002", "asr_0003", "asr_0004", "asr_0005"]
+}
+```
+
+### Child Chunk
+- **Purpose**: Individual segments from ASR, scenes from CUT, etc.
+- **Content**: Raw transcription or detection results
+- **Example**:
+```json
+{
+  "chunk_id": "asr_0001",
+  "chunk_type": "sentence",
+  "text_content": "Hello world",
+  "parent_chunk_id": "story_asr_0000"
+}
+```
+
+## Embedding Strategy
+
+### For Vector Search
+
+When embedding chunks for vector search, we combine **parent description + child content** to provide both context and detail.
+
+#### Parent Chunk Embedding
+```
+embedding_text = f"Summary: {parent.text_content}
+Children: {child_text_1}. {child_text_2}. {child_text_3}..."
+```
+
+**Prefix**: `search_document:` (for documents in Qdrant)
+
+**Example**:
+```
+search_document: Summary: A man enters a building. He walks down a hallway.
+Children: Hello, how are you? I'm fine thank you. The weather is nice today.
+```
+
+#### Child Chunk Embedding
+```
+embedding_text = f"[{child.chunk_type}] {child.text_content}
+Parent: {parent.description}"
+```
+
+**Prefix**: `search_document:`
+
+**Example**:
+```
+search_document: [sentence] Hello, how are you?
+Parent: A man enters a building. He walks down a hallway.
+```
+
+### For BM25 Text Search
+
+BM25 operates on raw text with PostgreSQL full-text search.
+
+- **Index**: `search_vector` (TSVECTOR) on `chunks.text_content`
+- **Search**: Uses `ts_rank_cd()` for ranking
+
+## Hybrid Search Ranking
+
+Combined score = `(vector_score * 0.7) + (bm25_score * 0.3)`
+
+### Why 0.7/0.3?
+
+| Weight | Vector | BM25 |
+|--------|--------|------|
+| Pros | Semantic similarity | Exact keyword match |
+| Cons | May miss specific terms | No semantic understanding |
+| Best for | Thematic queries | Fact lookup |
+
+## Query Patterns
+
+### Thematic Query ("What are the main themes?")
+- Use higher `vector_weight` (0.8-0.9)
+- Vector search finds semantically similar content
+
+### Fact Lookup ("Who said X?")
+- Use higher `bm25_weight` (0.5-0.7)
+- BM25 finds exact matches
+
+### Balanced ("Tell me about scene 5")
+- Use default 0.7/0.3
+
+## Implementation
+
+### Embedding Generation
+```rust
+fn build_embedding_text(chunk: &Chunk, parent_text: Option<&str>) -> String {
+    match chunk.chunk_type {
+        ChunkType::Story => {
+            format!(
+                "Summary: {}\nChildren: {}",
+                chunk.text_content,
+                get_children_text(chunk)
+            )
+        }
+        _ => {
+            format!(
+                "[{}] {}\nParent: {}",
+                chunk.chunk_type.as_str(),
+                chunk.text_content,
+                parent_text.unwrap_or("N/A")
+            )
+        }
+    }
+}
+```
+
+### Storage
+- Parent chunks stored with their `child_chunk_ids`
+- Child chunks reference `parent_chunk_id`
+- Both stored in PostgreSQL with full-text index
+- Vectors stored in Qdrant
+
+## Example Flow
+
+1. **Story Processing** generates parent-child hierarchy
+2. **Embedding** creates vector for each chunk
+3. **Storage** saves to PostgreSQL + Qdrant
+4. **Search** retrieves using hybrid search
+5. **Results** include both parent context and child details
+
+## Best Practices
+
+1. **Chunk Size**: 5 child chunks per parent (configurable)
+2. **Text Length**: Keep embeddings under 512 tokens
+3. **Parent Description**: Include temporal markers (timestamps)
+4. **Child Content**: Preserve original transcription
+
+## Future Enhancements
+
+- [ ] GraphRAG integration for relationship traversal
+- [ ] Cross-chunk entity linking
+- [ ] Temporal graph building
--- a/docs_v1.0/DESIGN/Face_Pipeline.md
+++ b/docs_v1.0/DESIGN/Face_Pipeline.md
@@ -0,0 +1,120 @@
+# Face Pipeline: Detection → Clustering → Trace
+
+**Date**: 2026-05-16
+
+---
+
+## 流程
+
+```
+Video Frames
+    │
+    ▼
+┌─────────────────────────────┐
+│  0. Cut Detection           │  PySceneDetect
+│     scene boundaries        │  → chunk (chunk_type='cut')
+└─────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────┐
+│  1. Face Detection          │  每幀偵測人臉
+│     confidence ≥ 0.5        │  → face_detections (cut_id 對應所屬 cut)
+└─────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────┐
+│  2. Face Clustering         │  embedding + IoU + distance
+│     trace_id assignment     │  同一人 + 同 cut → 同一 trace_id
+│     per-file sequential     │  trace_id 跨 cut 持續給號（不歸零）
+└─────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────┐
+│  3. Face Trace              │  跨影格連續追蹤
+│     per-file sequential     │  trace_id = 0, 1, 2, ...
+│     scoped by cut           │  每個 trace 完全落在一個 cut 內
+└─────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────┐
+│  4. Identity Binding        │  embedding 比對
+│     identity_id assignment  │  → known person / stranger
+└─────────────────────────────┘
+```
+
+## scope
+
+```sql
+trace_id   → per-file sequential          (file_uuid, trace_id) 唯一
+cut_id     → chunk.id WHERE chunk_type='cut'   輔助 scope，不影響唯一性
+identity_id → global FK                  跨 cut / file 關聯同一人
+```
+
+## 約束
+
+| 約束 | 說明 |
+|------|------|
+| 唯一 | `(file_uuid, trace_id)` |
+| 單一 cut | 每個 trace 完全落在一個 cut 內（`0` 個跨 cut trace） |
+| 獨立 | `trace_id` ≠ `identity_id`。前者是物體軌跡，後者是身份分別 |
+
+## 各階段資料量
+
+```
+Stage                  | 量          | Key
+------------------------|-------------|----------------------
+Raw faces              | 262,021     | face_detections rows
+After clustering       | 6,892       | distinct trace_id
+With identity          | 147,602     | identity_id NOT NULL (2,035 identities)
+Stranger (unbound)     | 114,419     | identity_id IS NULL
+```
+
+## Trace 大小分布
+
+| Faces per trace | Trace count | 說明 |
+|:---------------:|:-----------:|------|
+| 1 | 610 | 一閃而過 |
+| 2-5 | 969 | 短暫出現 |
+| 6-20 | 1,541 | 片段 |
+| 21-100 | 2,218 | 一般 |
+| 101+ | 1,554 | 主要角色 |
+
+## Clustering 方式
+
+Face Tracker (`scripts/face_tracker.py`) 使用三種方法決定同一人：
+
+1. **IoU (Intersection over Union)** — 前後影格框重疊率
+2. **Cosine distance** — face embedding 相似度
+3. **Euclidean distance** — bbox 中心距離
+
+三者加權決策：iou > 0.5 || (cosine < 0.3 && distance < 100px)
+
+## Trace 結構
+
+```json
+{
+  "trace_id": 2,          // per-file sequential
+  "faces": [              // face_detections GROUP BY trace_id
+    {"face_id": "4587_0", "frame": 4587, "confidence": 0.92},
+    {"face_id": "4588_0", "frame": 4588, "confidence": 0.91},
+    ...
+  ],
+  "start_frame": 4587,
+  "end_frame": 4722,
+  "face_count": 46,
+  "identity_id": 101     // NULL = stranger
+}
+```
+
+## API 查詢
+
+```bash
+# Trace 列表（含 face_count、區間）
+POST /api/v1/file/:uuid/face_trace/sortby
+
+# Trace 內 faces（逐幀 + 可選 interpolation）
+GET /api/v1/file/:uuid/trace/:trace_id/faces
+
+# Trace 綁定身份
+POST /api/v1/identity/:uuid/bind
+```
--- a/docs_v1.0/DESIGN/GUN_DETECTION_REPORT.md
+++ b/docs_v1.0/DESIGN/GUN_DETECTION_REPORT.md
@@ -0,0 +1,45 @@
+# 槍枝檢測模型 Charade 評估報告
+
+**Date:** 2026-05-10
+**模型:** YOLOv8n fine-tuned on Roboflow gun dataset (905 images)
+**Classes:** grenade (0), knife (1), pistol (2), rifle (3)
+**Weights:** `models/gun/gun_detector/weights/best.pt` (6MB)
+
+## 訓練
+
+- **Dataset**: 905 images, Roboflow CC BY 4.0
+- **Validation mAP50**: 0.813
+- **問題**: 訓練資料全為近距離槍枝特寫，與 Charade 電影中的中遠景畫面分布完全不同
+
+## Charade 測試結果
+
+### 系統掃描（24 取樣點 @ 每 300s）
+
+| 時間 | 類別 | 信心 | 判定 |
+|------|------|------|------|
+| t=600s | pistol×2, rifle | 0.16–0.30 | ❌ FP |
+| t=1200s | knife | 0.37 | ❌ FP |
+| t=1800s | pistol | 0.19 | ❌ FP |
+| t=2400s | knife | 0.18 | ❌ FP |
+| t=3000s | pistol | 0.16 | ❌ FP |
+| t=5400s | pistol×2 | 0.45, 0.17 | ❌ FP（郵票被誤判為槍） |
+| t=6600s | grenade | 0.22 | ❌ FP |
+
+### 密集掃描（ASR trigger）
+
+在 ASR dialogue 提到 "gun" 的時間點附近跑 gun detector，找到 5 個 pistol/gun 觸發（3188s / 5461s / 6309s / 6377s / 6479s），confidence 0.300-0.387。
+
+**結果：全部為 false positive。** 訓練效果非常不好 — 模型在電影中遠景畫面完全失效。
+
+## 結論
+
+1. 訓練資料與推論場景 distribution mismatch 嚴重
+2. 905 張 Roboflow 近距離特寫 → Charade 的中遠景手持/部分遮蔽槍枝 → 模型無法泛化
+3. 建議：收集電影真實槍枝畫面（200-500 張動作片片段）重新訓練
+4. 在此之前，槍枝搜尋只能靠 ASR dialogue keyword matching + 人工確認
+
+## 相關檔案
+
+- `models/gun/gun_detector/weights/best.pt` — 模型權重（效果不佳）
+- `output_dev/gun_detections/` — 偵測截圖（全部 FP）
+- `scripts/object_search_agent.py` — 整合搜尋 agent（gun detector 偵測結果僅供參考）
--- a/docs_v1.0/DESIGN/GUN_DETECTOR_SCAN_REPORT.md
+++ b/docs_v1.0/DESIGN/GUN_DETECTOR_SCAN_REPORT.md
@@ -0,0 +1,73 @@
+# Gun Detector Scan Report — YOLOv8n on Charade (1963)
+
+**Date:** 2026-05-10
+**Model:** `models/gun/gun_detector/weights/best.pt`
+**Base:** YOLOv8n fine-tuned on Roboflow gun dataset (905 images)
+**Classes:** grenade, knife, pistol, rifle
+**Scan script:** `scripts/gun_detector_scan.py`
+
+## Scan Method
+
+- **121 scan points**: 2 ASR "gun" mentions + 114 fixed intervals (60s) + 5 original hit timestamps
+- **Per point**: scan ±30 frames at every 3rd frame = ~20 frames per point
+- **Total frames processed**: ~2,420
+- **Runtime**: ~2 min
+
+## Results
+
+| Class | Detections | Top Confidence |
+|-------|-----------|---------------|
+| pistol | **82** | 0.887 |
+| rifle | 55 | 0.822 |
+| grenade | 35 | 0.797 |
+| knife | 38 | 0.810 |
+| **Total** | **210** (after dedup) | — |
+
+## Original 5 Pistol Timestamps
+
+| Timestamp | Original | This Scan | Delta |
+|-----------|----------|-----------|-------|
+| 3188s (53:08) | pistol 0.387 | ✅ **0.474** | +22% |
+| 5461s (91:01) | pistol 0.355 | ✅ **0.346** | −3% |
+| 6309s (1:45:09) | pistol 0.374 | ❌ Not found | — |
+| 6377s (1:46:17) | gun 0.316 | ✅ **0.757** | +140% |
+| 6479s (1:47:59) | pistol 0.300 | ✅ **0.815** | +172% |
+
+## Top Pistol Detections
+
+| Time | Confidence | Image |
+|------|-----------|-------|
+| 84:00 (5040s) | **0.887** | `5040s_pistol_0.887.jpg` |
+| 90:00 (5400s) | **0.816** | `5400s_pistol_0.816.jpg` |
+| 108:00 (6480s) | **0.815** | `6480s_pistol_0.815.jpg` |
+| 48:59 (2939s) | **0.805** | `2939s_pistol_0.805.jpg` |
+| 53:07 (3187s) | **0.474** | `3187s_pistol_0.474.jpg` |
+| 91:00 (5459s) | **0.346** | `5459s_pistol_0.346.jpg` |
+
+## Analysis
+
+### Model Performance
+
+Compared to the original evaluation (May 7, 24 sample points, all FP):
+
+- This scan found **significantly more detections** (210 vs 7)
+- Confidence values are **much higher** (0.887 vs 0.45 max)
+- 4/5 original pistol timestamps recovered
+
+### Cautions
+
+1. **Training data mismatch**: Model was trained on 905 close-up gun photos, NOT movie frames. High confidence ≠ real gun.
+2. **Stamp false positive confirmed**: t=5400s (identified in original eval as stamp → pistol) continues to fire at 0.816
+3. **Pattern suggests overconfidence**: Many detections at regular intervals (every 60s, same objects) suggest the model is detecting non-gun objects with high confidence
+
+### Verified Findings
+
+The original 5 pistol images from the gun_detections/ directory (3188s, 5461s, 6309s, 6377s, 6479s) were all produced by the same YOLOv8n model. The user previously stated that none of these have been confirmed as real guns.
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `output_dev/gun_detections/gun_detections.json` | All 210 deduped detections |
+| `output_dev/gun_detections/*.jpg` | Annotated screenshots (one per detection) |
+| `scripts/gun_detector_scan.py` | Scan script (reproducible) |
--- a/docs_v1.0/DESIGN/MARKBASE_DESIGN_V2.0.md
+++ b/docs_v1.0/DESIGN/MARKBASE_DESIGN_V2.0.md
@@ -0,0 +1,995 @@
+---
+document_type: "design"
+service: "MOMENTRY_CORE"
+title: "MarkBase 設計文件 V2.0"
+date: "2026-05-14"
+version: "V2.0"
+status: "active"
+owner: "M4"
+created_by: "OpenCode"
+tags:
+  - "markbase"
+  - "display-engine"
+  - "virtual-tree"
+  - "group-share"
+  - "storage-tier"
+  - "file-uuid"
+  - "sqlite"
+  - "design"
+ai_query_hints:
+  - "查詢 MarkBase 設計文件 V2.0 的內容"
+  - "MarkBase 虛擬檔案樹如何設計"
+  - "MarkBase Group Share 怎麼實現"
+  - "MarkBase file_uuid 規則"
+  - "MarkBase 儲存層級 Hot Warm Cold 設計"
+  - "MarkBase 與 Momentry Core 整合方式"
+  - "MarkBase Display Mode trait 架構"
+  - "MarkBase 檔案操作 API 設計"
+related_documents:
+  - "REFERENCE/MARKBASE_DESIGN_v1.0.0.md"
+  - "REFERENCE/file_uuid_spec.md"
+  - "REFERENCE/SPATIAL_COORDINATE_REGISTRY.md"
+---
+
+# MarkBase 設計文件 V2.0
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | M4 / OpenCode |
+| 建立時間 | 2026-05-14 |
+| 文件版本 | V2.0 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-05-12 | 初版設計（Demo Display + Knowledge Graph） | M4 / OpenCode | DeepSeek V4 Pro |
+| V2.0 | 2026-05-14 | 加入檔案樹、Group Share、儲存層級、技術棧、file_uuid 整合 | M4 / OpenCode | DeepSeek V4 Pro |
+
+---
+
+## 概述
+
+MarkBase 是 Momentry 生態系的 Display Engine 與檔案管理平台。從 V2.0 起，MarkBase 不再只是 Demo Runner 的 presentation layer，而是升級為具備虛擬檔案樹、跨用戶群組分享、多層級儲存管理、檔案操作 API 的完整平台。
+
+**核心設計原則：**
+
+| 原則 | 說明 |
+|------|------|
+| 展示層先行 | Demo Display 功能保留，作為 demo runner 的固定顯示視窗 |
+| 檔案層次化 | 虛擬檔案樹（Virtual Tree）讓用戶管理自己的資料結構 |
+| 儲存層級化 | Hot/Warm/Cold 三級儲存，讓用戶掌控成本 |
+| 群組協作 | Group Share 讓團隊內的檔案可讀寫 |
+| 單一使用者隔離 | One user = one SQLite，不混用 |
+
+---
+
+## 關鍵術語定義
+
+| 術語 | 定義 |
+|------|------|
+| Virtual Tree | 用戶管理的邏輯檔案樹，非實體路徑 |
+| FileNode | 虛擬樹中的節點，包含 label、別名、圖示、顏色 |
+| Display Mode | 使用者選擇的檔案展示方式（List / Tree / Small Icon / Large Icon） |
+| Group Share | 跨用戶的群組檔案分享（選項 A: Group SQLite） |
+| Storage Tier | 三級儲存層級（Hot / Warm / Cold） |
+| file_uuid | 32 字元十六進制檔案出生識別符，由 Momentry Core 計算 |
+| Exit Record | 檔案移出管理時的留存記錄 |
+| Mount | 實體儲存掛載點（NAS、外接硬碟、LTO） |
+
+---
+
+## 1. 架構總覽
+
+### 1.1 模組化 Rust 設計
+
+```
+markbase/
+├── src/
+│   ├── main.rs               # CLI entry point
+│   ├── server.rs              # axum HTTP server (port 11438)
+│   ├── display/               # Display engine (from V1.0)
+│   │   ├── mod.rs
+│   │   ├── render.rs          # .md → HTML (pulldown-cmark)
+│   │   ├── highlight.rs       # syntax highlighting (syntect)
+│   │   ├── mermaid.rs         # Mermaid rendering
+│   │   └── page.html          # core HTML template
+│   ├── filetree/              # Virtual file tree (NEW V2.0)
+│   │   ├── mod.rs             # FileTree struct, init_from_sqlite
+│   │   ├── node.rs            # FileNode struct
+│   │   ├── mode.rs            # DisplayMode trait
+│   │   ├── modes/
+│   │   │   ├── list.rs        # list module (trait impl)
+│   │   │   ├── tree.rs        # tree module (trait impl, Phase 1)
+│   │   │   ├── grid_sm.rs     # small icon grid (trait impl)
+│   │   │   └── grid_lg.rs     # large icon grid (trait impl)
+│   │   └── auto_layer.rs      # auto-layer rules
+│   ├── operations/            # File operations (NEW V2.0)
+│   │   ├── mod.rs
+│   │   ├── compress.rs        # zip / tar
+│   │   ├── transfer.rs        # copy / move between tiers
+│   │   ├── archive.rs         # auto-archive logic
+│   │   ├── restore.rs         # restore from archive
+│   │   ├── exit.rs            # exit record management
+│   │   └── registry.rs        # file_registry table
+│   ├── groups/                # Group share (NEW V2.0)
+│   │   ├── mod.rs
+│   │   ├── db.rs              # Group SQLite create/open
+│   │   ├── merge.rs           # ATTACH + cross-DB merge
+│   │   └── roles.rs           # owner/editor/viewer
+│   └── mount/                 # Mount management (NEW V2.0)
+│       ├── mod.rs
+│       ├── tier.rs            # Hot/Warm/Cold tier defs
+│       └── history.rs         # location_history table
+```
+
+**DisplayMode Trait 設計：**
+
+```rust
+/// 展示模式的統一介面。
+/// 每個模式（List, Tree, Grid）實作此 trait。
+#[async_trait]
+pub trait DisplayMode: Send + Sync {
+    /// 模式名稱（前端使用）
+    fn name(&self) -> &'static str;
+
+    /// 將 FileTree 轉換為此模式的前端資料
+    fn render(&self, tree: &FileTree, user_id: &str) -> Result<Value>;
+
+    /// 此模式支援的排序方式
+    fn sort_options(&self) -> Vec<SortOption>;
+
+    /// 此模式支援的過濾器
+    fn filter_options(&self) -> Vec<FilterOption>;
+}
+```
+
+### 1.2 One User = One SQLite
+
+```
+data/
+├── users/
+│   ├── demo.sqlite          # 用戶 demo 的虛擬樹 + 操作記錄
+│   ├── warren.sqlite        # 用戶 warren 的虛擬樹 + 操作記錄
+│   └── alice.sqlite         # 用戶 alice 的虛擬樹 + 操作記錄
+├── groups/
+│   ├── groups.sqlite        # 群組註冊表（group_id → path）
+│   ├── 1.sqlite             # 群組 1 的共用資料
+│   └── 2.sqlite             # 群組 2 的共用資料
+└── system.sqlite            # 系統層級資料（掛載點、全域設定）
+```
+
+| 原則 | 說明 |
+|------|------|
+| **用戶隔離** | 每個用戶獨立的 SQLite 檔案（user.sqlite） |
+| **簡單部署** | 不需 PostgreSQL server，單檔即可 |
+| **易於備份** | 複製 `.sqlite` 檔案即可 |
+| **Portable** | 隨身碟帶著走，離線可用 |
+
+### 1.3 Momentry Core 整合（A+B 混合模式）
+
+```
+┌──────────────────────────────────────────────────────┐
+│                    MarkBase                          │
+│                                                      │
+│  ┌─────────────────┐    ┌─────────────────────────┐ │
+│  │  模式 A: Crate   │    │  模式 B: HTTP API       │ │
+│  │  (momentry_core  │    │  (localhost:3003)       │ │
+│  │   作為依賴)       │    │                         │ │
+│  │                  │    │  • file_uuid 驗證       │ │
+│  │  • file_uuid 計算 │    │  • chunk 查詢          │ │
+│  │  • 向量嵌入       │    │  • identity 查詢       │ │
+│  │  • 本地處理       │    │  • trace data          │ │
+│  └─────────────────┘    └─────────────────────────┘ │
+│                                                      │
+│  選擇策略：                                           │
+│  • 輕量運算 → Crate 模式（不啟動 server）             │
+│  • 重查詢/伺服器操作 → HTTP API（需 server 運行）      │
+└──────────────────────────────────────────────────────┘
+```
+
+| 操作 | 模式 | 理由 |
+|------|:----:|------|
+| file_uuid 計算/驗證 | Crate | 純函數，不需 server |
+| SHA256 | Crate | 本地計算 |
+| Chunk 查詢（by file_uuid） | HTTP | 需存取 PostgreSQL |
+| Identity 查詢 | HTTP | 需存取 PostgreSQL |
+| Trace data（時序片段） | HTTP | 需存取 PostgreSQL |
+| 向量搜尋（ANN） | HTTP | 需 Qdrant server |
+| 文件轉換（soffice） | Crate/CLI | 本地處理 |
+
+---
+
+## 2. 技術棧
+
+### 2.1 Crate 依賴
+
+| Crate | 用途 | License |
+|-------|------|---------|
+| axum 0.7 | HTTP server（port 11438） | MIT |
+| tokio 1.0 | 非同步 runtime | MIT |
+| rusqlite 0.32 | SQLite 客戶端（bundled） | MIT |
+| r2d2 / r2d2_sqlite | SQLite 連接池 | MIT/Apache |
+| serde / serde_json 1.0 | JSON 序列化 | MIT/Apache |
+| sha2 0.10 | SHA256（file_uuid 驗證） | MIT/Apache |
+| notify 6.0 | 檔案系統監控（Hot tier） | CC0/MIT |
+| zip 2.0 | ZIP 壓縮 | MIT |
+| tar 0.4 | TAR 打包（LTO 歸檔） | MIT/Apache |
+| walkdir 2.0 | 目錄掃描 | MIT/Unlicense |
+| chrono 0.4 | 日期時間 | MIT/Apache |
+| tracing 0.1 | 結構化日誌 | MIT |
+| pulldown-cmark | Markdown → HTML | MIT |
+| syntect | 程式碼語法高亮 | MIT |
+| anyhow / thiserror | 錯誤處理 | MIT/Apache |
+| once_cell | 延遲初始化 | MIT/Apache |
+| async-trait | async trait 支援 | MIT/Apache |
+
+### 2.2 SQLite 查詢策略
+
+| 項目 | 決策 |
+|------|:--:|
+| Crate | rusqlite（同步 API） |
+| 非同步包裝 | `tokio::task::spawn_blocking` |
+| 連接池 | r2d2_sqlite |
+| WAL 模式 | 啟用（預設） |
+
+```rust
+// axum handler 中的使用模式
+async fn get_tree(State(pool): State<DbPool>) -> Result<Json<Value>> {
+    let tree = tokio::task::spawn_blocking(move || {
+        let conn = pool.get()?;
+        let tree = FileTree::load(&conn, user_id)?;
+        Ok::<_, anyhow::Error>(tree)
+    }).await??;
+
+    Ok(Json(tree))
+}
+```
+
+### 2.3 檔案系統監控
+
+| 項目 | 決策 |
+|------|:--:|
+| Crate | notify 6.0（CC0/MIT） |
+| 監控範圍 | 僅 Hot tier |
+| 不監控 | Warm / Cold tier（變更頻率低） |
+| 實作 | `notify::Watcher` + `mpsc::channel` → async stream |
+
+### 2.4 壓縮引擎
+
+| 格式 | Crate | 用途 |
+|------|-------|------|
+| `.zip` | `zip` crate | 一般壓縮（用戶下載、備份） |
+| `.tar.gz` | `tar` + `flate2` crate | LTO 歸檔（Cold tier） |
+
+不使用外部 CLI（ditto、hdiutil），全部以 Rust crate 實作。
+
+### 2.5 檔案傳輸（Transfer Engine）
+
+#### 雙引擎策略
+
+```
+TransferEngine:
+  ├── Direct 模式（std::fs::copy）
+  │    適用：小檔案 (<50MB)、fallback
+  │    特點：無外部依賴、簡單可靠
+  │
+  └── Rsync 模式（rsync CLI）
+       適用：大檔案 (>=50MB)、tier 遷移、NAS 鏡像
+       特點：增量傳輸、續傳、校驗和
+```
+
+#### 自動選擇邏輯
+
+```rust
+fn select_mode(file_path: &Path) -> TransferMode {
+    let size = std::fs::metadata(file_path).map(|m| m.len()).unwrap_or(0);
+    if size < 50 * 1024 * 1024 {  // <50MB
+        TransferMode::Direct
+    } else if Command::new("rsync").arg("--version").output().is_ok() {
+        TransferMode::Rsync
+    } else {
+        TransferMode::Direct  // rsync 不存在時 fallback
+    }
+}
+```
+
+#### rsync 適用性分析
+
+| 場景 | 工具 | 理由 |
+|------|------|------|
+| 單小檔複製 (<50MB) | `std::fs::copy` | rsync protocol overhead > 效益 |
+| 大檔案遷移 (tier move) | **rsync** | 增量、續傳、校驗和，三合一 |
+| Hot ↔ Warm 同一機器 | **rsync** | 大檔案 delta transfer 效益 |
+| NAS ↔ NAS 鏡像 | **rsync** | `--delete` 鏡像模式 |
+| 打包 .zip/.tar.gz | `zip` / `tar` crate | rsync 不做壓縮打包 |
+| 寫 LTO 磁帶 | `tar` crate | rsync 無法寫磁帶 |
+
+#### rsync CLI 參數
+
+| 參數 | 用途 |
+|------|------|
+| `-a` | archive mode（保留權限、時間戳） |
+| `-v` | verbose（進度顯示） |
+| `-P` | 等同 `--partial --progress`（續傳 + 進度） |
+| `-c` | checksum mode（SHA256 驗證，非 time/size） |
+| `-n` | dry-run（遷移前預覽） |
+| `--delete` | 鏡像模式（NAS 同步用） |
+
+### 2.6 Group Share 跨 DB 查詢
+
+使用 SQLite `ATTACH DATABASE`：
+
+```sql
+ATTACH DATABASE '/path/to/groups/1.sqlite' AS g;
+SELECT f.*, gf.permission
+FROM file_registry f
+JOIN g.file_registry gf ON f.file_uuid = gf.file_uuid;
+```
+
+**優勢：** 一行 SQL 解決，Rust 端不需額外合併邏輯。
+
+### 2.7 非同步策略
+
+```
+axum handler (async)
+  │
+  ├── 快速操作（直接 await）
+  │   ├── serde_json 序列化
+  │   ├── 驗證
+  │   └── 記憶體操作
+  │
+  └── 阻塞操作（spawn_blocking）
+      ├── rusqlite 查詢
+      ├── std::fs 檔案操作
+      ├── SHA256 計算
+      └── 壓縮/解壓
+```
+
+**原則：** axum handler 本身是 async，遇到 rusqlite 或 std::fs 時，一律用 `tokio::task::spawn_blocking` 包裝。
+
+---
+
+## 3. file_uuid 規範
+
+### 3.1 計算公式
+
+```
+file_uuid = SHA256(mac_address | birthday | physical_path_at_birth | filename)[0:32]
+```
+
+詳細規範參見 `REFERENCE/file_uuid_spec.md`。
+
+### 3.2 MarkBase 中的使用
+
+| 欄位 | 來源 | 說明 |
+|------|------|------|
+| file_uuid | Momentry Core | MarkBase 不重新計算，直接復用 |
+| 驗證 | `is_birth_uuid()` | 長度 32，不含 `_` |
+| 關聯 | 主鍵 | `file_registry.file_uuid`、`file_nodes.file_uuid` |
+
+### 3.3 整合流程
+
+```
+Momentry Core                    MarkBase
+  (檔案註冊)                       (匯入)
+┌──────────┐                   ┌──────────┐
+│ compute_ │                   │ INSERT   │
+│ birth_   │──── file_uuid ───▶│ INTO     │
+│ uuid()   │     32 hex        │ file_    │
+│          │                   │ registry │
+└──────────┘                   │(file_uuid)
+                               └──────────┘
+```
+
+---
+
+## 4. 虛擬檔案樹
+
+### 4.1 FileNode 結構
+
+```rust
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct FileNode {
+    /// 節點唯一 ID（UUIDv4）
+    pub node_id: String,
+
+    /// 顯示名稱
+    pub label: String,
+
+    /// 多語言別名
+    pub aliases: Aliases,
+
+    /// 關聯的 file_uuid（Momentry Core 來源）
+    pub file_uuid: Option<String>,
+
+    /// 父節點 node_id（root 為 None）
+    pub parent_id: Option<String>,
+
+    /// 子節點列表
+    pub children: Vec<String>,
+
+    /// 節點類型
+    pub node_type: NodeType,
+
+    /// 自訂圖示（emoji 或 SVG 路徑）
+    pub icon: Option<String>,
+
+    /// 文字顏色（CSS hex）
+    pub color: Option<String>,
+
+    /// 背景顏色（CSS hex）
+    pub bg_color: Option<String>,
+
+    /// 建立時間
+    pub created_at: String,
+
+    /// 最後修改時間
+    pub updated_at: String,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Aliases {
+    /// 繁體中文
+    pub zh_tw: Option<String>,
+    /// 英文
+    pub en_us: Option<String>,
+    /// 日文
+    pub ja_jp: Option<String>,
+    /// 韓文
+    pub ko_kr: Option<String>,
+    /// 法文
+    pub fr_fr: Option<String>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
+#[serde(rename_all = "snake_case")]
+pub enum NodeType {
+    /// 虛擬資料夾（用戶建立，不對應實體路徑）
+    Folder,
+    /// 實體檔案（指向 file_uuid）
+    File,
+    /// 動態層級（auto-layer 產生）
+    DynamicLayer,
+}
+```
+
+### 4.2 SQLite Schema（user.sqlite）
+
+```sql
+CREATE TABLE IF NOT EXISTS file_nodes (
+    node_id TEXT PRIMARY KEY,
+    label TEXT NOT NULL,
+    aliases_json TEXT NOT NULL DEFAULT '{}',
+    file_uuid TEXT,
+    parent_id TEXT,
+    children_json TEXT NOT NULL DEFAULT '[]',
+    node_type TEXT NOT NULL DEFAULT 'file',
+    icon TEXT,
+    color TEXT,
+    bg_color TEXT,
+    created_at TEXT NOT NULL DEFAULT (datetime('now')),
+    updated_at TEXT NOT NULL DEFAULT (datetime('now')),
+    sort_order INTEGER NOT NULL DEFAULT 0,
+    FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
+);
+
+CREATE TABLE IF NOT EXISTS file_registry (
+    file_uuid TEXT PRIMARY KEY,
+    original_name TEXT NOT NULL,
+    file_size INTEGER,
+    file_type TEXT,
+    registered_at TEXT NOT NULL,
+    last_seen_at TEXT,
+    status TEXT NOT NULL DEFAULT 'active'
+);
+```
+
+### 4.3 Display Modes
+
+用戶可切換四種展示模式（儲存在 `localStorage.display_mode`）：
+
+| 模式 | 枚舉值 | 說明 | 實作模組 |
+|------|--------|------|----------|
+| **List** | `list` | 列表檢視：名稱、大小、日期 | `modes/list.rs` |
+| **Tree** | `tree` | 樹狀檢視：展開/折疊層級 | `modes/tree.rs`（Phase 1） |
+| **Small Icon** | `grid_sm` | 小圖示網格：適合縮圖檢視 | `modes/grid_sm.rs` |
+| **Large Icon** | `grid_lg` | 大圖示網格：適合影片預覽 | `modes/grid_lg.rs` |
+
+每種模式實作 `DisplayMode` trait（參見 §1.1）。
+
+### 4.4 多語言別名
+
+| 欄位 | 語言 | 用途 |
+|------|------|------|
+| `zh_tw` | 繁體中文 | 預設語言 |
+| `en_us` | 英文 | 國際使用 |
+| `ja_jp` | 日文 | 日本用戶 |
+| `ko_kr` | 韓文 | 韓國用戶 |
+| `fr_fr` | 法文 | 法國/國際用戶 |
+
+用戶在前端選擇語言後，系統自動顯示對應別名。若該語言的別名不存在，fallback 到 `label`。
+
+### 4.5 自動分層規則
+
+系統根據預設規則自動為檔案建立虛擬層級：
+
+| 規則 | 條件 | 層級結構 |
+|------|------|----------|
+| **by_type** | 相同副檔名 | `Videos/`、`Images/`、`Documents/`、`Audio/`、`Other/` |
+| **by_date** | 按建立日期 | `2026/`、`2026/05/`、`2026/05/14/` |
+| **by_size** | 按檔案大小 | `<10MB`、`10–100MB`、`100MB–1GB`、`>1GB` |
+
+由 `auto_layer.rs` 實作，使用 `NodeType::DynamicLayer` 標記。
+
+---
+
+## 5. 群組分享
+
+### 5.1 Group SQLite 架構（選項 A）
+
+```
+data/groups/
+├── groups.sqlite          # 群組註冊表（全域）
+│   └── groups(
+│       group_id INTEGER PRIMARY KEY,
+│       group_name TEXT,
+│       db_path TEXT,      # 指向 1.sqlite
+│       created_by TEXT,   # 建立者 user_id
+│       created_at TEXT
+│   )
+├── 1.sqlite               # 群組 1 的共用資料
+└── 2.sqlite               # 群組 2 的共用資料
+```
+
+### 5.2 Group SQLite Schema
+
+```sql
+-- groups/1.sqlite
+CREATE TABLE group_members (
+    user_id TEXT NOT NULL,
+    role TEXT NOT NULL DEFAULT 'viewer',  -- owner / editor / viewer
+    joined_at TEXT NOT NULL DEFAULT (datetime('now')),
+    PRIMARY KEY (user_id)
+);
+
+CREATE TABLE group_files (
+    file_uuid TEXT NOT NULL,
+    added_by TEXT NOT NULL,
+    added_at TEXT NOT NULL DEFAULT (datetime('now')),
+    PRIMARY KEY (file_uuid),
+    FOREIGN KEY (added_by) REFERENCES group_members(user_id)
+);
+```
+
+### 5.3 跨 DB 查詢（ATTACH）
+
+```rust
+pub fn get_group_files(conn: &Connection, group_id: i64) -> Result<Vec<GroupFile>> {
+    let group_db = format!("/data/groups/{}.sqlite", group_id);
+    conn.execute_batch(&format!("ATTACH DATABASE '{}' AS g", group_db))?;
+
+    let mut stmt = conn.prepare("
+        SELECT f.file_uuid, f.original_name, gm.role
+        FROM main.file_registry f
+        JOIN g.group_files gf ON f.file_uuid = gf.file_uuid
+        JOIN g.group_members gm ON gf.added_by = gm.user_id
+    ")?;
+
+    // ...
+}
+```
+
+### 5.4 角色權限
+
+| 角色 | 讀取 | 寫入 | 刪除 | 邀請成員 |
+|------|:----:|:----:|:----:|:----:|
+| owner | ✅ | ✅ | ✅ | ✅ |
+| editor | ✅ | ✅ | ❌ | ❌ |
+| viewer | ✅ | ❌ | ❌ | ❌ |
+
+---
+
+## 6. 儲存層級
+
+### 6.1 三級定義
+
+| 層級 | 符號 | 延遲 | 速度 | 成本 | 典型媒體 |
+|------|:----:|------|------|------|----------|
+| **Hot** | 🔥 | <10ms | 高速 | 高 | NVMe SSD / 內建硬碟 |
+| **Warm** | 🌡️ | 10–500ms | 中等 | 中 | NAS（網路掛載） |
+| **Cold** | ❄️ | >1s | 低速 | 低 | LTO 磁帶 / 外接 HDD |
+
+### 6.2 掛載點設定
+
+管理員可設定每個層級的掛載路徑：
+
+```json
+{
+  "tiers": {
+    "hot":  ["/Users/accusys/sftpgo/data", "/Volumes/RAID5/projects"],
+    "warm": ["/Volumes/NAS_Archive"],
+    "cold": ["/Volumes/LTO_Archive"]
+  }
+}
+```
+
+### 6.3 自動歸檔規則
+
+管理員可設定自動歸檔觸發條件：
+
+```json
+{
+  "auto_archive": {
+    "enabled": true,
+    "rules": [
+      {
+        "condition": "idle_days > 90",
+        "action": "move_to_warm",
+        "schedule": "0 2 * * 0"
+      },
+      {
+        "condition": "idle_days > 365",
+        "action": "move_to_cold",
+        "schedule": "0 3 * * 0"
+      },
+      {
+        "condition": "tier_hot_usage > 80%",
+        "action": "move_oldest_to_warm",
+        "schedule": "0 * * * *"
+      }
+    ]
+  }
+}
+```
+
+### 6.4 file_uuid 層級遷移
+
+file_uuid **在遷移過程中不變**。檔案從 Hot 移到 Cold：
+
+1. 複製檔案到 Cold tier 路徑
+2. 驗證完整性（SHA256）
+3. 寫入 `location_history` 記錄新位置
+4. 移除 Hot tier 的原始檔案
+5. `file_registry.last_seen_at` 更新
+
+file_uuid 永遠指向 birth 時的 `physical_path_at_birth`（Hot 路徑），不因遷移而改變。
+
+### 6.5 AI Agent — 按需資料流動
+
+AI Agent 在底層自動管理資料流動，使用者無需知道檔案實際存放層級。
+
+#### 架構
+
+```
+User / Scheduler
+     │
+     ▼
+┌─────────────────────────────────┐
+│          AI Agent               │
+│  • Monitor tier usage           │
+│  • Detect hot/cold patterns     │
+│  • Trigger auto-archive         │
+│  • Restore on access (prefetch) │
+└──────────┬──────────────────────┘
+           │
+           ▼
+┌─────────────────────────────────┐
+│     Transfer Engine             │
+│  Direct (std::fs::copy)         │
+│  Rsync (delta + checksum)       │
+│  S3 / SFS / NFS / CDN           │
+└──────────┬──────────────────────┘
+           │
+           ▼
+┌─────────────────────────────────┐
+│     file_locations              │
+│  (single source of truth)       │
+│  M2  M4  M5  Cloud  LTO        │
+└─────────────────────────────────┘
+```
+
+#### 自動歸檔規則
+
+| 觸發條件 | 動作 | Transfer Engine |
+|----------|------|:--:|
+| `idle_days > 90` | move to Warm | Rsync + checksum verify |
+| `idle_days > 365` | move to Cold | Tar + checksum verify |
+| `hot_tier_usage > 80%` | move oldest to Warm | Rsync —progress |
+| user accesses cold file | restore to Hot | Rsync prefetch |
+
+#### 流程範例
+
+```
+1. AI Agent 偵測 Charade_1963.mp4 閒置 120 天
+2. rsync -avP --checksum → /Volumes/NAS_Archive/
+3. POST /api/v2/files/aeed7134.../locations
+     {"location": "/Volumes/NAS_Archive/Charade_1963.mp4",
+      "label": "M4-warm"}
+4. 移除 Hot tier 位置（或保留為參考）
+5. 使用者查詢檔案資訊 → 看到所有層級，無需知道實際位置
+```
+
+#### 設計原則
+
+| 原則 | 說明 |
+|------|------|
+| 透明遷移 | 使用者查詢 `file_locations` 始終得到一致視圖 |
+| 不變標識 | `file_uuid` 在遷移過程中不變 |
+| 位置追蹤 | 每次遷移後更新 `file_locations`，舊位置可選擇保留為歷史參考 |
+| 驗證完整性 | 遷移後執行 SHA256 校驗（Rsync `--checksum` 或手動比對） |
+| 類似記憶體階層 | Agent 是記憶體控制器：Hot=快取、Warm=主記憶體、Cold=磁碟 |
+
+```
+
+用戶查詢檔案 → 始終看到一致視圖（單一來源真相：file_locations）
+    ↑
+Transfer Engine（rsync / Direct / S3 / SFS / CDN）
+    ↑
+AI Agent（監控 tier 用量、偵測冷熱模式、自動歸檔、預取）
+    ↑
+Storage Tiers（M2 Hot → M4 Warm → M5 Cold → LTO）
+```
+
+```sql
+CREATE TABLE IF NOT EXISTS location_history (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    file_uuid TEXT NOT NULL,
+    location TEXT NOT NULL,    -- 實際檔案路徑
+    tier TEXT NOT NULL,        -- hot / warm / cold
+    moved_at TEXT NOT NULL DEFAULT (datetime('now')),
+    reason TEXT,
+    moved_by TEXT,
+    verified INTEGER DEFAULT 0, -- 完整性驗證通過
+    FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
+);
+
+CREATE INDEX idx_location_history_file_uuid ON location_history(file_uuid);
+```
+
+查詢目前位置：
+
+```sql
+SELECT location, tier
+FROM location_history
+WHERE file_uuid = ?
+ORDER BY moved_at DESC
+LIMIT 1;
+```
+
+---
+
+## 7. 檔案操作 API
+
+### 7.1 操作總覽
+
+| 操作 | API | 說明 |
+|------|-----|------|
+| **Compress** | `POST /api/v2/files/compress` | 壓縮為 .zip 或 .tar.gz |
+| **Transfer** | `POST /api/v2/files/transfer` | 複製/移動到 target tier |
+| **Archive** | `POST /api/v2/files/archive` | 歸檔到 Cold tier |
+| **Restore** | `POST /api/v2/files/restore` | 從 Cold tier 還原到 Hot tier |
+| **Exit** | `POST /api/v2/files/exit` | 從 MarkBase 移除（保留記錄） |
+
+### 7.2 壓縮
+
+```rust
+// Compress 請求
+{
+    "file_uuids": ["uuid1", "uuid2"],
+    "format": "zip",          // "zip" | "tar.gz"
+    "output_path": "/path/to/output.zip"
+}
+
+// Compress 回應
+{
+    "status": "completed",
+    "output_path": "/path/to/output.zip",
+    "file_count": 2,
+    "compressed_size": 1048576
+}
+```
+
+### 7.3 Transfer（層級遷移）
+
+#### 請求/回應
+
+```rust
+// Transfer 請求
+{
+    "file_uuids": ["uuid1"],
+    "target_tier": "cold",
+    "target_path": "/Volumes/LTO_Archive/2026/",
+    "delete_source": false
+}
+
+// Transfer 回應
+{
+    "status": "completed",
+    "file_uuid": "uuid1",
+    "new_location": "/Volumes/LTO_Archive/2026/uuid1.mp4",
+    "new_tier": "cold"
+}
+```
+
+#### Transfer Engine 實作流程
+
+```
+TransferEngine::execute(source, target, opts)
+  │
+  ├── 1. select_mode(source)
+  │     │
+  │     ├── size < 50MB ──→ DirectMode
+  │     └── size >= 50MB ──→ RsyncMode (fallback: DirectMode)
+  │
+  ├── 2. preflight (RsyncMode)
+  │     ├── rsync -an --checksum source/ target/
+  │     └── 回傳變更清單，供用戶確認
+  │
+  ├── 3. transfer
+  │     │
+  │     ├── DirectMode:  std::fs::copy + progress callback
+  │     │
+  │     └── RsyncMode:   rsync -avP --checksum source target
+  │           ├── -a  archive mode
+  │           ├── -v  verbose (進度)
+  │           ├── -P  --partial (續傳) + --progress (進度)
+  │           └── -c  checksum mode (SHA256 驗證替代 time/size)
+  │
+  ├── 4. verify (RsyncMode)
+  │     └── rsync -acn source target  (dry-run checksum，應為空)
+  │
+  ├── 5. update location_history
+  │     └── INSERT INTO location_history (file_uuid, location, tier, ...)
+  │
+  └── 6. cleanup
+        └── if delete_source: remove source file
+```
+
+#### Rsync vs Direct 選擇
+
+| 條件 | 模式 | 原因 |
+|------|:----:|------|
+| `file_size < 50 MB` | Direct | rsync overhead > 效益 |
+| `file_size >= 50 MB` 且 rsync 存在 | Rsync | 增量、續傳、校驗和 |
+| `file_size >= 50 MB` 且 rsync 不存在 | Direct | 優雅 fallback |
+
+### 7.4 Archive / Restore
+
+Archive 為 Transfer 到 Cold tier 的便捷包裝。
+Restore 為從 Cold tier 還原到 Hot tier 的便捷包裝。
+
+```rust
+// Restore 請求
+{
+    "file_uuid": "uuid1",
+    "target_path": "/Users/demo/restored/"  // 選填，預設為原始 birth path
+}
+
+// Restore 回應
+{
+    "status": "completed",
+    "file_uuid": "uuid1",
+    "restored_to": "/Users/demo/restored/uuid1.mp4"
+}
+```
+
+### 7.5 Exit 記錄
+
+檔案移出 MarkBase 管理時，保留記錄以供審計：
+
+```sql
+CREATE TABLE IF NOT EXISTS exit_records (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    file_uuid TEXT NOT NULL,
+    original_name TEXT NOT NULL,
+    exited_at TEXT NOT NULL DEFAULT (datetime('now')),
+    exited_by TEXT NOT NULL,
+    reason TEXT,
+    last_location TEXT,
+    FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
+);
+```
+
+```rust
+// Exit 請求
+{
+    "file_uuid": "uuid1",
+    "reason": "Project completed, moved to long-term archive"
+}
+
+// Exit 回應
+{
+    "status": "completed",
+    "file_uuid": "uuid1",
+    "exited_at": "2026-05-14T10:00:00Z"
+}
+```
+
+---
+
+## 8. API 參考
+
+### 8.1 Tree API
+
+| 方法 | 路徑 | 說明 |
+|------|------|------|
+| `GET` | `/api/v2/tree/:user_id` | 取得用戶的完整虛擬樹 |
+| `GET` | `/api/v2/tree/:user_id?mode=list` | 以特定模式取得樹 |
+| `POST` | `/api/v2/tree/:user_id/node` | 建立新節點 |
+| `PUT` | `/api/v2/tree/:user_id/node/:node_id` | 更新節點（label、icon、color、aliases） |
+| `DELETE` | `/api/v2/tree/:user_id/node/:node_id` | 刪除節點 |
+| `PUT` | `/api/v2/tree/:user_id/node/:node_id/move` | 移動節點（變更 parent） |
+| `PATCH` | `/api/v2/tree/:user_id/node/:node_id/alias` | 更新特定語言的別名 |
+
+### 8.2 File API
+
+| 方法 | 路徑 | 說明 |
+|------|------|------|
+| `GET` | `/api/v2/files/:file_uuid` | 取得檔案資訊 |
+| `POST` | `/api/v2/files/compress` | 壓縮檔案 |
+| `POST` | `/api/v2/files/transfer` | 轉移檔案到 target tier |
+| `POST` | `/api/v2/files/archive` | 歸檔到 Cold tier |
+| `POST` | `/api/v2/files/restore` | 從 Cold tier 還原 |
+| `POST` | `/api/v2/files/exit` | 移出管理 |
+| `GET` | `/api/v2/files/:file_uuid/locations` | 查詢位置歷史 |
+| `POST` | `/api/v2/files/validate` | 驗證檔案完整性（SHA256） |
+
+### 8.3 Mount API
+
+| 方法 | 路徑 | 說明 |
+|------|------|------|
+| `GET` | `/api/v2/mounts` | 列出所有掛載點 |
+| `POST` | `/api/v2/mounts` | 註冊新的掛載點 |
+| `PUT` | `/api/v2/mounts/:mount_id` | 更新掛載點 |
+| `DELETE` | `/api/v2/mounts/:mount_id` | 移除掛載點 |
+| `GET` | `/api/v2/mounts/:mount_id/status` | 查詢掛載點狀態（是否在線、容量） |
+
+### 8.4 Group API
+
+| 方法 | 路徑 | 說明 |
+|------|------|------|
+| `GET` | `/api/v2/groups` | 列出所有群組 |
+| `POST` | `/api/v2/groups` | 建立新群組 |
+| `DELETE` | `/api/v2/groups/:group_id` | 刪除群組 |
+| `POST` | `/api/v2/groups/:group_id/members` | 邀請成員 |
+| `DELETE` | `/api/v2/groups/:group_id/members/:user_id` | 移除成員 |
+| `PUT` | `/api/v2/groups/:group_id/members/:user_id/role` | 變更角色 |
+| `POST` | `/api/v2/groups/:group_id/files` | 分享檔案到群組 |
+| `DELETE` | `/api/v2/groups/:group_id/files/:file_uuid` | 從群組移除檔案 |
+| `GET` | `/api/v2/groups/:group_id/files` | 列出群組檔案 |
+
+---
+
+## 9. 決策記錄
+
+| # | 日期 | 決策 | 理由 |
+|---|------|------|------|
+| 1 | 2026-05-13 | Rust modular architecture (DisplayMode trait) | 與 Momentry Core 相同生態，模組化利於擴展 |
+| 2 | 2026-05-13 | One user = one SQLite | 用戶隔離、簡單部署、檔案可攜 |
+| 3 | 2026-05-13 | Group Share → Option A (Group SQLite) | 獨立可攜、不需專屬 server、備份簡單 |
+| 4 | 2026-05-13 | Hot/Warm/Cold 三級儲存 | 真實世界檔案管理需求，結合 LTO/NAS/SSD |
+| 5 | 2026-05-13 | Auto-archive rules (admin-configurable) | 減少手動管理，idle days + tier 容量觸發 |
+| 6 | 2026-05-14 | file_uuid 從 Momentry Core 繼承，不重新計算 | 唯一來源，避免不一致 |
+| 7 | 2026-05-14 | file_uuid 不因層級遷移而改變 | 凍結在 birth 時刻，確保身份穩定 |
+| 8 | 2026-05-14 | Display mode 儲存在 localStorage | 純 UI 偏好，不需後端儲存 |
+| 9 | 2026-05-14 | 檔案操作 API-first | 後端邏輯完成後再加 UI（壓縮、傳輸、歸檔） |
+| 10 | 2026-05-14 | Exit records（保留記錄） | 審計需求，不直接刪除記錄 |
+| 11 | 2026-05-14 | rusqlite (同步) + spawn_blocking (異步包裝) | 避免整個堆疊都必須 async，保持簡單 |
+| 12 | 2026-05-14 | ATTACH DATABASE for Group Share 跨 DB 查詢 | 一行 SQL，不需 Rust 端合併 |
+| 13 | 2026-05-14 | notify crate (僅 Hot tier) | 減少資源消耗，Warm/Cold 變更頻率低 |
+| 14 | 2026-05-14 | zip + tar crate (不用外部 CLI) | 跨平台，不需 ditto/hdiutil |
+| 15 | 2026-05-14 | Momentry Core 整合 A+B 混合模式 | 輕量運算用 crate，重查詢用 HTTP API |
+| 16 | 2026-05-14 | AI Agent 按需資料流動 | 透明遷移、類似記憶體階層、自動冷熱管理 |
+| 17 | 2026-05-14 | file_locations 支援任意 URI | /path、s3://、sfs://、ipfs://、https://、\\SMB\path |
+
+---
+
+## 10. 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-05-12 | 初版設計（Demo Display + Knowledge Graph） | M4 / OpenCode | DeepSeek V4 Pro |
+| V2.0 | 2026-05-14 | 虛擬檔案樹、Group Share、儲存層級、技術棧、file_uuid、檔案操作 API、AI Agent 按需資料流動、跨平台 multi-location | M4 / OpenCode | DeepSeek V4 Pro |
--- a/docs_v1.0/DESIGN/MARKBASE_DESIGN_v1.0.0.md
+++ b/docs_v1.0/DESIGN/MARKBASE_DESIGN_v1.0.0.md
@@ -0,0 +1,730 @@
+# MarkBase — Momentry 專屬 Display Engine 設計方案 v1.0
+
+## 產品定位
+
+**MarkBase** 是 Momentry 專屬的 Display Engine，擔任 **demo runner 的固定顯示器**。
+
+不只是 Markdown 閱讀器，而是一個可控的內容呈現視窗，能夠動態展示：
+
+| 內容類型 | 展示方式 |
+|----------|----------|
+| .md 文件 | 渲染為排版清晰的 HTML |
+| Mermaid 圖表 | 流程圖、時序圖、ER 圖等 |
+| API 回應 JSON | 語法高亮的格式化 JSON |
+| 影片 | 嵌入 video player（支援 HLS / MP4）|
+| 圖片 | 支援單張或輪播 |
+| HTML | 直接內嵌 |
+| 文字/程式碼 | syntax highlight |
+
+**定位一句話：** *Demo runner 的 presentation layer，一個專注、乾淨、可控的內容顯示器。*
+
+| 面向 | 說明 |
+|------|------|
+| 願景 | Momentry 生態系的 UI 輸出終端 |
+| 核心場景 | demo runner 的固定 display 視窗 |
+| 平台 | macOS native（Rust + axum + Tauri WebView）|
+| 授權 | Momentry 專屬工具，隨 momentry_core 發布 |
+
+---
+
+## 命名
+
+**MarkBase** — Markdown + Display Base
+
+> 承載所有內容類型的顯示基底。
+> 簡短、好記、產品感。
+
+---
+
+## 階段規劃
+
+### Phase 0：Demo Display（MVP — 立即價值）
+
+**目標**：取代 md_reader + 影片播放，成為 demo runner 的固定顯示視窗
+
+| 功能 | 說明 |
+|------|------|
+| 文件渲染 | CommonMark + GFM（表格、task list、strikethrough、footnotes）|
+| Mermaid 圖表 | 內建渲染（無需 CDN），支援 flowchart / sequence / class / ER / mindmap |
+| 程式碼高亮 | syntax highlighting（支援 50+ 語言）|
+| JSON 格式化 | API response 自動格式化 + 語法高亮 |
+| 影片播放 | MP4 / HLS 嵌入播放（取代 browser 開啟 trace video）|
+| 全螢幕 mode | 乾淨無干擾的展示模式，適合 presentation |
+| CLI 控制 | 透過 stdin / HTTP 動態載入內容，無需重新啟動 |
+| 與 demo runner 整合 | `--display` flag 啟動作為固定顯示視窗 |
+
+#### Demo Runner 整合流程
+
+```
+demo_runner.py --display          MarkBase.app (固定顯示視窗)
+┌────────────────────┐           ┌────────────────────┐
+│ Step 3: Markdown   │  ──HTTP──▶│ 渲染 GUIDE.md      │
+│ Step 11: Trace 5   │  ──HTTP──▶│ 播放 trace_5.mp4   │
+│ Step 13: 3D Cube   │  ──HTTP──▶│ 顯示 iframe: portal │
+│ Step 22: API resp  │  ──HTTP──▶│ 顯示格式化 JSON     │
+└────────────────────┘           └────────────────────┘
+       (控制端)                         (顯示端)
+```
+
+demo runner 透過 `--display` 啟動 MarkBase 作為顯示視窗，然後每步透過 HTTP 推送內容：
+
+```python
+# demo_runner.py 範例
+step_type = "markdown" → POST /display {"type":"md","file":"GUIDE.md"}
+step_type = "video"    → POST /display {"type":"video","url":"trace_5.mp4"}
+step_type = "curl"     → POST /display {"type":"json","data":response}
+step_type = "browser"  → POST /display {"type":"url","url":"..."}
+```
+
+### Phase 2：Knowledge Base
+
+**目標**：從閱讀器升級為個人知識庫管理器
+
+| 功能 | 說明 |
+|------|------|
+| 多文件索引 | 監控目錄，自動索引所有 .md |
+| 全文檢索 | 跨文件模糊搜尋 + 標題索引 |
+| 標籤管理 | YAML frontmatter tags → 標籤雲 |
+| Backlinks | 文件間的雙向連結（[[wiki-link]]）|
+| 收藏/書籤 | 標記常用文件 |
+| 閱讀歷史 | 最近開啟 / 最近搜尋 |
+
+### Phase 3：Collaboration
+
+**目標**：多人協作與發布
+
+| 功能 | 說明 |
+|------|------|
+| 評論/註釋 | 段落層級註解 |
+| 版本歷史 | git-based diff 檢視 |
+| 靜態站點生成 | .md → 整站 HTML（用於發布）|
+| Web 版本 | 瀏覽器可讀（可選自托管）|
+
+---
+
+## CLI 設計（Portal / Demo 使用）
+
+### 主要命令
+
+```
+markbase display                    ← 啟動顯示視窗（blocking，等待 HTTP 控制）
+markbase display "GUIDE.md"        ← 啟動並立刻顯示文件
+markbase preview "GUIDE.md"        ← (保留) 單次預覽，不回傳控制權
+markbase render "GUIDE.md"         ← (保留) 輸出 HTML 到 stdout
+```
+
+### display — 核心命令（給 demo runner 使用）
+
+```bash
+# 啟動顯示視窗，demo runner 透過 HTTP 控制
+markbase display
+
+# 指定控制埠（預設 11438）
+markbase display --port 11438
+
+# 全螢幕模式
+markbase display --fullscreen
+
+# 啟動時先顯示文件
+markbase display GUIDE.md
+```
+
+### HTTP 控制 API（display 模式下啟用）
+
+`markbase display` 啟動後在 `localhost:11438` 監聽控制請求：
+
+```bash
+# 顯示 .md 文件
+curl -X POST http://localhost:11438/display \
+  -H "Content-Type: application/json" \
+  -d '{"type":"md","file":"/path/to/doc.md","focus":"API 搜尋"}'
+
+# 播放影片
+curl -X POST http://localhost:11438/display \
+  -d '{"type":"video","url":"/path/to/trace.mp4","start":10,"end":30}'
+
+# 顯示格式化 JSON
+curl -X POST http://localhost:11438/display \
+  -d '{"type":"json","data":"{\"status\":\"ok\"}"}'
+
+# 內嵌網頁
+curl -X POST http://localhost:11438/display \
+  -d '{"type":"url","url":"http://localhost:1420/trace-viz/..."}'
+
+# 顯示圖片
+curl -X POST http://localhost:11438/display \
+  -d '{"type":"image","url":"/path/to/thumbnail.jpg"}'
+
+# 控制命令
+curl -X POST http://localhost:11438/control \
+  -d '{"cmd":"fullscreen"}'
+curl -X POST http://localhost:11438/control \
+  -d '{"cmd":"zoom","level":1.5}'
+curl -X POST http://localhost:11438/control \
+  -d '{"cmd":"close"}'
+```
+
+### demo_runner.py 整合
+
+```python
+class MarkBaseDisplay:
+    """控制 MarkBase 顯示視窗。"""
+    def __init__(self, port=11438):
+        self.port = port
+        self.process = None
+    
+    def start(self):
+        self.process = subprocess.Popen(["markbase", "display",
+            "--port", str(self.port)], ...)
+        time.sleep(1)  # wait for server
+    
+    def show(self, type, **kwargs):
+        """顯示內容。type: md/video/json/url/image"""
+        body = {"type": type, **kwargs}
+        requests.post(f"http://localhost:{self.port}/display", json=body)
+    
+    def show_step(self, step):
+        """根據 demo step 類型自動選擇顯示方式。"""
+        t = step["type"]
+        if t == "curl":
+            self.show("json", data=run_curl(step["cmd"]))
+        elif t == "browser":
+            self.show("url", url=step["url"])
+        elif t == "markdown":
+            self.show("md", file=step["cmd"], focus=step.get("focus"))
+        elif t == "video":
+            self.show("video", url=step.get("url"))
+
+
+---
+
+## 技術架構
+
+```
+┌─────────────────────────────────────────┐
+│              MarkBase App               │
+├─────────────────┬───────────────────────┤
+│   Frontend      │   Engine             │
+│  (SwiftUI)      │  (Rust core)         │
+│                 │                      │
+│  • 視窗管理     │  • 解析 .md → AST    │
+│  • 選單、快捷鍵  │  • Mermaid 渲染      │
+│  • 設定介面     │  • Code highlight     │
+│  • 搜尋 UI     │  • 全文索引            │
+│  • 目錄樹      │  • 文件監控            │
+└─────────────────┴───────────────────────┘
+         │                  │
+         ▼                  ▼
+   macOS Native API     Rust 二進制
+   (WebKit + Swift)     (pulldown-cmark + syntect + mermaid-rs)
+```
+
+### 為什麼 Engine 用 Rust？
+
+| 原因 | 說明 |
+|------|------|
+| 效能 | 大型 .md 文件（1000+ 行）瞬間渲染 |
+| 無 runtime | 單一二進制，無 Node.js/Python 依賴 |
+| 現有基礎 | 可直接重用 md_reader 的 rendering 邏輯 |
+| Mermaid 內嵌 | 可用 mermaid-rs crate 替代 CDN |
+
+### 為什麼 Frontend 用 SwiftUI？
+
+| 原因 | 說明 |
+|------|------|
+| Native 體驗 | macOS native 視窗、menu bar、快捷鍵 |
+| WebKit 整合 | 直接嵌入 WKWebView 渲染 HTML |
+| 系統整合 | Spotlight、QuickLook、分享功能 |
+| 效能 | 比 Electron 省 200MB+ 記憶體 |
+
+---
+
+## UI 設計
+
+### 主視窗佈局
+
+```
+┌────────────────────────────────────────────────┐
+│ Menu Bar: File  Edit  View  Window  Help        │
+├──────────┬─────────────────────────────────────┤
+│          │                                     │
+│  左側欄   │  主內容區                            │
+│  ──────  │  ─────────────────                  │
+│  📁 文件  │  # 標題                             │
+│    ├ README│  正文...                           │
+│    ├ Guide│  ```code block```                  │
+│    └ API  │  表格                               │
+│          │  [Mermaid diagram]                  │
+│  目錄     │                                     │
+│  ──────  │                                     │
+│  • Introduction│                               │
+│  • Getting...│                                 │
+│  • API Ref │                                  │
+│          │                                     │
+├──────────┴─────────────────────────────────────┤
+│ Status Bar: 字數 | 段落 | UTF-8 | dark mode toggle│
+└────────────────────────────────────────────────┘
+```
+
+### 快捷鍵
+
+| 按鍵 | 功能 |
+|------|------|
+| `Cmd+O` | 開啟 .md 文件 |
+| `Cmd+F` | 全文搜尋 |
+| `Cmd+Shift+F` | 跨文件搜尋 |
+| `Cmd++` / `Cmd+-` | 調整字級 |
+| `Cmd+D` | Toggle dark mode |
+| `Cmd+B` | 左側目錄 toggle |
+| `Cmd+P` | 列印 / PDF 匯出 |
+| `Esc` | 關閉搜尋 / 回到瀏覽 |
+
+---
+
+## 目錄結構
+
+```
+markbase/
+├── Cargo.toml              # Rust core
+├── src/
+│   ├── main.rs             # CLI entry point
+│   ├── render.rs           # .md → HTML
+│   ├── highlight.rs        # Code syntax highlighting
+│   ├── mermaid.rs          # Mermaid rendering
+│   ├── search.rs           # Full-text search
+│   └── watch.rs            # File watcher
+├── app/                    # SwiftUI app
+│   ├── MarkBase.xcodeproj
+│   ├── MarkBase/
+│   │   ├── ContentView.swift
+│   │   ├── SidebarView.swift
+│   │   ├── SearchView.swift
+│   │   └── SettingsView.swift
+│   └── markbase-cli        # Embedded Rust binary
+└── docs/
+    └── ARCHITECTURE.md
+```
+
+---
+
+## 與現有 md_reader 的差異
+
+| 面向 | md_reader | MarkBase |
+|------|-----------|----------|
+| 語言 | 純 Rust CLI | Rust engine + SwiftUI app |
+| 架構 | 單一 main.rs 1134 行 | 模組化 6+ 檔案 |
+| 視窗 | 簡陋的 WebKit 視窗 | 完整 SwiftUI + WKWebView |
+| 搜尋 | ❌ 無 | ✅ Cmd+F + 跨文件搜尋 |
+| 目錄 | ❌ 無 | ✅ 左側 heading tree |
+| File watcher | ❌ 無 | ✅ 自動索引目錄 |
+| dark mode | ❌ 無 | ✅ 系統跟隨 + 手動 |
+| Mermaid | CDN-based | 內建引擎 |
+| Code highlight | ❌ 無 | ✅ syntect 50+ 語言 |
+| 命名 | 功能描述 | 產品品牌 |
+
+---
+
+## 技術選型記錄
+
+> 2026-05-12 新增
+
+### 1. 轉檔引擎
+
+| 工具 | License | 用途 |
+|------|---------|------|
+| pandoc 3.9 | GPL 2.0 | MD ↔ DOCX/PPTX/PDF |
+| LibreOffice 26.2 | Apache 2.0 | 任何格式 ↔ 任何格式 (headless CLI) |
+| mmdc | MIT | Mermaid → SVG/PNG |
+| rsvg-convert | LGPL | SVG → PNG |
+
+### 2. 編輯器選型
+
+| 方案 | 決策 | 理由 |
+|------|:--:|------|
+| CodeMirror 6 | ✅ 選用 | MIT, 190KB gzip, CDN 免 npm, 模組化 |
+| Monaco (VS Code) | ❌ | 5MB 太大，需 webpack |
+| Ace | ❌ | 維護停滯 |
+
+### 3. Markdown 生態分析
+
+| 工具 | License | 類型 | MarkBase 啟發 |
+|------|---------|------|--------------|
+| glow | MIT | CLI 渲染 | 保留為獨立 CLI viewer |
+| MarkText | MIT | WYSIWYG GUI | 參考 split-pane 編輯/預覽設計 |
+| mdcat | MPL 2.0 | CLI | 參考 terminal 圖片渲染 |
+| bat | MIT/Apache | CLI | 參考語法高亮策略 |
+| mdBook | MPL 2.0 | CLI | 作為靜態文件站匯出格式 |
+| MkDocs | BSD | CLI | 備選文件站方案 |
+| Obsidian | Proprietary | Desktop PKM | 參考 `[[wiki links]]`、graph view、backlinks |
+
+### 4. 桌面 vs Web
+
+| 決策 | 選擇 | 理由 |
+|------|:--:|------|
+| Web first | ✅ | 任何裝置可用，同一份 HTML/JS/CSS |
+| Tauri shell | ✅ 可選 | <10MB, 跨平台 macOS/Win/Linux |
+| Electron | ❌ | 300MB 過於肥大 |
+
+### 5. MarkBase vs Obsidian 定位
+
+| | Obsidian | MarkBase |
+|------|:--:|:--:|
+| 定位 | 個人知識管理 (PKM) | **文件處理引擎 + 編輯器** |
+| 資料格式 | .md only | 全格式 (via soffice) |
+| 搜尋 | 全文 | RAG + embedding (Qdrant) |
+| 後端 | 無 | axum HTTP + PSQL + Qdrant |
+| CLI | 無 | ✅ CLI first |
+| Pipeline | 無 | ✅ Chunking + LLM pipeline |
+| 跨裝置 | 付費 sync | 自建 server 即可 |
+| 大小 | ~300MB (Electron) | <10MB (Tauri) |
+| 授權 | Proprietary (個人免費) | Momentry 專屬 |
+
+### 6. CLI 設計
+
+```
+markbase display [--port 11438] [FILE]     啟動顯示伺服器
+markbase render <FILE> [-o output.html]    Markdown → HTML
+markbase serve <DIR>                       檔案瀏覽 + 編輯器 (計畫中)
+```
+
+### 7. 架構對比
+
+```
+Obsidian:                        MarkBase:
+┌──────────────────────┐        ┌──────────────────────┐
+│  Electron Shell       │        │  Tauri / Browser      │
+│  ┌────────────────┐   │        │  ┌────────────────┐   │
+│  │  Renderer       │   │        │  │  Renderer       │   │
+│  │  ├─ CodeMirror  │   │        │  │  ├─ CodeMirror  │   │   ← 相同
+│  │  ├─ Graph/D3    │   │        │  │  ├─ Mermaid.js  │   │   ← 相同
+│  │  ├─ Mermaid.js  │   │  相同   │  │  └─ pulldown    │   │
+│  │  └─ MathJax     │   │        │  └────────────────┘   │
+│  └────────────────┘   │        │  ┌────────────────┐   │
+│  ┌────────────────┐   │        │  │  Rust Backend  │   │  ← MarkBase 獨有
+│  │  Plugin API     │   │        │  │  ├─ axum HTTP  │   │
+│  │  1,800+ plugins │   │        │  │  ├─ Embedding  │   │
+│  └────────────────┘   │        │  │  ├─ Qdrant ANN │   │
+│  ┌────────────────┐   │        │  │  ├─ pgvector   │   │
+│  │  FS Access      │   │        │  │  ├─ PG TKG     │   │
+│  │  .md files only │   │        │  │  ├─ SQLite TKG │   │
+│  │  └────────────────┘   │        │  │  ├─ sqlite-vec │   │
+│  └──────────────────────┘        │  │  └─ Pipeline   │   │
+```
+
+### 8. 向量儲存：sqlite-vec + Datasette
+
+> 2026-05-12 採用
+
+#### 選型
+
+| 需求 | pgvector (PG) | Qdrant | sqlite-vec | 決策 |
+|------|:--:|:--:|:--:|:--:|
+| Production API (3003) | ✅ | — | — | pgvector (已有) |
+| HNSW ANN 搜尋 | ⚠️ | ✅ | — | Qdrant (已有) |
+| Desktop 本機 RAG | ❌ 需裝 PG | ❌ 需 server | ✅ 單檔 | sqlite-vec |
+| 檔案包內嵌向量 | ❌ | ❌ | ✅ 隨包分發 | sqlite-vec |
+| 離線可用 | ❌ | ❌ | ✅ | sqlite-vec |
+| Web UI 查詢 | — | — | via Datasette | Datasette |
+
+#### sqlite-vec 規格
+
+| 屬性 | 值 |
+|------|-----|
+| License | MIT + Apache 2.0（雙授權） |
+| 作者 | Alex Garcia |
+| 贊助 | Mozilla Builders + Fly.io + Turso + SQLite Cloud |
+| Stars | 7,600+ |
+| 語言 | Pure C，零依賴 |
+| 大小 | ~200KB `.dylib` |
+| ANN 引擎 | exhaustive, IVF, DiskANN |
+| Rust binding | `cargo add sqlite-vec` |
+
+#### Datasette（選配 Web UI）
+
+| 屬性 | 值 |
+|------|-----|
+| License | Apache 2.0 |
+| 作者 | Simon Willison |
+| 定位 | SQLite → Web UI + JSON API |
+| Plugins | 154 個 |
+| sqlite-vec 插件 | `datasette-sqlite-vec`（同一作者） |
+
+#### 使用範例
+
+```sql
+.load ./vec0
+
+CREATE VIRTUAL TABLE chunks USING vec0(
+  embedding float[768],
+  file_uuid text,
+  chunk_type text,
+  text_content text
+);
+
+INSERT INTO chunks VALUES (?, 'uuid-123', 'sentence', 'hello world');
+
+SELECT rowid, text_content, distance
+FROM chunks WHERE embedding MATCH ?
+ORDER BY distance LIMIT 10;
+```
+
+#### 四層向量架構
+
+```
+Production  ← Qdrant (HNSW ANN, fast at scale)
+           ← pgvector (transactional, alongside chunk data)
+           ↓ backup / export
+
+Portable    ← sqlite-vec (.sqlite single file, package distributable)
+           ← Datasette (optional Web UI)
+```
+
+### 9. Qdrant Graph 分析
+
+> 2026-05-12 結論：Qdrant **沒有**原生 Graph 功能，是純向量資料庫
+
+#### Qdrant 現有功能
+
+| 功能 | 說明 | 圖論等級 |
+|------|------|:--:|
+| **Payload filtering** | 向量搜尋 + JSON 條件過濾 | ⚠️ 偽關聯查詢 |
+| **Collection aliases** | 多 collection 聯合查詢 | ⚠️ 基礎 |
+| **Hybrid Queries** | 向量 + 關鍵字混合 | ❌ |
+| **Qdrant Edge** | 嵌入式向量搜尋 | ❌ 非 Graph |
+| **Data Graphs (第三方)** | Neo4j + Qdrant hybrid RAG | ✅ 非原生 |
+
+#### Payload filtering 的極限
+
+可以模擬 1-hop 關係（例如「找 Cary Grant 說話的 chunk」），但不能做真正的 graph traversal：
+
+```json
+// ✅ 1-hop：filter speaker = "Cary Grant"
+{"filter": {"must": [{"key": "speaker", "match": {"value": "Cary Grant"}}]}}
+
+// ❌ 2-hop：graph traversal Qdrant 無法做到
+// "誰跟 Cary Grant 在同一個場景出現？"
+// "這些人中誰又跟 Audrey Hepburn 對話？"
+```
+
+| 限制 | 說明 |
+|------|------|
+| ❌ 2-hop+ traversal | 無法跨節點關聯查詢 |
+| ❌ 邊緣權重/時間 | 無 edge property 概念 |
+| ❌ Graph algebra | 無 `shortest_path`, `PageRank` 等演算法 |
+| ❌ Cypher/GQL | 無圖查詢語言 |
+
+#### Momentry TKG 決策
+
+| | Qdrant-only | PG TKG | SQLite TKG | Neo4j |
+|---|:--:|:--:|:--:|:--:|
+| 向量搜尋 | ✅ 原生 | via pgvector | via sqlite-vec | via plugin |
+| Graph traversal | ❌ | ✅ CTE | ✅ CTE | ✅ 原生 |
+| 2-hop+ 查詢 | ❌ | ✅ | ✅ | ✅ |
+| 時間範圍邊緣 | ❌ | ✅ | ✅ | ✅ |
+| 部署 | 需 server | 需 PG | **單檔** | 需 Java |
+| 檔案包分發 | ❌ | ❌ | ✅ | ❌ |
+| 適合規模 | 大 | 中 | 小-中 | 大 |
+
+#### 架構分工
+
+```
+Qdrant   → 向量搜尋（ANN）- 核心效能
+PG       → TKG 圖查詢（Recursive CTE）- API server
+SQLite   → TKG 圖查詢（Recursive CTE）- 檔案包/離線
+```
+
+---
+
+## 亮點：知識圖譜 (Knowledge Graph)
+
+> 2026-05-12 新增
+
+### Obsidian vs MarkBase 圖譜對比
+
+| | Obsidian Graph | MarkBase Knowledge Graph |
+|------|:--:|:--:|
+| 節點來源 | 手動建立的 `.md` 筆記 | AI pipeline 自動產生的 chunks |
+| 邊緣來源 | 手寫 `[[wikilinks]]` | **語意相似度**、結構層級、共現關係 |
+| 生成方式 | 人工 | **自動**（embedding + clustering） |
+| 影片支援 | ❌ | ✅ face traces, speaker graph, scene transitions |
+| 實體辨識 | ❌ | ✅ 人臉/說話者/物件/場景 |
+| 規模 | 數百節點 | **數萬節點**（chunk 級） |
+| 過濾 | 無 | 時間範圍、置信度、chunk type |
+
+### 圖譜類型
+
+#### A. 語意關係圖（Semantic Graph）
+
+以 embedding 餘弦相似度建立邊緣，相近 chunk 靠近。
+
+```
+[Audrey Hepburn 說話] ──0.82── [Cary Grant 回應]
+        │                              │
+        │ 0.75                         │ 0.78
+        ▼                              ▼
+[討論離婚原因]    ──0.91──    [緊張對話場景]
+```
+
+**演算法**：
+1. 取所有 chunk embedding
+2. 計算 pairwise cosine similarity
+3. 保留 top-K 相似邊（K=5 預設）
+4. 用 UMAP/t-SNE → 2D 座標
+5. D3.js force layout 渲染
+
+#### B. 結構層級圖（Hierarchy Graph）
+
+文件 → 章節 → 段落 的三層樹狀結構。
+
+#### C. 人物關係圖（Identity Graph）
+
+基於 face_detections + speaker_assign。
+
+```
+Cary Grant ──[對手戲]── Audrey Hepburn
+    │                       │
+    │[對話]                 │[場景共現]
+    ▼                       ▼
+Walter Matthau ────── Ned Glass
+```
+
+#### D. 時序演進圖（Timeline Graph）
+
+Chunks 按時間軸排列，場景切換點標記。X 軸 = 時間，Y 軸 = 說話者。
+
+### 渲染技術
+
+| 層 | 工具 | License |
+|----|------|---------|
+| 力導向佈局 | D3-force (d3.js v7) | ISC |
+| 降維 (UMAP) | umap-js | MIT |
+| 2D 繪圖 | Canvas / SVG via D3 | ISC |
+| 3D 繪圖 | Three.js | MIT |
+| 節點過濾 | Crossfilter / vanilla JS | — |
+
+### API 設計
+
+```
+GET /api/v1/graph/:file_uuid/identity    → 人物關係圖資料
+GET /api/v1/graph/:file_uuid/semantic?depth=3   → 語意圖資料
+GET /api/v1/graph/:file_uuid/hierarchy   → 結構層級圖
+GET /api/v1/graph/:file_uuid/timeline    → 時序圖資料
+```
+
+回傳格式：
+```json
+{
+  "nodes": [
+    {"id": "chunk_100", "label": "Cary Grant: What's your name?", "group": 3, "x": 0.1, "y": 0.5}
+  ],
+  "edges": [
+    {"source": "chunk_100", "target": "chunk_104", "weight": 0.82, "type": "semantic"}
+  ]
+}
+```
+
+### 互動設計
+
+| 操作 | 行為 |
+|------|------|
+| Drag node | 拖曳節點 |
+| Click node | 展開 chunk 內容預覽 |
+| Scroll | 縮放圖譜 |
+| Filter bar | 依 chunk_type / speaker / confidence 過濾 |
+| Double-click | 聚焦該節點，展開子圖 |
+| Hover edge | 顯示相似度分數 |
+
+### 圖譜渲染工具選型
+
+> 2026-05-12 新增
+
+#### 候選工具對比
+
+| 工具 | License | 大小 | CDN | 圖論演算法 | 中國社群 | 最佳場景 |
+|------|---------|:--:|:--:|:--:|:--:|------|
+| **Cytoscape.js** | MIT | ~120KB | ✅ | ✅ BFS/DFS/PageRank | ⚠️ | 複雜網絡圖 |
+| D3.js v7 | ISC | ~80KB | ✅ | ❌ 需自寫 | ⚠️ | 任何自訂圖表 |
+| ECharts | Apache 2.0 | ~1MB | ✅ | ❌ | ✅ 非常大 | 通用圖表 + 地圖 |
+| G6 (AntV) | MIT | ~500KB | ✅ | ✅ 多種佈局 | ✅ 非常大 | 關係圖專用 |
+| vis-network | MIT/Apache | ~300KB | ✅ | ❌ | ❌ | 網絡圖 |
+| Sigma.js | MIT | ~80KB | ✅ | ❌ | ❌ | WebGL 大圖 (>5000節點) |
+| Graphviz | EPL 1.0 | ~3MB | ❌ CLI only | ✅ | ⚠️ | 靜態匯出 SVG/PNG |
+
+#### 選型過程
+
+**第一輪篩選**：排除 CLI-only (Graphviz)、無 CDN、中文社群弱且圖論支援差的 (vis-network, Sigma.js)。
+
+剩餘：Cytoscape.js, D3.js, ECharts, G6。
+
+**第二輪深度評估**：
+
+| | Cytoscape.js | D3.js | ECharts | G6 |
+|---|:--:|:--:|:--:|:--:|
+| 力導向佈局 | ✅ 9 種 | ✅ 自寫 | ✅ 1 種內建 | ✅ 9 種 |
+| 複合節點 (compound) | ✅ | ❌ | ❌ | ✅ |
+| 圖論演算法 | ✅ 內建 | ❌ | ❌ | ✅ |
+| JSON → Graph | ✅ 原生 | ⚠️ 手動 | ⚠️ 手動 | ✅ 原生 |
+| TreeGraph | ⚠️ 需擴展 | ✅ | ❌ | ✅ 專用 |
+| 大型圖效能 | ⚠️ (>5000會慢) | ✅ | ✅ Canvas | ✅ |
+| 互動 API | ✅ 豐富 | ✅ 最靈活 | ✅ | ✅ |
+| 零外部依賴 | ✅ | ✅ | ❌ (zrender) | ❌ |
+
+**最終決策**：
+
+| 場景 | 選用 | 理由 |
+|------|:--:|------|
+| 知識圖譜核心 | **Cytoscape.js** | 圖論演算法、fCoSE 佈局、JSON 原生對接、Obsidian/Mermaid 都用 |
+| 統計輔助圖表 | **ECharts** | 中文社群大、Apache 背書、長條/圓餅/分佈圖開箱即用 |
+| 樹狀層級圖 | **G6 TreeGraph** | 專用 API，文件結構圖最簡潔 |
+| 自訂特殊需求 | **D3.js** | 保底方案，任何無法滿足的圖表 |
+
+#### Cytoscape.js 使用者背書
+
+| 組織 | 用途 |
+|------|------|
+| **Mermaid** | 流程圖/時序圖渲染引擎 |
+| **Obsidian** | 知識圖譜 (Graph View) |
+| Amazon, Google, Meta, Microsoft | 內部網絡圖視覺化 |
+| IBM, Cisco, Tencent, Uber | 網路拓樸視覺化 |
+| GitHub | 相依性圖 |
+
+#### 整合架構
+
+```
+MarkBase Knowledge Graph:
+┌──────────────────────────────────────┐
+│  圖譜類型           渲染引擎         │
+│  ─────────          ────────         │
+│  語意關係圖   →    Cytoscape.js      │
+│  結構層級圖   →    G6 TreeGraph      │
+│  人物關係圖   →    Cytoscape.js      │
+│  時序演進圖   →    ECharts timeline  │
+│  降維散點圖   →    D3.js             │
+│  統計分佈圖   →    ECharts           │
+│                                     │
+│  全部 CDN 載入，無需 npm            │
+└──────────────────────────────────────┘
+```
+
+### 在 MarkBase 中的整合
+
+```
+MarkBase Control Bar:
+⏮ ◀ ▶ ⏭ | Graph | Tree | Edit | 🔍
+                   ↑
+            Knowledge Graph View
+```
+
+---
+
+## 開發路線圖
+
+| 階段 | 時程 | 交付 |
+|------|:----:|------|
+| P0 Core rendering | ✅ Done | Rust engine: .md→HTML with Mermaid + AJAX refresh |
+| P1 macOS app | ✅ Done | Tauri shell (可選) |
+| P2 File tree + Editor | 2-3d | CodeMirror 6 + lazy-load 樹狀瀏覽 + 存檔 |
+| P3 Knowledge Graph | 3-5d | Cytoscape.js + G6 + ECharts: 語意/結構/人物關係圖譜 |
+| P4 Knowledge base | 3-5d | 多文件索引、全文檢索、backlinks |
+| P5 Export | 2d | 轉檔 CLI (md→pdf/docx/pptx) |
+| P6 Collaboration | 5-10d | 評論、版本、靜態站點 |
--- a/docs_v1.0/DESIGN/MODULE_STANDARDIZATION_SPECIFICATION.md
+++ b/docs_v1.0/DESIGN/MODULE_STANDARDIZATION_SPECIFICATION.md
@@ -0,0 +1,647 @@
+---
+document_type: "reference_doc"
+service: "MOMENTRY_CORE"
+title: "處理器模組標準化規範"
+date: "2026-04-25"
+version: "V1.0"
+status: "active"
+owner: "Warren"
+created_by: "OpenCode"
+tags:
+  - "處理器模組標準化規範"
+ai_query_hints:
+  - "查詢 處理器模組標準化規範 的內容"
+  - "處理器模組標準化規範 的主要目的是什麼？"
+  - "如何操作或實施 處理器模組標準化規範？"
+---
+
+# 處理器模組標準化規範
+
+## 概述
+
+本規範定義 Momentry Core 中處理器模組的標準化架構、接口和實現模式。目標是確保所有處理器模組（ASR、OCR、YOLO、Face、Pose、CUT、ASRX、Caption、Story）遵循一致的設計原則，提高代碼可維護性、可測試性和可擴展性。
+
+## 架構原則
+
+### 1. 分層架構
+```
+┌─────────────────────────────────────────┐
+│            Rust API 層                   │
+│  (src/core/processor/*.rs)              │
+├─────────────────────────────────────────┤
+│         Python 執行層                    │
+│   (scripts/*_processor.py)              │
+├─────────────────────────────────────────┤
+│          AI 模型層                       │
+│   (Whisper, YOLO, EasyOCR, etc.)        │
+└─────────────────────────────────────────┘
+```
+
+### 2. 職責分離
+- **Rust 層**: 接口定義、錯誤處理、配置管理、結果解析
+- **Python 層**: AI 模型調用、數據處理、中間文件管理
+- **模型層**: 特定 AI 任務執行
+
+## Rust 模組規範
+
+### 文件結構
+```
+src/core/processor/
+├── mod.rs              # 模組導出
+├── executor.rs         # Python 執行器（共享）
+├── asr.rs              # ASR 處理器
+├── ocr.rs              # OCR 處理器
+├── yolo.rs             # YOLO 處理器
+├── face.rs             # 人臉檢測處理器
+├── pose.rs             # 姿態檢測處理器
+├── cut.rs              # 場景切割處理器
+├── asrx.rs             # ASRX 處理器
+├── caption.rs          # 字幕生成處理器
+└── story.rs            # 故事分析處理器
+```
+
+### 模組模板
+
+#### 1. 結果結構定義
+```rust
+use anyhow::{Context, Result};
+use serde::{Deserialize, Serialize};
+use std::time::Duration;
+
+use super::executor::PythonExecutor;
+use crate::core::config::processor;
+
+// 主要結果結構
+#[derive(Debug, Serialize, Deserialize)]
+pub struct ModuleResult {
+    // 通用字段
+    pub processing_time: Option<f64>,
+    pub metadata: Option<serde_json::Value>,
+    
+    // 模組特定字段
+    // ...
+}
+
+// 數據單元結構
+#[derive(Debug, Serialize, Deserialize)]
+pub struct DataUnit {
+    // 時間或幀相關字段
+    pub start: f64,
+    pub end: f64,
+    pub frame: u64,
+    
+    // 數據內容
+    // ...
+}
+```
+
+#### 2. 處理函數模板
+```rust
+pub async fn process_module(
+    video_path: &str,
+    output_path: &str,
+    uuid: Option<&str>,
+) -> Result<ModuleResult> {
+    // 1. 初始化執行器
+    let executor = PythonExecutor::new()?;
+    let script_path = executor.script_path("module_processor.py");
+    
+    // 2. 記錄日誌
+    tracing::info!("[MODULE] Starting processing: {}", video_path);
+    
+    // 3. 執行 Python 腳本
+    executor
+        .run(
+            "module_processor.py",
+            &[video_path, output_path],
+            uuid,
+            "MODULE",
+            Some(Duration::from_secs(*processor::MODULE_TIMEOUT_SECS)),
+        )
+        .await
+        .with_context(|| format!("Failed to run {:?}", script_path))?;
+    
+    // 4. 讀取並解析結果
+    let json_str = std::fs::read_to_string(output_path)
+        .context("Failed to read module output")?;
+    
+    let result: ModuleResult = serde_json::from_str(&json_str)
+        .context("Failed to parse module output")?;
+    
+    // 5. 記錄結果摘要
+    tracing::info!(
+        "[MODULE] Result: processed {} units",
+        result.data_units.len()
+    );
+    
+    Ok(result)
+}
+```
+
+#### 3. 配置管理
+```rust
+// 在 src/core/config.rs 中添加
+pub mod processor {
+    use super::*;
+    
+    pub static MODULE_TIMEOUT_SECS: Lazy<u64> = Lazy::new(|| {
+        env::var("MOMENTRY_MODULE_TIMEOUT")
+            .unwrap_or_else(|_| "3600".to_string())
+            .parse()
+            .unwrap_or(3600)
+    });
+    
+    pub static MODULE_CHUNK_SIZE: Lazy<u64> = Lazy::new(|| {
+        env::var("MOMENTRY_MODULE_CHUNK_SIZE")
+            .unwrap_or_else(|_| "300".to_string())
+            .parse()
+            .unwrap_or(300)
+    });
+}
+```
+
+#### 4. 測試規範
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+    
+    #[test]
+    fn test_result_serialization() {
+        // 測試序列化/反序列化
+    }
+    
+    #[test]
+    fn test_empty_result() {
+        // 測試邊界條件
+    }
+    
+    #[tokio::test]
+    async fn test_integration() {
+        // 集成測試（可選）
+    }
+}
+```
+
+## Python 腳本規範
+
+### 文件命名
+```
+scripts/
+├── module_processor.py      # 主要處理腳本
+├── module_utils.py          # 工具函數（可選）
+└── module_debug.py          # 調試腳本（可選）
+```
+
+### 腳本模板
+```python
+#!/opt/homebrew/bin/python3.11
+"""
+模組處理器 - 標準化模板
+
+功能：執行 [模組名稱] 處理
+輸入：視頻文件路徑，輸出文件路徑
+輸出：JSON 格式的處理結果
+"""
+
+import sys
+import json
+import os
+import argparse
+import signal
+import tempfile
+import time
+from pathlib import Path
+from typing import Dict, Any, List, Optional
+
+# 環境檢查
+def check_environment() -> bool:
+    """檢查必要的環境和依賴"""
+    try:
+        # 檢查必要庫
+        import required_library
+        return True
+    except ImportError as e:
+        print(f"ERROR: Missing dependency: {e}", file=sys.stderr)
+        return False
+
+# 信號處理
+def signal_handler(signum, frame):
+    """處理中斷信號"""
+    print(f"[MODULE] Received signal {signum}, cleaning up...")
+    sys.exit(1)
+
+# 主要處理類
+class ModuleProcessor:
+    def __init__(self, video_path: str, output_path: str):
+        self.video_path = video_path
+        self.output_path = output_path
+        self.start_time = time.time()
+        
+    def validate_input(self) -> bool:
+        """驗證輸入文件"""
+        if not os.path.exists(self.video_path):
+            print(f"ERROR: Video file not found: {self.video_path}", file=sys.stderr)
+            return False
+        return True
+    
+    def process(self) -> Dict[str, Any]:
+        """執行處理邏輯"""
+        try:
+            # 1. 準備工作目錄
+            work_dir = tempfile.mkdtemp(prefix="module_")
+            
+            # 2. 執行核心處理邏輯
+            result = self._core_processing(work_dir)
+            
+            # 3. 添加元數據
+            result["metadata"] = {
+                "processing_time": time.time() - self.start_time,
+                "video_path": self.video_path,
+                "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
+                "module_version": "1.0.0"
+            }
+            
+            return result
+            
+        except Exception as e:
+            print(f"ERROR: Processing failed: {e}", file=sys.stderr)
+            raise
+    
+    def _core_processing(self, work_dir: str) -> Dict[str, Any]:
+        """核心處理邏輯（模組特定）"""
+        # 模組特定實現
+        return {
+            "data_units": [],
+            "summary": {}
+        }
+    
+    def save_result(self, result: Dict[str, Any]):
+        """保存結果到文件"""
+        with open(self.output_path, 'w', encoding='utf-8') as f:
+            json.dump(result, f, ensure_ascii=False, indent=2)
+        
+        print(f"[MODULE] Result saved to: {self.output_path}")
+
+# 命令行接口
+def main():
+    parser = argparse.ArgumentParser(description="模組處理器")
+    parser.add_argument("video_path", help="輸入視頻文件路徑")
+    parser.add_argument("output_path", help="輸出 JSON 文件路徑")
+    
+    args = parser.parse_args()
+    
+    # 設置信號處理
+    signal.signal(signal.SIGINT, signal_handler)
+    signal.signal(signal.SIGTERM, signal_handler)
+    
+    # 環境檢查
+    if not check_environment():
+        sys.exit(1)
+    
+    # 執行處理
+    processor = ModuleProcessor(args.video_path, args.output_path)
+    
+    if not processor.validate_input():
+        sys.exit(1)
+    
+    try:
+        result = processor.process()
+        processor.save_result(result)
+        print(f"[MODULE] Processing completed successfully")
+        
+    except Exception as e:
+        print(f"ERROR: {e}", file=sys.stderr)
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
+```
+
+### 輸出格式規範
+```json
+{
+  "data_units": [
+    {
+      "id": "unit_1",
+      "start": 0.0,
+      "end": 2.5,
+      "frame": 0,
+      "data": {},
+      "confidence": 0.95
+    }
+  ],
+  "summary": {
+    "total_units": 1,
+    "processing_time": 4.7,
+    "average_confidence": 0.95
+  },
+  "metadata": {
+    "video_path": "/path/to/video.mp4",
+    "module": "module_name",
+    "version": "1.0.0",
+    "timestamp": "2026-03-27 10:30:00"
+  }
+}
+```
+
+## 配置標準化
+
+### 環境變量
+```
+# 超時設置
+MOMENTRY_ASR_TIMEOUT=3600
+MOMENTRY_OCR_TIMEOUT=7200
+MOMENTRY_YOLO_TIMEOUT=7200
+MOMENTRY_FACE_TIMEOUT=3600
+MOMENTRY_POSE_TIMEOUT=3600
+MOMENTRY_CUT_TIMEOUT=3600
+MOMENTRY_ASRX_TIMEOUT=3600
+MOMENTRY_CAPTION_TIMEOUT=1800
+MOMENTRY_STORY_TIMEOUT=1800
+
+# 性能設置
+MOMENTRY_MODULE_CHUNK_SIZE=300
+MOMENTRY_MODULE_BATCH_SIZE=32
+MOMENTRY_MODULE_CACHE_ENABLED=true
+
+# 模型設置
+MOMENTRY_MODULE_MODEL=base
+MOMENTRY_MODULE_DEVICE=cpu
+```
+
+### 配置優先級
+1. 命令行參數（最高優先級）
+2. 環境變量
+3. 配置文件
+4. 默認值（最低優先級）
+
+## 錯誤處理規範
+
+### Rust 錯誤處理
+```rust
+use anyhow::{Context, Result};
+
+pub async fn process_module(...) -> Result<ModuleResult> {
+    // 使用 .context() 添加上下文
+    executor.run(...)
+        .await
+        .with_context(|| format!("Failed to run module script"))?;
+    
+    // 使用 anyhow::bail! 進行錯誤返回
+    if !condition {
+        anyhow::bail!("Condition not met: {}", reason);
+    }
+}
+```
+
+### Python 錯誤處理
+```python
+def process(self) -> Dict[str, Any]:
+    try:
+        # 主要邏輯
+        result = self._core_processing()
+        return result
+    except FileNotFoundError as e:
+        print(f"ERROR: File not found: {e}", file=sys.stderr)
+        raise
+    except RuntimeError as e:
+        print(f"ERROR: Runtime error: {e}", file=sys.stderr)
+        raise
+    except Exception as e:
+        print(f"ERROR: Unexpected error: {e}", file=sys.stderr)
+        raise
+```
+
+### 錯誤分類
+1. **輸入錯誤**: 文件不存在、格式不支持、權限問題
+2. **配置錯誤**: 缺少依賴、環境變量錯誤、模型文件缺失
+3. **運行時錯誤**: 內存不足、超時、模型推理錯誤
+4. **輸出錯誤**: 結果解析失敗、文件寫入失敗
+
+## 日誌規範
+
+### Rust 日誌
+```rust
+tracing::info!("[MODULE] Starting processing: {}", video_path);
+tracing::debug!("[MODULE] Processing details: {:?}", details);
+tracing::warn!("[MODULE] Warning: {}", warning_message);
+tracing::error!("[MODULE] Error: {}", error_message);
+```
+
+### Python 日誌
+```python
+import sys
+
+def log_info(message: str):
+    print(f"[MODULE] INFO: {message}", file=sys.stderr)
+
+def log_debug(message: str):
+    if os.environ.get("MODULE_DEBUG") == "1":
+        print(f"[MODULE] DEBUG: {message}", file=sys.stderr)
+
+def log_error(message: str):
+    print(f"[MODULE] ERROR: {message}", file=sys.stderr)
+```
+
+## 性能監控
+
+### 指標收集
+```rust
+pub struct ProcessingMetrics {
+    pub start_time: std::time::Instant,
+    pub end_time: Option<std::time::Instant>,
+    pub memory_usage_mb: f64,
+    pub cpu_usage_percent: f64,
+    pub items_processed: u64,
+    pub items_per_second: f64,
+}
+
+impl ProcessingMetrics {
+    pub fn new() -> Self {
+        Self {
+            start_time: std::time::Instant::now(),
+            end_time: None,
+            memory_usage_mb: 0.0,
+            cpu_usage_percent: 0.0,
+            items_processed: 0,
+            items_per_second: 0.0,
+        }
+    }
+    
+    pub fn record_completion(&mut self, items_processed: u64) {
+        self.end_time = Some(std::time::Instant::now());
+        self.items_processed = items_processed;
+        
+        let duration = self.end_time.unwrap().duration_since(self.start_time);
+        self.items_per_second = items_processed as f64 / duration.as_secs_f64();
+    }
+}
+```
+
+### 性能報告
+```json
+{
+  "performance": {
+    "processing_time_seconds": 4.7,
+    "memory_usage_mb": 512.5,
+    "cpu_usage_percent": 45.2,
+    "items_processed": 8,
+    "items_per_second": 1.7,
+    "throughput_mb_per_second": 10.5
+  }
+}
+```
+
+## 測試規範
+
+### 單元測試
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+    
+    #[test]
+    fn test_result_structure() {
+        // 測試數據結構
+    }
+    
+    #[test]
+    fn test_serialization() {
+        // 測試序列化
+    }
+    
+    #[test]
+    fn test_edge_cases() {
+        // 測試邊界條件
+    }
+}
+```
+
+### 集成測試
+```rust
+#[tokio::test]
+async fn test_module_integration() {
+    // 使用測試文件進行集成測試
+    let test_video = "test_data/sample.mp4";
+    let output_file = tempfile::NamedTempFile::new().unwrap();
+    
+    let result = process_module(test_video, output_file.path().to_str().unwrap(), None)
+        .await
+        .expect("Processing should succeed");
+    
+    assert!(!result.data_units.is_empty());
+}
+```
+
+### Python 測試
+```python
+def test_module_processor():
+    """測試 Python 處理器"""
+    processor = ModuleProcessor("test.mp4", "output.json")
+    
+    # 測試輸入驗證
+    assert not processor.validate_input()  # 文件不存在
+    
+    # 測試處理邏輯
+    with tempfile.NamedTemporaryFile() as tmp:
+        processor = ModuleProcessor("real_test.mp4", tmp.name)
+        result = processor.process()
+        assert "data_units" in result
+        assert "metadata" in result
+```
+
+## 文檔規範
+
+### Rust 文檔
+```rust
+/// ASR 處理器模組
+/// 
+/// 提供自動語音識別功能，支持多種語言和大文件處理。
+/// 
+/// # 示例
+/// ```
+/// use momentry_core::processor::asr;
+/// 
+/// let result = asr::process_asr("video.mp4", "output.json", None).await?;
+/// println!("識別到 {} 個語音片段", result.segments.len());
+/// ```
+pub mod asr {
+    // ...
+}
+```
+
+### Python 文檔
+```python
+"""
+模組處理器
+
+提供 [功能描述] 功能。
+
+使用示例：
+    python module_processor.py input.mp4 output.json
+
+參數：
+    video_path: 輸入視頻文件路徑
+    output_path: 輸出 JSON 文件路徑
+
+輸出格式：
+    詳見輸出格式規範部分。
+"""
+```
+
+## 遷移指南
+
+### 現有模組標準化步驟
+1. **分析現有代碼**: 識別不符合規範的部分
+2. **創建備份**: 備份原始文件
+3. **重構 Rust 模組**: 按照模板重構
+4. **重構 Python 腳本**: 按照模板重構
+5. **更新配置**: 統一配置管理
+6. **添加測試**: 補充單元和集成測試
+7. **更新文檔**: 更新 API 文檔和使用說明
+8. **驗證功能**: 確保功能正常
+
+### 兼容性保證
+- 保持現有 API 不變
+- 逐步遷移，不中斷現有功能
+- 提供遷移工具和文檔
+
+## 附錄
+
+### A. 模組分類
+
+| 模組 | 功能 | 主要技術 | 輸出類型 |
+|------|------|----------|----------|
+| ASR | 語音識別 | Whisper | 時間段文本 |
+| OCR | 文字識別 | EasyOCR | 幀級文字 |
+| YOLO | 物體檢測 | YOLOv8 | 幀級物體 |
+| Face | 人臉檢測 | OpenCV | 幀級人臉 |
+| Pose | 姿態檢測 | OpenPose | 幀級姿態 |
+| CUT | 場景切割 | PySceneDetect | 場景邊界 |
+| ASRX | 語音增強 | WhisperX | 說話人分離 |
+| Caption | 字幕生成 | BLIP | 幀級描述 |
+| Story | 故事分析 | 自定義 | 故事結構 |
+
+### B. 性能基準
+
+| 模組 | 平均處理時間 | 內存使用 | CPU 使用 |
+|------|--------------|----------|----------|
+| ASR | 4.7s (小文件) | 1.2GB | 45% |
+| OCR | 12.3s (小文件) | 800MB | 35% |
+| YOLO | 8.5s (小文件) | 1.5GB | 60% |
+| Face | 3.2s (小文件) | 500MB | 25% |
+
+### C. 常見問題
+
+1. **依賴問題**: 確保 Python 環境正確設置
+2. **內存不足**: 調整 chunk_size 參數
+3. **超時錯誤**: 增加 timeout 設置或優化算法
+4. **模型加載慢**: 啟用模型緩存
+
+---
+
+*版本: 1.0.0*
+*更新日期: 2026-03-27*
+*負責人: Warren (Technical Lead)*
+*狀態: 草案*
--- a/docs_v1.0/DESIGN/MOMENTRY_RAG_PRESENTATION.md
+++ b/docs_v1.0/DESIGN/MOMENTRY_RAG_PRESENTATION.md
@@ -0,0 +1,353 @@
+---
+document_type: "reference_doc"
+service: "MOMENTRY_CORE"
+title: "Momentry Core 影片 RAG 系統說明稿"
+date: "2026-03-22"
+version: "V1.0"
+status: "active"
+owner: "Warren"
+created_by: "OpenCode"
+tags:
+  - "momentry"
+  - "core"
+  - "系統說明稿"
+ai_query_hints:
+  - "查詢 Momentry Core 影片 RAG 系統說明稿 的內容"
+  - "Momentry Core 影片 RAG 系統說明稿 的主要目的是什麼？"
+  - "如何操作或實施 Momentry Core 影片 RAG 系統說明稿？"
+---
+
+# Momentry Core 影片 RAG 系統說明稿
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | Warren |
+| 建立時間 | 2026-03-22 |
+| 文件版本 | V1.1 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-03-22 | 創建文件 | Warren | OpenCode / MiniMax M2.5 |
+| V1.1 | 2026-03-25 | 更新API回應格式 (media_url→file_path) 與認證標頭 | OpenCode | deepseek-reasoner |
+
+---
+
+## 系統架構
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                        使用者                                │
+│                    (marcom 團隊)                            │
+└─────────────────┬───────────────────────────────────────────┘
+                  │
+                  ▼
+┌─────────────────────────────────────────────────────────────┐
+│                    WordPress 入口                            │
+│              (wp.momentry.ddns.net)                        │
+└─────────────────┬───────────────────────────────────────────┘
+                  │
+                  ▼
+┌─────────────────────────────────────────────────────────────┐
+│                      n8n 自動化                             │
+│               (localhost:5678)                             │
+│                                                               │
+│   [Webhook] → [HTTP Request] → [處理結果] → [回覆用戶]       │
+└─────────────────┬───────────────────────────────────────────┘
+                  │
+                  ▼
+┌─────────────────────────────────────────────────────────────┐
+│                  Momentry Core API                          │
+│                (localhost:3002)                             │
+│                                                               │
+│   POST /api/v1/search     → 語意搜尋                         │
+│   POST /api/v1/n8n/search → n8n 專用格式                     │
+│   GET  /api/v1/videos    → 影片列表                         │
+└─────────────────┬───────────────────────────────────────────┘
+                  │
+        ┌─────────┴──────────┐
+        ▼                    ▼
+┌───────────────┐    ┌───────────────┐
+│  PostgreSQL   │    │    Qdrant     │
+│  (chunks)     │    │  (vectors)    │
+└───────────────┘    └───────────────┘
+```
+
+---
+
+## 資料流程
+
+```
+1. 上傳影片 → SFTPGo
+2. 影片註冊 → PostgreSQL
+3. ASR 處理 → 產生字幕區塊
+4. 儲存 chunks → PostgreSQL
+5. 向量化 → Qdrant
+6. 搜尋查詢 → API
+7. 回傳結果 → n8n → 用戶
+```
+
+---
+
+## 示範影片
+
+| 項目 | 內容 |
+|------|------|
+| 檔案名稱 | Old_Time_Movie_Show_-_Charade_1963.HD.mov |
+| UUID | a1b10138a6bbb0cd |
+| 時長 | 6879 秒（約 1.9 小時） |
+| 區塊數 | 3,886 個 |
+| 向量數 | 3,688 個 |
+
+---
+
+## API 端點
+
+### 1. 語意搜尋
+
+```
+POST http://localhost:3002/api/v1/search
+```
+
+**請求：**
+```json
+{
+  "query": "charade",
+  "limit": 5,
+  "uuid": "a1b10138a6bbb0cd"
+}
+```
+
+> **注意**: 
+> 1. **API 認證**: 所有 `/api/v1/*` 端點需要 `X-API-Key` 標頭
+> 2. **檔案路徑轉換**: API 現在返回 `file_path`（檔案系統路徑），需要轉換為可訪問的 URL（例如透過 SFTPGo 分享連結）
+
+---
+
+### 2. n8n 專用格式
+
+```
+POST http://localhost:3002/api/v1/n8n/search
+```
+
+**請求：**
+```json
+{
+  "query": "charade",
+  "limit": 5
+}
+```
+
+**回應：**
+```json
+{
+  "query": "charade",
+  "count": 5,
+  "hits": [
+    {
+      "id": "sentence_0006",
+      "vid": "a1b10138a6bbb0cd",
+      "start": 48.8,
+      "end": 55.44,
+      "title": "Chunk sentence_0006",
+      "text": "fun plot twists...",
+      "score": 0.526,
+      "file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/video.mp4"
+    }
+  ]
+}
+```
+
+---
+
+## 實作範例
+
+### n8n Workflow 設計
+
+```
+┌─────────────┐
+│ Webhook     │  ← 接收用戶搜尋請求
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐
+│ HTTP Request│  → POST /api/v1/n8n/search
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐
+│ Code        │  → 處理回傳結果
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐
+│ Telegram    │  → 回覆給用戶
+│ (或 LINE)   │
+└─────────────┘
+```
+
+---
+
+## Step-by-Step n8n Workflow
+
+### Step 1: 建立 Webhook
+
+1. n8n 開新 Workflow
+2. 新增 node: **Webhook**
+3. 設定 path: `video-search`
+4. 複製 Webhook URL
+
+---
+
+### Step 2: 設定 HTTP Request
+
+1. 新增 node: **HTTP Request**
+2. 設定：
+   ```
+   Method: POST
+   URL: http://localhost:3002/api/v1/n8n/search
+   Body Content Type: JSON
+   Headers: X-API-Key (需設定)
+   ```
+
+3. Body:
+```json
+{
+  "query": "={{ $json.body }}",
+  "limit": 5
+}
+```
+
+---
+
+### Step 3: 處理結果 (Code)
+
+```javascript
+const hits = $input.first().json.hits;
+
+if (!hits || hits.length === 0) {
+  return {
+    json: { message: "找不到相關結果" }
+  };
+}
+
+const results = hits.map((hit, index) => ({
+  number: index + 1,
+  text: hit.text,
+  time: `${hit.start}s - ${hit.end}s`,
+  score: Math.round(hit.score * 100) + "%",
+  // 注意: API 現在返回 file_path（檔案系統路徑），需要轉換為可訪問的 URL
+  url: hit.file_path + "#t=" + hit.start + "," + hit.end // 需實作檔案路徑轉換為 URL
+}));
+
+return { json: { results } };
+```
+
+> **注意**: 
+> 1. **API 認證**: 所有 `/api/v1/*` 端點需要 `X-API-Key` 標頭
+> 2. **檔案路徑轉換**: API 現在返回 `file_path`（檔案系統路徑），需要轉換為可訪問的 URL（例如透過 SFTPGo 分享連結）
+
+---
+
+### Step 4: 格式化輸出
+
+**Telegram 格式：**
+```
+🎬 搜尋結果: "{{ $json.query }}"
+
+1️⃣ "fun plot twists, Woody Dialog and charming performances..."
+   ⏱ 48.8s - 55.4s
+   📊 相關度: 53%
+
+2️⃣ "Don't you like me to say that a pretty girl..."
+   ⏱ 4745.6s - 4748.6s
+   📊 相關度: 52%
+```
+
+---
+
+## 測試指令
+
+### curl 測試
+
+```bash
+# 語意搜尋
+curl -X POST http://localhost:3002/api/v1/search \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: YOUR_API_KEY" \
+  -d '{"query": "charade", "limit": 3}'
+
+# n8n 格式
+curl -X POST http://localhost:3002/api/v1/n8n/search \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: YOUR_API_KEY" \
+  -d '{"query": "charade", "limit": 3}'
+
+# 影片列表
+curl -H "X-API-Key: YOUR_API_KEY" http://localhost:3002/api/v1/videos
+
+# 特定影片區塊
+curl -H "X-API-Key: YOUR_API_KEY" http://localhost:3002/api/v1/videos/a1b10138a6bbb0cd/chunks
+```
+
+---
+
+## 實際搜尋範例
+
+| 搜尋詞 | 結果摘要 |
+|--------|----------|
+| `charade` | "fun plot twists, Woody Dialog and charming performances..." |
+| `woody` | "Well, you thick skull hair, brain half-witted..." |
+| `classic movie` | "Hello and welcome to the old-time movie show..." |
+| `charming` | "fun plot twists, Woody Dialog and charming performances..." |
+
+---
+
+## 資料庫狀態
+
+| 資料庫 | 資料筆數 | 狀態 |
+|--------|----------|------|
+| PostgreSQL (videos) | 4 | ✅ |
+| PostgreSQL (chunks) | 3,950 | ✅ |
+| PostgreSQL (vectors) | 1,870 | ✅ |
+| Qdrant (vectors) | 3,688 | ✅ |
+| Redis (job cache) | 4 keys | ✅ |
+
+---
+
+## 下一步
+
+1. **建立 SFTPGo 分享連結**
+   - 開啟 http://localhost:8080
+   - 登入 demo / demopassword123
+   - 建立影片分享連結
+
+2. **測試 n8n Workflow**
+   - 匯入 Postman Collection
+   - 建立 Webhook
+   - 測試搜尋
+
+3. **整合到 WordPress**
+   - 建立表單接收用戶輸入
+   - 呼叫 n8n Webhook
+   - 顯示搜尋結果
+
+---
+
+## 快速開始
+
+```bash
+# 1. 測試搜尋 API
+curl -X POST http://localhost:3002/api/v1/search \
+  -H "Content-Type: application/json" \
+  -d '{"query": "charade", "limit": 3}'
+
+# 2. 查看影片列表
+curl http://localhost:3002/api/v1/videos
+
+# 3. 查看 n8n 是否運行
+curl http://localhost:5678
+```
--- a/docs_v1.0/DESIGN/NON_HUMAN_SOUND_DETECTION.md
+++ b/docs_v1.0/DESIGN/NON_HUMAN_SOUND_DETECTION.md
@@ -0,0 +1,94 @@
+# Non-Human Sound Detection — Tool Selection Report
+
+**Date:** 2026-05-10
+**Movie:** Charade (1963), 113 min
+**Audio:** 16kHz mono WAV
+**Goal:** Detect non-human sound events (gunshots, impacts, doors, music, etc.)
+
+## Tested Approaches
+
+### Approach A: AST AudioSet (HuggingFace)
+
+| Item | Detail |
+|------|--------|
+| Model | `MIT/ast-finetuned-audioset-10-10-0.4593` |
+| Method | Audio Spectrogram Transformer, fine-tuned on AudioSet-2M (527 classes) |
+| Dependencies | `transformers`, `torch` ✅ (no torchcodec needed) |
+| Load time | ~1s on M5 |
+| Inference time | ~0.5s per 3-second clip (805k params, float32) |
+| Accuracy | Good — correctly distinguishes speech vs. door vs. music |
+
+**Test results on Charade:**
+
+| Time | Energy-based said | AST AudioSet said | Verdict |
+|------|------------------|-------------------|---------|
+| 0:10 | — | Environmental noise (26%) | Background noise, plausible |
+| 10:32 | Gunshot candidate (43x) | **Speech (76%)** | ✅ AST correct |
+| 57:00 | Gunshot candidate (49x) | **Door (62%) + Slam (5%)** | ✅ AST correct |
+| 65:13 | Gunshot candidate (50x) | **Speech (58%)** | ✅ AST correct |
+| 85:12 | Gunshot candidate (39x) | **Speech (68%)** | ✅ AST correct |
+
+**Conclusion**: Energy-based impulse detection has **100% false positive rate** for gunshot detection. AST AudioSet correctly classifies all candidates as non-gunshot.
+
+### Approach B: Custom Energy + Spectral Features
+
+| Item | Detail |
+|------|--------|
+| Method | RMS energy + spectral centroid + sub-band energy ratios |
+| Speed | ~3s for full 113-min movie (every 10th window) |
+| Accuracy | Poor — cannot distinguish gunshot from speech, door, music |
+| Result | 1 "gunshot_candidate" from 453 test windows; all false positives on verification |
+
+**Conclusion**: Useful as a **coarse pre-filter** (Stage 1), not as a standalone classifier.
+
+## Two-Stage Design
+
+```
+Stage 1 (Energy filter, ~1 min):
+  Full audio → sliding window RMS + centroid → ~200 candidate windows
+                    |
+                    v
+Stage 2 (AST classifier, ~2 min):
+  Extract 3-sec audio for each candidate → AST AudioSet classification
+                    |
+                    v
+  Non-speech events: gunshot, explosion, door slam, music, etc.
+```
+
+Estimated processing: ~3 min for full movie (vs. 75 min for full AST scan)
+
+## Key AudioSet Classes Relevant to Charade
+
+| Class | AudioSet ID | Relevance |
+|-------|-------------|-----------|
+| Gunshot, gunfire | 402 | **Primary target** |
+| Explosion | 400 | Hand grenade in plot |
+| Door slams | 404 | Scenes at hotel, apartment |
+| Music | 130-133 | Background score |
+| Speech | 0-3 | Already handled by ASR |
+| Vehicle | 100-110 | Car sounds in Paris chase |
+| Glass break | 424 | Window breaking scene |
+
+## Actor-voice gender mismatches (resolved by fine-grained ASRX)
+
+During the speaker mapping work, 20 segments where the old face→TMDb assignment said "Audrey Hepburn" but the new ASRX voice embedding clearly said "MALE". These segments were verified via video clips and confirmed to be scenes where:
+
+1. A male speaker (Cary Grant or other) is speaking while Audrey Hepburn's face is on screen
+2. The old pipeline incorrectly assigned the speaker name based on face identity
+3. The fine-grained sliding window approach correctly resolves these
+
+The 20 segments were from SPEAKER_5 (10 segs) and SPEAKER_9 (10 segs), both of which mapped to MALE voice clusters. These were re-assigned to "Cary Grant" or "Unknown" as appropriate.
+
+## Recommendations
+
+| Approach | Speed | Accuracy | Best for |
+|----------|-------|----------|----------|
+| Energy pre-filter | ✅ 1 min | ❌ Low | Stage 1: candidate selection |
+| AST AudioSet | ⚠️ 2 min | ✅ High | Stage 2: event classification |
+| Full AST scan | ❌ 75 min | ✅ High | N/A — two-stage is better |
+
+**Design**: Two-stage pipeline: energy pre-filter → AST classifier
+**Implementation path**:
+1. Write `scripts/non_human_sound_detector.py` with the two-stage design
+2. Output `{uuid}.sound_events.json` with typed events
+3. Integrate into the sound_event_detector framework
--- a/docs_v1.0/DESIGN/PROCESSOR_MECHANISMS_REVIEW.md
+++ b/docs_v1.0/DESIGN/PROCESSOR_MECHANISMS_REVIEW.md
@@ -0,0 +1,134 @@
+# Processor 產出機制檢討
+
+## 三層機制定義
+
+### 1. 中斷接續（Interruption Resume）
+Process 被殺掉後，重啟時能接續進度。
+**現狀**: 大部分 processor 有 `.tmp` → `.partial` 保護，但重跑時從頭開始。
+
+### 2. 補充機制（Supplement）
+完成度不足時，只補沒做完的部分，不重跑整個。
+**現狀**: 全部從頭跑，無補充。
+
+### 3. 糾錯機制（Error Correction）
+輸出檔損毀時能自動偵測並修復。
+**現狀**: file-existence check 只檢查檔案存在，不檢查內容是否有效。
+
+---
+
+## Processor 逐一檢討
+
+### ASR
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅ `.tmp` → `.partial`（executor） | ✅ OK |
+| 補充機制 | ❌ 每次從頭跑 | 若跑到 50% 被殺，下次從 0% 開始 |
+| 糾錯機制 | ❌ 不驗證內容 | file-existence check 看到 `.json` 存在就跳過，不管內容 |
+| Pipe | ✅ executor.run() | ✅ |
+| Timeout | ✅ 已移除（None） | ✅ |
+
+**改善方案**:
+- 補充：ASR 重跑時掃描 existing `.json` 或 `.partial`，找出最後 segment 的 `end_time`，傳入 `--resume-from` 給 Python script
+- 糾錯：file-existence check 對 `.json` 做 `serde_json::from_str` 驗證，無效 → 視為不存在
+
+### ASRX
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ❌ **不用 executor**，直接寫 `.json` | 被殺掉時留下壞檔 |
+| 補充機制 | ❌ 同 ASR | 依賴 ASR，ASR 不完整 ASRX 也不能跑 |
+| 糾錯機制 | ❌ 不驗證內容 | 同上 |
+| Pipe | ❌ **raw Command**，沒有 `.tmp` 保護 | 緊急 |
+| Timeout | ⚠️ 7200s hardcode | 應改為 None（同 ASR） |
+
+**改善方案**:
+- **最優先**: 改為使用 `executor.run()`，獲得 `.tmp` 保護
+- 其他同 ASR
+
+### YOLO
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅  executor `.tmp` | ✅ |
+| 補充機制 | ❌ 從頭跑 | 若跑到 frame 100,000 被殺，下次從 frame 0 |
+| 糾錯機制 | ❌ 不驗證內容 | yolo.json 之前就是壞的但 file check 跳過 |
+
+**改善方案**:
+- 補充：掃描 `.partial` 的最後 frame，傳入 `--resume-frame` 給 Python script
+- 糾錯：file-existence check 對 `.json` 做 JSON parse 驗證
+
+### FACE / POSE / OCR
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅ executor `.tmp` | ✅ |
+| 補充機制 | ❌ 從頭跑 | 同 YOLO |
+| 糾錯機制 | ❌ 不驗證內容 | 同 YOLO |
+
+**改善方案**: 同 YOLO
+
+### CUT
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅ executor `.tmp` | ✅ |
+| 補充機制 | ✅ register 階段已完成，直接載入 | ✅ |
+| 糾錯機制 | ❌ 不驗證內容 | 同 YOLO |
+
+**改善方案**: 糾錯即可
+
+### SCENE
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅ **最完整**：檢查 `.err`/`.json`/`.tmp` 三種狀態 | ✅ |
+| 補充機制 | ❌ 從頭跑 | ✅（scene 很快） |
+| 糾錯機制 | ⚠️ 有檢查 `.err` | ✅ |
+
+### VISUAL_CHUNK
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅ executor `.tmp` | ✅ |
+| 補充機制 | ❌ | ❌ |
+| 糾錯機制 | ❌ **錯誤被吞掉**（回傳空結果） | 應回報 error 而非靜默失敗 |
+
+**改善方案**: 不要吞錯誤，讓 error 往上傳
+
+### STORY
+| 面向 | 現狀 | 問題 |
+|------|------|------|
+| 中斷接續 | ✅ executor `.tmp` | ✅ |
+| 補充機制 | ❌ | ❌ |
+| 糾錯機制 | ❌ | ❌ |
+
+---
+
+## 優先級
+
+### P0 — 立即修復
+
+1. **ASRX 改用 executor.run()**
+   - 檔案：`src/core/processor/asrx.rs`
+   - 獲得 `.tmp` 保護、SIGKILL process group、`.partial` 保留
+   - 移除 hardcode timeout
+
+### P1 — 糾錯機制
+
+2. **File-existence check 加入 JSON 驗證**
+   - 檔案：`src/worker/job_worker.rs`
+   - 在 `output_path.exists()` 之後，對 `.json` 做 `serde_json::from_str::<Value>`
+   - 若 parse 失敗 → 不 skip，當作檔案不存在繼續跑
+   - 若 parse 成功但內容空（無 segments/frames）→ 當不完整
+
+### P2 — 補充機制
+
+3. **ASR resume-from 補充**
+   - 檔案：`src/core/processor/asr.rs` + `scripts/asr_processor.py`
+   - Rust 端發現 `.partial` 存在，讀取最後 segment 的 end_time
+   - 傳入 `--resume-from {time}` 給 Python script
+   - Python script 跳過 `--resume-from` 之前的音訊
+
+4. **YOLO/Face/Pose resume-frame 補充**
+   - 檔案：各 processor.rs + 對應 Python script
+   - 掃描 `.partial` 中的最後 frame_number
+   - 傳入 `--resume-frame {frame}` 給 Python script
+
+### P3 — 其他
+
+5. **VisualChunk 不吞錯誤**
+6. **Executor SIGTERM → SIGKILL 兩段式關閉**
--- a/docs_v1.0/DESIGN/RELEASE_PHASES.md
+++ b/docs_v1.0/DESIGN/RELEASE_PHASES.md
@@ -0,0 +1,240 @@
+# Momentry Model — 分階段交付
+
+## 核心架構
+
+```
+Pipeline (training)
+  │  每個 processor 產出 .json
+  │  Rule 1/3 Ingestion → chunks + embeddings
+  ▼
+momentry model for {video}      ← 每部影片 = 一個 model
+  │  release/phase1/latest/
+  │  release/phase2/latest/
+  ▼
+momentry core (inference engine)  ← Rust API server
+  │  momentry_playground (dev)
+  │  momentry (production)
+  ▼
+Search / Query / Identity APIs
+```
+
+- **Pipeline** = training phase：影片 → processor output → chunks → embeddings
+- **Model** = 每部影片的產出 package（output_json + chunks + vectors）
+- **Engine** = momentry core，吃 model 提供 API（search, trace, identity）
+
+每個影片可有多個 model 版本，命名保留升級空間：
+
+| Model 版本 | Qdrant Collection | 內容 | 觸發時機 |
+|-----------|------------------|------|---------|
+| `{uuid}_v1` | `momentry_dev_v1` | sentence chunk embedding（base） | ASR + ASRX + Rule 1 完成 |
+| `{uuid}_v2` | `momentry_dev_v2` | 完整 pipeline + 5W1H | 全部完成 |
+| `{uuid}_v3` | `momentry_dev_v3` | object identity + custom detector | v2 + object instance matching 完成 |
+
+各版本共存不覆蓋。
+
+## 階段劃分
+
+### Phase 1：Sentence Chunk Embedding（base model）
+
+**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
+
+**交付內容**:
+- `{uuid}.asr.json`
+- `{uuid}.asrx.json`
+- chunks（chunk_type = 'sentence'）
+- chunk_vectors（sentence embedding）
+
+**用途**: 終端使用者可進行語意搜尋
+
+### Phase 2：完整 Pipeline（v2 model）
+
+**觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
+
+**交付內容**:
+- Phase 1 全部內容
+- 所有 `{uuid}.*.json`（cut, yolo, face, pose, ocr, ...）
+- chunks（chunk_type = 'cut', 'visual', 'trace', 'story'）
+- chunk_vectors（summary embedding）
+- identities / identity_bindings / face_detections
+
+**用途**: 完整搜尋 + 摘要 + 人物識別
+
+---
+
+## Worker Pipeline
+
+```
+ASR 完成 → ASRX 完成
+  ↓
+Rule 1 Ingestion (sentence chunks)
+  ↓
+vectorize_chunks (sentence embedding)
+  ↓
+📦 Phase 1 release  ───→ release/phase1/latest/  (base model)
+  ↓
+其他 processors 繼續 (yolo, face, pose, ocr, ...)
+  ↓
+Rule 3 Ingestion + 5W1H Agent
+  ↓
+📦 Phase 2 release  ───→ release/phase2/latest/  (full model)
+```
+
+## 產出目錄結構
+
+```
+release/
+├── phase1/
+│   ├── {version}_{timestamp}/
+│   │   ├── output_json/      ← 所有已完成的 .json
+│   │   ├── chunks.csv        ← sentence chunks
+│   │   ├── vectors.csv       ← sentence embeddings
+│   │   ├── schema.sql        ← chunks table DDL
+│   │   └── RELEASE_INFO.txt
+│   └── latest → {version}_{timestamp}
+│
+└── phase2/
+    ├── {version}_{timestamp}/
+    │   ├── output_json/      ← 所有 .json
+    │   ├── chunks.csv        ← 所有 chunks
+    │   ├── vectors.csv       ← 所有 embeddings
+    │   ├── identities.csv    ← 人物身分
+    │   ├── schema.sql        ← 完整 schema
+    │   └── RELEASE_INFO.txt
+    └── latest → {version}_{timestamp}
+```
+
+## momentry model vs momentry core
+
+| | momentry model | momentry core |
+|---|---|---|
+| 類比 | 訓練好的 weights | inference engine |
+| 內容 | `.json` + chunks + vectors | Rust binary |
+| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
+| 版本 | `{uuid}_v1`（base） / `{uuid}_v2` / `{uuid}_v3` | `momentry_playground` / `momentry` |
+| 交付對象 | 終端使用者 | 部署工程師 |
+
+---
+
+## Wiki 機制：每個 model 都可被調整
+
+每個 momentry model（`{uuid}_v1` / `v2` / `v3`）不只是唯讀的產出，而是可透過 wiki 機制持續改善。
+
+### 與傳統 RAG 的區別
+
+| | 傳統 RAG | momentry wiki |
+|---|---|---|
+| 知識儲存 | vector DB（ephemeral） | model package（permanent） |
+| 修正方式 | query 時 LLM 決定是否採用 | 使用者/Agent 直接編輯 |
+| 修正持久性 | ❌ 下次 query 就消失 | ✅ 寫入 model，版本化保存 |
+| 模型改進 | 無（僅改變 prompt） | 下次 version bump 時合併為 ground truth |
+| 協作方式 | 單向（retrieve → generate） | 雙向（編輯 → 合併 → 改進） |
+| 離線可用 | ❌ 需 vector DB + LLM | ✅ 離線查閱 wiki 目錄 |
+
+**momentry wiki 不是 RAG 的替代品，而是 model 的生命週期管理機制。**
+
+### 概念
+
+```
+momentry model (release package)
+  ├── output_json/          ← 唯讀，processor 產出
+  ├── chunks.csv            ← 唯讀，ingestion 產出
+  ├── vectors.csv           ← 唯讀，embedding 產出
+  └── wiki/                 ← 可編輯，使用者貢獻知識
+        ├── identities.json   ← "trace 5 = Audrey Hepburn"
+        ├── objects.json      ← "object 42 = 郵票 #1"
+        ├── corrections.json  ← "ASR 'Hello' → 'Halo'"
+        └── changelog.json    ← 編輯歷史
+```
+
+### 資料流向
+
+```
+使用者/Agent 編輯 wiki
+        ↓
+  DB wiki_entries + wiki_revisions 寫入
+        ↓
+  下次 release 打包時 merge 進 model
+        ↓
+  TKG label 更新 (tkg_nodes.label)
+        ↓
+  新版 model version bump
+```
+
+### 與 TKG 的關係
+
+wiki 的 identity 和 object 標註會回寫到 TKG node label：
+```
+(face_trace:5) label="Audrey Hepburn"     ← wiki 編輯
+(object_instance:42) label="郵票 #1"       ← wiki 編輯
+```
+
+這些編輯累積後，可做為下一版 model training 的 ground truth。
+
+### 實作方向
+
+**DB 層** — 新 table `wiki_entries` + `wiki_revisions`：
+```sql
+wiki_entries (target_type, target_id, title, body, summary, status, version, file_uuid)
+wiki_revisions (entry_id, version, title, body, summary, change_summary, edited_by)
+```
+
+**API 層** — CRUD + 版本歷史：
+```
+GET    /api/v1/wiki/{target_type}/{target_id}
+PUT    /api/v1/wiki/{target_type}/{target_id}
+GET    /api/v1/wiki/{target_type}/{target_id}/revisions
+POST   /api/v1/wiki/search
+```
+
+**打包層** — `release_pack.py` 加入 wiki 匯出，與 model 共存
+
+---
+
+## Phase 3：Object Identity（v3 model）
+
+### 目標
+
+從影片中提取關鍵物體（郵票、手槍、信封、放大鏡...），對同類物體做 instance-level 的跨畫面追蹤與辨識，達到類似 face trace 的效果 — 不只是 detect class，還能區分「這一張郵票」vs「那一張郵票」。
+
+### 現狀問題
+
+1. **COCO 80 類不包含關鍵物體** — 郵票、手槍、信封、放大鏡等不在 COCO 資料集中
+2. **YOLOv5nano 偵測率低** — 即使是 COCO 類別（knife, cell phone）在 nano 模型上 recall 不足
+3. **無 object instance matching** — 目前只有 frame-level detection，沒有跨 frame 的物體追蹤
+
+### 技術方向
+
+```
+YOLOv8m/OWL-ViT → 改善 detection coverage
+        ↓
+ Object Tracker (IoU + embedding，類似 face tracker)
+        ↓
+ object_trace → TKG CO_OCCURS_WITH edges
+        ↓
+ object identity → 同物體跨場景辨識
+```
+
+| 方向 | 方法 | 效果 |
+|------|------|------|
+| Model upgrade | `yolov5nu` → `yolov8s.pt` / `yolov8m.pt` | COCO recall 提升 |
+| Custom fine-tune | 收集 stamps/guns 資料 fine-tune YOLO | 可偵測非 COCO 物件 |
+| Zero-shot | OWL-ViT / Grounding DINO by text prompt | 不用 training，但速度慢 |
+| Object trace | IoU + embedding 跨 frame 匹配 | instance-level 追蹤 |
+| Object identity | clustering 跨場景辨識同一物體 | 可在全片搜尋「這把槍」 |
+
+### 與 TKG 整合
+
+```
+face_trace -[:CO_OCCURS_WITH]-> object_instance:5  (這把槍)
+face_trace -[:CO_OCCURS_WITH]-> object_instance:42 (這張郵票)
+
+查詢: "Audrey Hepburn 拿這把槍的畫面"
+→ face_trace:5 -[:SPEAKS_AS]-> SPEAKER_0
+→ face_trace:5 -[:CO_OCCURS_WITH]-> object_instance:5
+```
+
+### 交付順序
+
+1. YOLO model upgrade（低難度，立即見效）
+2. Object tracker（中難度，參考 face tracker 實作）
+3. Custom fine-tune / zero-shot（高難度，需資料或新模型）
--- a/docs_v1.0/DESIGN/TMDb_Identity_File_System_V1.0.md
+++ b/docs_v1.0/DESIGN/TMDb_Identity_File_System_V1.0.md
@@ -0,0 +1,361 @@
+---
+document_type: "design"
+service: "MOMENTRY_CORE"
+title: "TMDb 整合 — Identity 檔案系統設計"
+date: "2026-05-16"
+version: "V1.0"
+status: "completed"
+owner: "M5"
+created_by: "OpenCode"
+tags:
+  - "tmdb"
+  - "identity"
+  - "cache"
+  - "file-system"
+  - "resource"
+  - "design"
+ai_query_hints:
+  - "查詢 TMDb Identity 檔案系統設計的內容"
+  - "TMDb 整合的三個階段是什麼"
+  - "如何從 cache 建立 TMDb identities"
+  - "identity 檔案化目錄結構"
+  - "TMDb resource API endpoint 列表"
+  - "TMDb face matching 整合位置"
+related_documents:
+  - "REFERENCE/Face_Pipeline.md"
+  - "REFERENCE/Trace_Structure.md"
+  - "REFERENCE/Demo_EndToEnd.md"
+  - "REFERENCE/Services_Inventory.md"
+---
+
+# TMDb 整合 — Identity 檔案系統設計 V1.0
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | OpenCode |
+| 建立時間 | 2026-05-16 |
+| 文件版本 | V1.0 |
+| 狀態 | Completed |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-05-16 | 三階段 TMDb 整合設計：Identity 檔案化、Agent Cache、Resource 納管 | OpenCode | DeepSeek V4 Flash |
+
+---
+
+## Overview
+
+三個計劃循序實作，建立 Identity 的 filesystem 副本與 TMDb 外部資源整合：
+
+1. **Plan 1: Identity 檔案化** — 每個 identity 在 `{OUTPUT}/identities/{uuid}/identity.json` 有完整備份
+2. **Plan 2: TMDb Agent + Cache** — 唯一外連點，fetch TMDb API → cache 到 `{uuid}.tmdb.json`
+3. **Plan 3: TMDb 納管** — resource endpoint + health 整合
+
+### 設計原則
+
+- **全本地為預設**：TMDb 是唯一需要外連的服務，視為 optional plugin
+- **Cache-first**：TMDb API 只 call 一次，之後全從 local cache 讀
+- **Dual-write**：DB + filesystem 保持一致
+- **filesystem 為 canonical snapshot**：DB 是 primary store，filesystem 是可攜離線副本
+
+---
+
+## Plan 1: Identity 檔案化
+
+### 目的
+
+為每個 identity 建立 filesystem snapshot，使 identity 資料：
+- **可搬移**：`cp -r identities/` 到另一台機器即可
+- **可檢查**：`cat {uuid}/identity.json` 直接看完整 identity 資料
+- **可備份**：tar identities/ 即為 identity 完整備份
+- **可離線**：不需要 DB 也能取得 identity 基本資訊
+
+### 目錄結構
+
+```
+{OUTPUT_DIR}/
+├── identities/
+│   ├── _index.json                             ← { uuid: name } 索引
+│   ├── a9a901056d6b46ff92da0c3c1a57dff4/
+│   │   └── identity.json                       ← V1: 完整 identity 資訊
+│   └── b0b101167e8c4a53a0.../
+│       └── identity.json
+└── {file_uuid}.tmdb.json                       ← V2: TMDb raw cache
+```
+
+### identity.json 格式
+
+```json
+{
+  "version": 1,
+  "identity_uuid": "a9a901056d6b46ff92da0c3c1a57dff4",
+  "name": "Cary Grant",
+  "identity_type": "people",
+  "source": "tmdb",
+  "status": "confirmed",
+  "tmdb_id": 112,
+  "tmdb_profile": "https://image.tmdb.org/t/p/w185/abc.jpg",
+  "metadata": {
+    "tmdb_character": "Peter Joshua",
+    "tmdb_cast_order": 0,
+    "tmdb_movie_id": 4808
+  },
+  "file_bindings": [
+    {
+      "file_uuid": "3a6c1865...",
+      "trace_ids": [10, 23],
+      "face_count": 12
+    }
+  ],
+  "created_at": "2026-05-16T12:00:00Z",
+  "updated_at": "2026-05-16T12:30:00Z"
+}
+```
+
+### _index.json 格式
+
+```json
+{
+  "version": 1,
+  "updated_at": "2026-05-16T12:00:00Z",
+  "entries": {
+    "a9a901056d6b46ff92da0c3c1a57dff4": "Cary Grant",
+    "b0b101167e8c4a53a09d6c2a68e0abf1": "Audrey Hepburn"
+  }
+}
+```
+
+### 寫入策略：Dual-write
+
+任何 identity 變更 → DB write → `save_identity_file()` → filesystem write
+
+```
+identity 變更發生處:
+├── TMDb probe (probe.rs)         → create_identities_from_data() → save_identity_file() per identity
+├── Face matching API (identity_agent_api.rs) → match_faces_iterative() → save_identity_file() per matched identity
+├── Face matching Worker P2.5 (job_worker.rs) → match_faces_against_tmdb() → save_identity_file() per affected identity
+├── Manual bind/unbind (identity_binding.rs) → bind/unbind handler → save_identity_file() per identity
+└── One-time migration (migrate_identity_files.py) → 全部 identities 檔案化
+```
+
+### API: `storage.rs`
+
+```rust
+// structs
+IdentityFile { version, identity_uuid, name, identity_type, source, status,
+               tmdb_id, tmdb_profile, metadata, file_bindings, created_at, updated_at }
+FileBinding { file_uuid, trace_ids, face_count }
+
+// core functions
+identity_dir(uuid: &str) -> PathBuf
+read_identity_file(uuid: &str) -> Result<IdentityFile>
+write_identity_file(file: &IdentityFile) -> Result<()>
+list_identity_uuids() -> Result<Vec<String>>
+count_identity_files() -> usize
+
+// index
+read_index() -> Result<HashMap<String, String>>
+update_index(uuid: &str, name: &str) -> Result<()>
+
+// dual-write hook
+async fn save_identity_file(db: &PostgresDb, uuid: &str) -> Result<()>
+  // 1. 查 DB 取得 identity full data
+  // 2. 查 DB 取得 file_bindings
+  // 3. 寫 identity.json
+  // 4. 更新 _index.json
+```
+
+### 改動清單
+
+| # | 檔案 | 屬性 | 內容 |
+|---|------|------|------|
+| 1.1 | `src/core/identity/storage.rs` | NEW | IdentityFile struct + CRUD + index + save_identity_file() |
+| 1.2 | `src/core/identity/mod.rs` | NEW | module declaration |
+| 1.3 | `src/core/mod.rs` | EDIT | `pub mod identity;` |
+| 1.4 | `src/core/db/postgres_db.rs` | EDIT | `get_identity_file_bindings(uuid)` helper |
+| 1.5 | `src/core/tmdb/probe.rs` | EDIT | hook: save_identity_file() |
+| 1.6 | `src/api/identity_binding.rs` | EDIT | hook: bind/unbind |
+| 1.7 | `src/api/identity_agent_api.rs` | EDIT | hook: match_faces_iterative |
+| 1.8 | `src/worker/job_worker.rs` | EDIT | hook: P2.5 matching |
+| 1.9 | `src/api/server.rs` | EDIT | health/detailed: identities section |
+| 1.10 | `scripts/migrate_identity_files.py` | NEW | one-time migration DB→filesystem |
+
+---
+
+## Plan 2: TMDb Agent + Cache
+
+### 目的
+
+將 TMDb 設定為「唯一外連點 + local cache」，實作全離線 identity enrichment。
+
+### 目錄結構
+
+```
+{OUTPUT_DIR}/
+├── {file_uuid}.tmdb.json       ← TMDb raw cache (file-level)
+├── identities/{uuid}/
+│   └── identity.json            ← Processed identity (identity-level)
+```
+
+### Cache 格式 (`{uuid}.tmdb.json`)
+
+```json
+{
+  "file_uuid": "3a6c1865...",
+  "fetched_at": "2026-05-16T12:00:00Z",
+  "source": "agent",
+  "movie": {
+    "tmdb_id": 4808,
+    "title": "Charade",
+    "release_date": "1963-12-05",
+    "overview": "After Regina Lampert...",
+    "poster_path": "/8wvQp...jpg"
+  },
+  "cast": [
+    {
+      "name": "Cary Grant",
+      "character": "Peter Joshua",
+      "profile_path": "/abc123.jpg",
+      "order": 0
+    }
+  ],
+  "cast_count": 20,
+  "identities_created": 0
+}
+```
+
+### 流程
+
+```
+Step 1: POST /agents/tmdb/prefetch
+  → tmdb_agent.py (唯一外連) → TMDB API search → credits
+  → 寫入 {uuid}.tmdb.json  (source: agent)
+
+Step 2: POST /file/:uuid/tmdb-probe
+  → probe_from_cache() 讀 {uuid}.tmdb.json
+  → INSERT identities (source='tmdb')
+  → spawn tmdb_embed_extractor.py (背景)
+  → save_identity_file() for each identity (Plan 1 hook)
+
+Step 3: POST /agents/identity/analyze (既存 endpoint)
+  → match_faces_iterative() 自動包含 TMDb identities
+```
+
+### probe.rs 重構
+
+```rust
+// 新增 (讀 cache)
+pub async fn probe_from_cache(db, file_uuid) -> Result<TmdbProbeResult> {
+    let cache = cache::read_tmdb_cache(file_uuid)?;
+    create_identities_from_data(db, file_uuid, &cache.movie, &cache.cast).await
+}
+
+// 共用內部函數 (從 probe_movie 抽離)
+async fn create_identities_from_data(db, file_uuid, movie, cast) -> Result<TmdbProbeResult> {
+    // 原本 probe_movie 的 INSERT + embed spawn + store logic
+    // 尾端呼叫 save_identity_file() per identity
+}
+
+// 保留 (direct API call, 後備)
+pub async fn probe_movie(db, filename, file_uuid) -> Result<...> {
+    let movie_name = extract_movie_name(filename)?;
+    // search TMDB API → credits
+    // 可選擇性寫入 cache 供下次使用
+    create_identities_from_data(db, file_uuid, &movie, &cast).await
+}
+```
+
+### 改動清單
+
+| # | 檔案 | 屬性 | 內容 |
+|---|------|------|------|
+| 2.1 | `src/core/tmdb/cache.rs` | NEW | TmdbCache struct + read/write |
+| 2.2 | `src/core/tmdb/mod.rs` | EDIT | `pub mod cache;` `pub mod status;` |
+| 2.3 | `src/core/tmdb/probe.rs` | EDIT | refactor: probe_from_cache() + create_identities_from_data() |
+| 2.4 | `scripts/tmdb_agent.py` | NEW | fetch TMDB API → cache tmdb.json |
+| 2.5 | `src/api/tmdb_api.rs` | NEW | 5 routes + 5 handlers |
+| 2.6 | `src/api/server.rs` | EDIT | `.merge(tmdb_routes())` |
+
+---
+
+## Plan 3: TMDb 納管
+
+### 目的
+
+將 TMDb 以 managed resource 形式納入系統監控與管理。
+
+### health/detailed 擴充
+
+```json
+{
+  "integrations": {
+    "tmdb": {
+      "api_key_configured": true,
+      "enabled": true,
+      "api_reachable": true,
+      "api_latency_ms": 120,
+      "api_error": null,
+      "last_check_at": "2026-05-16T12:00:00Z"
+    }
+  },
+  "identities": {
+    "directory_exists": true,
+    "files_count": 3481,
+    "index_ok": true,
+    "db_count": 3481,
+    "synced": true
+  }
+}
+```
+
+### API
+
+| Method | Path | 說明 |
+|--------|------|------|
+| `GET` | `/api/v1/resource/tmdb` | TMDb 完整狀態 + stats + cache count |
+| `POST` | `/api/v1/resource/tmdb/check` | ping TMDb API → 更新健康狀態 |
+
+### 改動清單
+
+| # | 檔案 | 屬性 | 內容 |
+|---|------|------|------|
+| 3.1 | `src/core/tmdb/status.rs` | NEW | check_tmdb_api(), count_tmdb_identities(), count_cache_files() |
+| 3.2 | `src/api/tmdb_api.rs` | EDIT | GET/POST resource endpoints |
+| 3.3 | `src/api/server.rs` | EDIT | integrations in health/detailed |
+
+---
+
+## 完整 API 表 (Plan 2 + 3)
+
+| Method | Path | Handler | Plan | Description |
+|--------|------|---------|------|-------------|
+| `POST` | `/api/v1/agents/tmdb/prefetch` | `prefetch_tmdb` | 2 | agent fetch TMDB → cache |
+| `POST` | `/api/v1/file/:file_uuid/tmdb-probe` | `tmdb_probe` | 2 | cache → identities |
+| `GET` | `/api/v1/file/:file_uuid/tmdb-cache` | `tmdb_cache_view` | 2 | view raw cache |
+| `GET` | `/api/v1/resource/tmdb` | `tmdb_resource_status` | 3 | full TMDb status |
+| `POST` | `/api/v1/resource/tmdb/check` | `tmdb_resource_check` | 3 | ping health check |
+
+## Migration
+
+一次性腳本：`scripts/migrate_identity_files.py`
+
+```bash
+python3 scripts/migrate_identity_files.py
+# → 讀 DB identities table → 寫 identity files → 建 index
+```
+
+---
+
+## 執行順序
+
+```
+Plan 1 (identity 檔案化)  →  Plan 2 (TMDb agent)  →  Plan 3 (TMDb 納管)
+    1.1 → 1.2 → 1.3 →         2.1 → 2.2 → 2.3 →       3.1 → 3.2 → 3.3
+    1.4 → 1.5 → 1.6 →         2.4 → 2.5 → 2.6
+    1.7 → 1.8 → 1.9 →
+    1.10
+```
--- a/docs_v1.0/DESIGN/TRACE_SEARCH_API_DESIGN.md
+++ b/docs_v1.0/DESIGN/TRACE_SEARCH_API_DESIGN.md
@@ -0,0 +1,101 @@
+# Trace Search API 設計
+
+## 概念
+
+trace 是一種 chunk。
+
+現有的 chunk_type: `cut`, `sentence`, `visual`, `story`
+新增 chunk_type: `trace`
+
+每個 trace（人物跨 frame 追蹤軌跡）就是一個時間區間 + 區間內的 ASR text。
+跟其他 chunk 完全一樣，只是切分維度不同：
+- cut chunk = 鏡頭切換
+- sentence chunk = 語句邊界
+- visual chunk = 畫面物體組合
+- **trace chunk = 人物出現區間 + 當下 spoken text**
+
+這樣 trace 可以直接放進現有的 `chunks` 表，共用 embedding、搜尋、Qdrant sync 整套機制，不需要任何新 table。
+
+## chunks 表現有結構
+
+```sql
+chunks (
+    id, file_uuid, chunk_type,                   -- 'trace' 新增
+    start_frame, end_frame, start_time, end_time,
+    text_content,                                 -- trace 區間的 ASR text
+    embedding,                                    -- text_content 的 pgvector
+    metadata JSONB,                               -- { trace_id, face_count, identity_id, identity_name }
+    ...
+)
+```
+
+## 資料產生流程（worker 擴充）
+
+在 face processing +  `store_traced_faces.py` 完成後：
+
+1. 查詢 `face_detections` 聚合每個 trace 的 `MIN(frame)`, `MAX(frame)`, `COUNT(*)`
+2. 對每個 trace，查詢 `pre_chunks WHERE processor_type='asr'` 中與 trace time range 重疊的 text
+3. 彙整 text → EmbeddingGemma 產生 `embedding`
+4. 寫入 `chunks`（`chunk_type='trace'`），metadata 含 `trace_id`, `face_count`, `identity_id`
+5. embedding 自動進 Qdrant（與既有 chunk 同一 collection）
+
+## Search API 擴充
+
+Universal Search 的 `types` 原本就支援 `"chunk"`。
+在 chunk 搜尋中過濾 `chunk_type = 'trace'` 即可。
+
+**Request**：
+```json
+{
+  "query": "open the door",
+  "types": ["chunk"],
+  "filters": { "chunk_type": "trace" },
+  "uuid": "aeed71342a899fe4b4c57b7d41bcb692",
+  "page": 1,
+  "page_size": 20
+}
+```
+
+**Response**（與既有 Chunk result 相同）：
+```json
+{
+  "type": "chunk",
+  "chunk_id": "chunk_42",
+  "chunk_type": "trace",
+  "start_frame": 45200, "end_frame": 45900,
+  "start_time": 1808.0, "end_time": 1836.0,
+  "score": 0.87,
+  "text": "Open the door. Come on, hurry up.",
+  "metadata": {
+    "trace_id": 5,
+    "face_count": 42,
+    "identity_name": "Audrey Hepburn"
+  }
+}
+```
+
+完全沿用既有的 `SearchResult::Chunk` variant，不用新增 enum variant。
+
+### 搜尋語法
+
+```sql
+SELECT c.*
+FROM dev.chunks c
+WHERE c.file_uuid = $1
+  AND c.chunk_type = 'trace'
+  AND c.embedding IS NOT NULL
+ORDER BY c.embedding <=> $2
+LIMIT $3;
+```
+
+## 總結
+
+| 項目 | 作法 |
+|------|------|
+| 新 table | ❌ 不需要 |
+| 新 enum variant | ❌ 不需要 |
+| SearchResult 改動 | ❌ 不需要 |
+| chunk_type 新增 | ✅ `'trace'` |
+| worker 擴充 | ✅ 產生 trace chunk (face done 後) |
+| SearchFilters 擴充 | ✅ 加 `chunk_type` filter |
+| Qdrant | ✅ 自動（既有 chunk collection） |
--- a/docs_v1.0/DESIGN/VIDEO_PROCESSING_SPEC.md
+++ b/docs_v1.0/DESIGN/VIDEO_PROCESSING_SPEC.md
--- a/docs_v1.0/DESIGN/VIDEO_REGISTRATION.md
+++ b/docs_v1.0/DESIGN/VIDEO_REGISTRATION.md
@@ -0,0 +1,264 @@
+---
+document_type: "reference_doc"
+service: "MOMENTRY_CORE"
+title: "Video Registration"
+date: "2026-03-25"
+version: "V1.0"
+status: "active"
+owner: "Warren"
+created_by: "OpenCode"
+tags:
+  - "video"
+  - "registration"
+ai_query_hints:
+  - "查詢 Video Registration 的內容"
+  - "Video Registration 的主要目的是什麼？"
+  - "如何操作或實施 Video Registration？"
+---
+
+# Video Registration
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | Warren |
+| 建立時間 | 2026-03-25 |
+| 文件版本 | V1.1 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-03-25 | 創建文件 | Warren | OpenCode |
+| V1.1 | 2026-03-26 | 修正 curl 範例，新增 API Key 驗證標頭 | OpenCode | deepseek-reasoner |
+
+---
+
+## 概述
+
+影片註冊 API (`POST /api/v1/register`) 用於將影片加入 Momentry Core 系統進行處理。
+
+## 路徑格式
+
+### 支援的路徑格式
+
+| 格式 | 範例 | 說明 |
+|------|------|------|
+| 相對路徑 | `./demo/video.mp4` | 推薦格式 |
+| 相對路徑（無 ./） | `demo/video.mp4` | 自動加上 `./` |
+| 絕對路徑 | `/Users/.../sftpgo/data/demo/video.mp4` | 支援但不推薦 |
+
+### 路徑結構
+
+```
+./username/filepath
+│    │       │
+│    │       └── 檔案路徑（可以是多層目錄）
+│    └── 使用者名稱（SFTPgo 用戶目錄名稱）
+└── 相對路徑前綴
+```
+
+**範例**：
+- `./demo/video.mp4` → username=`demo`, filepath=`video.mp4`
+- `./demo/movies/2024/video.mp4` → username=`demo`, filepath=`movies/2024/video.mp4`
+- `./warren/project1/interview.mp4` → username=`warren`, filepath=`project1/interview.mp4`
+
+## UUID 計算
+
+### 計算規則
+
+```
+UUID = SHA256(username/filepath)[0:16]
+```
+
+**範例**：
+```rust
+// 路徑: ./demo/video.mp4
+// username: "demo"
+// filepath: "video.mp4"
+// key: "demo/video.mp4"
+// UUID: SHA256("demo/video.mp4")[0:16]
+```
+
+### 特性
+
+| 特性 | 說明 |
+|------|------|
+| 用戶隔離 | 不同用戶的相同檔名會產生不同 UUID |
+| 一致性 | 相同相對路徑一定產生相同 UUID |
+| 遷移安全 | SFTPgo 資料路徑變更後 UUID 保持一致 |
+
+### 範例
+
+```rust
+// 用戶 demo 的影片
+compute_uuid_from_relative_path("./demo/video.mp4")
+// → "9760d0820f0cf9a7"
+
+// 用戶 warren 的相同檔名影片
+compute_uuid_from_relative_path("./warren/video.mp4")
+// → "a1b2c3d4e5f6g7h8" (不同的 UUID)
+```
+
+## 重複註冊檢查
+
+### 行為
+
+1. 系統檢查 UUID 是否已存在於資料庫
+2. 如果存在，返回 `already_exists: true` 和現有影片資訊
+3. 如果不存在，創建新的影片記錄
+
+### API 回應
+
+**新註冊**：
+```json
+{
+  "uuid": "9760d0820f0cf9a7",
+  "video_id": 18,
+  "job_id": 2,
+  "file_name": "video.mp4",
+  "duration": 159.637188,
+  "width": 640,
+  "height": 360,
+  "already_exists": false
+}
+```
+
+**重複註冊**：
+```json
+{
+  "uuid": "9760d0820f0cf9a7",
+  "video_id": 18,
+  "job_id": 2,
+  "file_name": "video.mp4",
+  "duration": 159.637188,
+  "width": 640,
+  "height": 360,
+  "already_exists": true
+}
+```
+
+## SFTPgo 整合
+
+### 目錄結構
+
+SFTPgo 的用戶目錄結構：
+
+```
+/Users/accusys/momentry/var/sftpgo/data/
+├── demo/                    ← 用戶目錄
+│   ├── video.mp4
+│   └── movies/
+│       └── movie1.mp4
+├── warren/                  ← 用戶目錄
+│   └── project1/
+│       └── interview.mp4
+└── momentry/                ← 用戶目錄
+    └── presentation.mp4
+```
+
+### 註冊流程
+
+1. SFTPgo 用戶上傳檔案到各自的目錄
+2. n8n 或其他服務調用註冊 API
+3. 使用相對路徑格式：`./username/filepath`
+4. 系統計算 UUID 並檢查重複
+5. 創建處理任務
+
+## 程式碼範例
+
+### 註冊影片
+
+```bash
+# 使用相對路徑註冊
+curl -X POST http://localhost:3002/api/v1/register \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: YOUR_API_KEY" \
+  -d '{"path": "./demo/video.mp4"}'
+
+# 或使用多層目錄
+curl -X POST http://localhost:3002/api/v1/register \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: YOUR_API_KEY" \
+  -d '{"path": "./demo/movies/2024/video.mp4"}'
+```
+
+### UUID 計算函數
+
+```rust
+// 使用相對路徑計算 UUID
+pub fn compute_uuid_from_relative_path(relative_path: &str) -> String {
+    let (username, filepath) = extract_user_from_relative_path(relative_path);
+    compute_uuid(&username, &filepath)
+}
+
+// 從相對路徑提取用戶名和檔案路徑
+pub fn extract_user_from_relative_path(relative_path: &str) -> (String, String) {
+    let path = relative_path.strip_prefix("./").unwrap_or(relative_path);
+    let path_buf = PathBuf::from(path);
+    
+    let mut components = path_buf.components();
+    let username = components
+        .next()
+        .map(|c| c.as_os_str().to_string_lossy().to_string())
+        .unwrap_or_default();
+    
+    let filepath: String = components
+        .map(|c| c.as_os_str().to_string_lossy().to_string())
+        .collect::<Vec<_>>()
+        .join("/");
+    
+    (username, filepath)
+}
+```
+
+## 相關 API
+
+### Probe API（僅探測，不註冊）
+
+如果只需要取得影片資訊而不註冊，可以使用 Probe API：
+
+```bash
+curl -X POST http://localhost:3002/api/v1/probe \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: YOUR_API_KEY" \
+  -d '{"path": "./demo/video.mp4"}'
+```
+
+**回應範例**：
+```json
+{
+  "uuid": "a1b10138a6bbb0cd",
+  "file_name": "video.mp4",
+  "duration": 120.5,
+  "width": 1920,
+  "height": 1080,
+  "fps": 30.0,
+  "cached": false,
+  "format": {...},
+  "streams": [...]
+}
+```
+
+**與 Register API 的差異**：
+
+| 功能 | Probe API | Register API |
+|------|-----------|---------------|
+| 計算 UUID | ✓ | ✓ |
+| 執行 ffprobe | ✓ | ✓ |
+| 儲存 probe.json | ✓ | ✓ |
+| 寫入 videos 表 | ✗ | ✓ |
+| 建立 monitor_job | ✗ | ✓ |
+| 返回 job_id | ✗ | ✓ |
+| 適用場景 | 預覽影片資訊 | 註冊並處理影片 |
+
+## 相關檔案
+
+| 檔案 | 說明 |
+|------|------|
+| `src/core/storage/uuid.rs` | UUID 計算邏輯 |
+| `src/api/server.rs` | 註冊與 Probe API 實現 |
+| `src/core/probe/ffprobe.rs` | ffprobe 整合 |
+| `docs_v1.0/IMPLEMENTATION/SFTPGO_DEMO_USER.md` | SFTPgo 用戶設置 |
+| `docs_v1.0/REFERENCE/API_ENDPOINTS.md` | API 端點總覽 |
--- a/docs_v1.0/DESIGN/VISION_AGENT_API.md
+++ b/docs_v1.0/DESIGN/VISION_AGENT_API.md
@@ -0,0 +1,201 @@
+# Momentry Eye API Reference
+
+**Vision Agent** — Multi-model zero-shot object detection service.
+Port: `5052` | Resource IDs: `eye-gdino`, `eye-paligemma`
+
+---
+
+## Models
+
+| Model | ID | Params | Size | Confidence | Speed | License |
+|-------|-----|--------|------|------------|-------|---------|
+| Grounding DINO | `grounding-dino` | 232M | 891MB | ✅ 0-1 score | ~340ms | Apache 2.0 |
+| PaliGemma 3B | `paligemma` | 2,923M | ~3GB | ❌ no score | ~80ms | Gemma license |
+
+## Endpoints
+
+### `GET /health`
+
+System status and loaded models.
+
+```bash
+curl localhost:5052/health
+```
+
+Response:
+```json
+{
+  "status": "ok",
+  "models_loaded": ["grounding-dino"],
+  "models_available": ["grounding-dino", "paligemma"],
+  "device": "mps",
+  "port": 5052
+}
+```
+
+### `GET /models`
+
+List available models with specs.
+
+```bash
+curl localhost:5052/models
+```
+
+### `POST /detect`
+
+Detect objects in a single video frame.
+
+```bash
+curl localhost:5052/detect \
+  -H "Content-Type: application/json" \
+  -d '{"time":5461, "prompt":"gun", "model":"grounding-dino"}'
+```
+
+**Parameters:**
+
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `uuid` | string | `aeed71342a...` | Video file UUID |
+| `time` | float | `0` | Timestamp in seconds |
+| `prompt` | string | `"gun"` | Object to detect |
+| `model` | string | `"grounding-dino"` | Model: `grounding-dino`, `paligemma`, or `fusion` |
+| `threshold` | float | `0.1` | Minimum confidence (GDINO only) |
+| `weights` | object | — | Fusion weights, e.g. `{"grounding-dino":0.6,"paligemma":0.4}` |
+
+**Fusion mode** runs both models and combines results with weighted scoring. Default weights: GDINO 0.6, PaliGemma 0.4.
+
+```bash
+# Fusion: run both models, combine results
+curl localhost:5052/detect \
+  -d '{"time":206, "prompt":"water gun", "model":"fusion"}'
+
+# Custom fusion weights
+curl localhost:5052/detect \
+  -d '{"time":206, "prompt":"gun", "model":"fusion",
+       "weights":{"grounding-dino":0.5,"paligemma":0.5}}'
+```
+
+**Response:**
+
+```json
+{
+  "model": "grounding-dino",
+  "detections": [
+    {"bbox": [726.2, 567.4, 969.0, 694.6], "score": 0.476, "label": "gun"},
+    {"bbox": [686.7, 567.0, 969.6, 918.3], "score": 0.262, "label": "gun"}
+  ],
+  "time_ms": 345.2,
+  "n_detections": 2,
+  "shot_url": "/shots/aeed7134_5461s_gun_grounding-dino.jpg"
+}
+```
+
+**Fusion response** also includes `per_model` (detections per model) and `fusion` (deduplicated combined list with `fused_score`).
+
+### `POST /search`
+
+Search across a time range.
+
+```bash
+# Natural language query
+curl localhost:5052/search \
+  -d '{"query":"find the gun", "range":"5400-5600", "interval":10}'
+```
+
+**Parameters:**
+
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `query` | string | `"find the gun"` | Natural language query (parsed to extract object) |
+| `target` | string | — | `file_uuid:chunk_id` or `file_uuid:trace_id` — resolves to time range |
+| `range` | string | `"0-6780"` | Manual time range |
+| `interval` | int | `30` | Scan interval in seconds |
+| `model` | string | `"grounding-dino"` | Detection model |
+| `threshold` | float | `0.15` | Minimum confidence |
+
+**Target resolution:**
+
+| Format | Example | Resolves to |
+|--------|---------|-------------|
+| `file_uuid:chunk_id` | `uuid:uuid_story_90` | Chunk's time range |
+| `file_uuid:trace_id` | `uuid:trace_5` | Trace's time range |
+| `file_uuid:chunk_index` | `uuid:500` | Chunk index 500's range |
+
+```bash
+# Using target
+curl localhost:5052/search \
+  -d '{"target":"aeed71342...:aeed71342..._story_90", "query":"gun"}'
+
+# Using trace
+curl localhost:5052/search \
+  -d '{"target":"aeed71342...:trace_5", "query":"person"}'
+```
+
+### `POST /multimodal`
+
+Multi-modal search across sentence chunks — combines ASR text match + visual confirmation.
+
+```bash
+# Search for Jean-Louis: ASR match + GDINO child detection
+curl localhost:5052/multimodal \
+  -d '{"keyword":"Jean-Louis", "prompt":"child"}'
+
+# Search trace chunks visually (no ASR)
+curl localhost:5052/multimodal \
+  -d '{"keyword":"", "prompt":"person", "chunk_type":"trace", "range":"3500-4000"}'
+```
+
+**Parameters:**
+
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `keyword` | string | — | ASR keyword to search in sentence text |
+| `prompt` | string | same as keyword | Visual prompt for GDINO |
+| `chunk_type` | string | `"sentence"` | `sentence`, `trace`, `story`, `cut` |
+| `target` | string | — | Specific chunk target |
+| `range` | string | `"0-6780"` | Time range (for non-sentence chunks) |
+| `threshold` | float | `0.15` | Visual detection threshold |
+
+### `GET /shots/<filename>`
+
+Retrieve annotated detection images.
+
+```bash
+curl -o result.jpg localhost:5052/shots/aeed7134_5461s_gun_grounding-dino.jpg
+```
+
+## Object Detection Performance Summary
+
+| Object type | Size in frame | GDINO | PaliGemma | Best prompt |
+|-------------|--------------|-------|-----------|-------------|
+| Gun (realistic) | 15-30% | ✅ 0.36-0.67 | ✅ | `pistol` / `handgun` |
+| Water gun (toy) | 15-31% | ❌ 0 | ✅ | `water gun` (PaliGemma) |
+| Child (Jean-Louis) | 30-60% | ⚠️ 0.3-0.9 | ❌ | `child` (high FP on adults) |
+| Stamp | <5% | ❌ FP | ❌ | — |
+| Passport | <10% | ❌ FP | ❌ | — |
+| Magnifying glass | <5% | ❌ FP | ❌ | — |
+| Cup / Bottle | 5-15% | ✅ 0.3-0.5 | — | `cup` / `bottle` |
+| Cell phone | 5-10% | ✅ 0.3-0.5 | — | `cell phone` |
+
+## Resource Registration
+
+On startup, the agent auto-registers as resources in `dev.resources`:
+
+| Resource ID | Type | Status |
+|-------------|------|--------|
+| `eye-gdino` | `vision_model` | `online` |
+| `eye-paligemma` | `vision_model` | `online` |
+
+Heartbeat updates every 60 seconds. Discover via:
+
+```sql
+SELECT * FROM dev.resources WHERE resource_type = 'vision_model';
+```
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `scripts/vision_agent.py` | Vision Agent server (port 5052) |
+| `output_dev/vision_shots/` | Annotated detection screenshots |
+| `docs/ZERO_SHOT_DETECTION_RESEARCH.md` | Full model research report |
--- a/docs_v1.0/DESIGN/VISUALIZATION_TOOL_CHOICES_V1.0.0.md
+++ b/docs_v1.0/DESIGN/VISUALIZATION_TOOL_CHOICES_V1.0.0.md
@@ -0,0 +1,105 @@
+# 視覺呈現工具選型 v1.0.0
+
+Momentry 前端視覺化工具選擇記錄。
+
+## SVG（內建）
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | Trace 時間軸、泳道圖、長條圖、矩陣 |
+| 授權 | 瀏覽器內建，無授權問題 |
+| 適用 | V1 TraceThumbnailTimeline、V2 IdentitySwimlane、V3 DurationHistogram、V4 SimilarityMatrix |
+| 優點 | 零依賴、向量清晰、可互動 |
+| 缺點 | 大規模節點時效能下降 |
+
+## Three.js
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | 3D 臉部網格、3D 時空立方體 |
+| 授權 | **MIT** — 可商用，需保留版權聲明 |
+| 適用 | Face3DViewer（MediaPipe 468 landmarks）、V5 3D Space-Time Cube |
+| npm | `three` + `@types/three` |
+| 檔案 | `node_modules/three/LICENSE`（MIT） |
+| Bundle | 約 120KB gzip |
+| 優點 | WebGL 封裝完整、OrbitControls、社群龐大 |
+| 缺點 | 需手動管理 Dispose 避免記憶體洩漏 |
+
+## MediaPipe Face Mesh
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | 人臉 468 個 3D landmark 偵測 |
+| 授權 | **Apache 2.0** — 可商用 |
+| 適用 | Face3DViewer |
+| 部署 | `scripts/face_landmarks_server.py`（port 11437） |
+| 輸入 | 臉部裁切 JPEG |
+| 輸出 | 478 個 (x, y, z) 3D 座標 |
+| 優點 | 輕量即時、跨平台 |
+| 缺點 | 僅正面臉部、無紋理 |
+
+## Three.js Face3DViewer 記憶體管理
+
+```typescript
+// 正確的 Dispose 模式
+function disposeScene() {
+  cancelAnimationFrame(animId)
+  for (const obj of objects) {
+    scene?.remove(obj)
+    if (obj instanceof THREE.Mesh) {
+      obj.geometry?.dispose()
+      if (Array.isArray(obj.material)) obj.material.forEach(m => m.dispose())
+      else obj.material?.dispose()
+    }
+    if (obj instanceof THREE.Points) {
+      obj.geometry?.dispose()
+      if (obj.material) obj.material.dispose()
+    }
+  }
+  objects = []
+  controls?.dispose()
+  controls = null
+  if (renderer) { renderer.dispose(); renderer = null }
+  scene = null; camera = null
+}
+```
+
+## 技術選型對照
+
+| 視覺化 | 工具 | 授權 | Bundle | 狀態 |
+|--------|------|:----:|:-----:|:----:|
+| V0 Trace Grid | Vue + Tailwind | — | 0 KB | ✅ |
+| V1 Thumbnail Timeline | SVG | — | 0 KB | ✅ |
+| V2 Identity Swimlane | SVG | — | 0 KB | ✅ |
+| V3 Duration Histogram | SVG | — | 0 KB | ✅ |
+| V4 Similarity Matrix | SVG | — | 0 KB | ✅ |
+| 3D Face Mesh | Three.js | MIT | ~120 KB | ✅ |
+| V5 3D Space-Time Cube | Three.js | MIT | ~120 KB | 🔜 |
+| Heatmap (Canvas) | Canvas 2D | — | 0 KB | 🔜 |
+| Trace Video | ffmpeg | GPL | 獨立行程 | ✅ |
+| **文件渲染** | | | | |
+| API 文件 | **Markdown** | — | 0 KB | ✅ |
+| API 圖解 | **Mermaid** (flowchart, sequence, ER, mindmap) | MIT | ~50 KB (VS Code 插件) | ✅ |
+| CLI 閱讀 | **glow** (terminal MD renderer) | MIT | 獨立 binary | ✅ |
+
+## Markdown
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | 所有 API 文件、設計規格、測試報告 |
+| 授權 | 純文字格式，無授權問題 |
+| 工具 | VS Code 內建預覽、`glow` CLI |
+| 優點 | 版本控制友善（diff 可讀）、純文字、跨平台 |
+| 缺點 | 無動態互動能力 |
+
+## Mermaid
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | API 流程圖（sequence）、架構圖（flowchart）、資料模型（ER）、端點總覽（mindmap） |
+| 授權 | **MIT** — 可商用 |
+| VS Code 插件 | `Markdown Preview Mermaid Support` |
+| 支援圖表 | flowchart, sequence, class, state, ER, mindmap, pie, gantt |
+| 檔案 | `API_USAGE_GUIDE_V1.0.0.md`（含 6 張 Mermaid 圖表） |
+| 優點 | Markdown 內嵌、版本控制友善、免截圖 |
+| 缺點 |  VS Code/GitHub 以外需插件支援 |
--- a/docs_v1.0/DESIGN/VOICE_TECH_CHOICES_V1.0.0.md
+++ b/docs_v1.0/DESIGN/VOICE_TECH_CHOICES_V1.0.0.md
@@ -0,0 +1,114 @@
+# 語音互動技術選型 v1.0.0
+
+Momentry Demo Runner 語音技術選擇記錄。
+
+## 語音輸出（TTS）
+
+### macOS `say`（已採用）
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | 朗讀展示解說文字 |
+| 授權 | macOS 內建，無授權問題 |
+| 語言 | 支援 40+ 語言，含中文（Meijia）、英文（Samantha）、日文（Kyoko）等 |
+| 方式 | `subprocess.Popen(["say", "-v", "Meijia", "文字"])` |
+| 優點 | 零安裝、零依賴、低延遲、多語系 |
+| 缺點 | 僅 macOS、無法控制語速微調 |
+
+**結論**：最適合 Momentry 的 TTS 方案 — macOS 內建、免費、多語系支援完整。
+
+---
+
+## 語音輸入（Speech-to-Command）
+
+### 方案比較
+
+| 方案 | 本地/雲端 | 語言 | 模型大小 | 延遲 | 精準度 | 授權 |
+|------|:---------:|:----:|:--------:|:----:|:------:|:----:|
+| **Vosk**（已整合） | ✅ **本地** | 中+英 | 42MB | 即時 | 中高 | Apache 2.0 |
+| macOS NSSpeechRecognizer | ✅ 本地 | 多語 | 系統內建 | 即時 | 中 | macOS 內建 |
+| Google Speech Recognition | ☁️ 雲端 | 120+ 語言 | — | ~1s | 高 | 免費（有限額） |
+| Whisper (tiny) | ✅ 本地 | 100+ 語言 | ~150MB | ~2s | 高 | MIT |
+| Porcupine | ✅ 本地 | 關鍵字 | ~2MB | 即時 | 高（限關鍵字） | Apache 2.0 |
+
+### Vosk（已採用為本地方案）
+
+| 項目 | 內容 |
+|------|------|
+| 模型 | `vosk-model-small-cn-0.22`（42MB，中文） |
+| 語言 | 中文、英文（需下載對應模型） |
+| 方式 | Python `vosk` 套件直接呼叫 |
+| 優點 | 純本地、即時、中英皆可、模型小 |
+| 缺點 | 需下載模型（一次性）、嘈雜環境精準度下降 |
+| 語音 | 僅偵測指令關鍵字：next/stop/repeat/goto 等 |
+
+### Google Speech Recognition（備援方案）
+
+| 項目 | 內容 |
+|------|------|
+| 用途 | 當 Vosk 模型未安裝時自動降級使用 |
+| 方式 | Python `SpeechRecognition` + Google API |
+| 優點 | 免下載模型、精準度高、多語系 |
+| 缺點 | **需網路**、每次請求 ~1s 延遲、有使用配額限制 |
+
+### 整合策略
+
+```
+啟動 --voice-control
+    │
+    ├── Vosk 模型存在？ → 使用 Vosk（本地離線）
+    │
+    └── Vosk 不存在？  → 使用 Google（需網路）
+                            │
+                            └── 也失敗？ → 顯示「語音不可用」
+```
+
+---
+
+## Demo Runner 整合
+
+### 指令集（中英雙語）
+
+| 指令 | English | 功能 |
+|:----:|:-------:|------|
+| 下一個 / 繼續 | next / continue | 前進到下一步 |
+| 停止 | stop / quit | 結束當前展示 |
+| 重複 | repeat / again | 重複朗讀當前解說 |
+| 跳到第 N 步 | go to N / step N | 跳到指定步驟 |
+
+### 程式碼結構
+
+```python
+# 背景執行緒監聽語音
+def voice_command_listener(lang):
+    # 1. 嘗試 Vosk（本地）
+    # 2. 降級 Google Speech Recognition（雲端）
+    # 3. 將辨識結果放入佇列
+
+# 主迴圈輪詢佇列
+def main():
+    while demo_running:
+        cmd = check_voice_command()
+        if cmd == "next":   # 前進
+        if cmd == "stop":   # 停止
+        if cmd == "goto N": # 跳到第 N 步
+```
+
+### 啟動方式
+
+```bash
+# 本地語音辨識（Vosk，不需網路）
+python3 scripts/demo_runner.py --voice zh_TW --voice-control
+
+# 備援：若 Vosk 模型未安裝，自動使用 Google（需網路）
+```
+
+---
+
+## 相關檔案
+
+| 檔案 | 說明 |
+|------|------|
+| `scripts/demo_runner.py` | 語音輸出 + 輸入整合 |
+| `~/.cache/vosk/vosk-model-small-cn-0.22/` | Vosk 中文模型（42MB） |
+| `docs_v1.0/REFERENCE/DEMO_RUNNER_V1.0.0.md` | Demo Runner 使用文件 |
--- a/docs_v1.0/DESIGN/VOICE_TEST_RESULTS_V1.0.0.md
+++ b/docs_v1.0/DESIGN/VOICE_TEST_RESULTS_V1.0.0.md
@@ -0,0 +1,36 @@
+# 語音辨識測試記錄 v1.0.0
+
+## 環境
+
+- **機器**: Mac Mini M4
+- **輸入裝置**: Display Audio (HDMI loopback)
+- **模型**: Vosk small-en-us (40MB)
+
+## 測試結果
+
+| 測試 | 設定 | Max Level | Mean Level | Vosk 辨識 |
+|------|------|:---------:|:----------:|:----------:|
+| 原始音訊 48kHz | pyaudio direct | 3510 | 654 | ❌ 空 |
+| 降噪後 16kHz | highpass200+lowpass4000+afftdn | 1224 | 110 | ❌ 空 |
+| 增益 3x | numpy boost | ~10K | ~1800 | ❌ 空 |
+| ffmpeg recording | avfoundation :0 | 3698 | 636 | ❌ 空 |
+
+## 發現
+
+1. **Display Audio 確實有收到音訊**（mean ~600, max ~3500）
+2. **背景噪聲偏高**（mean 600 遠高於正常麥克風的 10-50）
+3. 降噪後 noise floor 降至 mean 110，但仍無法辨識
+4. Vosk small model 對噪聲容忍度不足
+
+## 推測原因
+
+Display Audio 是 **HDMI 音訊回傳通道**，收到的可能是：
+- 顯示器內建喇叭的背景噪聲
+- 或顯示器本身產生的電氣噪聲
+- 不確定顯示器的麥克風是否確實透過 HDMI 回傳
+
+## 待嘗試
+
+- [ ] Whisper (本地，噪聲容忍度高)
+- [ ] USB 麥克風直接測試
+- [ ] macOS 內建 NSSpeechRecognizer（透過 PyObjC）
--- a/docs_v1.0/DESIGN/ZERO_SHOT_DETECTION_RESEARCH.md
+++ b/docs_v1.0/DESIGN/ZERO_SHOT_DETECTION_RESEARCH.md
@@ -0,0 +1,190 @@
+# Zero-Shot Object Detection Model Research Report
+
+**Date:** 2026-05-10
+**Goal:** Evaluate models for detecting arbitrary objects in Charade (1963)
+**System:** M5 MacBook Pro (Apple Silicon MPS, 48GB)
+
+---
+
+## Tested Models
+
+| Model | Params | Size | Resolution | Type | License |
+|-------|--------|------|------------|------|---------|
+| YOLOv8n fine-tune (gun) | 3.2M | 6MB | 640px | Closed-set (4 classes) | AGPL-3.0 |
+| OWL-ViT base | 109M | 586MB | 384px | Zero-shot | Apache 2.0 |
+| **Grounding DINO Base** | **232M** | **891MB** | **384px** | **Zero-shot** | **Apache 2.0** |
+| Grounding DINO Large | 232M | 895MB | 384px | Zero-shot | Apache 2.0 |
+| Florence-2 Base | 231M | ~3GB | 384px | Zero-shot (generative) | MIT |
+| Florence-2 Large | 776M | ~6GB | 384px | Zero-shot (generative) | MIT |
+| PaliGemma 3B mix-224 | 2,923M | ~3GB | 224px | Zero-shot (generative) | Gemma license |
+| PaliGemma 3B mix-448 | 2,923M | ~6GB | 448px | Zero-shot (generative) | Gemma license |
+
+## Detection Performance on Charade
+
+### Large Objects (gun)
+
+| Model | 8 timepoints | Best confidence | Runtime |
+|-------|-------------|----------------|---------|
+| YOLOv8n fine-tune | ❌ 0/5 (all FP) | 0.45 (stamp→pistol) | 0.03s |
+| OWL-ViT | ❌ 2/8 | 0.054 | 3.4s |
+| **Grounding DINO Base** | **✅ 8/8** | **0.499** | **0.33s** |
+| PaliGemma 3B mix-224 | ✅ 3/8 (gun), 3/8 overall | 0.499 | 0.5-3s |
+
+### Small Objects (stamp, passport, magnifying glass)
+
+| Model | Stamp | Passport | Magnifying glass |
+|-------|-------|----------|-----------------|
+| Grounding DINO Base | ❌ FP (~0.3) | ❌ FP (~0.4) | ❌ FP (~0.3-0.5) |
+| PaliGemma 3B mix-224 | ❌ no det | ❌ no det | not tested |
+| PaliGemma 3B mix-448 | ❌ (not tested) | ❌ (not tested) | ❌ (not tested) |
+
+**All models fail on objects smaller than ~50px at native 1920x1080 resolution.**
+
+### Other Objects
+
+| Object | YOLO COCO | Grounding DINO | Notes |
+|--------|-----------|----------------|-------|
+| knife | ✅ 368 frames | ✅ 84 hits | Small but detectable |
+| cup | ✅ | ✅ 13 hits | Moderate size |
+| bottle | ✅ | ✅ 12 hits | Moderate size |
+| cell phone | ✅ | ✅ 5 hits | Hand-held |
+| book | ✅ | ✅ 3 hits | Hand-held |
+| car | ✅ | ✅ 9 hits | Large object |
+| tie | ✅ | ✅ 139 hits | On-person (worn, not held) |
+
+## Detailed Model Analysis
+
+### Grounding DINO Base (Recommended)
+
+**Scores:** Detection confidence 0.1-0.5 (typical for zero-shot)
+
+**Timing per frame (MPS):**
+| Component | Time | % of total |
+|-----------|------|------------|
+| Processor (text+image) | 17ms | 5% |
+| Model inference | 310ms | 93% |
+| Post-processing | 5ms | 2% |
+| **Total** | **331ms** | **100%** |
+
+**Multi-prompt batching:** 8 prompts in 335ms (42ms/prompt vs 309ms single)
+
+**Memory:** ~1GB (MPS)
+
+**License:** Apache 2.0 — fully commercial, no restrictions
+
+### Grounding DINO Large
+
+**Result:** Identical weights to Base. The GitHub "7-dataset" checkpoint is the same 3-dataset version as HuggingFace. The actual 7-dataset version (56.7 AP) was never released.
+
+**Verdict: Do not use.** Base is identical and simpler.
+
+### OWL-ViT
+
+**Result:** Almost useless for this task. Max confidence 0.054. Detect only 2/8 timepoints.
+
+**Verdict: Do not use.**
+
+### Florence-2
+
+**Issue:** `prepare_inputs_for_generation` bug in current transformers version. Cannot run inference without patching model code.
+
+**Task format:** Uses task tokens (`<OD>`) instead of arbitrary text prompts. Cannot do "detect gun" directly — uses generic object detection.
+
+**Verdict: Cannot use in current environment.**
+
+### PaliGemma
+
+**Result:** Works for gun detection (3/8) but misses small objects entirely.
+
+**Key limitation:** No confidence score output (generative model). Either outputs bbox or nothing.
+
+**Issues:**
+- 224px variant: Too low resolution for small objects
+- 448px variant: 6GB download, suspected better for detail but untested
+- Gemma license may restrict commercial use vs Apache 2.0
+
+**Verdict: Inferior to Grounding DINO for this use case.**
+
+### YOLOv8n Fine-tune (Gun Detector)
+
+| Dataset | 905 images (Roboflow CC BY 4.0) |
+| Classes | grenade, knife, pistol, rifle |
+| Validation mAP50 | 0.813 |
+| Charade FP rate | **100%** (all false positives) |
+
+**Root cause:** Training images are close-up gun photos; Charade has distant/partial guns. Distribution mismatch makes this model unusable.
+
+**Verdict: Requires completely new training dataset.**
+
+## Root Cause Analysis: Small Object Failure
+
+### Grounding DINO's Resolution Limit
+
+Grounding DINO processes images at **384×384px**. At this resolution:
+
+```
+1920px frame → 384px input (5:1 reduction)
+A 50×50px object → 10×10px at 384px → only ~1 patch token
+```
+
+For comparison:
+- **Gun** at 200×200px (close-up) → 40×40px → still detectable
+- **Stamp** at 30×30px → 6×6px → lost in downsampling
+- **Passport** at 80×120px → 16×24px → barely visible
+- **Magnifying glass** at 40×40px → 8×8px → lost
+
+### Potential Solutions
+
+| Solution | Pros | Cons | Feasibility |
+|----------|------|------|-------------|
+| **Crop + zoom** on person region | Leverages existing YOLO person detections | Requires two-stage pipeline | ✅ High |
+| **PaliGemma 448px** | 448px native (36% more detail) | 6GB, requires download | ⚠️ Medium |
+| **YOLO fine-tune on stamps** | Fast inference (6MB) | Need 200+ training images | ⚠️ Medium |
+| **Grounding DINO + tiling** | Split image into tiles, run per tile | 4-9x slower | ⚠️ Medium |
+| **Florence-2 448px** | Higher resolution | Bug in transformers | ❌ Low |
+
+## Hand-Held Object Detection Feasibility
+
+### Available Data Sources
+
+| Source | Type | Coverage | Usefulness |
+|--------|------|----------|------------|
+| YOLO `pre_chunks` | Object detections | 169,625 frames | ✅ Every frame |
+| Pose `pre_chunks` | Body keypoints (left_wrist, right_wrist) | 4,269 frames | ✅ Hand location |
+| Grounding DINO | Zero-shot classification | On-demand | ✅ Object ID |
+| ASR dialogue | Text mentions | 4,188 chunks | ✅ "holding a gun" |
+
+### Approach: YOLO + Pose + Grounding DINO
+
+```
+Frame
+  → YOLO: Find person + objects
+  → Pose: Find wrist keypoints
+  → Check: Object bbox overlaps with hand region (wrist ±100px)
+  → Grounding DINO: Verify object class
+```
+
+### Known Limitations
+
+1. **Pose frame alignment:** Pose data (4,269 frames) doesn't always overlap with YOLO data at the same frame
+2. **Object proximity ≠ holding:** YOLO objects near hands may be background, not held
+3. **Small object blind spot:** Stamps, magnifying glasses at hand positions are too small to detect
+
+## Recommendations
+
+| Priority | Action | Rationale |
+|----------|--------|-----------|
+| 1 | Use Grounding DINO Base (Apache 2.0) | Best zero-shot detector, proven on guns, clean license |
+| 2 | Two-stage pipeline for small objects | YOLO person box → crop → upscale → Grounding DINO |
+| 3 | Pose wrist alignment for hand-held confirmation | Reduce false positives by requiring hand proximity |
+| 4 | Replace Grounding DINO "Large" ref with Base | Large is identical weights, no benefit |
+
+## Appendix: License Summary
+
+| Model | License | Commercial Use | Requires |
+|-------|---------|---------------|----------|
+| Grounding DINO | **Apache 2.0** | ✅ Yes | NOTICE file |
+| OWL-ViT | Apache 2.0 | ✅ Yes | NOTICE file |
+| PaliGemma | Gemma license | ⚠️ Needs review | Google ToS |
+| Florence-2 | MIT | ✅ Yes | Copyright notice |
+| YOLOv8 | AGPL-3.0 | ⚠️ Needs license | Open source or paid |
--- a/docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_PLAN.md
+++ b/docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_PLAN.md
@@ -0,0 +1,49 @@
+# Zero-Shot Gun Detection Test Plan
+
+**Date:** 2026-05-10
+**Goal:** Compare OWL-ViT vs Grounding DINO for detecting guns in Charade (1963)
+
+## Models
+
+| Model | Source | Type |
+|-------|--------|------|
+| `google/owlvit-base-patch32` | HuggingFace | Zero-shot object detection |
+| `IDEA-Research/grounding-dino-base` | HuggingFace | Zero-shot object detection |
+
+## Test Timepoints (8)
+
+| Time | Label | Source |
+|------|-------|--------|
+| 2646s (44:06) | 2646s | ASR: "He has a gun" |
+| 3188s (53:08) | 3188s | Original detection |
+| 3697s (61:37) | 3697s | ASR: "Where's your gun" |
+| 5341s (89:01) | 5341s | ASR: "He already killed 3 men" |
+| 5461s (91:01) | 5461s | Original detection |
+| 6309s (1:45:09) | 6309s | Original detection |
+| 6377s (1:46:17) | 6377s | Original detection |
+| 6479s (1:47:59) | 6479s | Original detection |
+
+## Prompts
+
+`"gun"`, `"pistol"`, `"rifle"`, `"weapon"`
+
+## Matrix
+
+8 timepoints × 2 models × 4 prompts = 64 inferences
+
+## Output
+
+| File | Description |
+|------|-------------|
+| `output_dev/zero_shot_test/*.jpg` | Annotated screenshots |
+| `output_dev/zero_shot_test/zero_shot_results.json` | Detection results |
+| `scripts/zero_shot_gun_test.py` | Test script |
+
+## Success Criteria
+
+| Level | Criteria |
+|-------|----------|
+| Excellent | Finds real gun with confidence > 0.5 |
+| Good | Finds real gun with confidence < 0.5 |
+| Limited | Finds guns but many false positives |
+| Failed | All false positives |
--- a/docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_REPORT.md
+++ b/docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_REPORT.md
@@ -0,0 +1,67 @@
+# Zero-Shot Gun Detection Test Report
+
+**Date:** 2026-05-10
+**Goal:** Compare OWL-ViT vs Grounding DINO for detecting guns in Charade (1963)
+
+## Test Setup
+
+| Model | Prompts | Timepoints | Total inferences |
+|-------|---------|------------|-----------------|
+| `google/owlvit-base-patch32` | gun, pistol, rifle, weapon | 8 | 32 |
+| `IDEA-Research/grounding-dino-base` | gun, pistol, rifle, weapon | 8 | 32 |
+
+## Results
+
+| Model | Timepoints with detections | Total detections | Best confidence | Runtime |
+|-------|---------------------------|-----------------|-----------------|---------|
+| OWL-ViT | 2/8 | 2 | 0.054 | 1.5s |
+| **Grounding DINO** | **8/8** | **109** | **0.186** | 11.5s |
+
+## Grounding DINO — Per Timepoint
+
+| Time | Source | Best prompt | Best confidence | Found? |
+|------|--------|-------------|-----------------|--------|
+| 2646s (44:06) | ASR: "He has a gun" | gun | 0.082 | ✅ |
+| **3188s (53:08)** | **Original pistol** | **gun** | **0.149** | **✅** |
+| 3697s (61:37) | ASR: "Where's your gun" | gun | 0.159 | ✅ |
+| 5341s (89:01) | ASR: "He already killed 3 men" | gun | 0.074 | ✅ |
+| **5461s (91:01)** | **Original pistol** | **gun** | **0.186** | **✅** |
+| **6309s (1:45:09)** | **Original pistol** | **gun** | **0.077** | **✅** |
+| **6377s (1:46:17)** | **Original gun** | **weapon** | **0.118** | **✅** |
+| **6479s (1:47:59)** | **Original pistol** | **gun** | **0.060** | **✅** |
+
+### Original 5 Pistol Frames
+
+| Frame | OWL-ViT | Grounding DINO | Verdict |
+|-------|---------|----------------|---------|
+| 3188s | Not found | ✅ Found (0.149) | ✅ |
+| 5461s | Not found | ✅ Found (0.186) | ✅ |
+| 6309s | Not found | ✅ Found (0.077) | ✅ |
+| 6377s | Not found | ✅ Found (0.118) | ✅ |
+| 6479s | Not found | ✅ Found (0.060) | ✅ |
+
+## Analysis
+
+### OWL-ViT
+- Almost completely failed: only 2 detections at 0.05 confidence
+- Not suitable for this task
+
+### Grounding DINO
+- **Found all 8 timepoints**, including all 5 original pistol frames
+- Best prompt is consistently `"gun"` (6/8 timepoints)
+- Confidence range: 0.060 - 0.186 (typical for zero-shot detection)
+- Higher confidence correlates with user-confirmed detections
+
+### Key Finding
+The 5 original pistol frames were produced by **Grounding DINO** (not YOLOv8n). The model was downloaded from HuggingFace at 15:43-15:44 on May 9, and the screenshots were generated at 15:49 — confirming OWL-ViT was tested first (failed) and then Grounding DINO was tested (succeeded).
+
+## Integration
+
+Grounding DINO has been integrated into `object_search_agent.py` as `--source zero_shot`:
+```
+python3 scripts/object_search_agent.py --keyword gun --source zero_shot
+```
+
+## Screenshots
+
+All 64 annotated screenshots saved to `output_dev/zero_shot_test/*.jpg`
--- a/docs_v1.0/DESIGN/ZERO_SHOT_VS_FINETUNE_SELECTION.md
+++ b/docs_v1.0/DESIGN/ZERO_SHOT_VS_FINETUNE_SELECTION.md
@@ -0,0 +1,115 @@
+# Zero-Shot vs Fine-Tune 物件偵測模型選型報告
+
+**Date:** 2026-05-10
+**Goal:** 在 Charade (1963) 中搜尋非 COCO 物件（槍枝、郵票、信封等）
+**System:** M5 MacBook Pro (Apple Silicon MPS)
+
+## 動機
+
+YOLOv8 COCO 只有 80 類，不包含 gun、stamp、envelope 等 Charade 核心物件。需要找到能在電影中搜尋任意物件的方法。
+
+## 候選方案
+
+| 方案 | 方法 | 訓練資料 | 開發成本 |
+|------|------|---------|---------|
+| A. YOLOv8n fine-tune | Fine-tune on gun dataset | 需收集 500+ 張標註圖片 | 高 |
+| B. OWL-ViT zero-shot | Vision-language pretraining | 無須訓練 | 低 |
+| C. Grounding DINO zero-shot | Vision-language pretraining | 無須訓練 | 低 |
+
+## 模型大小與效能
+
+| Model | 磁碟 | 參數 | 推論時間 (MPS) | 單幀能耗 | 模型類別 |
+|-------|------|------|---------------|---------|---------|
+| YOLOv8n | **6MB** | **3.2M** | **0.03s** | **~0.5J** | 封閉集（80 類） |
+| OWL-ViT | 586MB | 109M | 3.4s | ~50J | 開放集（zero-shot） |
+| **Grounding DINO** | **891MB** | **172M** | **4.3s** | **~65J** | **開放集（zero-shot）** |
+
+## Charade 實測結果
+
+| Model | 8 時間點命中 | 5 個原始 pistol | 最佳 confidence | 推論時間 | 模型大小 |
+|-------|-------------|-----------------|----------------|---------|---------|
+| YOLOv8n COCO | ❌ N/A（無 gun class） | — | — | 0.03s | 6MB |
+| YOLOv8n fine-tune | 7/7 FP | ❌ 全部 FP | 0.45（郵票誤判） | 0.03s | 6MB |
+| OWL-ViT | 2/8 | ❌ 0/5 | 0.054 | 3.4s | 586MB |
+| **Grounding DINO Base** | **31/32** | **✅ 5/5** | **0.672** | **11.6s** | **891MB** |
+| **Grounding DINO Large** | **32/32** | **✅ 5/5** | **1.000** | **50.1s** | **895MB** |
+
+### Base vs Large 比較
+
+| 指標 | Base (3 datasets) | Large (7 datasets) |
+|------|------------------|-------------------|
+| 平均最佳 confidence | 0.384 | **1.000** |
+| 總偵測數 | 333 | **28,800** |
+| COCO zero-shot AP | 48.4 | **56.7** |
+| 推論時間 (MPS) | 11.6s | 50.1s |
+| Edge 部署 | 較可行 | 較困難 |
+
+### 結論
+
+**效能優先選擇：Grounding DINO Large** — 所有 8 個時間點 confidence 1.000，零漏檢。犧牲推論速度但 detection 品質大幅超越 Base 版。
+
+**Edge 部署選擇：Grounding DINO Base** — 體積相近但推論快 4.3x，適合資源受限裝置。
+
+### 關鍵結論
+
+1. **YOLOv8n fine-tune 完全失敗** — 905 張 Roboflow 近距離特寫與 Charade 中遠景畫面分布 mismatch，訓練無法泛化
+2. **OWL-ViT 幾乎無效** — 對電影中的小物體辨識能力不足
+3. **Grounding DINO 成功** — 5/5 找回 pistol frames，所有 ASR gun mention 時間點也命中
+
+## Grounding DINO 優缺點
+
+### 優點
+- **零樣本搜尋**：任何 COCO 以外的物件直接用文字 prompt 搜尋
+- **延伸性**：同一模型可搜尋 gun、stamp、envelope、knife、hat 等任意物件
+- **無須訓練**：不需要收集標註資料或 fine-tune
+- **Apache 2.0 License**：可商用
+
+### 缺點
+- **體積大**：891MB（vs YOLOv8n 的 6MB）
+- **推論慢**：4.3s/frame（vs YOLOv8n 的 0.03s）
+- **不適合 real-time**：edge device 上無法做即時偵測，只適合離線掃描
+
+## Edge AI 部署考量
+
+| 項目標題 | YOLOv8n | Grounding DINO |
+|---------|---------|---------------|
+| 模型大小 | 6MB ✅ | 891MB ⚠️ |
+| RAM 需求 | ~100MB | ~2.5GB |
+| 推論時間 | 30ms | 4.3s |
+| 單幀能耗 | ~0.5J | ~65J |
+| 搜尋類別數 | 80（固定） | 無限（文字 prompt） |
+| 電池影響（1000 幀） | ~500J | ~65,000J |
+
+### 建議策略
+
+```
+離線掃描（Server/Gateway）：
+  用 Grounding DINO 對全片建立物件索引
+  → 耗時但可接受（113 min 電影約 2-3 小時）
+
+即時查詢（Edge Device）：
+  查詢時只跑 Grounding DINO 在該 timepoint → 4s/次
+  → 查詢體驗還可接受
+```
+
+## 整合狀態
+
+- ✅ Grounding DINO 測試通過
+- ✅ 整合進 `scripts/object_search_agent.py`（`--source zero_shot`）
+- ✅ 測試計畫：`docs/ZERO_SHOT_GUN_TEST_PLAN.md`
+- ✅ 測試報告：`docs/ZERO_SHOT_GUN_TEST_REPORT.md`
+
+## License 聲明
+
+Grounding DINO 採用 Apache 2.0 License，可商用。
+產品若 bundle 此模型，需附 `NOTICE` 檔案：
+
+```
+Momentry
+Copyright 2026 Accusys
+
+This product includes software developed by IDEA Research:
+- Grounding DINO (https://github.com/IDEA-Research/GroundingDINO)
+  Copyright 2023 IDEA Research
+  Licensed under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
+```