docs: file_uuid generation rules for M4
This commit is contained in:
731
docs_v1.0/DESIGN/API_KEY_DESIGN.md
Normal file
731
docs_v1.0/DESIGN/API_KEY_DESIGN.md
Normal file
@@ -0,0 +1,731 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Momentry API Key 管理系統設計"
|
||||
date: "2026-03-21"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "管理系統設計"
|
||||
ai_query_hints:
|
||||
- "查詢 Momentry API Key 管理系統設計 的內容"
|
||||
- "Momentry API Key 管理系統設計 的主要目的是什麼?"
|
||||
- "如何操作或實施 Momentry API Key 管理系統設計?"
|
||||
---
|
||||
|
||||
# Momentry API Key 管理系統設計
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | Warren |
|
||||
| 建立時間 | 2026-03-21 |
|
||||
| 文件版本 | V1.2 |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-03-18 | 創建文件 | Warren | OpenCode / MiniMax M2.5 |
|
||||
| V1.1 | 2026-03-20 | 新增 Key 類型與管理流程 | Warren | OpenCode |
|
||||
| V1.2 | 2026-03-21 | 更新 API Key 格式與驗證流程 | Warren | OpenCode |
|
||||
|
||||
---
|
||||
|
||||
**狀態**: 開發中
|
||||
|
||||
---
|
||||
|
||||
## 1. 概述
|
||||
|
||||
### 1.1 目標
|
||||
|
||||
建立安全的 API Key 管理機制,支援:
|
||||
- 多類型 API Key(系統、用戶、服務)
|
||||
- 自動過期與輪換
|
||||
- 異常使用偵測
|
||||
- 強制更新機制
|
||||
- 完整審計日誌
|
||||
- Gitea Token 整合
|
||||
- n8n API Key 整合
|
||||
|
||||
### 1.2 設計原則
|
||||
|
||||
| 原則 | 說明 |
|
||||
|------|------|
|
||||
| 最小權限 | 每個 Key 僅授予必要權限 |
|
||||
| 定期輪換 | 自動過期強制更新 |
|
||||
| 追蹤可審 | 所有操作都有日誌 |
|
||||
| 分離儲存 | Key 與使用者資料分離 |
|
||||
|
||||
---
|
||||
|
||||
## 2. API Key 類型
|
||||
|
||||
### 2.1 Key 類型矩陣
|
||||
|
||||
| 類型 | 前綴 | 用途 | 預設有效期 | 輪換方式 |
|
||||
|------|------|------|------------|----------|
|
||||
| `system` | `msys_` | 系統內部服務 | 365 天 | 手動 |
|
||||
| `user` | `muser_` | 個人用戶 | 90 天 | 自動 |
|
||||
| `service` | `msvc_` | 服務間通訊 | 180 天 | 自動 |
|
||||
| `integration` | `mint_` | 第三方整合 | 30 天 | 強制更新 |
|
||||
| `emergency` | `memg_` | 緊急存取 | 24 小時 | 一次性 |
|
||||
|
||||
### 2.2 Key 格式
|
||||
|
||||
```
|
||||
{prefix}{uuid_v4}_{timestamp}_{checksum}
|
||||
```
|
||||
|
||||
**範例:**
|
||||
```
|
||||
msys_a1b2c3d4-e5f6-7890-abcd-ef1234567890_1710998400_sha256
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 資料庫 Schema
|
||||
|
||||
### 3.1 api_keys 表
|
||||
|
||||
```sql
|
||||
CREATE TABLE api_keys (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
key_id VARCHAR(64) UNIQUE NOT NULL, -- 公開 Key ID
|
||||
key_hash VARCHAR(128) NOT NULL, -- SHA256 哈希
|
||||
key_prefix VARCHAR(8) NOT NULL, -- Key 前綴
|
||||
name VARCHAR(128) NOT NULL, -- Key 名稱
|
||||
key_type VARCHAR(32) NOT NULL, -- system/user/service/integration/emergency
|
||||
user_id BIGINT, -- 關聯用戶 (nullable for system)
|
||||
service_name VARCHAR(64), -- 服務名稱 (for service keys)
|
||||
permissions JSONB NOT NULL DEFAULT '[]', -- 權限列表
|
||||
expires_at TIMESTAMP, -- 過期時間
|
||||
last_used_at TIMESTAMP, -- 最後使用時間
|
||||
last_used_ip VARCHAR(45), -- 最後使用 IP
|
||||
usage_count BIGINT DEFAULT 0, -- 使用次數
|
||||
status VARCHAR(16) DEFAULT 'active', -- active/suspended/expired/revoked
|
||||
rotation_required BOOLEAN DEFAULT FALSE, -- 強制輪換標記
|
||||
rotation_reason VARCHAR(256), -- 輪換原因
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_api_keys_key_id ON api_keys(key_id);
|
||||
CREATE INDEX idx_api_keys_user_id ON api_keys(user_id);
|
||||
CREATE INDEX idx_api_keys_type ON api_keys(key_type);
|
||||
CREATE INDEX idx_api_keys_status ON api_keys(status);
|
||||
CREATE INDEX idx_api_keys_expires ON api_keys(expires_at);
|
||||
```
|
||||
|
||||
### 3.2 api_key_audit_log 表
|
||||
|
||||
```sql
|
||||
CREATE TABLE api_key_audit_log (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
key_id VARCHAR(64) NOT NULL,
|
||||
action VARCHAR(32) NOT NULL, -- created/used/rotated/revoked/expired/suspended
|
||||
actor VARCHAR(64), -- 操作者 (user_id or 'system')
|
||||
ip_address VARCHAR(45),
|
||||
user_agent VARCHAR(512),
|
||||
request_path VARCHAR(256),
|
||||
response_code INTEGER,
|
||||
details JSONB,
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_audit_key_id ON api_key_audit_log(key_id);
|
||||
CREATE INDEX idx_audit_action ON api_key_audit_log(action);
|
||||
CREATE INDEX idx_audit_created ON api_key_audit_log(created_at);
|
||||
```
|
||||
|
||||
### 3.3 api_key_rotation_log 表
|
||||
|
||||
```sql
|
||||
CREATE TABLE api_key_rotation_log (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
key_id VARCHAR(64) NOT NULL,
|
||||
old_key_id VARCHAR(64),
|
||||
new_key_id VARCHAR(64),
|
||||
rotation_type VARCHAR(32) NOT NULL, -- scheduled/manual/forced/emergency
|
||||
reason VARCHAR(256),
|
||||
triggered_by VARCHAR(64), -- system/user/scheduler
|
||||
grace_period_end TIMESTAMP, -- 寬限期結束時間
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. API Key 狀態機
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ created │
|
||||
└──────┬───────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────┐
|
||||
│ active │◄─────────────┐
|
||||
└─────────┬──────────┘ │
|
||||
│ │
|
||||
┌─────────────┼─────────────┐ │
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ │
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ suspended │ │ expired │ │ revoked │─────┘
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
### 狀態轉換規則
|
||||
|
||||
| 從 | 到 | 觸發條件 |
|
||||
|----|----|----------|
|
||||
| created | active | 啟用 Key |
|
||||
| active | suspended | 異常使用偵測 |
|
||||
| active | expired | 達到過期時間 |
|
||||
| active | revoked | 手動撤銷 |
|
||||
| suspended | active | 解除鎖定 |
|
||||
| suspended | revoked | 確認異常 |
|
||||
| expired | active | 重新啟用 |
|
||||
|
||||
---
|
||||
|
||||
## 5. 異常偵測機制
|
||||
|
||||
### 5.1 異常指標
|
||||
|
||||
| 指標 | 閾值 | 處置 |
|
||||
|------|------|------|
|
||||
| 每分鐘請求數 | > 1000 | 警告 |
|
||||
| 每小時請求數 | > 10000 | 鎖定 |
|
||||
| 錯誤率 | > 50% | 警告 |
|
||||
| 不同 IP 數 | > 5/小時 | 警告 |
|
||||
| 非工作時間使用 | 深夜請求 | 警告 |
|
||||
| 異常模式 | 暴力破解 | 鎖定 |
|
||||
|
||||
### 5.2 異常處理流程
|
||||
|
||||
```
|
||||
異常偵測
|
||||
│
|
||||
▼
|
||||
┌─────────┐
|
||||
│ 分析 │──→ 排除正常流量
|
||||
└────┬────┘
|
||||
│
|
||||
▼
|
||||
┌─────────┐
|
||||
│ 評估 │──→ 輕微 → 警告
|
||||
└────┬────┘
|
||||
│
|
||||
▼
|
||||
┌─────────┐
|
||||
│ 處置 │──→ 嚴重 → 鎖定 + 輪換
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 強制更新機制
|
||||
|
||||
### 6.1 觸發條件
|
||||
|
||||
| 條件 | 嚴重性 | 動作 |
|
||||
|------|--------|------|
|
||||
| 疑似洩露 | 高 | 立即停用 + 強制輪換 |
|
||||
| 異常使用 | 中 | 警告 + 建議輪換 |
|
||||
| 計劃性維護 | 低 | 通知 + 排程輪換 |
|
||||
| 政策要求 | 高 | 強制輪換 |
|
||||
| 過期 | 低 | 停用 + 通知 |
|
||||
|
||||
### 6.2 強制輪換流程
|
||||
|
||||
```
|
||||
1. 系統偵測到需要強制更新
|
||||
│
|
||||
▼
|
||||
2. 建立新 Key(保留舊 Key 在寬限期內)
|
||||
│
|
||||
▼
|
||||
3. 發送通知(Email/Slack/Redis PubSub)
|
||||
│
|
||||
▼
|
||||
4. 寬限期開始(預設 24 小時)
|
||||
│
|
||||
├── 在寬限期內更新 → 完成輪換
|
||||
│
|
||||
└── 寬限期結束 → 舊 Key 停用
|
||||
```
|
||||
|
||||
### 6.3 寬限期配置
|
||||
|
||||
| Key 類型 | 寬限期 |
|
||||
|----------|--------|
|
||||
| system | 72 小時 |
|
||||
| user | 24 小時 |
|
||||
| service | 48 小時 |
|
||||
| integration | 24 小時 |
|
||||
| emergency | 0 小時 |
|
||||
|
||||
---
|
||||
|
||||
## 7. CLI 管理命令
|
||||
|
||||
### 7.1 命令列表
|
||||
|
||||
```bash
|
||||
# Key 管理
|
||||
momentry api-key create --name "My Key" --type user --permissions read,write
|
||||
momentry api-key list --type user
|
||||
momentry api-key info <key_id>
|
||||
momentry api-key revoke <key_id> --reason "安全原因"
|
||||
|
||||
# 輪換管理
|
||||
momentry api-key rotate <key_id> # 正常輪換
|
||||
momentry api-key force-rotate <key_id> # 強制輪換
|
||||
momentry api-key rotation-status <key_id> # 查看輪換狀態
|
||||
|
||||
# 異常管理
|
||||
momentry api-key suspend <key_id> --reason "異常使用"
|
||||
momentry api-key unsuspend <key_id>
|
||||
momentry api-key blacklist <key_id> # 列入黑名單
|
||||
|
||||
# 審計
|
||||
momentry api-key audit <key_id> --since 7d
|
||||
momentry api-key stats --type service --period 30d
|
||||
```
|
||||
|
||||
### 7.2 輸出範例
|
||||
|
||||
```bash
|
||||
$ momentry api-key list --type service
|
||||
|
||||
┌────────────────────────────────────┬─────────┬──────────────┬────────────────┐
|
||||
│ Key ID │ Name │ Status │ Expires │
|
||||
├────────────────────────────────────┼─────────┼──────────────┼────────────────┤
|
||||
│ msvc_a1b2c3d4_1710998400_sha256 │ N8N │ active │ 2026-09-21 │
|
||||
│ msvc_e5f6g7h8_1713600000_sha256 │ OpenCode│ rotation_req │ 2026-09-21 │
|
||||
└────────────────────────────────────┴─────────┴──────────────┴────────────────┘
|
||||
|
||||
⚠️ 1 個 Key 需要輪換
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. 實現計畫
|
||||
|
||||
### Phase 1: 核心功能
|
||||
- [ ] 資料庫 Schema
|
||||
- [ ] Key 生成與哈希
|
||||
- [ ] 基本 CRUD API
|
||||
- [ ] 過期檢查
|
||||
|
||||
### Phase 2: 安全機制
|
||||
- [ ] 異常偵測
|
||||
- [ ] 自動鎖定
|
||||
- [ ] 強制輪換
|
||||
- [ ] 寬限期管理
|
||||
|
||||
### Phase 3: 管理工具
|
||||
- [ ] CLI 命令
|
||||
- [ ] 審計日誌
|
||||
- [ ] 統計報表
|
||||
- [ ] 通知系統
|
||||
|
||||
### Phase 4: 自動化
|
||||
- [ ] 定時輪換排程
|
||||
- [ ] Prometheus 指標
|
||||
- [ ] Alertmanager 整合
|
||||
- [ ] 自動化回應
|
||||
|
||||
---
|
||||
|
||||
## 9. 安全考量
|
||||
|
||||
### 9.1 Key 儲存
|
||||
- 明文 Key 只顯示一次(創建時)
|
||||
- 儲存時使用 SHA256 哈希
|
||||
- 使用 Fernet 對稱加密敏感配置
|
||||
|
||||
### 9.2 傳輸安全
|
||||
- 所有 API 必須使用 HTTPS
|
||||
- Key 在 Header 中傳輸(X-API-Key)
|
||||
- 避免 Key 在 URL 中
|
||||
|
||||
### 9.3 存取控制
|
||||
- 只有管理員可創建/撤銷 Key
|
||||
- 用戶只能管理自己的 Key
|
||||
- 系統 Key 需要特殊權限
|
||||
|
||||
---
|
||||
|
||||
## 10. 環境變數配置
|
||||
|
||||
```bash
|
||||
# API Key 管理
|
||||
MOMENTRY_API_KEY_GRACE_PERIOD=86400 # 寬限期(秒)
|
||||
MOMENTRY_API_KEY_MAX_PER_USER=5 # 每用戶最大 Key 數
|
||||
MOMENTRY_API_KEY_ROTATION_DAYS=90 # 自動輪換天數
|
||||
|
||||
# 異常偵測
|
||||
MOMENTRY_API_KEY_RATE_LIMIT=1000 # 每分鐘限制
|
||||
MOMENTRY_API_KEY_ERROR_THRESHOLD=0.5 # 錯誤率閾值
|
||||
MOMENTRY_API_KEY_IP_LIMIT=5 # 每小時 IP 限制
|
||||
|
||||
# 通知
|
||||
MOMENTRY_API_KEY_ALERT_WEBHOOK= # 異常通知 webhook
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Gitea API Token 整合
|
||||
|
||||
### 11.1 概述
|
||||
|
||||
支援透過 API Key 管理系統建立和管理 Gitea Personal Access Tokens,採用「建立時納管」模式。
|
||||
|
||||
### 11.2 納管模式
|
||||
|
||||
```
|
||||
使用者提供帳號密碼 → 呼叫 Gitea API 建立 Token → 明文只顯示一次 → 同步儲存至管理系統
|
||||
```
|
||||
|
||||
**特點:**
|
||||
- Token 明文僅在建立時取得
|
||||
- 管理系統記錄 Token 元數據(不含明文)
|
||||
- 支援本地查詢和刪除
|
||||
|
||||
### 11.3 資料庫結構
|
||||
|
||||
```sql
|
||||
CREATE TABLE gitea_tokens (
|
||||
id SERIAL PRIMARY KEY,
|
||||
gitea_token_id BIGINT NOT NULL, -- Gitea 內部 Token ID
|
||||
gitea_user VARCHAR(128) NOT NULL, -- Gitea 用戶名
|
||||
token_name VARCHAR(128) NOT NULL, -- Token 名稱
|
||||
token_last_eight VARCHAR(8) NOT NULL, -- SHA1 最後 8 碼(顯示用)
|
||||
scopes JSONB DEFAULT '[]', -- 權限範圍
|
||||
api_key_id VARCHAR(48), -- 關聯的 API Key ID(可選)
|
||||
last_verified TIMESTAMP, -- 最後驗證時間
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(gitea_user, token_name)
|
||||
);
|
||||
```
|
||||
|
||||
### 11.4 Token 權限範圍
|
||||
|
||||
| 範圍 | 說明 |
|
||||
|------|------|
|
||||
| `read:repository` | 讀取倉庫 |
|
||||
| `write:repository` | 寫入倉庫 |
|
||||
| `read:issue` | 讀取議題 |
|
||||
| `write:issue` | 寫入議題 |
|
||||
| `read:user` | 讀取用戶資訊 |
|
||||
| `write:write` | 修改用戶資訊 |
|
||||
| `read:organization` | 讀取組織 |
|
||||
| `write:organization` | 修改組織 |
|
||||
| `read:package` | 讀取套件 |
|
||||
| `write:package` | 發布套件 |
|
||||
| `read:notification` | 讀取通知 |
|
||||
| `write:notification` | 修改通知 |
|
||||
| `read:admin` | 管理員讀取 |
|
||||
| `write:admin` | 管理員寫入 |
|
||||
|
||||
### 11.5 CLI 命令
|
||||
|
||||
#### 建立 Token
|
||||
|
||||
```bash
|
||||
# 基本用法
|
||||
momentry gitea create \
|
||||
--username <gitea_user> \
|
||||
--password <gitea_password> \
|
||||
--token-name <token_name> \
|
||||
--scopes "read:repository,write:repository"
|
||||
|
||||
# 範例:建立整合用 Token
|
||||
momentry gitea create \
|
||||
--username admin \
|
||||
--password "MyPassword123" \
|
||||
--token-name "ci-pipeline" \
|
||||
--scopes "read:repository,write:repository,read:issue,write:issue"
|
||||
```
|
||||
|
||||
**輸出範例:**
|
||||
```
|
||||
✅ Gitea Token created successfully!
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ⚠️ IMPORTANT: Save this token now - it will not be shown again! │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Token ID: 9
|
||||
Token Name: ci-pipeline
|
||||
SHA1: 9a4f282e9ba817b430082e6bff2c18e2ae38e480
|
||||
Last 8: ae38e480
|
||||
|
||||
Authorization Header:
|
||||
Authorization: token 9a4f282e9ba817b430082e6bff2c18e2ae38e480
|
||||
```
|
||||
|
||||
#### 列出 Token
|
||||
|
||||
```bash
|
||||
# 列出用戶的所有 Token
|
||||
momentry gitea list \
|
||||
--username <gitea_user> \
|
||||
--password <gitea_password>
|
||||
```
|
||||
|
||||
**輸出範例:**
|
||||
```
|
||||
📋 Gitea Tokens for user: admin
|
||||
|
||||
┌────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ID │ Name │ Last 8 │ Registered │
|
||||
├────────────────────────────────────────────────────────────────────────────┤
|
||||
│ 9 │ ci-pipeline │ ae38e480 │ ✓ │
|
||||
│ 8 │ dev-token │ 1234abcd │ - │
|
||||
└────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Total: 2 token(s)
|
||||
```
|
||||
|
||||
#### 刪除 Token
|
||||
|
||||
```bash
|
||||
# 刪除指定 Token
|
||||
momentry gitea delete \
|
||||
--username <gitea_user> \
|
||||
--password <gitea_password> \
|
||||
--token-name <token_name>
|
||||
```
|
||||
|
||||
#### 查詢本地記錄
|
||||
|
||||
```bash
|
||||
# 查詢已納管的 Token 記錄
|
||||
momentry gitea verify --token-name <token_name>
|
||||
```
|
||||
|
||||
**輸出範例:**
|
||||
```
|
||||
📋 Gitea Token: ci-pipeline
|
||||
User: admin
|
||||
Token ID: 9
|
||||
Last 8: ae38e480
|
||||
Scopes: ["read:repository","write:repository"]
|
||||
Created: 2026-03-21 06:44:55.577586 UTC
|
||||
Last Verified: never
|
||||
```
|
||||
|
||||
### 11.6 使用範圍
|
||||
|
||||
#### 適用場景
|
||||
|
||||
| 場景 | 說明 |
|
||||
|------|------|
|
||||
| CI/CD 整合 | 建立專用 Token 用於自動化流程 |
|
||||
| 服務間通訊 | 建立 Token 供其他服務存取 Gitea API |
|
||||
| 開發環境 | 為開發者建立短期 Token |
|
||||
| 監控整合 | 建立只讀 Token 用於監控和報告 |
|
||||
|
||||
#### 限制
|
||||
|
||||
| 限制 | 說明 |
|
||||
|------|------|
|
||||
| 明文 Token | 僅在建立時取得,無法再次查詢 |
|
||||
| 管理 API | 需要帳號密碼(BasicAuth) |
|
||||
| Token 驗證 | 只能透過 API 呼叫驗證有效性 |
|
||||
| 同步刪除 | 本地刪除不會自動同步到 Gitea |
|
||||
|
||||
### 11.7 環境變數
|
||||
|
||||
```bash
|
||||
# Gitea 連線設定
|
||||
GITEA_URL=http://localhost:3000 # Gitea API URL
|
||||
```
|
||||
|
||||
### 11.8 安全考量
|
||||
|
||||
| 項目 | 措施 |
|
||||
|------|------|
|
||||
| 密碼傳輸 | 僅在 CLI 命令中使用,不儲存 |
|
||||
| Token 儲存 | 本地僅存元數據,不含明文 |
|
||||
| 權限最小化 | 建議僅授予必要權限 |
|
||||
| 定期輪換 | 建議定期更新 Token |
|
||||
|
||||
---
|
||||
|
||||
## 12. n8n API Key 整合
|
||||
|
||||
### 12.1 概述
|
||||
|
||||
支援透過 API Key 管理系統建立和管理 n8n API Keys,採用「建立時納管」模式。
|
||||
|
||||
### 12.2 納管模式
|
||||
|
||||
```
|
||||
使用者提供現有 n8n API Key → 呼叫 n8n API 建立新 Key → 明文只顯示一次 → 同步儲存至管理系統
|
||||
```
|
||||
|
||||
**特點:**
|
||||
- 需要一個現有的 n8n API Key 作為管理憑證
|
||||
- API Key 明文僅在建立時取得
|
||||
- 管理系統記錄 Key 元數據(不含明文)
|
||||
- 支援本地查詢和刪除
|
||||
|
||||
### 12.3 資料庫結構
|
||||
|
||||
```sql
|
||||
CREATE TABLE n8n_api_keys (
|
||||
id SERIAL PRIMARY KEY,
|
||||
n8n_key_id VARCHAR(64) UNIQUE NOT NULL, -- n8n 內部 Key ID
|
||||
label VARCHAR(100) NOT NULL, -- Key 標籤
|
||||
api_key_last_eight VARCHAR(8) NOT NULL, -- API Key 最後 8 碼(顯示用)
|
||||
momentry_api_key_id VARCHAR(48), -- 關聯的 API Key ID(可選)
|
||||
expires_at TIMESTAMP WITH TIME ZONE, -- 過期時間
|
||||
last_verified TIMESTAMP WITH TIME ZONE, -- 最後驗證時間
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
### 12.4 認證方式
|
||||
|
||||
n8n 使用 JWT-based API Key,透過 `X-N8N-API-KEY` Header 認證:
|
||||
|
||||
```bash
|
||||
curl -H "X-N8N-API-KEY: <your-api-key>" https://n8n.example.com/api/v1/workflows
|
||||
```
|
||||
|
||||
### 12.5 CLI 命令
|
||||
|
||||
#### 建立 API Key
|
||||
|
||||
```bash
|
||||
# 基本用法
|
||||
momentry n8n create \
|
||||
--api-key <existing_n8n_api_key> \
|
||||
--label <key_label> \
|
||||
--expires-in-days <days>
|
||||
|
||||
# 範例:建立 CI/CD 用 Key
|
||||
momentry n8n create \
|
||||
--api_key "n8n_api_xxxxxxxxxxxx" \
|
||||
--label "ci-pipeline" \
|
||||
--expires-in-days 90
|
||||
```
|
||||
|
||||
**輸出範例:**
|
||||
```
|
||||
✅ n8n API Key created successfully!
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ⚠️ IMPORTANT: Save this API key now - it will not be shown again! │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Key ID: abc123-def456
|
||||
Label: ci-pipeline
|
||||
API Key: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
||||
|
||||
Usage:
|
||||
curl -H 'X-N8N-API-KEY: eyJhbGciOiJIUz...' https://n8n.momentry.ddns.net/api/v1/workflows
|
||||
```
|
||||
|
||||
#### 列出 API Keys
|
||||
|
||||
```bash
|
||||
# 列出所有 API Keys
|
||||
momentry n8n list --api-key <existing_n8n_api_key>
|
||||
```
|
||||
|
||||
**輸出範例:**
|
||||
```
|
||||
📋 n8n API Keys
|
||||
|
||||
┌────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Label │ ID │
|
||||
├────────────────────────────────────────────────────────────────────────────┤
|
||||
│ ci-pipeline │ abc123-def456-789 │
|
||||
│ monitoring │ xyz789-abc123-456 │
|
||||
└────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Total: 2 key(s)
|
||||
```
|
||||
|
||||
#### 刪除 API Key
|
||||
|
||||
```bash
|
||||
# 刪除指定 API Key
|
||||
momentry n8n delete \
|
||||
--api-key <existing_n8n_api_key> \
|
||||
--label <key_label>
|
||||
```
|
||||
|
||||
#### 查詢本地記錄
|
||||
|
||||
```bash
|
||||
# 查詢已納管的 API Key 記錄
|
||||
momentry n8n verify --label <key_label>
|
||||
```
|
||||
|
||||
**輸出範例:**
|
||||
```
|
||||
📋 n8n API Key: ci-pipeline
|
||||
Key ID: abc123-def456
|
||||
Last 8: ...JVCJ9
|
||||
Created: 2026-03-21 06:44:55.577586 UTC
|
||||
Expires: 2026-06-19 06:44:55.577586 UTC
|
||||
Last Verified: never
|
||||
```
|
||||
|
||||
### 12.6 使用範圍
|
||||
|
||||
#### 適用場景
|
||||
|
||||
| 場景 | 說明 |
|
||||
|------|------|
|
||||
| CI/CD 整合 | 建立專用 Key 用於自動化流程 |
|
||||
| 監控整合 | 建立只讀 Key 用於監控工作流狀態 |
|
||||
| 服務間通訊 | 建立 Key 供其他服務呼叫 n8n API |
|
||||
| 開發環境 | 為開發者建立短期 Key |
|
||||
|
||||
#### 限制
|
||||
|
||||
| 限制 | 說明 |
|
||||
|------|------|
|
||||
| 明文 API Key | 僅在建立時取得,無法再次查詢 |
|
||||
| 管理憑證 | 需要一個現有的 n8n API Key |
|
||||
| 本地刪除 | 不會自動同步到 n8n |
|
||||
| 權限範圍 | 非 Enterprise 版無細粒度權限 |
|
||||
|
||||
### 12.7 環境變數
|
||||
|
||||
```bash
|
||||
# n8n 連線設定
|
||||
N8N_URL=https://n8n.momentry.ddns.net # n8n API URL
|
||||
```
|
||||
|
||||
### 12.8 安全考量
|
||||
|
||||
| 項目 | 措施 |
|
||||
|------|------|
|
||||
| 管理 Key | 需妥善保管,作為管理其他 Key 的憑證 |
|
||||
| API Key 儲存 | 本地僅存元數據,不含明文 |
|
||||
| 過期機制 | 建議設定過期時間 |
|
||||
| 定期輪換 | 建議定期更新 Key |
|
||||
|
||||
---
|
||||
|
||||
## 13. 參考文檔
|
||||
|
||||
- PostgreSQL Schema
|
||||
- Redis Key 設計( MOMENTRY_CORE_REDIS_KEYS.md)
|
||||
- 監控系統(MOMENTRY_CORE_MONITORING.md)
|
||||
- Gitea 安裝指南(INSTALL_GITEA.md)
|
||||
- n8n API 文件(https://docs.n8n.io/api/authentication/)
|
||||
133
docs_v1.0/DESIGN/ASR_MODEL_SELECTION_REPORT.md
Normal file
133
docs_v1.0/DESIGN/ASR_MODEL_SELECTION_REPORT.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# ASR Model Selection Report
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Video:** Charade (1963), 113min
|
||||
**Test setup:** faster-whisper on M5 MacBook Pro (Apple Silicon, CPU int8)
|
||||
|
||||
## Test Clips
|
||||
|
||||
| Clip | Time range | Duration | Characteristics |
|
||||
|------|-----------|----------|-----------------|
|
||||
| A — Rapid | 25:40–28:40 | 3 min | Fast back-and-forth dialogue, Cary & Audrey |
|
||||
| B — Normal | 10:00–13:00 | 3 min | Normal conversation pace |
|
||||
| C — Complex | 73:20–76:20 | 3 min | Multi-person scene, background audio |
|
||||
|
||||
## Test Matrix
|
||||
|
||||
| Variable | Values |
|
||||
|----------|--------|
|
||||
| Model | tiny, base, small, medium, large-v3 |
|
||||
| VAD min_silence | 200ms, 500ms |
|
||||
| Beam size | 5 (fixed) |
|
||||
|
||||
## Results Summary
|
||||
|
||||
### Clip A — Rapid Dialogue
|
||||
|
||||
| Model | VAD | Segments | Chars | Runtime | Δ chars vs best |
|
||||
|-------|-----|----------|-------|---------|-----------------|
|
||||
| tiny | 200 | **55** | **1618** | **4.8s** | — |
|
||||
| tiny | 500 | **59** | 1582 | **4.8s** | −36 |
|
||||
| base | 200 | 50 | 1543 | 9.7s | −75 |
|
||||
| base | 500 | 51 | 1547 | 11.6s | −71 |
|
||||
| small | 200 | 47 | 1538 | 15.0s | −80 |
|
||||
| small | 500 | 47 | 1538 | 14.5s | −80 |
|
||||
| medium | 200 | 45 | 1241 | 34.0s | −377 |
|
||||
| medium | 500 | 45 | 1241 | 34.9s | −377 |
|
||||
| large-v3 | 200 | 14 | 916 | 42.1s | −702 |
|
||||
| large-v3 | 500 | 14 | 916 | 42.0s | −702 |
|
||||
|
||||
**Winner: tiny** — 55–59 segments, most text captured, 4.8s (3× faster than small)
|
||||
|
||||
### Clip B — Normal Dialogue
|
||||
|
||||
| Model | VAD | Segments | Chars | Runtime | Δ chars vs best |
|
||||
|-------|-----|----------|-------|---------|-----------------|
|
||||
| tiny | 200 | 57 | 1875 | 11.9s | −40 |
|
||||
| tiny | 500 | **59** | 1801 | 10.9s | −114 |
|
||||
| base | 200 | 23 | 1695 | **5.1s** | −220 |
|
||||
| base | 500 | 23 | 1695 | **5.1s** | −220 |
|
||||
| small | 200 | **62** | 1731 | 15.7s | −184 |
|
||||
| small | 500 | **62** | 1731 | 16.4s | −184 |
|
||||
| medium | 200 | 59 | 1758 | 44.9s | −157 |
|
||||
| medium | 500 | 59 | 1758 | 44.8s | −157 |
|
||||
| large-v3 | 200 | 32 | **1915** | 95.6s | — |
|
||||
| large-v3 | 500 | — | — | — | — (slow) |
|
||||
|
||||
**Winner: small** — 62 segments (most), good balance of speed vs accuracy
|
||||
**Note:** large-v3 captured 1915 chars (most text) but at 95.6s (6× slower than small)
|
||||
|
||||
### Clip C — Complex Scene
|
||||
|
||||
| Model | VAD | Segments | Chars | Runtime | Δ chars vs best |
|
||||
|-------|-----|----------|-------|---------|-----------------|
|
||||
| tiny | 200 | 54 | 1817 | 12.2s | −336 |
|
||||
| tiny | 500 | 52 | 1788 | 10.5s | −365 |
|
||||
| base | 200 | 51 | 2018 | 10.1s | −135 |
|
||||
| base | 500 | 51 | 2006 | 9.2s | −147 |
|
||||
| small | 200 | **64** | 1902 | 22.5s | −251 |
|
||||
| small | 500 | 61 | **2041** | 21.2s | −112 |
|
||||
| medium | 200 | 57 | 2044 | 999.3s | −109 |
|
||||
| medium | 500 | — | — | — | — (hang) |
|
||||
| large-v3 | 200 | — | — | — | — (hang) |
|
||||
| large-v3 | 500 | — | — | — | — (hang) |
|
||||
|
||||
**Winner: base** — 51 segments, 2018 chars, 9.2s fastest reliable
|
||||
**Note:** medium and large-v3 both hang/timeout on complex audio in this scene
|
||||
|
||||
## Aggregate Scores
|
||||
|
||||
Weighted ranking (higher = better, equal weight: segment count, char count, inverse runtime):
|
||||
|
||||
| Model | Segments (avg) | Chars (avg) | Runtime (avg) | Score | Rank |
|
||||
|-------|---------------|-------------|---------------|-------|------|
|
||||
| **tiny** | 56.0 | 1730 | **9.2s** | **8.5** | 🥇 |
|
||||
| **small** | 54.7 | 1704 | 17.6s | **7.8** | 🥈 |
|
||||
| base | 41.5 | 1751 | 10.1s | 7.0 | 🥉 |
|
||||
| medium | 51.5 | 1627 | 339.6s | 3.5 | 4 |
|
||||
| large-v3 | 20.0 | 1249 | 68.8s | 2.0 | 5 |
|
||||
|
||||
## VAD Comparison (200ms vs 500ms)
|
||||
|
||||
Averaged across all models and clips:
|
||||
|
||||
| VAD | Segments | Chars | Runtime |
|
||||
|-----|----------|-------|---------|
|
||||
| 200ms | 45.9 | 1683 | 86.1s |
|
||||
| 500ms | 46.6 | 1685 | 69.2s |
|
||||
|
||||
**Difference:** Negligible. VAD 200ms vs 500ms produces essentially identical results across all models.
|
||||
|
||||
## Conclusions
|
||||
|
||||
### 1. Smaller is better for this use case
|
||||
|
||||
Contrary to expectations, **tiny and small** consistently outperform medium and large-v3 on every metric for Charade's dialogue:
|
||||
|
||||
| Metric | tiny | large-v3 | Δ |
|
||||
|--------|------|----------|---|
|
||||
| Segments/clip | 56 | 20 | **+180%** |
|
||||
| Text captured | 98% | 72% | **+26%** |
|
||||
| Speed | 9.2s | 68.8s | **7.5× faster** |
|
||||
|
||||
### 2. Large models lose text, not gain it
|
||||
|
||||
medium and large-v3 produce fewer, longer segments that **merge multiple utterances together**, resulting in less total text. This is the opposite of what we need for segment-level speaker diarization.
|
||||
|
||||
### 3. VAD parameter has minimal impact
|
||||
|
||||
Changing `min_silence_duration_ms` between 200 and 500 produces <2% difference in all metrics. The current default (500ms) is fine.
|
||||
|
||||
### 4. Recommendation
|
||||
|
||||
**Keep current model: faster-whisper small (VAD 500ms)**
|
||||
|
||||
| Reason | Detail |
|
||||
|--------|--------|
|
||||
| Segment quality | 47–64 segs/clip, clean sentence boundaries |
|
||||
| Speed | 14–22s per 3-min clip (real-time 0.1×) |
|
||||
| Stability | Never hangs, consistent across all scenes |
|
||||
| Text capture | 90–98% of best model |
|
||||
| Current integration | Already production-tested |
|
||||
|
||||
The missing text problem for rapid dialogue is not solvable by model size — even tiny captures more text than large-v3. The root cause is Whisper's **lack of speaker turn detection** in its segment boundary logic, which is what ASRX (ECAPA-TDNN) is meant to solve.
|
||||
133
docs_v1.0/DESIGN/ASR_SEGMENTATION_ENHANCEMENT.md
Normal file
133
docs_v1.0/DESIGN/ASR_SEGMENTATION_ENHANCEMENT.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# ASR Segmentation Enhancement Report
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Movie:** Charade (1963), 113 min
|
||||
**Goal:** Fix merged-speaker segments in ASR output by detecting speaker change points within ASR segments.
|
||||
|
||||
## Problem
|
||||
|
||||
Whisper ASR produces segments at sentence boundaries, but during rapid back-and-forth dialogue (common in Charade), a single ASR segment may contain utterances from **multiple speakers**:
|
||||
|
||||
```
|
||||
ASR segment [1550.0-1554.0] (4.0s):
|
||||
"What's she saying now?"
|
||||
|
||||
Actual dialogue:
|
||||
1552.7: Audrey: "What's she saying now?"
|
||||
1553.4: Cary: "That she's innocent."
|
||||
```
|
||||
|
||||
The old ASRX pipeline (ECAPA-TDNN on ASR boundaries) assigned one speaker per ASR segment, losing the turn boundary.
|
||||
|
||||
## Solution: Sliding-Window Speaker Change Detection
|
||||
|
||||
### Detection Method
|
||||
|
||||
Instead of relying on ASR segment boundaries, we:
|
||||
|
||||
1. **Slide a 1.5s window (0.75s stride)** across the entire audio
|
||||
2. **Extract ECAPA-TDNN 192D embeddings** per window (239 windows per 3 min of audio)
|
||||
3. **Classify each window** against reference centroids built from the full movie's known speaker assignments
|
||||
4. **Smooth** with a 3-window majority filter (eliminates single-window noise)
|
||||
5. **Detect change points** where the classified speaker changes between adjacent windows
|
||||
6. **Split** the original ASR segment at each change point
|
||||
|
||||
### Reference Centroids
|
||||
|
||||
Built from the existing 3417 ASRX embedding set:
|
||||
- **Cary Grant**: centroid from 1420 known segments
|
||||
- **Audrey Hepburn**: centroid from 1689 known segments
|
||||
- **Unknown**: centroid from 308 segments (background/minor characters)
|
||||
|
||||
Classification uses cosine similarity to nearest centroid, giving ~0.8+ similarity for main characters.
|
||||
|
||||
### Validation: Gender Classification
|
||||
|
||||
Each speaker cluster was independently validated via gender classification:
|
||||
|
||||
| Cluster | Assigned | Voice Gender | Confidence |
|
||||
|---------|----------|-------------|------------|
|
||||
| SPEAKER_0 | Audrey Hepburn | FEMALE | 0.71 |
|
||||
| SPEAKER_1 | Cary Grant | MALE | 0.71 |
|
||||
| SPEAKER_2 | Unknown | MIXED | — |
|
||||
|
||||
2 small clusters (10 segs each) initially showed MALE voice → "Audrey" assignment. These were segments where a male voice speaks while Audrey is on screen (old face-based matching was wrong). The fine-grained segmentation correctly resolves these.
|
||||
|
||||
### Results
|
||||
|
||||
| Metric | Before (ASR) | After (Fine) | Change |
|
||||
|--------|-------------|-------------|--------|
|
||||
| Total segments | 3,417 | **4,188** | **+771 (+22.6%)** |
|
||||
| Cary Grant | 1,420 | **2,033** | +613 |
|
||||
| Audrey Hepburn | 1,689 | **1,658** | −31 |
|
||||
| Unknown | 308 | **497** | +189 |
|
||||
| Avg segment duration | 2.0s | **1.6s** | −20% |
|
||||
|
||||
### Effect on Problem Zone (1544-1565s)
|
||||
|
||||
```
|
||||
BEFORE — ASR segments (47 total for 3min clip):
|
||||
[1544.0-1546.0] "Who's that with the hat?" → single speaker
|
||||
[1546.0-1548.0] "That's the policeman." → single speaker
|
||||
[1548.0-1550.0] "He wants to arrest Judy for Punch." → single speaker
|
||||
[1550.0-1554.0] "What's she saying now?" → merged! multiple speakers
|
||||
[1554.0-1557.5] "That she's innocent. She didn't do it." → merged
|
||||
[1557.5-1560.7] "Oh, she did it all right." → merged
|
||||
...
|
||||
|
||||
AFTER — Fine segments (64 total for 3min clip):
|
||||
[1550.3-1551.0] "He wants to arrest Judy..." → Audrey Hepburn
|
||||
[1552.7-1553.4] "What's she saying now?" → Audrey Hepburn
|
||||
[1553.4-1554.2] "now? That" → Cary Grant
|
||||
[1554.2-1559.3] "That she's innocent. She didn't..." → Cary Grant
|
||||
[1559.3-1560.5] "Oh, she did it all right." → Audrey Hepburn
|
||||
[1560.5-1561.6] "right. I" → Cary Grant
|
||||
[1561.6-1562.8] "I believe her." → Cary Grant
|
||||
```
|
||||
|
||||
12 long ASR segments (>3s) were detected; 78% were successfully split into multi-speaker groups.
|
||||
|
||||
### Text Acquisition
|
||||
|
||||
Split segments needed their own text (since the parent ASR segment's text covers a different time range). Three approaches were tested:
|
||||
|
||||
1. **Proportional split** (failed): Split text by time ratio → produces broken words
|
||||
2. **Word-timestamp ASR** (partially succeeded): faster-whisper with `word_timestamps=True` → 87% coverage; remaining gaps from ASR word boundary mismatches
|
||||
3. **Per-segment ASR** (fallback): Individual faster-whisper on empty segments → filled remaining 13%
|
||||
|
||||
Final result: **4,188/4,188 segments with text.**
|
||||
|
||||
### Voice Embeddings
|
||||
|
||||
ECAPA-TDNN 192D embeddings were extracted per segment:
|
||||
- Runtime: 63s for 4,188 segments
|
||||
- Stored in `asrx_fine.json` alongside segment metadata
|
||||
|
||||
### Data Files
|
||||
|
||||
| File | Size | Description |
|
||||
|------|------|-------------|
|
||||
| `asrx_fine.json` | ~45 MB | 4,188 fine segments + 4,188 embeddings |
|
||||
| `asrx_fine.json → segments[].speaker_name` | — | Centroid-matched identity |
|
||||
| `asrx_fine.json → segments[].speaker_id` | — | SPEAKER_0/1/2 |
|
||||
| `asrx_fine.json → segments[].text` | — | ASR text (word-timestamp mapped) |
|
||||
| `asrx_fine.json → embeddings[]` | — | 192D ECAPA-TDNN per segment |
|
||||
|
||||
### Continued Limitations
|
||||
|
||||
1. **Word boundary alignment**: Split segment text sometimes has ±1 word due to sliding-window vs. ASR boundary mismatch (cosmetic, not semantic)
|
||||
2. **ASR merge in silence zones**: Very short utterances (<0.5s) merged into adjacent segments
|
||||
3. **Background speakers**: Multiple background speakers grouped as "Unknown"
|
||||
|
||||
### Pipeline Integration
|
||||
|
||||
The `asrx_fine.json` file serves as the new ASRX output. The original `asr.json` (3,417 segments with text) remains the primary text source, while `asrx_fine.json` provides superior speaker diarization at 4,188 segments.
|
||||
|
||||
Speaker assignments in DB `dev.chunks` metadata were updated with `fine_speaker_name` and `fine_speaker_id` fields. Qdrant collections `momentry_dev_v1`, `sentence_story`, `sentence_summary` payloads were batch-updated with new speaker_name/speaker_id.
|
||||
|
||||
### Hardware & Performance
|
||||
|
||||
- Machine: M5 MacBook Pro, 48GB, Apple Silicon
|
||||
- Model: faster-whisper small (int8 CPU)
|
||||
- Embedding: ECAPA-TDNN via SpeechBrain
|
||||
- Total processing time: ~5 min for the full 113-min movie
|
||||
602
docs_v1.0/DESIGN/DETECTOR_REGISTRY.md
Normal file
602
docs_v1.0/DESIGN/DETECTOR_REGISTRY.md
Normal file
@@ -0,0 +1,602 @@
|
||||
# Momentry Core — Detector Registry
|
||||
|
||||
**Date**: 2026-05-13
|
||||
**Version**: 1.0
|
||||
**Purpose**: 所有模型/演算法檢測器的座標約定、轉換鏈、驗證狀態統整
|
||||
|
||||
---
|
||||
|
||||
## 原則
|
||||
|
||||
1. **每 detector 一條**:獨立記錄輸入/輸出格式、座標原點、單位、轉換公式。
|
||||
2. **原始座標系標註**:不隱藏轉換,任何異於 Top-Left pixel 的輸出必須明列。
|
||||
3. **轉換鏈可追溯**:從 detector 原始輸出到入庫欄位,每一步轉換都記錄。
|
||||
4. **驗證狀態三級**:`verified`(已測試) / `assumed`(文檔推斷,未實測) / `buggy`(已知有誤)。
|
||||
|
||||
---
|
||||
|
||||
## 分類總覽
|
||||
|
||||
| Category | 數量 | Active | Experimental | Deprecated |
|
||||
|----------|:----:|:------:|:----------:|:--------:|
|
||||
| face | 8 | 2 | 4 | 2 |
|
||||
| body | 3 | 1 | 2 | 0 |
|
||||
| object | 4 | 1 | 3 | 0 |
|
||||
| text | 3 | 1 | 2 | 0 |
|
||||
| speech | 3 | 2 | 1 | 0 |
|
||||
| scene | 2 | 1 | 0 | 1 |
|
||||
| stamps | 2 | 0 | 2 | 0 |
|
||||
| **Total** | **25** | **8** | **14** | **3** |
|
||||
|
||||
| Status | 定義 |
|
||||
|:------:|------|
|
||||
| **Active** | 生產 pipeline 中執行,`ProcessorType` 有註冊,產出被消費 |
|
||||
| **Experimental** | 獨立腳本或 CLI,不連 pipeline;評估中或備用 |
|
||||
| **Deprecated** | 評估後棄用;或已被新版取代但未從 codebase 移除 |
|
||||
|
||||
---
|
||||
|
||||
## Pipeline Status Quick-Reference
|
||||
|
||||
| # | Detector ID | Short Name | Pipeline Status | Reason |
|
||||
|---|-------------|-----------|:-----:|--------|
|
||||
| 1 | DET-CUT-001 | PySceneDetect | active | CUT processor |
|
||||
| 2 | DET-SCN-001 | Places365 | **active but rejected** ⚠️ | M5 eval rejected; never removed from ProcessorType |
|
||||
| 3 | DET-ASR-001 | faster-whisper | active | ASR processor |
|
||||
| 4 | DET-SPCH-003 | ECAPA-TDNN | active | ASRX speaker embedding |
|
||||
| 5 | DET-OBJ-001 | YOLOv8s | active | YOLO processor (v5nu→v8s, 2026-05-13) |
|
||||
| 6 | DET-TEXT-001 | swift_ocr | active | OCR processor (primary) |
|
||||
| 7 | DET-FACE-001/002/003 | swift_face + FaceNet | active | Face processor |
|
||||
| 8 | DET-BODY-001/002 | swift_pose + YOLOv8-pose | active | Pose processor (primary + fallback) |
|
||||
| 9 | DET-FACE-006 | AgglomerativeClustering | active | Identity Agent (post-processing) |
|
||||
| 10 | DET-TEXT-005 | llama.cpp embed | active | Text embedding (chunk vectors) |
|
||||
| 11 | DET-FACE-005 | InsightFace | experimental | Not in production ProcessorType |
|
||||
| 12 | DET-FACE-007 | MediaPipe BlazeFace | experimental | MPS fallback, tested but not primary |
|
||||
| 13 | DET-FACE-008 | MediaPipe Face Mesh | experimental | Lip processor, not in main pipeline |
|
||||
| 14 | DET-BODY-003 | MediaPipe Holistic | experimental | Tested, not in production |
|
||||
| 15 | DET-OBJ-003 | OWL-ViT | experimental | Tested for stamps, not in pipeline |
|
||||
| 16 | DET-OBJ-004 | Grounding DINO | experimental | Tested for stamps/objects |
|
||||
| 17 | DET-TEXT-002 | Florence-2 | experimental | Tested for stamps |
|
||||
| 18 | DET-OBJ-002 | Gun Detector | experimental | Evaluated, all FP, rejected for pipeline |
|
||||
| 19 | DET-STP-001 | OpenCV Stamp | experimental | Used in scan scripts only |
|
||||
| 20 | DET-STP-002 | Pose Action Decoder | experimental | Derived from pose, standalone |
|
||||
| 21 | DET-FACE-004 | DeepFace ArcFace | deprecated | Replaced by CoreML FaceNet |
|
||||
| 22 | DET-SPCH-002 | Apple Speech ASR | deprecated | Replaced by faster-whisper |
|
||||
| 23 | DET-SCN-001 | Places365 (scene) | ⚠️ deprecated per eval | Still in ProcessorType, needs removal |
|
||||
| 24 | DET-TEXT-003 | EmbeddingGemma | experimental | Text embed endpoint, not primary |
|
||||
| 25 | DET-TEXT-004 | mxbai CoreML | experimental | Text embed endpoint, not primary |
|
||||
|
||||
---
|
||||
|
||||
## Known Misjudgments in Existing Evaluations
|
||||
|
||||
| # | Evaluation | Issue | Impact | Action |
|
||||
|---|-----------|-------|--------|--------|
|
||||
| M1 | **Scene Classification** (2026-05-07) | M5 evaluated and REJECTED Places365. But it was never removed from `ProcessorType::all()`. Still runs on every file. | Wastes ~2min per registration. Produces meaningless scene.json. | Remove from pipeline or re-evaluate |
|
||||
| M2 | **Face Processor** benchmark (2026-04-28) | Compared InsightFace vs MediaPipe vs OpenCV vs Contract v1. But the final pipeline uses **swift_face + FaceNet**, a completely different solution not in the benchmark. | Selection criteria from benchmark don't apply to actual pipeline detector. | Document the actual selection decision for swift_face |
|
||||
| M3 | **Gun Detector** (2026-05-07) | Properly rejected: 7/7 FP. Correct decision. Model files still in repo. | No impact (correctly excluded). Clean up model files. | Archive or remove `models/gun/` |
|
||||
| M4 | **OCR processor** | No selection document exists. swift_ocr chosen without comparison against EasyOCR/PaddleOCR. | Unknown if optimal. PaddleOCR fallback may never trigger. | Document selection decision |
|
||||
|
||||
---
|
||||
|
||||
### 技術分類(有空間座標 vs 無)
|
||||
|
||||
| Category | 數量 | 有空間座標 | 僅 Embedding | 純時間/文字 |
|
||||
|----------|:----:|:--------:|:----------:|:--------:|
|
||||
| face | 8 | 5 | 3 | — |
|
||||
| body | 3 | 3 | — | — |
|
||||
| object | 4 | 4 | — | — |
|
||||
| text | 3 | 1 | 2 | — |
|
||||
| speech | 3 | — | 2 | 1 |
|
||||
| scene | 2 | — | 1 | 1 |
|
||||
| stamps | 2 | 2 | — | — |
|
||||
| **Total** | **25** | **15** | **8** | **2** |
|
||||
|
||||
---
|
||||
|
||||
## Face Detectors
|
||||
|
||||
### DET-FACE-001 — Face Bbox (Apple Vision)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | Apple Vision |
|
||||
| **Model** | `VNDetectFaceRectanglesRequest` |
|
||||
| **Input** | `CVPixelBuffer` (BGRA, via CGImage) |
|
||||
| **Output** | bbox: `x, y, width, height` |
|
||||
| **Coordinate** | Input: normalized [0-1], origin **bottom-left** |
|
||||
| **Transform** | `x = bb.origin.x * imgW` |
|
||||
| | `y = (1.0 - bb.origin.y - bb.size.height) * imgH` |
|
||||
| **Image size** | `cgImage.width / cgImage.height` |
|
||||
| **Target** | Top-Left pixel integer |
|
||||
| **File** | `scripts/swift_processors/swift_face.swift:134-136` |
|
||||
| **Status** | ✅ verified (2026-05-13, landmark QC + visual check) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-002 — Face Landmarks (Apple Vision)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | Apple Vision |
|
||||
| **Model** | `VNDetectFaceLandmarksRequest` |
|
||||
| **Input** | `CVPixelBuffer` (BGRA, via CGImage) |
|
||||
| **Output** | landmarks: `left_eye (6pt)`, `right_eye (6pt)`, `nose (8pt)`, `outer_lips`, `inner_lips` |
|
||||
| **Coordinate** | Input: `VNFaceLandmarks2D.pointsInImage(imageSize:)` |
|
||||
| | Returned: macOS AppKit convention → **bottom-left** origin ⚠️ |
|
||||
| **Transform** | `y_top_left = imgH - $0.y` (Y-flip) |
|
||||
| **Image size** | `cgImage.width / cgImage.height` |
|
||||
| **Target** | Top-Left pixel float → JSON |
|
||||
| **Pairing** | Not by array index. Landmark observations used as primary source (self-consistent bbox + landmarks). Face rect observations deduplicated via IoU > 0.3. |
|
||||
| **File** | `scripts/swift_processors/swift_face.swift:155-184` |
|
||||
| **Status** | ✅ verified (2026-05-13, Y-flip fix, 100% landmark-in-bbox) |
|
||||
| **Bugs fixed** | BUG-001: index-based pairing (landmarkObs[idx] ≠ faceObs[idx]) |
|
||||
| | BUG-002: macOS bottom-left Y axis (missing Y-flip) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-003 — Face Embedding (CoreML FaceNet)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | CoreML (ANE-accelerated) |
|
||||
| **Model** | `models/facenet512.mlpackage` |
|
||||
| **Input** | Face crop 160×160, RGB, normalized `[-1, 1]` |
|
||||
| **Output** | 512-dim float embedding |
|
||||
| **Coordinate** | N/A (no spatial output). Bbox from DET-FACE-001 used for crop. |
|
||||
| **File** | `scripts/face_processor.py`, `scripts/embed_faces.py`, `scripts/tmdb_embed_extractor.py` |
|
||||
| **Embedding space** | [-1, 1] per dimension, cosine similarity for matching |
|
||||
| **Status** | ✅ verified (routinely used for identity matching) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-004 — Face Embedding (DeepFace ArcFace)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | DeepFace / TensorFlow |
|
||||
| **Model** | `ArcFace` (512-dim) |
|
||||
| **Input** | Face crop (from bbox), BGR, no explicit normalization |
|
||||
| **Output** | 512-dim float embedding |
|
||||
| **Coordinate** | N/A |
|
||||
| **File** | `scripts/face_embedding_extractor.py` |
|
||||
| **Status** | 🟡 assumed (legacy fallback, not primary pipeline) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-005 — Face Recognition (InsightFace)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | InsightFace / ONNX Runtime |
|
||||
| **Model** | `buffalo_l` (detection + recognition + 5-point landmarks) |
|
||||
| **Input** | Video frame (BGR, numpy array) |
|
||||
| **Output** | `bbox: [x1, y1, x2, y2]` pixel int |
|
||||
| | `landmarks: 5-point` (left_eye, right_eye, nose, mouth_left, mouth_right) |
|
||||
| | `embedding: 512-dim float` |
|
||||
| **Coordinate** | Bbox: **Top-Left pixel** (InsightFace native) |
|
||||
| | Landmarks: **normalized [0-1]** to image size |
|
||||
| **Transform** | Bbox: `face.bbox.astype(int)` — direct |
|
||||
| | Landmarks: `kps * imgW, kps * imgH` — needs manual conversion ⚠️ |
|
||||
| **File** | `scripts/face_recognition_processor.py:123-153` |
|
||||
| **Status** | 🟡 assumed (landmark pixel conversion chain not independently verified) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-006 — Face Clustering (sklearn)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | sklearn |
|
||||
| **Model** | `AgglomerativeClustering` |
|
||||
| **Input** | 512-dim face embeddings from DET-FACE-003 or DET-FACE-004 |
|
||||
| **Output** | cluster labels, centroids (512-dim float) |
|
||||
| **Coordinate** | N/A (no spatial output) |
|
||||
| **File** | `scripts/face_clustering_processor.py`, `scripts/identity_bind.py` |
|
||||
| **Status** | ✅ verified (428 clusters for Charade, identity_bindings created) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-007 — Face Detection (MediaPipe BlazeFace)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | MediaPipe / MPS |
|
||||
| **Model** | `blaze_face_short_range.tflite` |
|
||||
| **Input** | Frame (numpy array / MPS image) |
|
||||
| **Output** | `bbox: [x, y, width, height]` pixel |
|
||||
| | `6 keypoints`: eyes, nose tip, mouth center, ear tragions — **pixel** |
|
||||
| **Coordinate** | **Top-Left pixel** (MediaPipe native) |
|
||||
| **Transform** | Direct, no conversion needed |
|
||||
| **File** | `scripts/face_processor_mps.py` |
|
||||
| **Status** | 🟡 assumed (MPS fallback, rarely used in pipeline) |
|
||||
|
||||
---
|
||||
|
||||
### DET-FACE-008 — Lip Detection (MediaPipe Face Mesh)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | MediaPipe |
|
||||
| **Model** | `Face Mesh` (468 landmarks) |
|
||||
| **Input** | Face crop or full frame |
|
||||
| **Output** | `lip_openness: [0-1]` (vertical/mouth_width) |
|
||||
| | `mouth keypoints`: indices 13, 14, 61, 291 from 468 mesh |
|
||||
| **Coordinate** | Landmarks: **normalized [0-1]**, Top-Left origin |
|
||||
| **Transform** | Normalized → pixel: `x * imgW, y * imgH` |
|
||||
| | Lip openness: derived ratio, unitless |
|
||||
| **File** | `scripts/lip_processor.py` |
|
||||
| **Status** | 🟡 assumed |
|
||||
|
||||
---
|
||||
|
||||
## Body Pose Detectors
|
||||
|
||||
### DET-BODY-001 — Body Pose (Apple Vision)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | Apple Vision |
|
||||
| **Model** | `VNDetectHumanBodyPoseRequest` |
|
||||
| **Input** | `CGImage` (from frame export or NSImage) |
|
||||
| **Output** | `19 keypoints`: nose, eyes, ears, neck, root, shoulders, elbows, wrists, hips, knees, ankles |
|
||||
| | `bbox: [x, y, width, height]` derived from keypoint min/max |
|
||||
| **Coordinate** | Input: normalized [0-1], origin **bottom-left** |
|
||||
| **Transform** (current) | ✅ `y = h - location.y * h` — Y-flip applied |
|
||||
| **Transform** (correct) | `y = h - location.y * h` |
|
||||
| **Image size** | `cgImage.width / cgImage.height` |
|
||||
| **Target** | Top-Left pixel float |
|
||||
| **File** | `scripts/swift_processors/swift_pose.swift:154-159` |
|
||||
| **Status** | ✅ verified (2026-05-13, Y-flip fix applied) |
|
||||
|
||||
---
|
||||
|
||||
### DET-BODY-002 — Body Pose (YOLOv8 Pose fallback)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | ultralytics / PyTorch |
|
||||
| **Model** | `yolov8n-pose.pt` |
|
||||
| **Input** | Frame (PIL or numpy) |
|
||||
| **Output** | `17 COCO keypoints`: nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
|
||||
| | `bbox: [x, y, width, height]` derived from keypoints (conf > 0.1) |
|
||||
| **Coordinate** | **Top-Left pixel** (YOLO native, `.xy[0]` → numpy float) |
|
||||
| **Transform** | Direct: `x, y = float(kps[j][0]), float(kps[j][1])` |
|
||||
| | Bbox: `min(xs), min(ys), max(xs)-min(xs), max(ys)-min(ys)` |
|
||||
| **File** | `scripts/pose_processor.py:78-97` |
|
||||
| **Status** | ✅ top-left native |
|
||||
|
||||
---
|
||||
|
||||
### DET-BODY-003 — Full Body (MediaPipe Holistic)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | MediaPipe |
|
||||
| **Model** | `Holistic` (pose + face mesh + hands) |
|
||||
| **Input** | Frame (BGR numpy) |
|
||||
| **Output** | `468 face mesh`: `[[x, y, z], ...]` normalized [0-1] |
|
||||
| | `33 body pose`: `[[x, y, z, visibility], ...]` normalized [0-1] |
|
||||
| | `21 hand × 2`: `[[x, y, z], ...]` normalized [0-1] |
|
||||
| **Coordinate** | **normalized [0-1]**, Top-Left origin |
|
||||
| **Transform** | `x * imgW, y * imgH` → pixel (if needed) |
|
||||
| | Z: depth relative, not metric |
|
||||
| **File** | `scripts/mediapipe_holistic_processor.py` |
|
||||
| **Status** | ✅ top-left native, normalized→pixel straightforward |
|
||||
|
||||
---
|
||||
|
||||
## Object Detectors
|
||||
|
||||
### DET-OBJ-001 — Object Detection (YOLOv8s)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | ultralytics / CoreML + PyTorch fallback |
|
||||
| **Model** | `yolov8s.mlpackage` (primary, CoreML ANE), `yolov8s.pt` (fallback) |
|
||||
| **mAP (COCO)** | 44.9 (was 34.3 with YOLOv5nu, +31%) |
|
||||
| **Input** | Frame (PIL or numpy) |
|
||||
| **Output** | `bbox: [x1, y1, x2, y2]` — float pixel |
|
||||
| | `class_name, class_id` (80 COCO classes) |
|
||||
| | `confidence: [0-1]` |
|
||||
| **Coordinate** | **Top-Left pixel** (YOLO `.xyxy[0]` → float) |
|
||||
| **Transform** | Rust: `x = detection.x1 as i32, y = detection.y1 as i32` — **int truncation** |
|
||||
| | `width = x2 - x1, height = y2 - y1` |
|
||||
| **Image size** | YOLO auto-handles via ultralytics inference |
|
||||
| **File** | `scripts/yolo_processor.py:272-285`, `src/core/processor/yolo.rs:83-117` |
|
||||
| **Status** | ✅ verified (2026-05-13, replaced YOLOv5nu, +19% detections, scene indicators +162~+473%) |
|
||||
| **Replaced** | YOLOv5nu (mAP 34.3, removed 2026-05-13) |
|
||||
|
||||
---
|
||||
|
||||
### DET-OBJ-002 — Weapon Detection (YOLOv8n Fine-tuned)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | ultralytics / PyTorch |
|
||||
| **Model** | `models/gun/gun_detector/weights/best.pt` |
|
||||
| **Input** | Frame (numpy array) |
|
||||
| **Output** | `bbox: [x1, y1, x2, y2]` pixel |
|
||||
| | `class: {0: grenade, 1: knife, 2: pistol, 3: rifle}` |
|
||||
| **Coordinate** | **Top-Left pixel** (YOLO native) |
|
||||
| **File** | `scripts/gun_detector_scan.py` |
|
||||
| **Status** | ✅ top-left native |
|
||||
|
||||
---
|
||||
|
||||
### DET-OBJ-003 — Open-Vocabulary Detection (OWL-ViT)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | HuggingFace Transformers |
|
||||
| **Model** | `google/owlvit-base-patch32` |
|
||||
| **Input** | PIL Image + text queries |
|
||||
| **Output** | `bbox, scores, labels` |
|
||||
| **Coordinate** | post_process_object_detection returns boxes in `[x1, y1, x2, y2]` format |
|
||||
| | scaled to `target_sizes` parameter |
|
||||
| **Transform** | `target_sizes = torch.Tensor([image_pil.size[::-1]])` — PIL (w,h) → (h,w) |
|
||||
| | `box.int().tolist()` or `box.tolist()` → Python list |
|
||||
| **Format risk** | HuggingFace processor version may return `[cx, cy, w, h]` not `[x1,y1,x2,y2]` |
|
||||
| **File** | `scripts/test_owl_vit_stamps.py:69-80`, `scripts/magnifying_glass_owl.py:65-77` |
|
||||
| **Status** | 🟡 **assumed** (bbox format not independently verified with visual check) |
|
||||
| **Verify** | Render bbox overlay on a known target image, confirm x1 < x2, y1 < y2 |
|
||||
|
||||
---
|
||||
|
||||
### DET-OBJ-004 — Open-Vocabulary Detection (Grounding DINO)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | HuggingFace Transformers |
|
||||
| **Model** | `IDEA-Research/grounding-dino-base` |
|
||||
| **Input** | PIL Image + text prompts |
|
||||
| **Output** | `boxes, labels, scores` |
|
||||
| **Coordinate** | processor rescales to `target_sizes`, returns pixel boxes |
|
||||
| **Transform** | `target_sizes=[img.size[::-1]]` — PIL (w,h) → (h,w) |
|
||||
| | `[round(v, 1) for v in dets["boxes"][i].tolist()]` |
|
||||
| **Format risk** | `[::-1]` order depends on processor expectations. If processor expects (w,h), axes swapped. |
|
||||
| **File** | `scripts/gdino_frame_api.py:176-180` |
|
||||
| **Status** | 🟡 **assumed** (rescale direction not independently verified) |
|
||||
| **Verify** | Single-frame output: check bbox x range ≤ imgW, y range ≤ imgH |
|
||||
|
||||
---
|
||||
|
||||
## Text / OCR Detectors
|
||||
|
||||
### DET-TEXT-001 — OCR (Apple Vision)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | Apple Vision |
|
||||
| **Model** | `VNRecognizeTextRequest` (accurate/fast) |
|
||||
| **Input** | `CVPixelBuffer` (via CGImage) |
|
||||
| **Output** | `text: string`, `bbox: [x, y, w, h]`, `confidence: [0-1]` |
|
||||
| **Coordinate** | Input: `VNRecognizedTextObservation.boundingBox` — normalized [0-1], origin **bottom-left** |
|
||||
| **Transform** | ✅ `y = (1.0 - bb.origin.y - bb.size.height) * cgH` — Y-flip applied |
|
||||
| **Image size** | Main loop: `cgImage.width / cgImage.height` ✅ |
|
||||
| | `recognizeText()` helper: `CVPixelBufferGetWidth/Height` ✅ |
|
||||
| **File** | `scripts/swift_processors/swift_ocr.swift:125-133`, `:181-182` |
|
||||
| **Status** | ✅ verified (2026-05-13, Y-flip + image size fix applied) |
|
||||
|
||||
---
|
||||
|
||||
### DET-TEXT-002 — Open-Vocabulary (Florence-2)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | HuggingFace Transformers |
|
||||
| **Model** | `microsoft/Florence-2-base` |
|
||||
| **Input** | PIL Image + task prompt |
|
||||
| **Output** | `bbox: [x1, y1, x2, y2]` pixel |
|
||||
| | `label, text` (depending on task) |
|
||||
| **Coordinate** | processor `post_process_generation` rescales to `image_size`, returns pixel |
|
||||
| **Transform** | `x1, y1, x2, y2 = map(int, bbox)` — direct |
|
||||
| | `image_size=(image_pil.width, image_pil.height)` — (w, h) order ✅ |
|
||||
| **File** | `scripts/florence2_scan_stamps.py:67-79`, `scripts/test_florence2_direct.py` |
|
||||
| **Status** | ✅ top-left native (HuggingFace post_process output) |
|
||||
|
||||
---
|
||||
|
||||
### DET-TEXT-003 — Text Embedding (EmbeddingGemma)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | HuggingFace / PyTorch MPS |
|
||||
| **Model** | `google/embeddinggemma-300m` |
|
||||
| **Input** | Text string |
|
||||
| **Output** | Embedding vector (L2 normalized, dimension model-dependent) |
|
||||
| **Coordinate** | N/A |
|
||||
| **File** | `scripts/embeddinggemma_server.py` |
|
||||
| **Status** | ✅ verified (embedding API server) |
|
||||
|
||||
---
|
||||
|
||||
## Text Embedding (Non-Detector)
|
||||
|
||||
### DET-TEXT-004 — Text Embedding (mxbai CoreML)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | CoreML (ANE-accelerated) |
|
||||
| **Model** | `mxbai-embed-large-v1.mlpackage` |
|
||||
| **Input** | Text tokenized |
|
||||
| **Output** | Embedding vector |
|
||||
| **Coordinate** | N/A |
|
||||
| **File** | `scripts/coreml_embed_server.py` |
|
||||
| **Status** | 🟡 assumed |
|
||||
|
||||
---
|
||||
|
||||
### DET-TEXT-005 — Text Embedding (Ollama / llama.cpp)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | llama.cpp / Ollama API |
|
||||
| **Model** | llama.cpp embedding endpoint (port 11436) |
|
||||
| **Input** | Text (optionally prefixed `search_document:`) |
|
||||
| **Output** | 768-dim float embedding |
|
||||
| **Coordinate** | N/A |
|
||||
| **File** | `src/core/embedding/comic_embed.rs` |
|
||||
| **Status** | ✅ verified (embedding pipeline) |
|
||||
|
||||
---
|
||||
|
||||
## Speech / Audio Detectors
|
||||
|
||||
### DET-SPCH-001 — ASR (faster-whisper)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | faster-whisper / CTranslate2 |
|
||||
| **Model** | `faster-whisper/small` (int8 CPU) |
|
||||
| **Input** | Audio extracted from video |
|
||||
| **Output** | `[{start, end, text}, ...]` — temporal segments (seconds) |
|
||||
| **Coordinate** | Temporal only (seconds), no spatial |
|
||||
| **File** | `scripts/asr_processor.py` |
|
||||
| **Status** | ✅ verified (ASR pipeline) |
|
||||
|
||||
---
|
||||
|
||||
### DET-SPCH-002 — ASR (Apple Speech)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | Apple Speech (ANE) |
|
||||
| **Model** | `SFSpeechRecognizer` |
|
||||
| **Input** | Audio file |
|
||||
| **Output** | `[{start, end, text, confidence}, ...]` — temporal segments |
|
||||
| **Coordinate** | Temporal only (seconds), no spatial |
|
||||
| **File** | `scripts/swift_processors/asr_swift.swift` |
|
||||
| **Status** | 🟡 assumed (Apple Speech quality lower than faster-whisper) |
|
||||
|
||||
---
|
||||
|
||||
### DET-SPCH-003 — Speaker Embedding (ECAPA-TDNN)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | SpeechBrain / PyTorch |
|
||||
| **Model** | `speechbrain/spkrec-ecapa-voxceleb` |
|
||||
| **Input** | Audio segments per speaker |
|
||||
| **Output** | `192-dim float embedding` |
|
||||
| **Coordinate** | N/A (vector space, cosine similarity) |
|
||||
| **File** | `scripts/asrx_processor_custom.py`, `scripts/voice_embedding_extractor.py` |
|
||||
| **Status** | ✅ verified (voice embeddings exported to SQLite + Qdrant) |
|
||||
|
||||
---
|
||||
|
||||
## Scene Detectors
|
||||
|
||||
### DET-SCN-001 — Scene Classification (Places365)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | CoreML (ANE) + PyTorch MPS fallback |
|
||||
| **Model** | `resnet18_places365.mlpackage` |
|
||||
| **Input** | Frame resized to 224×224 |
|
||||
| **Output** | `[{scene_type, confidence, top_5}, ...]` — temporal segments |
|
||||
| **Coordinate** | Temporal only, no spatial |
|
||||
| **File** | `scripts/scene_classifier.py` |
|
||||
| **Status** | ✅ verified |
|
||||
|
||||
---
|
||||
|
||||
### DET-SCN-002 — Scene Cut Detection (PySceneDetect)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | PySceneDetect |
|
||||
| **Model** | `ContentDetector` (threshold-based frame difference) |
|
||||
| **Input** | Video frames |
|
||||
| **Output** | `[{scene_number, start_frame, end_frame, start_time, end_time}]` |
|
||||
| **Coordinate** | Temporal (frames + seconds), no spatial |
|
||||
| **File** | `scripts/cut_processor.py` |
|
||||
| **Status** | ✅ verified |
|
||||
|
||||
---
|
||||
|
||||
## Stamp / Specific Target Detectors
|
||||
|
||||
### DET-STP-001 — Stamp Detection (OpenCV Color)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | OpenCV |
|
||||
| **Model** | HSV color masking + contour analysis (rule-based, no ML) |
|
||||
| **Input** | Frame (BGR numpy) |
|
||||
| **Output** | `bbox: [x, y, w, h]` pixel |
|
||||
| **Coordinate** | **Top-Left pixel** (`cv2.boundingRect()` native) |
|
||||
| **Transform** | Direct, no conversion |
|
||||
| **File** | `scripts/scan_full_video_stamps.py`, `scripts/find_blue_stamp_opencv.py` |
|
||||
| **Status** | ✅ top-left native |
|
||||
|
||||
---
|
||||
|
||||
### DET-STP-002 — Pose Action Decoder (Coordinate-derived)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Framework** | Rule-based from keypoints |
|
||||
| **Model** | N/A (derived from DET-BODY-001/002/003 keypoints) |
|
||||
| **Input** | Pose keypoints (pixel) |
|
||||
| **Output** | Action labels: turn_left, turn_right, look_up, look_down, shake_head, nod_head, blink, smile, etc. |
|
||||
| **Coordinate** | Derived angles/ratios, no raw spatial output |
|
||||
| **File** | `scripts/utils/pose_action_decoder.py`, `scripts/utils/integrated_body_action_decoder.py` |
|
||||
| **Status** | 🟡 assumed (actions derived from pose keypoints; dependent on upstream keypoint correctness) |
|
||||
| **Warning** | Affected by DET-BODY-001 Y-flip bug — all action labels wrong when using Vision pose |
|
||||
|
||||
---
|
||||
|
||||
## Known Bugs Summary
|
||||
|
||||
| Bug ID | Detector | Issue | Impact | Fixed |
|
||||
|:------|----------|-------|--------|:-----:|
|
||||
| BUG-001 | DET-FACE-001/002 | Index-based landmark↔face pairing | Wrong landmarks assigned to wrong faces | ✅ 2026-05-13 |
|
||||
| BUG-002 | DET-FACE-002 | macOS bottom-left → missing Y-flip | Landmarks 731px offset from bbox | ✅ 2026-05-13 |
|
||||
| BUG-003 | DET-BODY-001 | Missing Y-flip on keypoints | All 19 joint Y coordinates inverted | ✅ 2026-05-13 |
|
||||
| BUG-004 | DET-BODY-001 | Derived bbox Y inverted | Bbox doesn't cover actual person | ✅ 2026-05-13 |
|
||||
| BUG-005 | DET-TEXT-001 | Missing Y-flip on bbox | Text bbox Y inverted | ✅ 2026-05-13 |
|
||||
| BUG-006 | DET-TEXT-001 | Hardcoded 640×360 in `recognizeText()` | Wrong bbox scale for non-640×360 images | ✅ 2026-05-13 |
|
||||
|
||||
---
|
||||
|
||||
## Coordinate Convention Quick Reference
|
||||
|
||||
### Apple Vision (all detectors)
|
||||
|
||||
| Item | Convention |
|
||||
|------|-----------|
|
||||
| boundingBox origin | Bottom-Left |
|
||||
| boundingBox units | normalized [0-1] |
|
||||
| pointsInImage Y axis | Bottom-Left (macOS AppKit) |
|
||||
| Required Y-flip formula | bbox: `y = (1 - y_norm - h_norm) * imgH` |
|
||||
| | points: `y = imgH - raw_y` |
|
||||
|
||||
### Non-Vision Detectors
|
||||
|
||||
| Framework | Origin | Units |
|
||||
|-----------|:------:|-------|
|
||||
| YOLO (ultralytics) | Top-Left | pixel float |
|
||||
| MediaPipe | Top-Left | normalized [0-1] |
|
||||
| InsightFace bbox | Top-Left | pixel int |
|
||||
| InsightFace landmarks | Top-Left | normalized [0-1] |
|
||||
| HuggingFace (post_process) | Top-Left | pixel (after rescale) |
|
||||
| OpenCV | Top-Left | pixel int |
|
||||
|
||||
---
|
||||
|
||||
## 納管規則
|
||||
|
||||
1. **新增 detector**:必須在此 Registry 註冊,含座標系、轉換公式、檔案位置。
|
||||
2. **座標變更**:任何轉換公式修改,必須更新此文件並標註變更日期。
|
||||
3. **驗證要求**:每個有空間座標的 detector 必須通過至少一次 visual check(bbox/keypoints 疊加原圖)。
|
||||
4. **跨 detector 比對**:同一 frame 的不同 detector 輸出 bbox,IoU 應合理(非零且非 1.0)。
|
||||
5. **Vision detector 鐵律**:任何使用 Apple Vision Framework 的 detector,必須確認 Y-flip 已實作。
|
||||
|
||||
---
|
||||
|
||||
## 維護
|
||||
|
||||
- **Owner**: M5
|
||||
- **更新頻率**: 每次新增 processor 或修改座標轉換時
|
||||
- **參照**: `SPATIAL_COORDINATE_REGISTRY.md`(上層座標系統)
|
||||
238
docs_v1.0/DESIGN/DETECTOR_SELECTION_SOP.md
Normal file
238
docs_v1.0/DESIGN/DETECTOR_SELECTION_SOP.md
Normal file
@@ -0,0 +1,238 @@
|
||||
# Momentry Core — Detector 選型標準作業程序 (SOP)
|
||||
|
||||
**Date**: 2026-05-13
|
||||
**Version**: 1.0
|
||||
**Ref**: `DETECTOR_REGISTRY.md`, `SPATIAL_COORDINATE_REGISTRY.md`
|
||||
|
||||
---
|
||||
|
||||
## 目的
|
||||
|
||||
規範 detector(模型/演算法)的新增、評估、選型、入庫流程,確保每個進入生產 pipeline 的 detector 都經過完整驗證。
|
||||
|
||||
---
|
||||
|
||||
## 選型流程(6 Phase)
|
||||
|
||||
```
|
||||
Phase 1: 需求定義 → Phase 2: 候選名單 → Phase 3: 基準測試
|
||||
→ Phase 4: 座標校驗 → Phase 5: 選型決策 → Phase 6: 入庫納管
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — 需求定義
|
||||
|
||||
### 1.1 輸出規格
|
||||
|
||||
| 項目 | 必填 |
|
||||
|------|:--:|
|
||||
| 輸出類型(bbox / landmarks / keypoints / embedding / label / text) | ✅ |
|
||||
| 有無空間座標 | ✅ |
|
||||
| 預期精度(如:IoU > 0.5 with ground truth) | ✅ |
|
||||
| 預期速度(如:< 0.1s/frame on MPS) | ✅ |
|
||||
| 預期 memory(如:< 1GB) | ✅ |
|
||||
| 授權限制(MIT / Apache / GPL / commercial) | ✅ |
|
||||
|
||||
### 1.2 輸入規格
|
||||
|
||||
| 項目 | 必填 |
|
||||
|------|:--:|
|
||||
| 輸入型別(frame image / audio / text) | ✅ |
|
||||
| 是否需要前處理(resize / crop / normalize) | ✅ |
|
||||
| 需要的輸入尺寸 | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — 候選名單
|
||||
|
||||
### 2.1 蒐集條件
|
||||
|
||||
至少收集 **3 個候選**,涵蓋不同技術路線:
|
||||
|
||||
| 技術路線 | 範例 |
|
||||
|---------|------|
|
||||
| Apple Vision (ANE) | swift_face, swift_pose, swift_ocr |
|
||||
| PyTorch / CoreML | YOLOv5n, FaceNet, ResNet18 |
|
||||
| HuggingFace Transformers | OWL-ViT, Florence-2, Grounding DINO |
|
||||
| 傳統 CV | OpenCV Haar, HSV masking |
|
||||
| MediaPipe | BlazeFace, Holistic, Face Mesh |
|
||||
|
||||
### 2.2 排除條件
|
||||
|
||||
以下任一成立即排除,不進入測試:
|
||||
|
||||
- 授權不合(GPL/AGPL 在無 commercial license 時排除)
|
||||
- 已知在 target 平台無法運行(如 CUDA-only on Mac)
|
||||
- 維護狀態超過 2 年未更新(除非無替代方案)
|
||||
- 模型大小超過 1GB(除非有強烈理由)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — 基準測試
|
||||
|
||||
### 3.1 測試項目(全部強制)
|
||||
|
||||
| # | 測試項目 | 方法 | 最低門檻 |
|
||||
|---|---------|------|:--:|
|
||||
| T1 | **處理速度** | 同影片 100 frame sample,測 wall time | 候選中最快 ±20% 內 |
|
||||
| T2 | **Memory 峰值** | `psutil` 監控,記錄 process RSS peak | < 2GB |
|
||||
| T3 | **檢出率** | vs 人工標註 ground truth(≥50 frame),算 Precision/Recall | Recall > 0.6 |
|
||||
| T4 | **誤報率** | TP / (TP + FP),從同上 ground truth | Precision > 0.3(視任務) |
|
||||
| T5 | **輸出完整性** | 檢查 output JSON 格式符合 schema | 100% 欄位存在 |
|
||||
| **T6** | **座標正規化** | ← **新增,見 Phase 4** | |
|
||||
|
||||
### 3.2 基準測試腳本規範
|
||||
|
||||
每組候選必須產出:
|
||||
|
||||
```
|
||||
output/benchmark/{category}/
|
||||
├── BENCHMARK_REPORT.md # 人類可讀報告
|
||||
├── BENCHMARK_REPORT.json # 機器可讀結果
|
||||
└── {scheme}_{detector}.json # 各候選原始輸出
|
||||
```
|
||||
|
||||
使用現有 `*_benchmark_runner.py` 模板,或參考 `scripts/compare_*.py`。
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — 座標正規化校驗(T6)← 強制新增
|
||||
|
||||
### 4.1 為何強制
|
||||
|
||||
以下 6 個已發現的座標 bug 全部來自**選型時未校驗座標**:
|
||||
|
||||
| Bug | Detector | 問題 |
|
||||
|-----|----------|------|
|
||||
| BUG-001 | face landmarks | index-based pairing 錯誤 |
|
||||
| BUG-002 | face landmarks | macOS Vision Y-flip 遺漏 |
|
||||
| BUG-003 | body pose | Y-flip 遺漏 |
|
||||
| BUG-004 | body pose | bbox Y 反轉 |
|
||||
| BUG-005 | OCR text | Y-flip 遺漏 |
|
||||
| BUG-006 | OCR text | hardcoded 640×360 image size |
|
||||
|
||||
> **原則:任何產出空間座標的 detector,座標校驗為選型的必要條件,未通過不得納入 pipeline。**
|
||||
|
||||
### 4.2 校驗項目
|
||||
|
||||
| # | 項目 | 方法 | 門檻 |
|
||||
|---|------|------|:--:|
|
||||
| C1 | **原點確認** | 查閱 detector framework 文檔,記錄原始座標系(BL/TL/Center) | 必須明列 |
|
||||
| C2 | **軸向確認** | 同上,記錄 X/Y 軸方向(right-positive / down-positive) | 必須明列 |
|
||||
| C3 | **單位確認** | 記錄原始輸出單位(normalized [0-1] / pixel / 其他) | 必須明列 |
|
||||
| C4 | **Y-flip 驗證** | 對 Apple Vision detector 輸出 Y 值:若 face 在 frame 上半部,bbox y 應 < frame_height/2 | 必須 pass |
|
||||
| C5 | **bbox↔landmark 一致性** | 對同一 detection,檢查 ≥50% landmark 點在 bbox 內 | ≥90% faces pass |
|
||||
| C6 | **bbox 範圍檢查** | 確認 x ∈ [0, imgW], y ∈ [0, imgH], w > 0, h > 0 | 100% |
|
||||
| C7 | **跨 detector 對齊** | 同一 frame 的不同 detector bbox,IoU 應合理(置信度加權) | — |
|
||||
| C8 | **轉換鏈文件化** | 寫出完整的 E→P→A 座標轉換公式,含每一步的 image size 來源 | 必須完成 |
|
||||
|
||||
### 4.3 校驗腳本
|
||||
|
||||
使用 `scripts/face_landmark_qc.py` 模式(可擴展到其他類別):
|
||||
|
||||
```python
|
||||
# 對每個 frame:
|
||||
# 1. 讀取 detector 輸出
|
||||
# 2. 檢查 x ∈ [0, imgW], y ∈ [0, imgH]
|
||||
# 3. 若有 landmarks: 檢查 ≥50% inside bbox
|
||||
# 4. 輸出 pass/fail report
|
||||
```
|
||||
|
||||
完成後在 `DETECTOR_REGISTRY.md` 中標記 `verified`。
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — 選型決策
|
||||
|
||||
### 5.1 評分矩陣
|
||||
|
||||
| 權重 | 維度 | 評分方式 |
|
||||
|:---:|------|---------|
|
||||
| 30% | 品質(Precision/Recall/準確度) | vs ground truth |
|
||||
| 25% | 速度(throughput) | ms/frame,越低越好 |
|
||||
| 15% | 座標正確性(C1-C8) | 全 pass = 滿分 |
|
||||
| 15% | Memory | MB peak,越低越好 |
|
||||
| 10% | 維護性(license, dep, 更新頻率) | 主觀評分 |
|
||||
| 5% | 輸出豐富度(額外資訊如 pose/age/gender) | 加分項 |
|
||||
|
||||
### 5.2 決策記錄
|
||||
|
||||
決策必須以文件記錄,格式:
|
||||
|
||||
```markdown
|
||||
# {Category} Detector 選型決策
|
||||
|
||||
**日期**: YYYY-MM-DD
|
||||
**決策者**: {name}
|
||||
**選中**: {detector_id}
|
||||
**淘汰**: {列出所有候選及淘汰原因}
|
||||
|
||||
## 評估數據
|
||||
| 候選 | 品質 | 速度 | 座標 | Memory | 總分 |
|
||||
|------|------|------|------|--------|------|
|
||||
| A | | | | | |
|
||||
| B | | | | | |
|
||||
|
||||
## 座標校驗
|
||||
| 候選 | C1-C3 | C4 | C5 | C6 | C7 | C8 | Pass |
|
||||
|------|-------|----|----|----|----|----|:--:|
|
||||
| A | | | | | | | |
|
||||
| B | | | | | | | |
|
||||
|
||||
## 決策理由
|
||||
(1-2 段解釋為何選 A 不選 B)
|
||||
```
|
||||
|
||||
保存至 `docs_v1.0/decisions/{YYYY-MM-DD}_{category}_detector_selection.md`。
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — 入庫納管
|
||||
|
||||
### 6.1 Registry 更新
|
||||
|
||||
選定後必須更新:
|
||||
|
||||
1. `DETECTOR_REGISTRY.md` — 新增 detector 條目(若未存在),狀態標 `verified`
|
||||
2. `SPATIAL_COORDINATE_REGISTRY.md` — 更新 E 層 + P 層校準路徑
|
||||
3. 在 `src/worker/processor.rs` 或對應呼叫處,新增註解標註 detector ID
|
||||
|
||||
### 6.2 Rollback 機制
|
||||
|
||||
若偵測到已部署 detector 有嚴重問題(如 BUG-003/004),執行:
|
||||
|
||||
1. 立即標記 `buggy` 在 `DETECTOR_REGISTRY.md`
|
||||
2. 修復後重新 build
|
||||
3. 更新 `SPATIAL_COORDINATE_REGISTRY.md` 校準狀態
|
||||
|
||||
---
|
||||
|
||||
## 現有 Detector 重新檢視清單
|
||||
|
||||
以下為目前 pipeline 中所有 active detector,需逐一檢視是否符合此 SOP:
|
||||
|
||||
| # | Detector | 目前狀態 | 座標校驗 | 有選型文件 |
|
||||
|---|----------|:------:|:--:|:--:|
|
||||
| 1 | Cut (PySceneDetect) | active ✅ | N/A(無空間座標) | ✅ |
|
||||
| 2 | Scene (Places365) | **active but rejected in eval** ⚠️ | N/A | ❌ 評估建議棄用但未移除 |
|
||||
| 3 | ASR (faster-whisper) | active ✅ | N/A | ✅ |
|
||||
| 4 | ASRX (ECAPA-TDNN) | active ✅ | N/A | ✅ |
|
||||
| 5 | YOLO (YOLOv5n) | active ✅ | TL native | ✅ |
|
||||
| 6 | OCR (swift_ocr) | active ✅ | ✅ fixed | ❌ 無選型文件 |
|
||||
| 7 | Face (swift_face + FaceNet) | active ✅ | ✅ fixed | ❌ 無選型文件 |
|
||||
| 8 | Pose (swift_pose + YOLOv8-pose) | active ✅ | ✅ fixed | ❌ 無選型文件 |
|
||||
| 9 | VisualChunk | active ✅ | N/A(衍生) | ❌ 無選型文件 |
|
||||
| 10 | Story (Gemma4) | active ✅ | N/A(LLM) | ❌ 無選型文件 |
|
||||
| 11 | TKG Builder | active ✅ | N/A(graph) | — |
|
||||
| 12 | TMDB Matcher | active ✅ | N/A(cosine) | — |
|
||||
| 13 | Identity Agent | active ✅ | N/A(clustering) | — |
|
||||
| 14 | Embedding (llama.cpp) | active ✅ | N/A(vector) | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## 維護
|
||||
|
||||
- **Owner**: M5
|
||||
- **更新頻率**: 每次新增 detector 時
|
||||
- **稽核**: 每季度檢視一次所有 active detector 是否仍符合品質標準
|
||||
187
docs_v1.0/DESIGN/DOCUMENT_EMBEDDING_STRATEGY.md
Normal file
187
docs_v1.0/DESIGN/DOCUMENT_EMBEDDING_STRATEGY.md
Normal file
@@ -0,0 +1,187 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Document Embedding Strategy - Parent-Child Chunks"
|
||||
date: "2026-03-23"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "embedding"
|
||||
- "chunks"
|
||||
- "strategy"
|
||||
- "document"
|
||||
ai_query_hints:
|
||||
- "查詢 Document Embedding Strategy - Parent-Child Chunks 的內容"
|
||||
- "Document Embedding Strategy - Parent-Child Chunks 的主要目的是什麼?"
|
||||
- "如何操作或實施 Document Embedding Strategy - Parent-Child Chunks?"
|
||||
---
|
||||
|
||||
# Document Embedding Strategy - Parent-Child Chunks
|
||||
|
||||
| Item | Content |
|
||||
|------|---------|
|
||||
| Author | Warren |
|
||||
| Created | 2026-03-23 |
|
||||
| Document Version | V1.0 |
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Purpose | Operator | Tool/Model |
|
||||
|---------|------|---------|----------|------------|
|
||||
| V1.0 | 2026-03-23 | Create document embedding strategy | Warren | OpenCode |
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Momentry uses a **parent-child chunk hierarchy** for improved RAG retrieval. This document describes the embedding strategy for this hierarchy.
|
||||
|
||||
## Chunk Structure
|
||||
|
||||
### Parent Chunk
|
||||
- **Purpose**: Summarize multiple child chunks with narrative description
|
||||
- **Content**: High-level description of multiple scenes/segments
|
||||
- **Example**:
|
||||
```json
|
||||
{
|
||||
"chunk_id": "story_asr_0000",
|
||||
"chunk_type": "story",
|
||||
"text_content": "[0s-125s] A man enters a building. He walks down a hallway.",
|
||||
"child_chunk_ids": ["asr_0001", "asr_0002", "asr_0003", "asr_0004", "asr_0005"]
|
||||
}
|
||||
```
|
||||
|
||||
### Child Chunk
|
||||
- **Purpose**: Individual segments from ASR, scenes from CUT, etc.
|
||||
- **Content**: Raw transcription or detection results
|
||||
- **Example**:
|
||||
```json
|
||||
{
|
||||
"chunk_id": "asr_0001",
|
||||
"chunk_type": "sentence",
|
||||
"text_content": "Hello world",
|
||||
"parent_chunk_id": "story_asr_0000"
|
||||
}
|
||||
```
|
||||
|
||||
## Embedding Strategy
|
||||
|
||||
### For Vector Search
|
||||
|
||||
When embedding chunks for vector search, we combine **parent description + child content** to provide both context and detail.
|
||||
|
||||
#### Parent Chunk Embedding
|
||||
```
|
||||
embedding_text = f"Summary: {parent.text_content}
|
||||
Children: {child_text_1}. {child_text_2}. {child_text_3}..."
|
||||
```
|
||||
|
||||
**Prefix**: `search_document:` (for documents in Qdrant)
|
||||
|
||||
**Example**:
|
||||
```
|
||||
search_document: Summary: A man enters a building. He walks down a hallway.
|
||||
Children: Hello, how are you? I'm fine thank you. The weather is nice today.
|
||||
```
|
||||
|
||||
#### Child Chunk Embedding
|
||||
```
|
||||
embedding_text = f"[{child.chunk_type}] {child.text_content}
|
||||
Parent: {parent.description}"
|
||||
```
|
||||
|
||||
**Prefix**: `search_document:`
|
||||
|
||||
**Example**:
|
||||
```
|
||||
search_document: [sentence] Hello, how are you?
|
||||
Parent: A man enters a building. He walks down a hallway.
|
||||
```
|
||||
|
||||
### For BM25 Text Search
|
||||
|
||||
BM25 operates on raw text with PostgreSQL full-text search.
|
||||
|
||||
- **Index**: `search_vector` (TSVECTOR) on `chunks.text_content`
|
||||
- **Search**: Uses `ts_rank_cd()` for ranking
|
||||
|
||||
## Hybrid Search Ranking
|
||||
|
||||
Combined score = `(vector_score * 0.7) + (bm25_score * 0.3)`
|
||||
|
||||
### Why 0.7/0.3?
|
||||
|
||||
| Weight | Vector | BM25 |
|
||||
|--------|--------|------|
|
||||
| Pros | Semantic similarity | Exact keyword match |
|
||||
| Cons | May miss specific terms | No semantic understanding |
|
||||
| Best for | Thematic queries | Fact lookup |
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Thematic Query ("What are the main themes?")
|
||||
- Use higher `vector_weight` (0.8-0.9)
|
||||
- Vector search finds semantically similar content
|
||||
|
||||
### Fact Lookup ("Who said X?")
|
||||
- Use higher `bm25_weight` (0.5-0.7)
|
||||
- BM25 finds exact matches
|
||||
|
||||
### Balanced ("Tell me about scene 5")
|
||||
- Use default 0.7/0.3
|
||||
|
||||
## Implementation
|
||||
|
||||
### Embedding Generation
|
||||
```rust
|
||||
fn build_embedding_text(chunk: &Chunk, parent_text: Option<&str>) -> String {
|
||||
match chunk.chunk_type {
|
||||
ChunkType::Story => {
|
||||
format!(
|
||||
"Summary: {}\nChildren: {}",
|
||||
chunk.text_content,
|
||||
get_children_text(chunk)
|
||||
)
|
||||
}
|
||||
_ => {
|
||||
format!(
|
||||
"[{}] {}\nParent: {}",
|
||||
chunk.chunk_type.as_str(),
|
||||
chunk.text_content,
|
||||
parent_text.unwrap_or("N/A")
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Storage
|
||||
- Parent chunks stored with their `child_chunk_ids`
|
||||
- Child chunks reference `parent_chunk_id`
|
||||
- Both stored in PostgreSQL with full-text index
|
||||
- Vectors stored in Qdrant
|
||||
|
||||
## Example Flow
|
||||
|
||||
1. **Story Processing** generates parent-child hierarchy
|
||||
2. **Embedding** creates vector for each chunk
|
||||
3. **Storage** saves to PostgreSQL + Qdrant
|
||||
4. **Search** retrieves using hybrid search
|
||||
5. **Results** include both parent context and child details
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Chunk Size**: 5 child chunks per parent (configurable)
|
||||
2. **Text Length**: Keep embeddings under 512 tokens
|
||||
3. **Parent Description**: Include temporal markers (timestamps)
|
||||
4. **Child Content**: Preserve original transcription
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- [ ] GraphRAG integration for relationship traversal
|
||||
- [ ] Cross-chunk entity linking
|
||||
- [ ] Temporal graph building
|
||||
120
docs_v1.0/DESIGN/Face_Pipeline.md
Normal file
120
docs_v1.0/DESIGN/Face_Pipeline.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Face Pipeline: Detection → Clustering → Trace
|
||||
|
||||
**Date**: 2026-05-16
|
||||
|
||||
---
|
||||
|
||||
## 流程
|
||||
|
||||
```
|
||||
Video Frames
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────┐
|
||||
│ 0. Cut Detection │ PySceneDetect
|
||||
│ scene boundaries │ → chunk (chunk_type='cut')
|
||||
└─────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────┐
|
||||
│ 1. Face Detection │ 每幀偵測人臉
|
||||
│ confidence ≥ 0.5 │ → face_detections (cut_id 對應所屬 cut)
|
||||
└─────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────┐
|
||||
│ 2. Face Clustering │ embedding + IoU + distance
|
||||
│ trace_id assignment │ 同一人 + 同 cut → 同一 trace_id
|
||||
│ per-file sequential │ trace_id 跨 cut 持續給號(不歸零)
|
||||
└─────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────┐
|
||||
│ 3. Face Trace │ 跨影格連續追蹤
|
||||
│ per-file sequential │ trace_id = 0, 1, 2, ...
|
||||
│ scoped by cut │ 每個 trace 完全落在一個 cut 內
|
||||
└─────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────┐
|
||||
│ 4. Identity Binding │ embedding 比對
|
||||
│ identity_id assignment │ → known person / stranger
|
||||
└─────────────────────────────┘
|
||||
```
|
||||
|
||||
## scope
|
||||
|
||||
```sql
|
||||
trace_id → per-file sequential (file_uuid, trace_id) 唯一
|
||||
cut_id → chunk.id WHERE chunk_type='cut' 輔助 scope,不影響唯一性
|
||||
identity_id → global FK 跨 cut / file 關聯同一人
|
||||
```
|
||||
|
||||
## 約束
|
||||
|
||||
| 約束 | 說明 |
|
||||
|------|------|
|
||||
| 唯一 | `(file_uuid, trace_id)` |
|
||||
| 單一 cut | 每個 trace 完全落在一個 cut 內(`0` 個跨 cut trace) |
|
||||
| 獨立 | `trace_id` ≠ `identity_id`。前者是物體軌跡,後者是身份分別 |
|
||||
|
||||
## 各階段資料量
|
||||
|
||||
```
|
||||
Stage | 量 | Key
|
||||
------------------------|-------------|----------------------
|
||||
Raw faces | 262,021 | face_detections rows
|
||||
After clustering | 6,892 | distinct trace_id
|
||||
With identity | 147,602 | identity_id NOT NULL (2,035 identities)
|
||||
Stranger (unbound) | 114,419 | identity_id IS NULL
|
||||
```
|
||||
|
||||
## Trace 大小分布
|
||||
|
||||
| Faces per trace | Trace count | 說明 |
|
||||
|:---------------:|:-----------:|------|
|
||||
| 1 | 610 | 一閃而過 |
|
||||
| 2-5 | 969 | 短暫出現 |
|
||||
| 6-20 | 1,541 | 片段 |
|
||||
| 21-100 | 2,218 | 一般 |
|
||||
| 101+ | 1,554 | 主要角色 |
|
||||
|
||||
## Clustering 方式
|
||||
|
||||
Face Tracker (`scripts/face_tracker.py`) 使用三種方法決定同一人:
|
||||
|
||||
1. **IoU (Intersection over Union)** — 前後影格框重疊率
|
||||
2. **Cosine distance** — face embedding 相似度
|
||||
3. **Euclidean distance** — bbox 中心距離
|
||||
|
||||
三者加權決策:iou > 0.5 || (cosine < 0.3 && distance < 100px)
|
||||
|
||||
## Trace 結構
|
||||
|
||||
```json
|
||||
{
|
||||
"trace_id": 2, // per-file sequential
|
||||
"faces": [ // face_detections GROUP BY trace_id
|
||||
{"face_id": "4587_0", "frame": 4587, "confidence": 0.92},
|
||||
{"face_id": "4588_0", "frame": 4588, "confidence": 0.91},
|
||||
...
|
||||
],
|
||||
"start_frame": 4587,
|
||||
"end_frame": 4722,
|
||||
"face_count": 46,
|
||||
"identity_id": 101 // NULL = stranger
|
||||
}
|
||||
```
|
||||
|
||||
## API 查詢
|
||||
|
||||
```bash
|
||||
# Trace 列表(含 face_count、區間)
|
||||
POST /api/v1/file/:uuid/face_trace/sortby
|
||||
|
||||
# Trace 內 faces(逐幀 + 可選 interpolation)
|
||||
GET /api/v1/file/:uuid/trace/:trace_id/faces
|
||||
|
||||
# Trace 綁定身份
|
||||
POST /api/v1/identity/:uuid/bind
|
||||
```
|
||||
45
docs_v1.0/DESIGN/GUN_DETECTION_REPORT.md
Normal file
45
docs_v1.0/DESIGN/GUN_DETECTION_REPORT.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# 槍枝檢測模型 Charade 評估報告
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**模型:** YOLOv8n fine-tuned on Roboflow gun dataset (905 images)
|
||||
**Classes:** grenade (0), knife (1), pistol (2), rifle (3)
|
||||
**Weights:** `models/gun/gun_detector/weights/best.pt` (6MB)
|
||||
|
||||
## 訓練
|
||||
|
||||
- **Dataset**: 905 images, Roboflow CC BY 4.0
|
||||
- **Validation mAP50**: 0.813
|
||||
- **問題**: 訓練資料全為近距離槍枝特寫,與 Charade 電影中的中遠景畫面分布完全不同
|
||||
|
||||
## Charade 測試結果
|
||||
|
||||
### 系統掃描(24 取樣點 @ 每 300s)
|
||||
|
||||
| 時間 | 類別 | 信心 | 判定 |
|
||||
|------|------|------|------|
|
||||
| t=600s | pistol×2, rifle | 0.16–0.30 | ❌ FP |
|
||||
| t=1200s | knife | 0.37 | ❌ FP |
|
||||
| t=1800s | pistol | 0.19 | ❌ FP |
|
||||
| t=2400s | knife | 0.18 | ❌ FP |
|
||||
| t=3000s | pistol | 0.16 | ❌ FP |
|
||||
| t=5400s | pistol×2 | 0.45, 0.17 | ❌ FP(郵票被誤判為槍) |
|
||||
| t=6600s | grenade | 0.22 | ❌ FP |
|
||||
|
||||
### 密集掃描(ASR trigger)
|
||||
|
||||
在 ASR dialogue 提到 "gun" 的時間點附近跑 gun detector,找到 5 個 pistol/gun 觸發(3188s / 5461s / 6309s / 6377s / 6479s),confidence 0.300-0.387。
|
||||
|
||||
**結果:全部為 false positive。** 訓練效果非常不好 — 模型在電影中遠景畫面完全失效。
|
||||
|
||||
## 結論
|
||||
|
||||
1. 訓練資料與推論場景 distribution mismatch 嚴重
|
||||
2. 905 張 Roboflow 近距離特寫 → Charade 的中遠景手持/部分遮蔽槍枝 → 模型無法泛化
|
||||
3. 建議:收集電影真實槍枝畫面(200-500 張動作片片段)重新訓練
|
||||
4. 在此之前,槍枝搜尋只能靠 ASR dialogue keyword matching + 人工確認
|
||||
|
||||
## 相關檔案
|
||||
|
||||
- `models/gun/gun_detector/weights/best.pt` — 模型權重(效果不佳)
|
||||
- `output_dev/gun_detections/` — 偵測截圖(全部 FP)
|
||||
- `scripts/object_search_agent.py` — 整合搜尋 agent(gun detector 偵測結果僅供參考)
|
||||
73
docs_v1.0/DESIGN/GUN_DETECTOR_SCAN_REPORT.md
Normal file
73
docs_v1.0/DESIGN/GUN_DETECTOR_SCAN_REPORT.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Gun Detector Scan Report — YOLOv8n on Charade (1963)
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Model:** `models/gun/gun_detector/weights/best.pt`
|
||||
**Base:** YOLOv8n fine-tuned on Roboflow gun dataset (905 images)
|
||||
**Classes:** grenade, knife, pistol, rifle
|
||||
**Scan script:** `scripts/gun_detector_scan.py`
|
||||
|
||||
## Scan Method
|
||||
|
||||
- **121 scan points**: 2 ASR "gun" mentions + 114 fixed intervals (60s) + 5 original hit timestamps
|
||||
- **Per point**: scan ±30 frames at every 3rd frame = ~20 frames per point
|
||||
- **Total frames processed**: ~2,420
|
||||
- **Runtime**: ~2 min
|
||||
|
||||
## Results
|
||||
|
||||
| Class | Detections | Top Confidence |
|
||||
|-------|-----------|---------------|
|
||||
| pistol | **82** | 0.887 |
|
||||
| rifle | 55 | 0.822 |
|
||||
| grenade | 35 | 0.797 |
|
||||
| knife | 38 | 0.810 |
|
||||
| **Total** | **210** (after dedup) | — |
|
||||
|
||||
## Original 5 Pistol Timestamps
|
||||
|
||||
| Timestamp | Original | This Scan | Delta |
|
||||
|-----------|----------|-----------|-------|
|
||||
| 3188s (53:08) | pistol 0.387 | ✅ **0.474** | +22% |
|
||||
| 5461s (91:01) | pistol 0.355 | ✅ **0.346** | −3% |
|
||||
| 6309s (1:45:09) | pistol 0.374 | ❌ Not found | — |
|
||||
| 6377s (1:46:17) | gun 0.316 | ✅ **0.757** | +140% |
|
||||
| 6479s (1:47:59) | pistol 0.300 | ✅ **0.815** | +172% |
|
||||
|
||||
## Top Pistol Detections
|
||||
|
||||
| Time | Confidence | Image |
|
||||
|------|-----------|-------|
|
||||
| 84:00 (5040s) | **0.887** | `5040s_pistol_0.887.jpg` |
|
||||
| 90:00 (5400s) | **0.816** | `5400s_pistol_0.816.jpg` |
|
||||
| 108:00 (6480s) | **0.815** | `6480s_pistol_0.815.jpg` |
|
||||
| 48:59 (2939s) | **0.805** | `2939s_pistol_0.805.jpg` |
|
||||
| 53:07 (3187s) | **0.474** | `3187s_pistol_0.474.jpg` |
|
||||
| 91:00 (5459s) | **0.346** | `5459s_pistol_0.346.jpg` |
|
||||
|
||||
## Analysis
|
||||
|
||||
### Model Performance
|
||||
|
||||
Compared to the original evaluation (May 7, 24 sample points, all FP):
|
||||
|
||||
- This scan found **significantly more detections** (210 vs 7)
|
||||
- Confidence values are **much higher** (0.887 vs 0.45 max)
|
||||
- 4/5 original pistol timestamps recovered
|
||||
|
||||
### Cautions
|
||||
|
||||
1. **Training data mismatch**: Model was trained on 905 close-up gun photos, NOT movie frames. High confidence ≠ real gun.
|
||||
2. **Stamp false positive confirmed**: t=5400s (identified in original eval as stamp → pistol) continues to fire at 0.816
|
||||
3. **Pattern suggests overconfidence**: Many detections at regular intervals (every 60s, same objects) suggest the model is detecting non-gun objects with high confidence
|
||||
|
||||
### Verified Findings
|
||||
|
||||
The original 5 pistol images from the gun_detections/ directory (3188s, 5461s, 6309s, 6377s, 6479s) were all produced by the same YOLOv8n model. The user previously stated that none of these have been confirmed as real guns.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `output_dev/gun_detections/gun_detections.json` | All 210 deduped detections |
|
||||
| `output_dev/gun_detections/*.jpg` | Annotated screenshots (one per detection) |
|
||||
| `scripts/gun_detector_scan.py` | Scan script (reproducible) |
|
||||
995
docs_v1.0/DESIGN/MARKBASE_DESIGN_V2.0.md
Normal file
995
docs_v1.0/DESIGN/MARKBASE_DESIGN_V2.0.md
Normal file
@@ -0,0 +1,995 @@
|
||||
---
|
||||
document_type: "design"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "MarkBase 設計文件 V2.0"
|
||||
date: "2026-05-14"
|
||||
version: "V2.0"
|
||||
status: "active"
|
||||
owner: "M4"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "markbase"
|
||||
- "display-engine"
|
||||
- "virtual-tree"
|
||||
- "group-share"
|
||||
- "storage-tier"
|
||||
- "file-uuid"
|
||||
- "sqlite"
|
||||
- "design"
|
||||
ai_query_hints:
|
||||
- "查詢 MarkBase 設計文件 V2.0 的內容"
|
||||
- "MarkBase 虛擬檔案樹如何設計"
|
||||
- "MarkBase Group Share 怎麼實現"
|
||||
- "MarkBase file_uuid 規則"
|
||||
- "MarkBase 儲存層級 Hot Warm Cold 設計"
|
||||
- "MarkBase 與 Momentry Core 整合方式"
|
||||
- "MarkBase Display Mode trait 架構"
|
||||
- "MarkBase 檔案操作 API 設計"
|
||||
related_documents:
|
||||
- "REFERENCE/MARKBASE_DESIGN_v1.0.0.md"
|
||||
- "REFERENCE/file_uuid_spec.md"
|
||||
- "REFERENCE/SPATIAL_COORDINATE_REGISTRY.md"
|
||||
---
|
||||
|
||||
# MarkBase 設計文件 V2.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | M4 / OpenCode |
|
||||
| 建立時間 | 2026-05-14 |
|
||||
| 文件版本 | V2.0 |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-12 | 初版設計(Demo Display + Knowledge Graph) | M4 / OpenCode | DeepSeek V4 Pro |
|
||||
| V2.0 | 2026-05-14 | 加入檔案樹、Group Share、儲存層級、技術棧、file_uuid 整合 | M4 / OpenCode | DeepSeek V4 Pro |
|
||||
|
||||
---
|
||||
|
||||
## 概述
|
||||
|
||||
MarkBase 是 Momentry 生態系的 Display Engine 與檔案管理平台。從 V2.0 起,MarkBase 不再只是 Demo Runner 的 presentation layer,而是升級為具備虛擬檔案樹、跨用戶群組分享、多層級儲存管理、檔案操作 API 的完整平台。
|
||||
|
||||
**核心設計原則:**
|
||||
|
||||
| 原則 | 說明 |
|
||||
|------|------|
|
||||
| 展示層先行 | Demo Display 功能保留,作為 demo runner 的固定顯示視窗 |
|
||||
| 檔案層次化 | 虛擬檔案樹(Virtual Tree)讓用戶管理自己的資料結構 |
|
||||
| 儲存層級化 | Hot/Warm/Cold 三級儲存,讓用戶掌控成本 |
|
||||
| 群組協作 | Group Share 讓團隊內的檔案可讀寫 |
|
||||
| 單一使用者隔離 | One user = one SQLite,不混用 |
|
||||
|
||||
---
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Virtual Tree | 用戶管理的邏輯檔案樹,非實體路徑 |
|
||||
| FileNode | 虛擬樹中的節點,包含 label、別名、圖示、顏色 |
|
||||
| Display Mode | 使用者選擇的檔案展示方式(List / Tree / Small Icon / Large Icon) |
|
||||
| Group Share | 跨用戶的群組檔案分享(選項 A: Group SQLite) |
|
||||
| Storage Tier | 三級儲存層級(Hot / Warm / Cold) |
|
||||
| file_uuid | 32 字元十六進制檔案出生識別符,由 Momentry Core 計算 |
|
||||
| Exit Record | 檔案移出管理時的留存記錄 |
|
||||
| Mount | 實體儲存掛載點(NAS、外接硬碟、LTO) |
|
||||
|
||||
---
|
||||
|
||||
## 1. 架構總覽
|
||||
|
||||
### 1.1 模組化 Rust 設計
|
||||
|
||||
```
|
||||
markbase/
|
||||
├── src/
|
||||
│ ├── main.rs # CLI entry point
|
||||
│ ├── server.rs # axum HTTP server (port 11438)
|
||||
│ ├── display/ # Display engine (from V1.0)
|
||||
│ │ ├── mod.rs
|
||||
│ │ ├── render.rs # .md → HTML (pulldown-cmark)
|
||||
│ │ ├── highlight.rs # syntax highlighting (syntect)
|
||||
│ │ ├── mermaid.rs # Mermaid rendering
|
||||
│ │ └── page.html # core HTML template
|
||||
│ ├── filetree/ # Virtual file tree (NEW V2.0)
|
||||
│ │ ├── mod.rs # FileTree struct, init_from_sqlite
|
||||
│ │ ├── node.rs # FileNode struct
|
||||
│ │ ├── mode.rs # DisplayMode trait
|
||||
│ │ ├── modes/
|
||||
│ │ │ ├── list.rs # list module (trait impl)
|
||||
│ │ │ ├── tree.rs # tree module (trait impl, Phase 1)
|
||||
│ │ │ ├── grid_sm.rs # small icon grid (trait impl)
|
||||
│ │ │ └── grid_lg.rs # large icon grid (trait impl)
|
||||
│ │ └── auto_layer.rs # auto-layer rules
|
||||
│ ├── operations/ # File operations (NEW V2.0)
|
||||
│ │ ├── mod.rs
|
||||
│ │ ├── compress.rs # zip / tar
|
||||
│ │ ├── transfer.rs # copy / move between tiers
|
||||
│ │ ├── archive.rs # auto-archive logic
|
||||
│ │ ├── restore.rs # restore from archive
|
||||
│ │ ├── exit.rs # exit record management
|
||||
│ │ └── registry.rs # file_registry table
|
||||
│ ├── groups/ # Group share (NEW V2.0)
|
||||
│ │ ├── mod.rs
|
||||
│ │ ├── db.rs # Group SQLite create/open
|
||||
│ │ ├── merge.rs # ATTACH + cross-DB merge
|
||||
│ │ └── roles.rs # owner/editor/viewer
|
||||
│ └── mount/ # Mount management (NEW V2.0)
|
||||
│ ├── mod.rs
|
||||
│ ├── tier.rs # Hot/Warm/Cold tier defs
|
||||
│ └── history.rs # location_history table
|
||||
```
|
||||
|
||||
**DisplayMode Trait 設計:**
|
||||
|
||||
```rust
|
||||
/// 展示模式的統一介面。
|
||||
/// 每個模式(List, Tree, Grid)實作此 trait。
|
||||
#[async_trait]
|
||||
pub trait DisplayMode: Send + Sync {
|
||||
/// 模式名稱(前端使用)
|
||||
fn name(&self) -> &'static str;
|
||||
|
||||
/// 將 FileTree 轉換為此模式的前端資料
|
||||
fn render(&self, tree: &FileTree, user_id: &str) -> Result<Value>;
|
||||
|
||||
/// 此模式支援的排序方式
|
||||
fn sort_options(&self) -> Vec<SortOption>;
|
||||
|
||||
/// 此模式支援的過濾器
|
||||
fn filter_options(&self) -> Vec<FilterOption>;
|
||||
}
|
||||
```
|
||||
|
||||
### 1.2 One User = One SQLite
|
||||
|
||||
```
|
||||
data/
|
||||
├── users/
|
||||
│ ├── demo.sqlite # 用戶 demo 的虛擬樹 + 操作記錄
|
||||
│ ├── warren.sqlite # 用戶 warren 的虛擬樹 + 操作記錄
|
||||
│ └── alice.sqlite # 用戶 alice 的虛擬樹 + 操作記錄
|
||||
├── groups/
|
||||
│ ├── groups.sqlite # 群組註冊表(group_id → path)
|
||||
│ ├── 1.sqlite # 群組 1 的共用資料
|
||||
│ └── 2.sqlite # 群組 2 的共用資料
|
||||
└── system.sqlite # 系統層級資料(掛載點、全域設定)
|
||||
```
|
||||
|
||||
| 原則 | 說明 |
|
||||
|------|------|
|
||||
| **用戶隔離** | 每個用戶獨立的 SQLite 檔案(user.sqlite) |
|
||||
| **簡單部署** | 不需 PostgreSQL server,單檔即可 |
|
||||
| **易於備份** | 複製 `.sqlite` 檔案即可 |
|
||||
| **Portable** | 隨身碟帶著走,離線可用 |
|
||||
|
||||
### 1.3 Momentry Core 整合(A+B 混合模式)
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ MarkBase │
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ 模式 A: Crate │ │ 模式 B: HTTP API │ │
|
||||
│ │ (momentry_core │ │ (localhost:3003) │ │
|
||||
│ │ 作為依賴) │ │ │ │
|
||||
│ │ │ │ • file_uuid 驗證 │ │
|
||||
│ │ • file_uuid 計算 │ │ • chunk 查詢 │ │
|
||||
│ │ • 向量嵌入 │ │ • identity 查詢 │ │
|
||||
│ │ • 本地處理 │ │ • trace data │ │
|
||||
│ └─────────────────┘ └─────────────────────────┘ │
|
||||
│ │
|
||||
│ 選擇策略: │
|
||||
│ • 輕量運算 → Crate 模式(不啟動 server) │
|
||||
│ • 重查詢/伺服器操作 → HTTP API(需 server 運行) │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
| 操作 | 模式 | 理由 |
|
||||
|------|:----:|------|
|
||||
| file_uuid 計算/驗證 | Crate | 純函數,不需 server |
|
||||
| SHA256 | Crate | 本地計算 |
|
||||
| Chunk 查詢(by file_uuid) | HTTP | 需存取 PostgreSQL |
|
||||
| Identity 查詢 | HTTP | 需存取 PostgreSQL |
|
||||
| Trace data(時序片段) | HTTP | 需存取 PostgreSQL |
|
||||
| 向量搜尋(ANN) | HTTP | 需 Qdrant server |
|
||||
| 文件轉換(soffice) | Crate/CLI | 本地處理 |
|
||||
|
||||
---
|
||||
|
||||
## 2. 技術棧
|
||||
|
||||
### 2.1 Crate 依賴
|
||||
|
||||
| Crate | 用途 | License |
|
||||
|-------|------|---------|
|
||||
| axum 0.7 | HTTP server(port 11438) | MIT |
|
||||
| tokio 1.0 | 非同步 runtime | MIT |
|
||||
| rusqlite 0.32 | SQLite 客戶端(bundled) | MIT |
|
||||
| r2d2 / r2d2_sqlite | SQLite 連接池 | MIT/Apache |
|
||||
| serde / serde_json 1.0 | JSON 序列化 | MIT/Apache |
|
||||
| sha2 0.10 | SHA256(file_uuid 驗證) | MIT/Apache |
|
||||
| notify 6.0 | 檔案系統監控(Hot tier) | CC0/MIT |
|
||||
| zip 2.0 | ZIP 壓縮 | MIT |
|
||||
| tar 0.4 | TAR 打包(LTO 歸檔) | MIT/Apache |
|
||||
| walkdir 2.0 | 目錄掃描 | MIT/Unlicense |
|
||||
| chrono 0.4 | 日期時間 | MIT/Apache |
|
||||
| tracing 0.1 | 結構化日誌 | MIT |
|
||||
| pulldown-cmark | Markdown → HTML | MIT |
|
||||
| syntect | 程式碼語法高亮 | MIT |
|
||||
| anyhow / thiserror | 錯誤處理 | MIT/Apache |
|
||||
| once_cell | 延遲初始化 | MIT/Apache |
|
||||
| async-trait | async trait 支援 | MIT/Apache |
|
||||
|
||||
### 2.2 SQLite 查詢策略
|
||||
|
||||
| 項目 | 決策 |
|
||||
|------|:--:|
|
||||
| Crate | rusqlite(同步 API) |
|
||||
| 非同步包裝 | `tokio::task::spawn_blocking` |
|
||||
| 連接池 | r2d2_sqlite |
|
||||
| WAL 模式 | 啟用(預設) |
|
||||
|
||||
```rust
|
||||
// axum handler 中的使用模式
|
||||
async fn get_tree(State(pool): State<DbPool>) -> Result<Json<Value>> {
|
||||
let tree = tokio::task::spawn_blocking(move || {
|
||||
let conn = pool.get()?;
|
||||
let tree = FileTree::load(&conn, user_id)?;
|
||||
Ok::<_, anyhow::Error>(tree)
|
||||
}).await??;
|
||||
|
||||
Ok(Json(tree))
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 檔案系統監控
|
||||
|
||||
| 項目 | 決策 |
|
||||
|------|:--:|
|
||||
| Crate | notify 6.0(CC0/MIT) |
|
||||
| 監控範圍 | 僅 Hot tier |
|
||||
| 不監控 | Warm / Cold tier(變更頻率低) |
|
||||
| 實作 | `notify::Watcher` + `mpsc::channel` → async stream |
|
||||
|
||||
### 2.4 壓縮引擎
|
||||
|
||||
| 格式 | Crate | 用途 |
|
||||
|------|-------|------|
|
||||
| `.zip` | `zip` crate | 一般壓縮(用戶下載、備份) |
|
||||
| `.tar.gz` | `tar` + `flate2` crate | LTO 歸檔(Cold tier) |
|
||||
|
||||
不使用外部 CLI(ditto、hdiutil),全部以 Rust crate 實作。
|
||||
|
||||
### 2.5 檔案傳輸(Transfer Engine)
|
||||
|
||||
#### 雙引擎策略
|
||||
|
||||
```
|
||||
TransferEngine:
|
||||
├── Direct 模式(std::fs::copy)
|
||||
│ 適用:小檔案 (<50MB)、fallback
|
||||
│ 特點:無外部依賴、簡單可靠
|
||||
│
|
||||
└── Rsync 模式(rsync CLI)
|
||||
適用:大檔案 (>=50MB)、tier 遷移、NAS 鏡像
|
||||
特點:增量傳輸、續傳、校驗和
|
||||
```
|
||||
|
||||
#### 自動選擇邏輯
|
||||
|
||||
```rust
|
||||
fn select_mode(file_path: &Path) -> TransferMode {
|
||||
let size = std::fs::metadata(file_path).map(|m| m.len()).unwrap_or(0);
|
||||
if size < 50 * 1024 * 1024 { // <50MB
|
||||
TransferMode::Direct
|
||||
} else if Command::new("rsync").arg("--version").output().is_ok() {
|
||||
TransferMode::Rsync
|
||||
} else {
|
||||
TransferMode::Direct // rsync 不存在時 fallback
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### rsync 適用性分析
|
||||
|
||||
| 場景 | 工具 | 理由 |
|
||||
|------|------|------|
|
||||
| 單小檔複製 (<50MB) | `std::fs::copy` | rsync protocol overhead > 效益 |
|
||||
| 大檔案遷移 (tier move) | **rsync** | 增量、續傳、校驗和,三合一 |
|
||||
| Hot ↔ Warm 同一機器 | **rsync** | 大檔案 delta transfer 效益 |
|
||||
| NAS ↔ NAS 鏡像 | **rsync** | `--delete` 鏡像模式 |
|
||||
| 打包 .zip/.tar.gz | `zip` / `tar` crate | rsync 不做壓縮打包 |
|
||||
| 寫 LTO 磁帶 | `tar` crate | rsync 無法寫磁帶 |
|
||||
|
||||
#### rsync CLI 參數
|
||||
|
||||
| 參數 | 用途 |
|
||||
|------|------|
|
||||
| `-a` | archive mode(保留權限、時間戳) |
|
||||
| `-v` | verbose(進度顯示) |
|
||||
| `-P` | 等同 `--partial --progress`(續傳 + 進度) |
|
||||
| `-c` | checksum mode(SHA256 驗證,非 time/size) |
|
||||
| `-n` | dry-run(遷移前預覽) |
|
||||
| `--delete` | 鏡像模式(NAS 同步用) |
|
||||
|
||||
### 2.6 Group Share 跨 DB 查詢
|
||||
|
||||
使用 SQLite `ATTACH DATABASE`:
|
||||
|
||||
```sql
|
||||
ATTACH DATABASE '/path/to/groups/1.sqlite' AS g;
|
||||
SELECT f.*, gf.permission
|
||||
FROM file_registry f
|
||||
JOIN g.file_registry gf ON f.file_uuid = gf.file_uuid;
|
||||
```
|
||||
|
||||
**優勢:** 一行 SQL 解決,Rust 端不需額外合併邏輯。
|
||||
|
||||
### 2.7 非同步策略
|
||||
|
||||
```
|
||||
axum handler (async)
|
||||
│
|
||||
├── 快速操作(直接 await)
|
||||
│ ├── serde_json 序列化
|
||||
│ ├── 驗證
|
||||
│ └── 記憶體操作
|
||||
│
|
||||
└── 阻塞操作(spawn_blocking)
|
||||
├── rusqlite 查詢
|
||||
├── std::fs 檔案操作
|
||||
├── SHA256 計算
|
||||
└── 壓縮/解壓
|
||||
```
|
||||
|
||||
**原則:** axum handler 本身是 async,遇到 rusqlite 或 std::fs 時,一律用 `tokio::task::spawn_blocking` 包裝。
|
||||
|
||||
---
|
||||
|
||||
## 3. file_uuid 規範
|
||||
|
||||
### 3.1 計算公式
|
||||
|
||||
```
|
||||
file_uuid = SHA256(mac_address | birthday | physical_path_at_birth | filename)[0:32]
|
||||
```
|
||||
|
||||
詳細規範參見 `REFERENCE/file_uuid_spec.md`。
|
||||
|
||||
### 3.2 MarkBase 中的使用
|
||||
|
||||
| 欄位 | 來源 | 說明 |
|
||||
|------|------|------|
|
||||
| file_uuid | Momentry Core | MarkBase 不重新計算,直接復用 |
|
||||
| 驗證 | `is_birth_uuid()` | 長度 32,不含 `_` |
|
||||
| 關聯 | 主鍵 | `file_registry.file_uuid`、`file_nodes.file_uuid` |
|
||||
|
||||
### 3.3 整合流程
|
||||
|
||||
```
|
||||
Momentry Core MarkBase
|
||||
(檔案註冊) (匯入)
|
||||
┌──────────┐ ┌──────────┐
|
||||
│ compute_ │ │ INSERT │
|
||||
│ birth_ │──── file_uuid ───▶│ INTO │
|
||||
│ uuid() │ 32 hex │ file_ │
|
||||
│ │ │ registry │
|
||||
└──────────┘ │(file_uuid)
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 虛擬檔案樹
|
||||
|
||||
### 4.1 FileNode 結構
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct FileNode {
|
||||
/// 節點唯一 ID(UUIDv4)
|
||||
pub node_id: String,
|
||||
|
||||
/// 顯示名稱
|
||||
pub label: String,
|
||||
|
||||
/// 多語言別名
|
||||
pub aliases: Aliases,
|
||||
|
||||
/// 關聯的 file_uuid(Momentry Core 來源)
|
||||
pub file_uuid: Option<String>,
|
||||
|
||||
/// 父節點 node_id(root 為 None)
|
||||
pub parent_id: Option<String>,
|
||||
|
||||
/// 子節點列表
|
||||
pub children: Vec<String>,
|
||||
|
||||
/// 節點類型
|
||||
pub node_type: NodeType,
|
||||
|
||||
/// 自訂圖示(emoji 或 SVG 路徑)
|
||||
pub icon: Option<String>,
|
||||
|
||||
/// 文字顏色(CSS hex)
|
||||
pub color: Option<String>,
|
||||
|
||||
/// 背景顏色(CSS hex)
|
||||
pub bg_color: Option<String>,
|
||||
|
||||
/// 建立時間
|
||||
pub created_at: String,
|
||||
|
||||
/// 最後修改時間
|
||||
pub updated_at: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Aliases {
|
||||
/// 繁體中文
|
||||
pub zh_tw: Option<String>,
|
||||
/// 英文
|
||||
pub en_us: Option<String>,
|
||||
/// 日文
|
||||
pub ja_jp: Option<String>,
|
||||
/// 韓文
|
||||
pub ko_kr: Option<String>,
|
||||
/// 法文
|
||||
pub fr_fr: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum NodeType {
|
||||
/// 虛擬資料夾(用戶建立,不對應實體路徑)
|
||||
Folder,
|
||||
/// 實體檔案(指向 file_uuid)
|
||||
File,
|
||||
/// 動態層級(auto-layer 產生)
|
||||
DynamicLayer,
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 SQLite Schema(user.sqlite)
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS file_nodes (
|
||||
node_id TEXT PRIMARY KEY,
|
||||
label TEXT NOT NULL,
|
||||
aliases_json TEXT NOT NULL DEFAULT '{}',
|
||||
file_uuid TEXT,
|
||||
parent_id TEXT,
|
||||
children_json TEXT NOT NULL DEFAULT '[]',
|
||||
node_type TEXT NOT NULL DEFAULT 'file',
|
||||
icon TEXT,
|
||||
color TEXT,
|
||||
bg_color TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
sort_order INTEGER NOT NULL DEFAULT 0,
|
||||
FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS file_registry (
|
||||
file_uuid TEXT PRIMARY KEY,
|
||||
original_name TEXT NOT NULL,
|
||||
file_size INTEGER,
|
||||
file_type TEXT,
|
||||
registered_at TEXT NOT NULL,
|
||||
last_seen_at TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'active'
|
||||
);
|
||||
```
|
||||
|
||||
### 4.3 Display Modes
|
||||
|
||||
用戶可切換四種展示模式(儲存在 `localStorage.display_mode`):
|
||||
|
||||
| 模式 | 枚舉值 | 說明 | 實作模組 |
|
||||
|------|--------|------|----------|
|
||||
| **List** | `list` | 列表檢視:名稱、大小、日期 | `modes/list.rs` |
|
||||
| **Tree** | `tree` | 樹狀檢視:展開/折疊層級 | `modes/tree.rs`(Phase 1) |
|
||||
| **Small Icon** | `grid_sm` | 小圖示網格:適合縮圖檢視 | `modes/grid_sm.rs` |
|
||||
| **Large Icon** | `grid_lg` | 大圖示網格:適合影片預覽 | `modes/grid_lg.rs` |
|
||||
|
||||
每種模式實作 `DisplayMode` trait(參見 §1.1)。
|
||||
|
||||
### 4.4 多語言別名
|
||||
|
||||
| 欄位 | 語言 | 用途 |
|
||||
|------|------|------|
|
||||
| `zh_tw` | 繁體中文 | 預設語言 |
|
||||
| `en_us` | 英文 | 國際使用 |
|
||||
| `ja_jp` | 日文 | 日本用戶 |
|
||||
| `ko_kr` | 韓文 | 韓國用戶 |
|
||||
| `fr_fr` | 法文 | 法國/國際用戶 |
|
||||
|
||||
用戶在前端選擇語言後,系統自動顯示對應別名。若該語言的別名不存在,fallback 到 `label`。
|
||||
|
||||
### 4.5 自動分層規則
|
||||
|
||||
系統根據預設規則自動為檔案建立虛擬層級:
|
||||
|
||||
| 規則 | 條件 | 層級結構 |
|
||||
|------|------|----------|
|
||||
| **by_type** | 相同副檔名 | `Videos/`、`Images/`、`Documents/`、`Audio/`、`Other/` |
|
||||
| **by_date** | 按建立日期 | `2026/`、`2026/05/`、`2026/05/14/` |
|
||||
| **by_size** | 按檔案大小 | `<10MB`、`10–100MB`、`100MB–1GB`、`>1GB` |
|
||||
|
||||
由 `auto_layer.rs` 實作,使用 `NodeType::DynamicLayer` 標記。
|
||||
|
||||
---
|
||||
|
||||
## 5. 群組分享
|
||||
|
||||
### 5.1 Group SQLite 架構(選項 A)
|
||||
|
||||
```
|
||||
data/groups/
|
||||
├── groups.sqlite # 群組註冊表(全域)
|
||||
│ └── groups(
|
||||
│ group_id INTEGER PRIMARY KEY,
|
||||
│ group_name TEXT,
|
||||
│ db_path TEXT, # 指向 1.sqlite
|
||||
│ created_by TEXT, # 建立者 user_id
|
||||
│ created_at TEXT
|
||||
│ )
|
||||
├── 1.sqlite # 群組 1 的共用資料
|
||||
└── 2.sqlite # 群組 2 的共用資料
|
||||
```
|
||||
|
||||
### 5.2 Group SQLite Schema
|
||||
|
||||
```sql
|
||||
-- groups/1.sqlite
|
||||
CREATE TABLE group_members (
|
||||
user_id TEXT NOT NULL,
|
||||
role TEXT NOT NULL DEFAULT 'viewer', -- owner / editor / viewer
|
||||
joined_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
PRIMARY KEY (user_id)
|
||||
);
|
||||
|
||||
CREATE TABLE group_files (
|
||||
file_uuid TEXT NOT NULL,
|
||||
added_by TEXT NOT NULL,
|
||||
added_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
PRIMARY KEY (file_uuid),
|
||||
FOREIGN KEY (added_by) REFERENCES group_members(user_id)
|
||||
);
|
||||
```
|
||||
|
||||
### 5.3 跨 DB 查詢(ATTACH)
|
||||
|
||||
```rust
|
||||
pub fn get_group_files(conn: &Connection, group_id: i64) -> Result<Vec<GroupFile>> {
|
||||
let group_db = format!("/data/groups/{}.sqlite", group_id);
|
||||
conn.execute_batch(&format!("ATTACH DATABASE '{}' AS g", group_db))?;
|
||||
|
||||
let mut stmt = conn.prepare("
|
||||
SELECT f.file_uuid, f.original_name, gm.role
|
||||
FROM main.file_registry f
|
||||
JOIN g.group_files gf ON f.file_uuid = gf.file_uuid
|
||||
JOIN g.group_members gm ON gf.added_by = gm.user_id
|
||||
")?;
|
||||
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 角色權限
|
||||
|
||||
| 角色 | 讀取 | 寫入 | 刪除 | 邀請成員 |
|
||||
|------|:----:|:----:|:----:|:----:|
|
||||
| owner | ✅ | ✅ | ✅ | ✅ |
|
||||
| editor | ✅ | ✅ | ❌ | ❌ |
|
||||
| viewer | ✅ | ❌ | ❌ | ❌ |
|
||||
|
||||
---
|
||||
|
||||
## 6. 儲存層級
|
||||
|
||||
### 6.1 三級定義
|
||||
|
||||
| 層級 | 符號 | 延遲 | 速度 | 成本 | 典型媒體 |
|
||||
|------|:----:|------|------|------|----------|
|
||||
| **Hot** | 🔥 | <10ms | 高速 | 高 | NVMe SSD / 內建硬碟 |
|
||||
| **Warm** | 🌡️ | 10–500ms | 中等 | 中 | NAS(網路掛載) |
|
||||
| **Cold** | ❄️ | >1s | 低速 | 低 | LTO 磁帶 / 外接 HDD |
|
||||
|
||||
### 6.2 掛載點設定
|
||||
|
||||
管理員可設定每個層級的掛載路徑:
|
||||
|
||||
```json
|
||||
{
|
||||
"tiers": {
|
||||
"hot": ["/Users/accusys/sftpgo/data", "/Volumes/RAID5/projects"],
|
||||
"warm": ["/Volumes/NAS_Archive"],
|
||||
"cold": ["/Volumes/LTO_Archive"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 自動歸檔規則
|
||||
|
||||
管理員可設定自動歸檔觸發條件:
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_archive": {
|
||||
"enabled": true,
|
||||
"rules": [
|
||||
{
|
||||
"condition": "idle_days > 90",
|
||||
"action": "move_to_warm",
|
||||
"schedule": "0 2 * * 0"
|
||||
},
|
||||
{
|
||||
"condition": "idle_days > 365",
|
||||
"action": "move_to_cold",
|
||||
"schedule": "0 3 * * 0"
|
||||
},
|
||||
{
|
||||
"condition": "tier_hot_usage > 80%",
|
||||
"action": "move_oldest_to_warm",
|
||||
"schedule": "0 * * * *"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.4 file_uuid 層級遷移
|
||||
|
||||
file_uuid **在遷移過程中不變**。檔案從 Hot 移到 Cold:
|
||||
|
||||
1. 複製檔案到 Cold tier 路徑
|
||||
2. 驗證完整性(SHA256)
|
||||
3. 寫入 `location_history` 記錄新位置
|
||||
4. 移除 Hot tier 的原始檔案
|
||||
5. `file_registry.last_seen_at` 更新
|
||||
|
||||
file_uuid 永遠指向 birth 時的 `physical_path_at_birth`(Hot 路徑),不因遷移而改變。
|
||||
|
||||
### 6.5 AI Agent — 按需資料流動
|
||||
|
||||
AI Agent 在底層自動管理資料流動,使用者無需知道檔案實際存放層級。
|
||||
|
||||
#### 架構
|
||||
|
||||
```
|
||||
User / Scheduler
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ AI Agent │
|
||||
│ • Monitor tier usage │
|
||||
│ • Detect hot/cold patterns │
|
||||
│ • Trigger auto-archive │
|
||||
│ • Restore on access (prefetch) │
|
||||
└──────────┬──────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ Transfer Engine │
|
||||
│ Direct (std::fs::copy) │
|
||||
│ Rsync (delta + checksum) │
|
||||
│ S3 / SFS / NFS / CDN │
|
||||
└──────────┬──────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ file_locations │
|
||||
│ (single source of truth) │
|
||||
│ M2 M4 M5 Cloud LTO │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### 自動歸檔規則
|
||||
|
||||
| 觸發條件 | 動作 | Transfer Engine |
|
||||
|----------|------|:--:|
|
||||
| `idle_days > 90` | move to Warm | Rsync + checksum verify |
|
||||
| `idle_days > 365` | move to Cold | Tar + checksum verify |
|
||||
| `hot_tier_usage > 80%` | move oldest to Warm | Rsync —progress |
|
||||
| user accesses cold file | restore to Hot | Rsync prefetch |
|
||||
|
||||
#### 流程範例
|
||||
|
||||
```
|
||||
1. AI Agent 偵測 Charade_1963.mp4 閒置 120 天
|
||||
2. rsync -avP --checksum → /Volumes/NAS_Archive/
|
||||
3. POST /api/v2/files/aeed7134.../locations
|
||||
{"location": "/Volumes/NAS_Archive/Charade_1963.mp4",
|
||||
"label": "M4-warm"}
|
||||
4. 移除 Hot tier 位置(或保留為參考)
|
||||
5. 使用者查詢檔案資訊 → 看到所有層級,無需知道實際位置
|
||||
```
|
||||
|
||||
#### 設計原則
|
||||
|
||||
| 原則 | 說明 |
|
||||
|------|------|
|
||||
| 透明遷移 | 使用者查詢 `file_locations` 始終得到一致視圖 |
|
||||
| 不變標識 | `file_uuid` 在遷移過程中不變 |
|
||||
| 位置追蹤 | 每次遷移後更新 `file_locations`,舊位置可選擇保留為歷史參考 |
|
||||
| 驗證完整性 | 遷移後執行 SHA256 校驗(Rsync `--checksum` 或手動比對) |
|
||||
| 類似記憶體階層 | Agent 是記憶體控制器:Hot=快取、Warm=主記憶體、Cold=磁碟 |
|
||||
|
||||
```
|
||||
|
||||
用戶查詢檔案 → 始終看到一致視圖(單一來源真相:file_locations)
|
||||
↑
|
||||
Transfer Engine(rsync / Direct / S3 / SFS / CDN)
|
||||
↑
|
||||
AI Agent(監控 tier 用量、偵測冷熱模式、自動歸檔、預取)
|
||||
↑
|
||||
Storage Tiers(M2 Hot → M4 Warm → M5 Cold → LTO)
|
||||
```
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS location_history (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
file_uuid TEXT NOT NULL,
|
||||
location TEXT NOT NULL, -- 實際檔案路徑
|
||||
tier TEXT NOT NULL, -- hot / warm / cold
|
||||
moved_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
reason TEXT,
|
||||
moved_by TEXT,
|
||||
verified INTEGER DEFAULT 0, -- 完整性驗證通過
|
||||
FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_location_history_file_uuid ON location_history(file_uuid);
|
||||
```
|
||||
|
||||
查詢目前位置:
|
||||
|
||||
```sql
|
||||
SELECT location, tier
|
||||
FROM location_history
|
||||
WHERE file_uuid = ?
|
||||
ORDER BY moved_at DESC
|
||||
LIMIT 1;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. 檔案操作 API
|
||||
|
||||
### 7.1 操作總覽
|
||||
|
||||
| 操作 | API | 說明 |
|
||||
|------|-----|------|
|
||||
| **Compress** | `POST /api/v2/files/compress` | 壓縮為 .zip 或 .tar.gz |
|
||||
| **Transfer** | `POST /api/v2/files/transfer` | 複製/移動到 target tier |
|
||||
| **Archive** | `POST /api/v2/files/archive` | 歸檔到 Cold tier |
|
||||
| **Restore** | `POST /api/v2/files/restore` | 從 Cold tier 還原到 Hot tier |
|
||||
| **Exit** | `POST /api/v2/files/exit` | 從 MarkBase 移除(保留記錄) |
|
||||
|
||||
### 7.2 壓縮
|
||||
|
||||
```rust
|
||||
// Compress 請求
|
||||
{
|
||||
"file_uuids": ["uuid1", "uuid2"],
|
||||
"format": "zip", // "zip" | "tar.gz"
|
||||
"output_path": "/path/to/output.zip"
|
||||
}
|
||||
|
||||
// Compress 回應
|
||||
{
|
||||
"status": "completed",
|
||||
"output_path": "/path/to/output.zip",
|
||||
"file_count": 2,
|
||||
"compressed_size": 1048576
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 Transfer(層級遷移)
|
||||
|
||||
#### 請求/回應
|
||||
|
||||
```rust
|
||||
// Transfer 請求
|
||||
{
|
||||
"file_uuids": ["uuid1"],
|
||||
"target_tier": "cold",
|
||||
"target_path": "/Volumes/LTO_Archive/2026/",
|
||||
"delete_source": false
|
||||
}
|
||||
|
||||
// Transfer 回應
|
||||
{
|
||||
"status": "completed",
|
||||
"file_uuid": "uuid1",
|
||||
"new_location": "/Volumes/LTO_Archive/2026/uuid1.mp4",
|
||||
"new_tier": "cold"
|
||||
}
|
||||
```
|
||||
|
||||
#### Transfer Engine 實作流程
|
||||
|
||||
```
|
||||
TransferEngine::execute(source, target, opts)
|
||||
│
|
||||
├── 1. select_mode(source)
|
||||
│ │
|
||||
│ ├── size < 50MB ──→ DirectMode
|
||||
│ └── size >= 50MB ──→ RsyncMode (fallback: DirectMode)
|
||||
│
|
||||
├── 2. preflight (RsyncMode)
|
||||
│ ├── rsync -an --checksum source/ target/
|
||||
│ └── 回傳變更清單,供用戶確認
|
||||
│
|
||||
├── 3. transfer
|
||||
│ │
|
||||
│ ├── DirectMode: std::fs::copy + progress callback
|
||||
│ │
|
||||
│ └── RsyncMode: rsync -avP --checksum source target
|
||||
│ ├── -a archive mode
|
||||
│ ├── -v verbose (進度)
|
||||
│ ├── -P --partial (續傳) + --progress (進度)
|
||||
│ └── -c checksum mode (SHA256 驗證替代 time/size)
|
||||
│
|
||||
├── 4. verify (RsyncMode)
|
||||
│ └── rsync -acn source target (dry-run checksum,應為空)
|
||||
│
|
||||
├── 5. update location_history
|
||||
│ └── INSERT INTO location_history (file_uuid, location, tier, ...)
|
||||
│
|
||||
└── 6. cleanup
|
||||
└── if delete_source: remove source file
|
||||
```
|
||||
|
||||
#### Rsync vs Direct 選擇
|
||||
|
||||
| 條件 | 模式 | 原因 |
|
||||
|------|:----:|------|
|
||||
| `file_size < 50 MB` | Direct | rsync overhead > 效益 |
|
||||
| `file_size >= 50 MB` 且 rsync 存在 | Rsync | 增量、續傳、校驗和 |
|
||||
| `file_size >= 50 MB` 且 rsync 不存在 | Direct | 優雅 fallback |
|
||||
|
||||
### 7.4 Archive / Restore
|
||||
|
||||
Archive 為 Transfer 到 Cold tier 的便捷包裝。
|
||||
Restore 為從 Cold tier 還原到 Hot tier 的便捷包裝。
|
||||
|
||||
```rust
|
||||
// Restore 請求
|
||||
{
|
||||
"file_uuid": "uuid1",
|
||||
"target_path": "/Users/demo/restored/" // 選填,預設為原始 birth path
|
||||
}
|
||||
|
||||
// Restore 回應
|
||||
{
|
||||
"status": "completed",
|
||||
"file_uuid": "uuid1",
|
||||
"restored_to": "/Users/demo/restored/uuid1.mp4"
|
||||
}
|
||||
```
|
||||
|
||||
### 7.5 Exit 記錄
|
||||
|
||||
檔案移出 MarkBase 管理時,保留記錄以供審計:
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS exit_records (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
file_uuid TEXT NOT NULL,
|
||||
original_name TEXT NOT NULL,
|
||||
exited_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
exited_by TEXT NOT NULL,
|
||||
reason TEXT,
|
||||
last_location TEXT,
|
||||
FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
|
||||
);
|
||||
```
|
||||
|
||||
```rust
|
||||
// Exit 請求
|
||||
{
|
||||
"file_uuid": "uuid1",
|
||||
"reason": "Project completed, moved to long-term archive"
|
||||
}
|
||||
|
||||
// Exit 回應
|
||||
{
|
||||
"status": "completed",
|
||||
"file_uuid": "uuid1",
|
||||
"exited_at": "2026-05-14T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. API 參考
|
||||
|
||||
### 8.1 Tree API
|
||||
|
||||
| 方法 | 路徑 | 說明 |
|
||||
|------|------|------|
|
||||
| `GET` | `/api/v2/tree/:user_id` | 取得用戶的完整虛擬樹 |
|
||||
| `GET` | `/api/v2/tree/:user_id?mode=list` | 以特定模式取得樹 |
|
||||
| `POST` | `/api/v2/tree/:user_id/node` | 建立新節點 |
|
||||
| `PUT` | `/api/v2/tree/:user_id/node/:node_id` | 更新節點(label、icon、color、aliases) |
|
||||
| `DELETE` | `/api/v2/tree/:user_id/node/:node_id` | 刪除節點 |
|
||||
| `PUT` | `/api/v2/tree/:user_id/node/:node_id/move` | 移動節點(變更 parent) |
|
||||
| `PATCH` | `/api/v2/tree/:user_id/node/:node_id/alias` | 更新特定語言的別名 |
|
||||
|
||||
### 8.2 File API
|
||||
|
||||
| 方法 | 路徑 | 說明 |
|
||||
|------|------|------|
|
||||
| `GET` | `/api/v2/files/:file_uuid` | 取得檔案資訊 |
|
||||
| `POST` | `/api/v2/files/compress` | 壓縮檔案 |
|
||||
| `POST` | `/api/v2/files/transfer` | 轉移檔案到 target tier |
|
||||
| `POST` | `/api/v2/files/archive` | 歸檔到 Cold tier |
|
||||
| `POST` | `/api/v2/files/restore` | 從 Cold tier 還原 |
|
||||
| `POST` | `/api/v2/files/exit` | 移出管理 |
|
||||
| `GET` | `/api/v2/files/:file_uuid/locations` | 查詢位置歷史 |
|
||||
| `POST` | `/api/v2/files/validate` | 驗證檔案完整性(SHA256) |
|
||||
|
||||
### 8.3 Mount API
|
||||
|
||||
| 方法 | 路徑 | 說明 |
|
||||
|------|------|------|
|
||||
| `GET` | `/api/v2/mounts` | 列出所有掛載點 |
|
||||
| `POST` | `/api/v2/mounts` | 註冊新的掛載點 |
|
||||
| `PUT` | `/api/v2/mounts/:mount_id` | 更新掛載點 |
|
||||
| `DELETE` | `/api/v2/mounts/:mount_id` | 移除掛載點 |
|
||||
| `GET` | `/api/v2/mounts/:mount_id/status` | 查詢掛載點狀態(是否在線、容量) |
|
||||
|
||||
### 8.4 Group API
|
||||
|
||||
| 方法 | 路徑 | 說明 |
|
||||
|------|------|------|
|
||||
| `GET` | `/api/v2/groups` | 列出所有群組 |
|
||||
| `POST` | `/api/v2/groups` | 建立新群組 |
|
||||
| `DELETE` | `/api/v2/groups/:group_id` | 刪除群組 |
|
||||
| `POST` | `/api/v2/groups/:group_id/members` | 邀請成員 |
|
||||
| `DELETE` | `/api/v2/groups/:group_id/members/:user_id` | 移除成員 |
|
||||
| `PUT` | `/api/v2/groups/:group_id/members/:user_id/role` | 變更角色 |
|
||||
| `POST` | `/api/v2/groups/:group_id/files` | 分享檔案到群組 |
|
||||
| `DELETE` | `/api/v2/groups/:group_id/files/:file_uuid` | 從群組移除檔案 |
|
||||
| `GET` | `/api/v2/groups/:group_id/files` | 列出群組檔案 |
|
||||
|
||||
---
|
||||
|
||||
## 9. 決策記錄
|
||||
|
||||
| # | 日期 | 決策 | 理由 |
|
||||
|---|------|------|------|
|
||||
| 1 | 2026-05-13 | Rust modular architecture (DisplayMode trait) | 與 Momentry Core 相同生態,模組化利於擴展 |
|
||||
| 2 | 2026-05-13 | One user = one SQLite | 用戶隔離、簡單部署、檔案可攜 |
|
||||
| 3 | 2026-05-13 | Group Share → Option A (Group SQLite) | 獨立可攜、不需專屬 server、備份簡單 |
|
||||
| 4 | 2026-05-13 | Hot/Warm/Cold 三級儲存 | 真實世界檔案管理需求,結合 LTO/NAS/SSD |
|
||||
| 5 | 2026-05-13 | Auto-archive rules (admin-configurable) | 減少手動管理,idle days + tier 容量觸發 |
|
||||
| 6 | 2026-05-14 | file_uuid 從 Momentry Core 繼承,不重新計算 | 唯一來源,避免不一致 |
|
||||
| 7 | 2026-05-14 | file_uuid 不因層級遷移而改變 | 凍結在 birth 時刻,確保身份穩定 |
|
||||
| 8 | 2026-05-14 | Display mode 儲存在 localStorage | 純 UI 偏好,不需後端儲存 |
|
||||
| 9 | 2026-05-14 | 檔案操作 API-first | 後端邏輯完成後再加 UI(壓縮、傳輸、歸檔) |
|
||||
| 10 | 2026-05-14 | Exit records(保留記錄) | 審計需求,不直接刪除記錄 |
|
||||
| 11 | 2026-05-14 | rusqlite (同步) + spawn_blocking (異步包裝) | 避免整個堆疊都必須 async,保持簡單 |
|
||||
| 12 | 2026-05-14 | ATTACH DATABASE for Group Share 跨 DB 查詢 | 一行 SQL,不需 Rust 端合併 |
|
||||
| 13 | 2026-05-14 | notify crate (僅 Hot tier) | 減少資源消耗,Warm/Cold 變更頻率低 |
|
||||
| 14 | 2026-05-14 | zip + tar crate (不用外部 CLI) | 跨平台,不需 ditto/hdiutil |
|
||||
| 15 | 2026-05-14 | Momentry Core 整合 A+B 混合模式 | 輕量運算用 crate,重查詢用 HTTP API |
|
||||
| 16 | 2026-05-14 | AI Agent 按需資料流動 | 透明遷移、類似記憶體階層、自動冷熱管理 |
|
||||
| 17 | 2026-05-14 | file_locations 支援任意 URI | /path、s3://、sfs://、ipfs://、https://、\\SMB\path |
|
||||
|
||||
---
|
||||
|
||||
## 10. 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-12 | 初版設計(Demo Display + Knowledge Graph) | M4 / OpenCode | DeepSeek V4 Pro |
|
||||
| V2.0 | 2026-05-14 | 虛擬檔案樹、Group Share、儲存層級、技術棧、file_uuid、檔案操作 API、AI Agent 按需資料流動、跨平台 multi-location | M4 / OpenCode | DeepSeek V4 Pro |
|
||||
730
docs_v1.0/DESIGN/MARKBASE_DESIGN_v1.0.0.md
Normal file
730
docs_v1.0/DESIGN/MARKBASE_DESIGN_v1.0.0.md
Normal file
@@ -0,0 +1,730 @@
|
||||
# MarkBase — Momentry 專屬 Display Engine 設計方案 v1.0
|
||||
|
||||
## 產品定位
|
||||
|
||||
**MarkBase** 是 Momentry 專屬的 Display Engine,擔任 **demo runner 的固定顯示器**。
|
||||
|
||||
不只是 Markdown 閱讀器,而是一個可控的內容呈現視窗,能夠動態展示:
|
||||
|
||||
| 內容類型 | 展示方式 |
|
||||
|----------|----------|
|
||||
| .md 文件 | 渲染為排版清晰的 HTML |
|
||||
| Mermaid 圖表 | 流程圖、時序圖、ER 圖等 |
|
||||
| API 回應 JSON | 語法高亮的格式化 JSON |
|
||||
| 影片 | 嵌入 video player(支援 HLS / MP4)|
|
||||
| 圖片 | 支援單張或輪播 |
|
||||
| HTML | 直接內嵌 |
|
||||
| 文字/程式碼 | syntax highlight |
|
||||
|
||||
**定位一句話:** *Demo runner 的 presentation layer,一個專注、乾淨、可控的內容顯示器。*
|
||||
|
||||
| 面向 | 說明 |
|
||||
|------|------|
|
||||
| 願景 | Momentry 生態系的 UI 輸出終端 |
|
||||
| 核心場景 | demo runner 的固定 display 視窗 |
|
||||
| 平台 | macOS native(Rust + axum + Tauri WebView)|
|
||||
| 授權 | Momentry 專屬工具,隨 momentry_core 發布 |
|
||||
|
||||
---
|
||||
|
||||
## 命名
|
||||
|
||||
**MarkBase** — Markdown + Display Base
|
||||
|
||||
> 承載所有內容類型的顯示基底。
|
||||
> 簡短、好記、產品感。
|
||||
|
||||
---
|
||||
|
||||
## 階段規劃
|
||||
|
||||
### Phase 0:Demo Display(MVP — 立即價值)
|
||||
|
||||
**目標**:取代 md_reader + 影片播放,成為 demo runner 的固定顯示視窗
|
||||
|
||||
| 功能 | 說明 |
|
||||
|------|------|
|
||||
| 文件渲染 | CommonMark + GFM(表格、task list、strikethrough、footnotes)|
|
||||
| Mermaid 圖表 | 內建渲染(無需 CDN),支援 flowchart / sequence / class / ER / mindmap |
|
||||
| 程式碼高亮 | syntax highlighting(支援 50+ 語言)|
|
||||
| JSON 格式化 | API response 自動格式化 + 語法高亮 |
|
||||
| 影片播放 | MP4 / HLS 嵌入播放(取代 browser 開啟 trace video)|
|
||||
| 全螢幕 mode | 乾淨無干擾的展示模式,適合 presentation |
|
||||
| CLI 控制 | 透過 stdin / HTTP 動態載入內容,無需重新啟動 |
|
||||
| 與 demo runner 整合 | `--display` flag 啟動作為固定顯示視窗 |
|
||||
|
||||
#### Demo Runner 整合流程
|
||||
|
||||
```
|
||||
demo_runner.py --display MarkBase.app (固定顯示視窗)
|
||||
┌────────────────────┐ ┌────────────────────┐
|
||||
│ Step 3: Markdown │ ──HTTP──▶│ 渲染 GUIDE.md │
|
||||
│ Step 11: Trace 5 │ ──HTTP──▶│ 播放 trace_5.mp4 │
|
||||
│ Step 13: 3D Cube │ ──HTTP──▶│ 顯示 iframe: portal │
|
||||
│ Step 22: API resp │ ──HTTP──▶│ 顯示格式化 JSON │
|
||||
└────────────────────┘ └────────────────────┘
|
||||
(控制端) (顯示端)
|
||||
```
|
||||
|
||||
demo runner 透過 `--display` 啟動 MarkBase 作為顯示視窗,然後每步透過 HTTP 推送內容:
|
||||
|
||||
```python
|
||||
# demo_runner.py 範例
|
||||
step_type = "markdown" → POST /display {"type":"md","file":"GUIDE.md"}
|
||||
step_type = "video" → POST /display {"type":"video","url":"trace_5.mp4"}
|
||||
step_type = "curl" → POST /display {"type":"json","data":response}
|
||||
step_type = "browser" → POST /display {"type":"url","url":"..."}
|
||||
```
|
||||
|
||||
### Phase 2:Knowledge Base
|
||||
|
||||
**目標**:從閱讀器升級為個人知識庫管理器
|
||||
|
||||
| 功能 | 說明 |
|
||||
|------|------|
|
||||
| 多文件索引 | 監控目錄,自動索引所有 .md |
|
||||
| 全文檢索 | 跨文件模糊搜尋 + 標題索引 |
|
||||
| 標籤管理 | YAML frontmatter tags → 標籤雲 |
|
||||
| Backlinks | 文件間的雙向連結([[wiki-link]])|
|
||||
| 收藏/書籤 | 標記常用文件 |
|
||||
| 閱讀歷史 | 最近開啟 / 最近搜尋 |
|
||||
|
||||
### Phase 3:Collaboration
|
||||
|
||||
**目標**:多人協作與發布
|
||||
|
||||
| 功能 | 說明 |
|
||||
|------|------|
|
||||
| 評論/註釋 | 段落層級註解 |
|
||||
| 版本歷史 | git-based diff 檢視 |
|
||||
| 靜態站點生成 | .md → 整站 HTML(用於發布)|
|
||||
| Web 版本 | 瀏覽器可讀(可選自托管)|
|
||||
|
||||
---
|
||||
|
||||
## CLI 設計(Portal / Demo 使用)
|
||||
|
||||
### 主要命令
|
||||
|
||||
```
|
||||
markbase display ← 啟動顯示視窗(blocking,等待 HTTP 控制)
|
||||
markbase display "GUIDE.md" ← 啟動並立刻顯示文件
|
||||
markbase preview "GUIDE.md" ← (保留) 單次預覽,不回傳控制權
|
||||
markbase render "GUIDE.md" ← (保留) 輸出 HTML 到 stdout
|
||||
```
|
||||
|
||||
### display — 核心命令(給 demo runner 使用)
|
||||
|
||||
```bash
|
||||
# 啟動顯示視窗,demo runner 透過 HTTP 控制
|
||||
markbase display
|
||||
|
||||
# 指定控制埠(預設 11438)
|
||||
markbase display --port 11438
|
||||
|
||||
# 全螢幕模式
|
||||
markbase display --fullscreen
|
||||
|
||||
# 啟動時先顯示文件
|
||||
markbase display GUIDE.md
|
||||
```
|
||||
|
||||
### HTTP 控制 API(display 模式下啟用)
|
||||
|
||||
`markbase display` 啟動後在 `localhost:11438` 監聽控制請求:
|
||||
|
||||
```bash
|
||||
# 顯示 .md 文件
|
||||
curl -X POST http://localhost:11438/display \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"type":"md","file":"/path/to/doc.md","focus":"API 搜尋"}'
|
||||
|
||||
# 播放影片
|
||||
curl -X POST http://localhost:11438/display \
|
||||
-d '{"type":"video","url":"/path/to/trace.mp4","start":10,"end":30}'
|
||||
|
||||
# 顯示格式化 JSON
|
||||
curl -X POST http://localhost:11438/display \
|
||||
-d '{"type":"json","data":"{\"status\":\"ok\"}"}'
|
||||
|
||||
# 內嵌網頁
|
||||
curl -X POST http://localhost:11438/display \
|
||||
-d '{"type":"url","url":"http://localhost:1420/trace-viz/..."}'
|
||||
|
||||
# 顯示圖片
|
||||
curl -X POST http://localhost:11438/display \
|
||||
-d '{"type":"image","url":"/path/to/thumbnail.jpg"}'
|
||||
|
||||
# 控制命令
|
||||
curl -X POST http://localhost:11438/control \
|
||||
-d '{"cmd":"fullscreen"}'
|
||||
curl -X POST http://localhost:11438/control \
|
||||
-d '{"cmd":"zoom","level":1.5}'
|
||||
curl -X POST http://localhost:11438/control \
|
||||
-d '{"cmd":"close"}'
|
||||
```
|
||||
|
||||
### demo_runner.py 整合
|
||||
|
||||
```python
|
||||
class MarkBaseDisplay:
|
||||
"""控制 MarkBase 顯示視窗。"""
|
||||
def __init__(self, port=11438):
|
||||
self.port = port
|
||||
self.process = None
|
||||
|
||||
def start(self):
|
||||
self.process = subprocess.Popen(["markbase", "display",
|
||||
"--port", str(self.port)], ...)
|
||||
time.sleep(1) # wait for server
|
||||
|
||||
def show(self, type, **kwargs):
|
||||
"""顯示內容。type: md/video/json/url/image"""
|
||||
body = {"type": type, **kwargs}
|
||||
requests.post(f"http://localhost:{self.port}/display", json=body)
|
||||
|
||||
def show_step(self, step):
|
||||
"""根據 demo step 類型自動選擇顯示方式。"""
|
||||
t = step["type"]
|
||||
if t == "curl":
|
||||
self.show("json", data=run_curl(step["cmd"]))
|
||||
elif t == "browser":
|
||||
self.show("url", url=step["url"])
|
||||
elif t == "markdown":
|
||||
self.show("md", file=step["cmd"], focus=step.get("focus"))
|
||||
elif t == "video":
|
||||
self.show("video", url=step.get("url"))
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 技術架構
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ MarkBase App │
|
||||
├─────────────────┬───────────────────────┤
|
||||
│ Frontend │ Engine │
|
||||
│ (SwiftUI) │ (Rust core) │
|
||||
│ │ │
|
||||
│ • 視窗管理 │ • 解析 .md → AST │
|
||||
│ • 選單、快捷鍵 │ • Mermaid 渲染 │
|
||||
│ • 設定介面 │ • Code highlight │
|
||||
│ • 搜尋 UI │ • 全文索引 │
|
||||
│ • 目錄樹 │ • 文件監控 │
|
||||
└─────────────────┴───────────────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
macOS Native API Rust 二進制
|
||||
(WebKit + Swift) (pulldown-cmark + syntect + mermaid-rs)
|
||||
```
|
||||
|
||||
### 為什麼 Engine 用 Rust?
|
||||
|
||||
| 原因 | 說明 |
|
||||
|------|------|
|
||||
| 效能 | 大型 .md 文件(1000+ 行)瞬間渲染 |
|
||||
| 無 runtime | 單一二進制,無 Node.js/Python 依賴 |
|
||||
| 現有基礎 | 可直接重用 md_reader 的 rendering 邏輯 |
|
||||
| Mermaid 內嵌 | 可用 mermaid-rs crate 替代 CDN |
|
||||
|
||||
### 為什麼 Frontend 用 SwiftUI?
|
||||
|
||||
| 原因 | 說明 |
|
||||
|------|------|
|
||||
| Native 體驗 | macOS native 視窗、menu bar、快捷鍵 |
|
||||
| WebKit 整合 | 直接嵌入 WKWebView 渲染 HTML |
|
||||
| 系統整合 | Spotlight、QuickLook、分享功能 |
|
||||
| 效能 | 比 Electron 省 200MB+ 記憶體 |
|
||||
|
||||
---
|
||||
|
||||
## UI 設計
|
||||
|
||||
### 主視窗佈局
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────┐
|
||||
│ Menu Bar: File Edit View Window Help │
|
||||
├──────────┬─────────────────────────────────────┤
|
||||
│ │ │
|
||||
│ 左側欄 │ 主內容區 │
|
||||
│ ────── │ ───────────────── │
|
||||
│ 📁 文件 │ # 標題 │
|
||||
│ ├ README│ 正文... │
|
||||
│ ├ Guide│ ```code block``` │
|
||||
│ └ API │ 表格 │
|
||||
│ │ [Mermaid diagram] │
|
||||
│ 目錄 │ │
|
||||
│ ────── │ │
|
||||
│ • Introduction│ │
|
||||
│ • Getting...│ │
|
||||
│ • API Ref │ │
|
||||
│ │ │
|
||||
├──────────┴─────────────────────────────────────┤
|
||||
│ Status Bar: 字數 | 段落 | UTF-8 | dark mode toggle│
|
||||
└────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 快捷鍵
|
||||
|
||||
| 按鍵 | 功能 |
|
||||
|------|------|
|
||||
| `Cmd+O` | 開啟 .md 文件 |
|
||||
| `Cmd+F` | 全文搜尋 |
|
||||
| `Cmd+Shift+F` | 跨文件搜尋 |
|
||||
| `Cmd++` / `Cmd+-` | 調整字級 |
|
||||
| `Cmd+D` | Toggle dark mode |
|
||||
| `Cmd+B` | 左側目錄 toggle |
|
||||
| `Cmd+P` | 列印 / PDF 匯出 |
|
||||
| `Esc` | 關閉搜尋 / 回到瀏覽 |
|
||||
|
||||
---
|
||||
|
||||
## 目錄結構
|
||||
|
||||
```
|
||||
markbase/
|
||||
├── Cargo.toml # Rust core
|
||||
├── src/
|
||||
│ ├── main.rs # CLI entry point
|
||||
│ ├── render.rs # .md → HTML
|
||||
│ ├── highlight.rs # Code syntax highlighting
|
||||
│ ├── mermaid.rs # Mermaid rendering
|
||||
│ ├── search.rs # Full-text search
|
||||
│ └── watch.rs # File watcher
|
||||
├── app/ # SwiftUI app
|
||||
│ ├── MarkBase.xcodeproj
|
||||
│ ├── MarkBase/
|
||||
│ │ ├── ContentView.swift
|
||||
│ │ ├── SidebarView.swift
|
||||
│ │ ├── SearchView.swift
|
||||
│ │ └── SettingsView.swift
|
||||
│ └── markbase-cli # Embedded Rust binary
|
||||
└── docs/
|
||||
└── ARCHITECTURE.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 與現有 md_reader 的差異
|
||||
|
||||
| 面向 | md_reader | MarkBase |
|
||||
|------|-----------|----------|
|
||||
| 語言 | 純 Rust CLI | Rust engine + SwiftUI app |
|
||||
| 架構 | 單一 main.rs 1134 行 | 模組化 6+ 檔案 |
|
||||
| 視窗 | 簡陋的 WebKit 視窗 | 完整 SwiftUI + WKWebView |
|
||||
| 搜尋 | ❌ 無 | ✅ Cmd+F + 跨文件搜尋 |
|
||||
| 目錄 | ❌ 無 | ✅ 左側 heading tree |
|
||||
| File watcher | ❌ 無 | ✅ 自動索引目錄 |
|
||||
| dark mode | ❌ 無 | ✅ 系統跟隨 + 手動 |
|
||||
| Mermaid | CDN-based | 內建引擎 |
|
||||
| Code highlight | ❌ 無 | ✅ syntect 50+ 語言 |
|
||||
| 命名 | 功能描述 | 產品品牌 |
|
||||
|
||||
---
|
||||
|
||||
## 技術選型記錄
|
||||
|
||||
> 2026-05-12 新增
|
||||
|
||||
### 1. 轉檔引擎
|
||||
|
||||
| 工具 | License | 用途 |
|
||||
|------|---------|------|
|
||||
| pandoc 3.9 | GPL 2.0 | MD ↔ DOCX/PPTX/PDF |
|
||||
| LibreOffice 26.2 | Apache 2.0 | 任何格式 ↔ 任何格式 (headless CLI) |
|
||||
| mmdc | MIT | Mermaid → SVG/PNG |
|
||||
| rsvg-convert | LGPL | SVG → PNG |
|
||||
|
||||
### 2. 編輯器選型
|
||||
|
||||
| 方案 | 決策 | 理由 |
|
||||
|------|:--:|------|
|
||||
| CodeMirror 6 | ✅ 選用 | MIT, 190KB gzip, CDN 免 npm, 模組化 |
|
||||
| Monaco (VS Code) | ❌ | 5MB 太大,需 webpack |
|
||||
| Ace | ❌ | 維護停滯 |
|
||||
|
||||
### 3. Markdown 生態分析
|
||||
|
||||
| 工具 | License | 類型 | MarkBase 啟發 |
|
||||
|------|---------|------|--------------|
|
||||
| glow | MIT | CLI 渲染 | 保留為獨立 CLI viewer |
|
||||
| MarkText | MIT | WYSIWYG GUI | 參考 split-pane 編輯/預覽設計 |
|
||||
| mdcat | MPL 2.0 | CLI | 參考 terminal 圖片渲染 |
|
||||
| bat | MIT/Apache | CLI | 參考語法高亮策略 |
|
||||
| mdBook | MPL 2.0 | CLI | 作為靜態文件站匯出格式 |
|
||||
| MkDocs | BSD | CLI | 備選文件站方案 |
|
||||
| Obsidian | Proprietary | Desktop PKM | 參考 `[[wiki links]]`、graph view、backlinks |
|
||||
|
||||
### 4. 桌面 vs Web
|
||||
|
||||
| 決策 | 選擇 | 理由 |
|
||||
|------|:--:|------|
|
||||
| Web first | ✅ | 任何裝置可用,同一份 HTML/JS/CSS |
|
||||
| Tauri shell | ✅ 可選 | <10MB, 跨平台 macOS/Win/Linux |
|
||||
| Electron | ❌ | 300MB 過於肥大 |
|
||||
|
||||
### 5. MarkBase vs Obsidian 定位
|
||||
|
||||
| | Obsidian | MarkBase |
|
||||
|------|:--:|:--:|
|
||||
| 定位 | 個人知識管理 (PKM) | **文件處理引擎 + 編輯器** |
|
||||
| 資料格式 | .md only | 全格式 (via soffice) |
|
||||
| 搜尋 | 全文 | RAG + embedding (Qdrant) |
|
||||
| 後端 | 無 | axum HTTP + PSQL + Qdrant |
|
||||
| CLI | 無 | ✅ CLI first |
|
||||
| Pipeline | 無 | ✅ Chunking + LLM pipeline |
|
||||
| 跨裝置 | 付費 sync | 自建 server 即可 |
|
||||
| 大小 | ~300MB (Electron) | <10MB (Tauri) |
|
||||
| 授權 | Proprietary (個人免費) | Momentry 專屬 |
|
||||
|
||||
### 6. CLI 設計
|
||||
|
||||
```
|
||||
markbase display [--port 11438] [FILE] 啟動顯示伺服器
|
||||
markbase render <FILE> [-o output.html] Markdown → HTML
|
||||
markbase serve <DIR> 檔案瀏覽 + 編輯器 (計畫中)
|
||||
```
|
||||
|
||||
### 7. 架構對比
|
||||
|
||||
```
|
||||
Obsidian: MarkBase:
|
||||
┌──────────────────────┐ ┌──────────────────────┐
|
||||
│ Electron Shell │ │ Tauri / Browser │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ Renderer │ │ │ │ Renderer │ │
|
||||
│ │ ├─ CodeMirror │ │ │ │ ├─ CodeMirror │ │ ← 相同
|
||||
│ │ ├─ Graph/D3 │ │ │ │ ├─ Mermaid.js │ │ ← 相同
|
||||
│ │ ├─ Mermaid.js │ │ 相同 │ │ └─ pulldown │ │
|
||||
│ │ └─ MathJax │ │ │ └────────────────┘ │
|
||||
│ └────────────────┘ │ │ ┌────────────────┐ │
|
||||
│ ┌────────────────┐ │ │ │ Rust Backend │ │ ← MarkBase 獨有
|
||||
│ │ Plugin API │ │ │ │ ├─ axum HTTP │ │
|
||||
│ │ 1,800+ plugins │ │ │ │ ├─ Embedding │ │
|
||||
│ └────────────────┘ │ │ │ ├─ Qdrant ANN │ │
|
||||
│ ┌────────────────┐ │ │ │ ├─ pgvector │ │
|
||||
│ │ FS Access │ │ │ │ ├─ PG TKG │ │
|
||||
│ │ .md files only │ │ │ │ ├─ SQLite TKG │ │
|
||||
│ │ └────────────────┘ │ │ │ ├─ sqlite-vec │ │
|
||||
│ └──────────────────────┘ │ │ └─ Pipeline │ │
|
||||
```
|
||||
|
||||
### 8. 向量儲存:sqlite-vec + Datasette
|
||||
|
||||
> 2026-05-12 採用
|
||||
|
||||
#### 選型
|
||||
|
||||
| 需求 | pgvector (PG) | Qdrant | sqlite-vec | 決策 |
|
||||
|------|:--:|:--:|:--:|:--:|
|
||||
| Production API (3003) | ✅ | — | — | pgvector (已有) |
|
||||
| HNSW ANN 搜尋 | ⚠️ | ✅ | — | Qdrant (已有) |
|
||||
| Desktop 本機 RAG | ❌ 需裝 PG | ❌ 需 server | ✅ 單檔 | sqlite-vec |
|
||||
| 檔案包內嵌向量 | ❌ | ❌ | ✅ 隨包分發 | sqlite-vec |
|
||||
| 離線可用 | ❌ | ❌ | ✅ | sqlite-vec |
|
||||
| Web UI 查詢 | — | — | via Datasette | Datasette |
|
||||
|
||||
#### sqlite-vec 規格
|
||||
|
||||
| 屬性 | 值 |
|
||||
|------|-----|
|
||||
| License | MIT + Apache 2.0(雙授權) |
|
||||
| 作者 | Alex Garcia |
|
||||
| 贊助 | Mozilla Builders + Fly.io + Turso + SQLite Cloud |
|
||||
| Stars | 7,600+ |
|
||||
| 語言 | Pure C,零依賴 |
|
||||
| 大小 | ~200KB `.dylib` |
|
||||
| ANN 引擎 | exhaustive, IVF, DiskANN |
|
||||
| Rust binding | `cargo add sqlite-vec` |
|
||||
|
||||
#### Datasette(選配 Web UI)
|
||||
|
||||
| 屬性 | 值 |
|
||||
|------|-----|
|
||||
| License | Apache 2.0 |
|
||||
| 作者 | Simon Willison |
|
||||
| 定位 | SQLite → Web UI + JSON API |
|
||||
| Plugins | 154 個 |
|
||||
| sqlite-vec 插件 | `datasette-sqlite-vec`(同一作者) |
|
||||
|
||||
#### 使用範例
|
||||
|
||||
```sql
|
||||
.load ./vec0
|
||||
|
||||
CREATE VIRTUAL TABLE chunks USING vec0(
|
||||
embedding float[768],
|
||||
file_uuid text,
|
||||
chunk_type text,
|
||||
text_content text
|
||||
);
|
||||
|
||||
INSERT INTO chunks VALUES (?, 'uuid-123', 'sentence', 'hello world');
|
||||
|
||||
SELECT rowid, text_content, distance
|
||||
FROM chunks WHERE embedding MATCH ?
|
||||
ORDER BY distance LIMIT 10;
|
||||
```
|
||||
|
||||
#### 四層向量架構
|
||||
|
||||
```
|
||||
Production ← Qdrant (HNSW ANN, fast at scale)
|
||||
← pgvector (transactional, alongside chunk data)
|
||||
↓ backup / export
|
||||
|
||||
Portable ← sqlite-vec (.sqlite single file, package distributable)
|
||||
← Datasette (optional Web UI)
|
||||
```
|
||||
|
||||
### 9. Qdrant Graph 分析
|
||||
|
||||
> 2026-05-12 結論:Qdrant **沒有**原生 Graph 功能,是純向量資料庫
|
||||
|
||||
#### Qdrant 現有功能
|
||||
|
||||
| 功能 | 說明 | 圖論等級 |
|
||||
|------|------|:--:|
|
||||
| **Payload filtering** | 向量搜尋 + JSON 條件過濾 | ⚠️ 偽關聯查詢 |
|
||||
| **Collection aliases** | 多 collection 聯合查詢 | ⚠️ 基礎 |
|
||||
| **Hybrid Queries** | 向量 + 關鍵字混合 | ❌ |
|
||||
| **Qdrant Edge** | 嵌入式向量搜尋 | ❌ 非 Graph |
|
||||
| **Data Graphs (第三方)** | Neo4j + Qdrant hybrid RAG | ✅ 非原生 |
|
||||
|
||||
#### Payload filtering 的極限
|
||||
|
||||
可以模擬 1-hop 關係(例如「找 Cary Grant 說話的 chunk」),但不能做真正的 graph traversal:
|
||||
|
||||
```json
|
||||
// ✅ 1-hop:filter speaker = "Cary Grant"
|
||||
{"filter": {"must": [{"key": "speaker", "match": {"value": "Cary Grant"}}]}}
|
||||
|
||||
// ❌ 2-hop:graph traversal Qdrant 無法做到
|
||||
// "誰跟 Cary Grant 在同一個場景出現?"
|
||||
// "這些人中誰又跟 Audrey Hepburn 對話?"
|
||||
```
|
||||
|
||||
| 限制 | 說明 |
|
||||
|------|------|
|
||||
| ❌ 2-hop+ traversal | 無法跨節點關聯查詢 |
|
||||
| ❌ 邊緣權重/時間 | 無 edge property 概念 |
|
||||
| ❌ Graph algebra | 無 `shortest_path`, `PageRank` 等演算法 |
|
||||
| ❌ Cypher/GQL | 無圖查詢語言 |
|
||||
|
||||
#### Momentry TKG 決策
|
||||
|
||||
| | Qdrant-only | PG TKG | SQLite TKG | Neo4j |
|
||||
|---|:--:|:--:|:--:|:--:|
|
||||
| 向量搜尋 | ✅ 原生 | via pgvector | via sqlite-vec | via plugin |
|
||||
| Graph traversal | ❌ | ✅ CTE | ✅ CTE | ✅ 原生 |
|
||||
| 2-hop+ 查詢 | ❌ | ✅ | ✅ | ✅ |
|
||||
| 時間範圍邊緣 | ❌ | ✅ | ✅ | ✅ |
|
||||
| 部署 | 需 server | 需 PG | **單檔** | 需 Java |
|
||||
| 檔案包分發 | ❌ | ❌ | ✅ | ❌ |
|
||||
| 適合規模 | 大 | 中 | 小-中 | 大 |
|
||||
|
||||
#### 架構分工
|
||||
|
||||
```
|
||||
Qdrant → 向量搜尋(ANN)- 核心效能
|
||||
PG → TKG 圖查詢(Recursive CTE)- API server
|
||||
SQLite → TKG 圖查詢(Recursive CTE)- 檔案包/離線
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 亮點:知識圖譜 (Knowledge Graph)
|
||||
|
||||
> 2026-05-12 新增
|
||||
|
||||
### Obsidian vs MarkBase 圖譜對比
|
||||
|
||||
| | Obsidian Graph | MarkBase Knowledge Graph |
|
||||
|------|:--:|:--:|
|
||||
| 節點來源 | 手動建立的 `.md` 筆記 | AI pipeline 自動產生的 chunks |
|
||||
| 邊緣來源 | 手寫 `[[wikilinks]]` | **語意相似度**、結構層級、共現關係 |
|
||||
| 生成方式 | 人工 | **自動**(embedding + clustering) |
|
||||
| 影片支援 | ❌ | ✅ face traces, speaker graph, scene transitions |
|
||||
| 實體辨識 | ❌ | ✅ 人臉/說話者/物件/場景 |
|
||||
| 規模 | 數百節點 | **數萬節點**(chunk 級) |
|
||||
| 過濾 | 無 | 時間範圍、置信度、chunk type |
|
||||
|
||||
### 圖譜類型
|
||||
|
||||
#### A. 語意關係圖(Semantic Graph)
|
||||
|
||||
以 embedding 餘弦相似度建立邊緣,相近 chunk 靠近。
|
||||
|
||||
```
|
||||
[Audrey Hepburn 說話] ──0.82── [Cary Grant 回應]
|
||||
│ │
|
||||
│ 0.75 │ 0.78
|
||||
▼ ▼
|
||||
[討論離婚原因] ──0.91── [緊張對話場景]
|
||||
```
|
||||
|
||||
**演算法**:
|
||||
1. 取所有 chunk embedding
|
||||
2. 計算 pairwise cosine similarity
|
||||
3. 保留 top-K 相似邊(K=5 預設)
|
||||
4. 用 UMAP/t-SNE → 2D 座標
|
||||
5. D3.js force layout 渲染
|
||||
|
||||
#### B. 結構層級圖(Hierarchy Graph)
|
||||
|
||||
文件 → 章節 → 段落 的三層樹狀結構。
|
||||
|
||||
#### C. 人物關係圖(Identity Graph)
|
||||
|
||||
基於 face_detections + speaker_assign。
|
||||
|
||||
```
|
||||
Cary Grant ──[對手戲]── Audrey Hepburn
|
||||
│ │
|
||||
│[對話] │[場景共現]
|
||||
▼ ▼
|
||||
Walter Matthau ────── Ned Glass
|
||||
```
|
||||
|
||||
#### D. 時序演進圖(Timeline Graph)
|
||||
|
||||
Chunks 按時間軸排列,場景切換點標記。X 軸 = 時間,Y 軸 = 說話者。
|
||||
|
||||
### 渲染技術
|
||||
|
||||
| 層 | 工具 | License |
|
||||
|----|------|---------|
|
||||
| 力導向佈局 | D3-force (d3.js v7) | ISC |
|
||||
| 降維 (UMAP) | umap-js | MIT |
|
||||
| 2D 繪圖 | Canvas / SVG via D3 | ISC |
|
||||
| 3D 繪圖 | Three.js | MIT |
|
||||
| 節點過濾 | Crossfilter / vanilla JS | — |
|
||||
|
||||
### API 設計
|
||||
|
||||
```
|
||||
GET /api/v1/graph/:file_uuid/identity → 人物關係圖資料
|
||||
GET /api/v1/graph/:file_uuid/semantic?depth=3 → 語意圖資料
|
||||
GET /api/v1/graph/:file_uuid/hierarchy → 結構層級圖
|
||||
GET /api/v1/graph/:file_uuid/timeline → 時序圖資料
|
||||
```
|
||||
|
||||
回傳格式:
|
||||
```json
|
||||
{
|
||||
"nodes": [
|
||||
{"id": "chunk_100", "label": "Cary Grant: What's your name?", "group": 3, "x": 0.1, "y": 0.5}
|
||||
],
|
||||
"edges": [
|
||||
{"source": "chunk_100", "target": "chunk_104", "weight": 0.82, "type": "semantic"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 互動設計
|
||||
|
||||
| 操作 | 行為 |
|
||||
|------|------|
|
||||
| Drag node | 拖曳節點 |
|
||||
| Click node | 展開 chunk 內容預覽 |
|
||||
| Scroll | 縮放圖譜 |
|
||||
| Filter bar | 依 chunk_type / speaker / confidence 過濾 |
|
||||
| Double-click | 聚焦該節點,展開子圖 |
|
||||
| Hover edge | 顯示相似度分數 |
|
||||
|
||||
### 圖譜渲染工具選型
|
||||
|
||||
> 2026-05-12 新增
|
||||
|
||||
#### 候選工具對比
|
||||
|
||||
| 工具 | License | 大小 | CDN | 圖論演算法 | 中國社群 | 最佳場景 |
|
||||
|------|---------|:--:|:--:|:--:|:--:|------|
|
||||
| **Cytoscape.js** | MIT | ~120KB | ✅ | ✅ BFS/DFS/PageRank | ⚠️ | 複雜網絡圖 |
|
||||
| D3.js v7 | ISC | ~80KB | ✅ | ❌ 需自寫 | ⚠️ | 任何自訂圖表 |
|
||||
| ECharts | Apache 2.0 | ~1MB | ✅ | ❌ | ✅ 非常大 | 通用圖表 + 地圖 |
|
||||
| G6 (AntV) | MIT | ~500KB | ✅ | ✅ 多種佈局 | ✅ 非常大 | 關係圖專用 |
|
||||
| vis-network | MIT/Apache | ~300KB | ✅ | ❌ | ❌ | 網絡圖 |
|
||||
| Sigma.js | MIT | ~80KB | ✅ | ❌ | ❌ | WebGL 大圖 (>5000節點) |
|
||||
| Graphviz | EPL 1.0 | ~3MB | ❌ CLI only | ✅ | ⚠️ | 靜態匯出 SVG/PNG |
|
||||
|
||||
#### 選型過程
|
||||
|
||||
**第一輪篩選**:排除 CLI-only (Graphviz)、無 CDN、中文社群弱且圖論支援差的 (vis-network, Sigma.js)。
|
||||
|
||||
剩餘:Cytoscape.js, D3.js, ECharts, G6。
|
||||
|
||||
**第二輪深度評估**:
|
||||
|
||||
| | Cytoscape.js | D3.js | ECharts | G6 |
|
||||
|---|:--:|:--:|:--:|:--:|
|
||||
| 力導向佈局 | ✅ 9 種 | ✅ 自寫 | ✅ 1 種內建 | ✅ 9 種 |
|
||||
| 複合節點 (compound) | ✅ | ❌ | ❌ | ✅ |
|
||||
| 圖論演算法 | ✅ 內建 | ❌ | ❌ | ✅ |
|
||||
| JSON → Graph | ✅ 原生 | ⚠️ 手動 | ⚠️ 手動 | ✅ 原生 |
|
||||
| TreeGraph | ⚠️ 需擴展 | ✅ | ❌ | ✅ 專用 |
|
||||
| 大型圖效能 | ⚠️ (>5000會慢) | ✅ | ✅ Canvas | ✅ |
|
||||
| 互動 API | ✅ 豐富 | ✅ 最靈活 | ✅ | ✅ |
|
||||
| 零外部依賴 | ✅ | ✅ | ❌ (zrender) | ❌ |
|
||||
|
||||
**最終決策**:
|
||||
|
||||
| 場景 | 選用 | 理由 |
|
||||
|------|:--:|------|
|
||||
| 知識圖譜核心 | **Cytoscape.js** | 圖論演算法、fCoSE 佈局、JSON 原生對接、Obsidian/Mermaid 都用 |
|
||||
| 統計輔助圖表 | **ECharts** | 中文社群大、Apache 背書、長條/圓餅/分佈圖開箱即用 |
|
||||
| 樹狀層級圖 | **G6 TreeGraph** | 專用 API,文件結構圖最簡潔 |
|
||||
| 自訂特殊需求 | **D3.js** | 保底方案,任何無法滿足的圖表 |
|
||||
|
||||
#### Cytoscape.js 使用者背書
|
||||
|
||||
| 組織 | 用途 |
|
||||
|------|------|
|
||||
| **Mermaid** | 流程圖/時序圖渲染引擎 |
|
||||
| **Obsidian** | 知識圖譜 (Graph View) |
|
||||
| Amazon, Google, Meta, Microsoft | 內部網絡圖視覺化 |
|
||||
| IBM, Cisco, Tencent, Uber | 網路拓樸視覺化 |
|
||||
| GitHub | 相依性圖 |
|
||||
|
||||
#### 整合架構
|
||||
|
||||
```
|
||||
MarkBase Knowledge Graph:
|
||||
┌──────────────────────────────────────┐
|
||||
│ 圖譜類型 渲染引擎 │
|
||||
│ ───────── ──────── │
|
||||
│ 語意關係圖 → Cytoscape.js │
|
||||
│ 結構層級圖 → G6 TreeGraph │
|
||||
│ 人物關係圖 → Cytoscape.js │
|
||||
│ 時序演進圖 → ECharts timeline │
|
||||
│ 降維散點圖 → D3.js │
|
||||
│ 統計分佈圖 → ECharts │
|
||||
│ │
|
||||
│ 全部 CDN 載入,無需 npm │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 在 MarkBase 中的整合
|
||||
|
||||
```
|
||||
MarkBase Control Bar:
|
||||
⏮ ◀ ▶ ⏭ | Graph | Tree | Edit | 🔍
|
||||
↑
|
||||
Knowledge Graph View
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 開發路線圖
|
||||
|
||||
| 階段 | 時程 | 交付 |
|
||||
|------|:----:|------|
|
||||
| P0 Core rendering | ✅ Done | Rust engine: .md→HTML with Mermaid + AJAX refresh |
|
||||
| P1 macOS app | ✅ Done | Tauri shell (可選) |
|
||||
| P2 File tree + Editor | 2-3d | CodeMirror 6 + lazy-load 樹狀瀏覽 + 存檔 |
|
||||
| P3 Knowledge Graph | 3-5d | Cytoscape.js + G6 + ECharts: 語意/結構/人物關係圖譜 |
|
||||
| P4 Knowledge base | 3-5d | 多文件索引、全文檢索、backlinks |
|
||||
| P5 Export | 2d | 轉檔 CLI (md→pdf/docx/pptx) |
|
||||
| P6 Collaboration | 5-10d | 評論、版本、靜態站點 |
|
||||
647
docs_v1.0/DESIGN/MODULE_STANDARDIZATION_SPECIFICATION.md
Normal file
647
docs_v1.0/DESIGN/MODULE_STANDARDIZATION_SPECIFICATION.md
Normal file
@@ -0,0 +1,647 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "處理器模組標準化規範"
|
||||
date: "2026-04-25"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "處理器模組標準化規範"
|
||||
ai_query_hints:
|
||||
- "查詢 處理器模組標準化規範 的內容"
|
||||
- "處理器模組標準化規範 的主要目的是什麼?"
|
||||
- "如何操作或實施 處理器模組標準化規範?"
|
||||
---
|
||||
|
||||
# 處理器模組標準化規範
|
||||
|
||||
## 概述
|
||||
|
||||
本規範定義 Momentry Core 中處理器模組的標準化架構、接口和實現模式。目標是確保所有處理器模組(ASR、OCR、YOLO、Face、Pose、CUT、ASRX、Caption、Story)遵循一致的設計原則,提高代碼可維護性、可測試性和可擴展性。
|
||||
|
||||
## 架構原則
|
||||
|
||||
### 1. 分層架構
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Rust API 層 │
|
||||
│ (src/core/processor/*.rs) │
|
||||
├─────────────────────────────────────────┤
|
||||
│ Python 執行層 │
|
||||
│ (scripts/*_processor.py) │
|
||||
├─────────────────────────────────────────┤
|
||||
│ AI 模型層 │
|
||||
│ (Whisper, YOLO, EasyOCR, etc.) │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2. 職責分離
|
||||
- **Rust 層**: 接口定義、錯誤處理、配置管理、結果解析
|
||||
- **Python 層**: AI 模型調用、數據處理、中間文件管理
|
||||
- **模型層**: 特定 AI 任務執行
|
||||
|
||||
## Rust 模組規範
|
||||
|
||||
### 文件結構
|
||||
```
|
||||
src/core/processor/
|
||||
├── mod.rs # 模組導出
|
||||
├── executor.rs # Python 執行器(共享)
|
||||
├── asr.rs # ASR 處理器
|
||||
├── ocr.rs # OCR 處理器
|
||||
├── yolo.rs # YOLO 處理器
|
||||
├── face.rs # 人臉檢測處理器
|
||||
├── pose.rs # 姿態檢測處理器
|
||||
├── cut.rs # 場景切割處理器
|
||||
├── asrx.rs # ASRX 處理器
|
||||
├── caption.rs # 字幕生成處理器
|
||||
└── story.rs # 故事分析處理器
|
||||
```
|
||||
|
||||
### 模組模板
|
||||
|
||||
#### 1. 結果結構定義
|
||||
```rust
|
||||
use anyhow::{Context, Result};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::time::Duration;
|
||||
|
||||
use super::executor::PythonExecutor;
|
||||
use crate::core::config::processor;
|
||||
|
||||
// 主要結果結構
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub struct ModuleResult {
|
||||
// 通用字段
|
||||
pub processing_time: Option<f64>,
|
||||
pub metadata: Option<serde_json::Value>,
|
||||
|
||||
// 模組特定字段
|
||||
// ...
|
||||
}
|
||||
|
||||
// 數據單元結構
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub struct DataUnit {
|
||||
// 時間或幀相關字段
|
||||
pub start: f64,
|
||||
pub end: f64,
|
||||
pub frame: u64,
|
||||
|
||||
// 數據內容
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. 處理函數模板
|
||||
```rust
|
||||
pub async fn process_module(
|
||||
video_path: &str,
|
||||
output_path: &str,
|
||||
uuid: Option<&str>,
|
||||
) -> Result<ModuleResult> {
|
||||
// 1. 初始化執行器
|
||||
let executor = PythonExecutor::new()?;
|
||||
let script_path = executor.script_path("module_processor.py");
|
||||
|
||||
// 2. 記錄日誌
|
||||
tracing::info!("[MODULE] Starting processing: {}", video_path);
|
||||
|
||||
// 3. 執行 Python 腳本
|
||||
executor
|
||||
.run(
|
||||
"module_processor.py",
|
||||
&[video_path, output_path],
|
||||
uuid,
|
||||
"MODULE",
|
||||
Some(Duration::from_secs(*processor::MODULE_TIMEOUT_SECS)),
|
||||
)
|
||||
.await
|
||||
.with_context(|| format!("Failed to run {:?}", script_path))?;
|
||||
|
||||
// 4. 讀取並解析結果
|
||||
let json_str = std::fs::read_to_string(output_path)
|
||||
.context("Failed to read module output")?;
|
||||
|
||||
let result: ModuleResult = serde_json::from_str(&json_str)
|
||||
.context("Failed to parse module output")?;
|
||||
|
||||
// 5. 記錄結果摘要
|
||||
tracing::info!(
|
||||
"[MODULE] Result: processed {} units",
|
||||
result.data_units.len()
|
||||
);
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
```
|
||||
|
||||
#### 3. 配置管理
|
||||
```rust
|
||||
// 在 src/core/config.rs 中添加
|
||||
pub mod processor {
|
||||
use super::*;
|
||||
|
||||
pub static MODULE_TIMEOUT_SECS: Lazy<u64> = Lazy::new(|| {
|
||||
env::var("MOMENTRY_MODULE_TIMEOUT")
|
||||
.unwrap_or_else(|_| "3600".to_string())
|
||||
.parse()
|
||||
.unwrap_or(3600)
|
||||
});
|
||||
|
||||
pub static MODULE_CHUNK_SIZE: Lazy<u64> = Lazy::new(|| {
|
||||
env::var("MOMENTRY_MODULE_CHUNK_SIZE")
|
||||
.unwrap_or_else(|_| "300".to_string())
|
||||
.parse()
|
||||
.unwrap_or(300)
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
#### 4. 測試規範
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_result_serialization() {
|
||||
// 測試序列化/反序列化
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_empty_result() {
|
||||
// 測試邊界條件
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_integration() {
|
||||
// 集成測試(可選)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Python 腳本規範
|
||||
|
||||
### 文件命名
|
||||
```
|
||||
scripts/
|
||||
├── module_processor.py # 主要處理腳本
|
||||
├── module_utils.py # 工具函數(可選)
|
||||
└── module_debug.py # 調試腳本(可選)
|
||||
```
|
||||
|
||||
### 腳本模板
|
||||
```python
|
||||
#!/opt/homebrew/bin/python3.11
|
||||
"""
|
||||
模組處理器 - 標準化模板
|
||||
|
||||
功能:執行 [模組名稱] 處理
|
||||
輸入:視頻文件路徑,輸出文件路徑
|
||||
輸出:JSON 格式的處理結果
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import os
|
||||
import argparse
|
||||
import signal
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
|
||||
# 環境檢查
|
||||
def check_environment() -> bool:
|
||||
"""檢查必要的環境和依賴"""
|
||||
try:
|
||||
# 檢查必要庫
|
||||
import required_library
|
||||
return True
|
||||
except ImportError as e:
|
||||
print(f"ERROR: Missing dependency: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
# 信號處理
|
||||
def signal_handler(signum, frame):
|
||||
"""處理中斷信號"""
|
||||
print(f"[MODULE] Received signal {signum}, cleaning up...")
|
||||
sys.exit(1)
|
||||
|
||||
# 主要處理類
|
||||
class ModuleProcessor:
|
||||
def __init__(self, video_path: str, output_path: str):
|
||||
self.video_path = video_path
|
||||
self.output_path = output_path
|
||||
self.start_time = time.time()
|
||||
|
||||
def validate_input(self) -> bool:
|
||||
"""驗證輸入文件"""
|
||||
if not os.path.exists(self.video_path):
|
||||
print(f"ERROR: Video file not found: {self.video_path}", file=sys.stderr)
|
||||
return False
|
||||
return True
|
||||
|
||||
def process(self) -> Dict[str, Any]:
|
||||
"""執行處理邏輯"""
|
||||
try:
|
||||
# 1. 準備工作目錄
|
||||
work_dir = tempfile.mkdtemp(prefix="module_")
|
||||
|
||||
# 2. 執行核心處理邏輯
|
||||
result = self._core_processing(work_dir)
|
||||
|
||||
# 3. 添加元數據
|
||||
result["metadata"] = {
|
||||
"processing_time": time.time() - self.start_time,
|
||||
"video_path": self.video_path,
|
||||
"timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"module_version": "1.0.0"
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"ERROR: Processing failed: {e}", file=sys.stderr)
|
||||
raise
|
||||
|
||||
def _core_processing(self, work_dir: str) -> Dict[str, Any]:
|
||||
"""核心處理邏輯(模組特定)"""
|
||||
# 模組特定實現
|
||||
return {
|
||||
"data_units": [],
|
||||
"summary": {}
|
||||
}
|
||||
|
||||
def save_result(self, result: Dict[str, Any]):
|
||||
"""保存結果到文件"""
|
||||
with open(self.output_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(result, f, ensure_ascii=False, indent=2)
|
||||
|
||||
print(f"[MODULE] Result saved to: {self.output_path}")
|
||||
|
||||
# 命令行接口
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="模組處理器")
|
||||
parser.add_argument("video_path", help="輸入視頻文件路徑")
|
||||
parser.add_argument("output_path", help="輸出 JSON 文件路徑")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 設置信號處理
|
||||
signal.signal(signal.SIGINT, signal_handler)
|
||||
signal.signal(signal.SIGTERM, signal_handler)
|
||||
|
||||
# 環境檢查
|
||||
if not check_environment():
|
||||
sys.exit(1)
|
||||
|
||||
# 執行處理
|
||||
processor = ModuleProcessor(args.video_path, args.output_path)
|
||||
|
||||
if not processor.validate_input():
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
result = processor.process()
|
||||
processor.save_result(result)
|
||||
print(f"[MODULE] Processing completed successfully")
|
||||
|
||||
except Exception as e:
|
||||
print(f"ERROR: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
### 輸出格式規範
|
||||
```json
|
||||
{
|
||||
"data_units": [
|
||||
{
|
||||
"id": "unit_1",
|
||||
"start": 0.0,
|
||||
"end": 2.5,
|
||||
"frame": 0,
|
||||
"data": {},
|
||||
"confidence": 0.95
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"total_units": 1,
|
||||
"processing_time": 4.7,
|
||||
"average_confidence": 0.95
|
||||
},
|
||||
"metadata": {
|
||||
"video_path": "/path/to/video.mp4",
|
||||
"module": "module_name",
|
||||
"version": "1.0.0",
|
||||
"timestamp": "2026-03-27 10:30:00"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 配置標準化
|
||||
|
||||
### 環境變量
|
||||
```
|
||||
# 超時設置
|
||||
MOMENTRY_ASR_TIMEOUT=3600
|
||||
MOMENTRY_OCR_TIMEOUT=7200
|
||||
MOMENTRY_YOLO_TIMEOUT=7200
|
||||
MOMENTRY_FACE_TIMEOUT=3600
|
||||
MOMENTRY_POSE_TIMEOUT=3600
|
||||
MOMENTRY_CUT_TIMEOUT=3600
|
||||
MOMENTRY_ASRX_TIMEOUT=3600
|
||||
MOMENTRY_CAPTION_TIMEOUT=1800
|
||||
MOMENTRY_STORY_TIMEOUT=1800
|
||||
|
||||
# 性能設置
|
||||
MOMENTRY_MODULE_CHUNK_SIZE=300
|
||||
MOMENTRY_MODULE_BATCH_SIZE=32
|
||||
MOMENTRY_MODULE_CACHE_ENABLED=true
|
||||
|
||||
# 模型設置
|
||||
MOMENTRY_MODULE_MODEL=base
|
||||
MOMENTRY_MODULE_DEVICE=cpu
|
||||
```
|
||||
|
||||
### 配置優先級
|
||||
1. 命令行參數(最高優先級)
|
||||
2. 環境變量
|
||||
3. 配置文件
|
||||
4. 默認值(最低優先級)
|
||||
|
||||
## 錯誤處理規範
|
||||
|
||||
### Rust 錯誤處理
|
||||
```rust
|
||||
use anyhow::{Context, Result};
|
||||
|
||||
pub async fn process_module(...) -> Result<ModuleResult> {
|
||||
// 使用 .context() 添加上下文
|
||||
executor.run(...)
|
||||
.await
|
||||
.with_context(|| format!("Failed to run module script"))?;
|
||||
|
||||
// 使用 anyhow::bail! 進行錯誤返回
|
||||
if !condition {
|
||||
anyhow::bail!("Condition not met: {}", reason);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Python 錯誤處理
|
||||
```python
|
||||
def process(self) -> Dict[str, Any]:
|
||||
try:
|
||||
# 主要邏輯
|
||||
result = self._core_processing()
|
||||
return result
|
||||
except FileNotFoundError as e:
|
||||
print(f"ERROR: File not found: {e}", file=sys.stderr)
|
||||
raise
|
||||
except RuntimeError as e:
|
||||
print(f"ERROR: Runtime error: {e}", file=sys.stderr)
|
||||
raise
|
||||
except Exception as e:
|
||||
print(f"ERROR: Unexpected error: {e}", file=sys.stderr)
|
||||
raise
|
||||
```
|
||||
|
||||
### 錯誤分類
|
||||
1. **輸入錯誤**: 文件不存在、格式不支持、權限問題
|
||||
2. **配置錯誤**: 缺少依賴、環境變量錯誤、模型文件缺失
|
||||
3. **運行時錯誤**: 內存不足、超時、模型推理錯誤
|
||||
4. **輸出錯誤**: 結果解析失敗、文件寫入失敗
|
||||
|
||||
## 日誌規範
|
||||
|
||||
### Rust 日誌
|
||||
```rust
|
||||
tracing::info!("[MODULE] Starting processing: {}", video_path);
|
||||
tracing::debug!("[MODULE] Processing details: {:?}", details);
|
||||
tracing::warn!("[MODULE] Warning: {}", warning_message);
|
||||
tracing::error!("[MODULE] Error: {}", error_message);
|
||||
```
|
||||
|
||||
### Python 日誌
|
||||
```python
|
||||
import sys
|
||||
|
||||
def log_info(message: str):
|
||||
print(f"[MODULE] INFO: {message}", file=sys.stderr)
|
||||
|
||||
def log_debug(message: str):
|
||||
if os.environ.get("MODULE_DEBUG") == "1":
|
||||
print(f"[MODULE] DEBUG: {message}", file=sys.stderr)
|
||||
|
||||
def log_error(message: str):
|
||||
print(f"[MODULE] ERROR: {message}", file=sys.stderr)
|
||||
```
|
||||
|
||||
## 性能監控
|
||||
|
||||
### 指標收集
|
||||
```rust
|
||||
pub struct ProcessingMetrics {
|
||||
pub start_time: std::time::Instant,
|
||||
pub end_time: Option<std::time::Instant>,
|
||||
pub memory_usage_mb: f64,
|
||||
pub cpu_usage_percent: f64,
|
||||
pub items_processed: u64,
|
||||
pub items_per_second: f64,
|
||||
}
|
||||
|
||||
impl ProcessingMetrics {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
start_time: std::time::Instant::now(),
|
||||
end_time: None,
|
||||
memory_usage_mb: 0.0,
|
||||
cpu_usage_percent: 0.0,
|
||||
items_processed: 0,
|
||||
items_per_second: 0.0,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn record_completion(&mut self, items_processed: u64) {
|
||||
self.end_time = Some(std::time::Instant::now());
|
||||
self.items_processed = items_processed;
|
||||
|
||||
let duration = self.end_time.unwrap().duration_since(self.start_time);
|
||||
self.items_per_second = items_processed as f64 / duration.as_secs_f64();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 性能報告
|
||||
```json
|
||||
{
|
||||
"performance": {
|
||||
"processing_time_seconds": 4.7,
|
||||
"memory_usage_mb": 512.5,
|
||||
"cpu_usage_percent": 45.2,
|
||||
"items_processed": 8,
|
||||
"items_per_second": 1.7,
|
||||
"throughput_mb_per_second": 10.5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 測試規範
|
||||
|
||||
### 單元測試
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_result_structure() {
|
||||
// 測試數據結構
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_serialization() {
|
||||
// 測試序列化
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_edge_cases() {
|
||||
// 測試邊界條件
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 集成測試
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_module_integration() {
|
||||
// 使用測試文件進行集成測試
|
||||
let test_video = "test_data/sample.mp4";
|
||||
let output_file = tempfile::NamedTempFile::new().unwrap();
|
||||
|
||||
let result = process_module(test_video, output_file.path().to_str().unwrap(), None)
|
||||
.await
|
||||
.expect("Processing should succeed");
|
||||
|
||||
assert!(!result.data_units.is_empty());
|
||||
}
|
||||
```
|
||||
|
||||
### Python 測試
|
||||
```python
|
||||
def test_module_processor():
|
||||
"""測試 Python 處理器"""
|
||||
processor = ModuleProcessor("test.mp4", "output.json")
|
||||
|
||||
# 測試輸入驗證
|
||||
assert not processor.validate_input() # 文件不存在
|
||||
|
||||
# 測試處理邏輯
|
||||
with tempfile.NamedTemporaryFile() as tmp:
|
||||
processor = ModuleProcessor("real_test.mp4", tmp.name)
|
||||
result = processor.process()
|
||||
assert "data_units" in result
|
||||
assert "metadata" in result
|
||||
```
|
||||
|
||||
## 文檔規範
|
||||
|
||||
### Rust 文檔
|
||||
```rust
|
||||
/// ASR 處理器模組
|
||||
///
|
||||
/// 提供自動語音識別功能,支持多種語言和大文件處理。
|
||||
///
|
||||
/// # 示例
|
||||
/// ```
|
||||
/// use momentry_core::processor::asr;
|
||||
///
|
||||
/// let result = asr::process_asr("video.mp4", "output.json", None).await?;
|
||||
/// println!("識別到 {} 個語音片段", result.segments.len());
|
||||
/// ```
|
||||
pub mod asr {
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### Python 文檔
|
||||
```python
|
||||
"""
|
||||
模組處理器
|
||||
|
||||
提供 [功能描述] 功能。
|
||||
|
||||
使用示例:
|
||||
python module_processor.py input.mp4 output.json
|
||||
|
||||
參數:
|
||||
video_path: 輸入視頻文件路徑
|
||||
output_path: 輸出 JSON 文件路徑
|
||||
|
||||
輸出格式:
|
||||
詳見輸出格式規範部分。
|
||||
"""
|
||||
```
|
||||
|
||||
## 遷移指南
|
||||
|
||||
### 現有模組標準化步驟
|
||||
1. **分析現有代碼**: 識別不符合規範的部分
|
||||
2. **創建備份**: 備份原始文件
|
||||
3. **重構 Rust 模組**: 按照模板重構
|
||||
4. **重構 Python 腳本**: 按照模板重構
|
||||
5. **更新配置**: 統一配置管理
|
||||
6. **添加測試**: 補充單元和集成測試
|
||||
7. **更新文檔**: 更新 API 文檔和使用說明
|
||||
8. **驗證功能**: 確保功能正常
|
||||
|
||||
### 兼容性保證
|
||||
- 保持現有 API 不變
|
||||
- 逐步遷移,不中斷現有功能
|
||||
- 提供遷移工具和文檔
|
||||
|
||||
## 附錄
|
||||
|
||||
### A. 模組分類
|
||||
|
||||
| 模組 | 功能 | 主要技術 | 輸出類型 |
|
||||
|------|------|----------|----------|
|
||||
| ASR | 語音識別 | Whisper | 時間段文本 |
|
||||
| OCR | 文字識別 | EasyOCR | 幀級文字 |
|
||||
| YOLO | 物體檢測 | YOLOv8 | 幀級物體 |
|
||||
| Face | 人臉檢測 | OpenCV | 幀級人臉 |
|
||||
| Pose | 姿態檢測 | OpenPose | 幀級姿態 |
|
||||
| CUT | 場景切割 | PySceneDetect | 場景邊界 |
|
||||
| ASRX | 語音增強 | WhisperX | 說話人分離 |
|
||||
| Caption | 字幕生成 | BLIP | 幀級描述 |
|
||||
| Story | 故事分析 | 自定義 | 故事結構 |
|
||||
|
||||
### B. 性能基準
|
||||
|
||||
| 模組 | 平均處理時間 | 內存使用 | CPU 使用 |
|
||||
|------|--------------|----------|----------|
|
||||
| ASR | 4.7s (小文件) | 1.2GB | 45% |
|
||||
| OCR | 12.3s (小文件) | 800MB | 35% |
|
||||
| YOLO | 8.5s (小文件) | 1.5GB | 60% |
|
||||
| Face | 3.2s (小文件) | 500MB | 25% |
|
||||
|
||||
### C. 常見問題
|
||||
|
||||
1. **依賴問題**: 確保 Python 環境正確設置
|
||||
2. **內存不足**: 調整 chunk_size 參數
|
||||
3. **超時錯誤**: 增加 timeout 設置或優化算法
|
||||
4. **模型加載慢**: 啟用模型緩存
|
||||
|
||||
---
|
||||
|
||||
*版本: 1.0.0*
|
||||
*更新日期: 2026-03-27*
|
||||
*負責人: Warren (Technical Lead)*
|
||||
*狀態: 草案*
|
||||
353
docs_v1.0/DESIGN/MOMENTRY_RAG_PRESENTATION.md
Normal file
353
docs_v1.0/DESIGN/MOMENTRY_RAG_PRESENTATION.md
Normal file
@@ -0,0 +1,353 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Momentry Core 影片 RAG 系統說明稿"
|
||||
date: "2026-03-22"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "系統說明稿"
|
||||
ai_query_hints:
|
||||
- "查詢 Momentry Core 影片 RAG 系統說明稿 的內容"
|
||||
- "Momentry Core 影片 RAG 系統說明稿 的主要目的是什麼?"
|
||||
- "如何操作或實施 Momentry Core 影片 RAG 系統說明稿?"
|
||||
---
|
||||
|
||||
# Momentry Core 影片 RAG 系統說明稿
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | Warren |
|
||||
| 建立時間 | 2026-03-22 |
|
||||
| 文件版本 | V1.1 |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-03-22 | 創建文件 | Warren | OpenCode / MiniMax M2.5 |
|
||||
| V1.1 | 2026-03-25 | 更新API回應格式 (media_url→file_path) 與認證標頭 | OpenCode | deepseek-reasoner |
|
||||
|
||||
---
|
||||
|
||||
## 系統架構
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 使用者 │
|
||||
│ (marcom 團隊) │
|
||||
└─────────────────┬───────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ WordPress 入口 │
|
||||
│ (wp.momentry.ddns.net) │
|
||||
└─────────────────┬───────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ n8n 自動化 │
|
||||
│ (localhost:5678) │
|
||||
│ │
|
||||
│ [Webhook] → [HTTP Request] → [處理結果] → [回覆用戶] │
|
||||
└─────────────────┬───────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Momentry Core API │
|
||||
│ (localhost:3002) │
|
||||
│ │
|
||||
│ POST /api/v1/search → 語意搜尋 │
|
||||
│ POST /api/v1/n8n/search → n8n 專用格式 │
|
||||
│ GET /api/v1/videos → 影片列表 │
|
||||
└─────────────────┬───────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────┴──────────┐
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐
|
||||
│ PostgreSQL │ │ Qdrant │
|
||||
│ (chunks) │ │ (vectors) │
|
||||
└───────────────┘ └───────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 資料流程
|
||||
|
||||
```
|
||||
1. 上傳影片 → SFTPGo
|
||||
2. 影片註冊 → PostgreSQL
|
||||
3. ASR 處理 → 產生字幕區塊
|
||||
4. 儲存 chunks → PostgreSQL
|
||||
5. 向量化 → Qdrant
|
||||
6. 搜尋查詢 → API
|
||||
7. 回傳結果 → n8n → 用戶
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 示範影片
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 檔案名稱 | Old_Time_Movie_Show_-_Charade_1963.HD.mov |
|
||||
| UUID | a1b10138a6bbb0cd |
|
||||
| 時長 | 6879 秒(約 1.9 小時) |
|
||||
| 區塊數 | 3,886 個 |
|
||||
| 向量數 | 3,688 個 |
|
||||
|
||||
---
|
||||
|
||||
## API 端點
|
||||
|
||||
### 1. 語意搜尋
|
||||
|
||||
```
|
||||
POST http://localhost:3002/api/v1/search
|
||||
```
|
||||
|
||||
**請求:**
|
||||
```json
|
||||
{
|
||||
"query": "charade",
|
||||
"limit": 5,
|
||||
"uuid": "a1b10138a6bbb0cd"
|
||||
}
|
||||
```
|
||||
|
||||
> **注意**:
|
||||
> 1. **API 認證**: 所有 `/api/v1/*` 端點需要 `X-API-Key` 標頭
|
||||
> 2. **檔案路徑轉換**: API 現在返回 `file_path`(檔案系統路徑),需要轉換為可訪問的 URL(例如透過 SFTPGo 分享連結)
|
||||
|
||||
---
|
||||
|
||||
### 2. n8n 專用格式
|
||||
|
||||
```
|
||||
POST http://localhost:3002/api/v1/n8n/search
|
||||
```
|
||||
|
||||
**請求:**
|
||||
```json
|
||||
{
|
||||
"query": "charade",
|
||||
"limit": 5
|
||||
}
|
||||
```
|
||||
|
||||
**回應:**
|
||||
```json
|
||||
{
|
||||
"query": "charade",
|
||||
"count": 5,
|
||||
"hits": [
|
||||
{
|
||||
"id": "sentence_0006",
|
||||
"vid": "a1b10138a6bbb0cd",
|
||||
"start": 48.8,
|
||||
"end": 55.44,
|
||||
"title": "Chunk sentence_0006",
|
||||
"text": "fun plot twists...",
|
||||
"score": 0.526,
|
||||
"file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/video.mp4"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 實作範例
|
||||
|
||||
### n8n Workflow 設計
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ Webhook │ ← 接收用戶搜尋請求
|
||||
└──────┬──────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ HTTP Request│ → POST /api/v1/n8n/search
|
||||
└──────┬──────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Code │ → 處理回傳結果
|
||||
└──────┬──────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Telegram │ → 回覆給用戶
|
||||
│ (或 LINE) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step-by-Step n8n Workflow
|
||||
|
||||
### Step 1: 建立 Webhook
|
||||
|
||||
1. n8n 開新 Workflow
|
||||
2. 新增 node: **Webhook**
|
||||
3. 設定 path: `video-search`
|
||||
4. 複製 Webhook URL
|
||||
|
||||
---
|
||||
|
||||
### Step 2: 設定 HTTP Request
|
||||
|
||||
1. 新增 node: **HTTP Request**
|
||||
2. 設定:
|
||||
```
|
||||
Method: POST
|
||||
URL: http://localhost:3002/api/v1/n8n/search
|
||||
Body Content Type: JSON
|
||||
Headers: X-API-Key (需設定)
|
||||
```
|
||||
|
||||
3. Body:
|
||||
```json
|
||||
{
|
||||
"query": "={{ $json.body }}",
|
||||
"limit": 5
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 3: 處理結果 (Code)
|
||||
|
||||
```javascript
|
||||
const hits = $input.first().json.hits;
|
||||
|
||||
if (!hits || hits.length === 0) {
|
||||
return {
|
||||
json: { message: "找不到相關結果" }
|
||||
};
|
||||
}
|
||||
|
||||
const results = hits.map((hit, index) => ({
|
||||
number: index + 1,
|
||||
text: hit.text,
|
||||
time: `${hit.start}s - ${hit.end}s`,
|
||||
score: Math.round(hit.score * 100) + "%",
|
||||
// 注意: API 現在返回 file_path(檔案系統路徑),需要轉換為可訪問的 URL
|
||||
url: hit.file_path + "#t=" + hit.start + "," + hit.end // 需實作檔案路徑轉換為 URL
|
||||
}));
|
||||
|
||||
return { json: { results } };
|
||||
```
|
||||
|
||||
> **注意**:
|
||||
> 1. **API 認證**: 所有 `/api/v1/*` 端點需要 `X-API-Key` 標頭
|
||||
> 2. **檔案路徑轉換**: API 現在返回 `file_path`(檔案系統路徑),需要轉換為可訪問的 URL(例如透過 SFTPGo 分享連結)
|
||||
|
||||
---
|
||||
|
||||
### Step 4: 格式化輸出
|
||||
|
||||
**Telegram 格式:**
|
||||
```
|
||||
🎬 搜尋結果: "{{ $json.query }}"
|
||||
|
||||
1️⃣ "fun plot twists, Woody Dialog and charming performances..."
|
||||
⏱ 48.8s - 55.4s
|
||||
📊 相關度: 53%
|
||||
|
||||
2️⃣ "Don't you like me to say that a pretty girl..."
|
||||
⏱ 4745.6s - 4748.6s
|
||||
📊 相關度: 52%
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 測試指令
|
||||
|
||||
### curl 測試
|
||||
|
||||
```bash
|
||||
# 語意搜尋
|
||||
curl -X POST http://localhost:3002/api/v1/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: YOUR_API_KEY" \
|
||||
-d '{"query": "charade", "limit": 3}'
|
||||
|
||||
# n8n 格式
|
||||
curl -X POST http://localhost:3002/api/v1/n8n/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: YOUR_API_KEY" \
|
||||
-d '{"query": "charade", "limit": 3}'
|
||||
|
||||
# 影片列表
|
||||
curl -H "X-API-Key: YOUR_API_KEY" http://localhost:3002/api/v1/videos
|
||||
|
||||
# 特定影片區塊
|
||||
curl -H "X-API-Key: YOUR_API_KEY" http://localhost:3002/api/v1/videos/a1b10138a6bbb0cd/chunks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 實際搜尋範例
|
||||
|
||||
| 搜尋詞 | 結果摘要 |
|
||||
|--------|----------|
|
||||
| `charade` | "fun plot twists, Woody Dialog and charming performances..." |
|
||||
| `woody` | "Well, you thick skull hair, brain half-witted..." |
|
||||
| `classic movie` | "Hello and welcome to the old-time movie show..." |
|
||||
| `charming` | "fun plot twists, Woody Dialog and charming performances..." |
|
||||
|
||||
---
|
||||
|
||||
## 資料庫狀態
|
||||
|
||||
| 資料庫 | 資料筆數 | 狀態 |
|
||||
|--------|----------|------|
|
||||
| PostgreSQL (videos) | 4 | ✅ |
|
||||
| PostgreSQL (chunks) | 3,950 | ✅ |
|
||||
| PostgreSQL (vectors) | 1,870 | ✅ |
|
||||
| Qdrant (vectors) | 3,688 | ✅ |
|
||||
| Redis (job cache) | 4 keys | ✅ |
|
||||
|
||||
---
|
||||
|
||||
## 下一步
|
||||
|
||||
1. **建立 SFTPGo 分享連結**
|
||||
- 開啟 http://localhost:8080
|
||||
- 登入 demo / demopassword123
|
||||
- 建立影片分享連結
|
||||
|
||||
2. **測試 n8n Workflow**
|
||||
- 匯入 Postman Collection
|
||||
- 建立 Webhook
|
||||
- 測試搜尋
|
||||
|
||||
3. **整合到 WordPress**
|
||||
- 建立表單接收用戶輸入
|
||||
- 呼叫 n8n Webhook
|
||||
- 顯示搜尋結果
|
||||
|
||||
---
|
||||
|
||||
## 快速開始
|
||||
|
||||
```bash
|
||||
# 1. 測試搜尋 API
|
||||
curl -X POST http://localhost:3002/api/v1/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "charade", "limit": 3}'
|
||||
|
||||
# 2. 查看影片列表
|
||||
curl http://localhost:3002/api/v1/videos
|
||||
|
||||
# 3. 查看 n8n 是否運行
|
||||
curl http://localhost:5678
|
||||
```
|
||||
94
docs_v1.0/DESIGN/NON_HUMAN_SOUND_DETECTION.md
Normal file
94
docs_v1.0/DESIGN/NON_HUMAN_SOUND_DETECTION.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Non-Human Sound Detection — Tool Selection Report
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Movie:** Charade (1963), 113 min
|
||||
**Audio:** 16kHz mono WAV
|
||||
**Goal:** Detect non-human sound events (gunshots, impacts, doors, music, etc.)
|
||||
|
||||
## Tested Approaches
|
||||
|
||||
### Approach A: AST AudioSet (HuggingFace)
|
||||
|
||||
| Item | Detail |
|
||||
|------|--------|
|
||||
| Model | `MIT/ast-finetuned-audioset-10-10-0.4593` |
|
||||
| Method | Audio Spectrogram Transformer, fine-tuned on AudioSet-2M (527 classes) |
|
||||
| Dependencies | `transformers`, `torch` ✅ (no torchcodec needed) |
|
||||
| Load time | ~1s on M5 |
|
||||
| Inference time | ~0.5s per 3-second clip (805k params, float32) |
|
||||
| Accuracy | Good — correctly distinguishes speech vs. door vs. music |
|
||||
|
||||
**Test results on Charade:**
|
||||
|
||||
| Time | Energy-based said | AST AudioSet said | Verdict |
|
||||
|------|------------------|-------------------|---------|
|
||||
| 0:10 | — | Environmental noise (26%) | Background noise, plausible |
|
||||
| 10:32 | Gunshot candidate (43x) | **Speech (76%)** | ✅ AST correct |
|
||||
| 57:00 | Gunshot candidate (49x) | **Door (62%) + Slam (5%)** | ✅ AST correct |
|
||||
| 65:13 | Gunshot candidate (50x) | **Speech (58%)** | ✅ AST correct |
|
||||
| 85:12 | Gunshot candidate (39x) | **Speech (68%)** | ✅ AST correct |
|
||||
|
||||
**Conclusion**: Energy-based impulse detection has **100% false positive rate** for gunshot detection. AST AudioSet correctly classifies all candidates as non-gunshot.
|
||||
|
||||
### Approach B: Custom Energy + Spectral Features
|
||||
|
||||
| Item | Detail |
|
||||
|------|--------|
|
||||
| Method | RMS energy + spectral centroid + sub-band energy ratios |
|
||||
| Speed | ~3s for full 113-min movie (every 10th window) |
|
||||
| Accuracy | Poor — cannot distinguish gunshot from speech, door, music |
|
||||
| Result | 1 "gunshot_candidate" from 453 test windows; all false positives on verification |
|
||||
|
||||
**Conclusion**: Useful as a **coarse pre-filter** (Stage 1), not as a standalone classifier.
|
||||
|
||||
## Two-Stage Design
|
||||
|
||||
```
|
||||
Stage 1 (Energy filter, ~1 min):
|
||||
Full audio → sliding window RMS + centroid → ~200 candidate windows
|
||||
|
|
||||
v
|
||||
Stage 2 (AST classifier, ~2 min):
|
||||
Extract 3-sec audio for each candidate → AST AudioSet classification
|
||||
|
|
||||
v
|
||||
Non-speech events: gunshot, explosion, door slam, music, etc.
|
||||
```
|
||||
|
||||
Estimated processing: ~3 min for full movie (vs. 75 min for full AST scan)
|
||||
|
||||
## Key AudioSet Classes Relevant to Charade
|
||||
|
||||
| Class | AudioSet ID | Relevance |
|
||||
|-------|-------------|-----------|
|
||||
| Gunshot, gunfire | 402 | **Primary target** |
|
||||
| Explosion | 400 | Hand grenade in plot |
|
||||
| Door slams | 404 | Scenes at hotel, apartment |
|
||||
| Music | 130-133 | Background score |
|
||||
| Speech | 0-3 | Already handled by ASR |
|
||||
| Vehicle | 100-110 | Car sounds in Paris chase |
|
||||
| Glass break | 424 | Window breaking scene |
|
||||
|
||||
## Actor-voice gender mismatches (resolved by fine-grained ASRX)
|
||||
|
||||
During the speaker mapping work, 20 segments where the old face→TMDb assignment said "Audrey Hepburn" but the new ASRX voice embedding clearly said "MALE". These segments were verified via video clips and confirmed to be scenes where:
|
||||
|
||||
1. A male speaker (Cary Grant or other) is speaking while Audrey Hepburn's face is on screen
|
||||
2. The old pipeline incorrectly assigned the speaker name based on face identity
|
||||
3. The fine-grained sliding window approach correctly resolves these
|
||||
|
||||
The 20 segments were from SPEAKER_5 (10 segs) and SPEAKER_9 (10 segs), both of which mapped to MALE voice clusters. These were re-assigned to "Cary Grant" or "Unknown" as appropriate.
|
||||
|
||||
## Recommendations
|
||||
|
||||
| Approach | Speed | Accuracy | Best for |
|
||||
|----------|-------|----------|----------|
|
||||
| Energy pre-filter | ✅ 1 min | ❌ Low | Stage 1: candidate selection |
|
||||
| AST AudioSet | ⚠️ 2 min | ✅ High | Stage 2: event classification |
|
||||
| Full AST scan | ❌ 75 min | ✅ High | N/A — two-stage is better |
|
||||
|
||||
**Design**: Two-stage pipeline: energy pre-filter → AST classifier
|
||||
**Implementation path**:
|
||||
1. Write `scripts/non_human_sound_detector.py` with the two-stage design
|
||||
2. Output `{uuid}.sound_events.json` with typed events
|
||||
3. Integrate into the sound_event_detector framework
|
||||
134
docs_v1.0/DESIGN/PROCESSOR_MECHANISMS_REVIEW.md
Normal file
134
docs_v1.0/DESIGN/PROCESSOR_MECHANISMS_REVIEW.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# Processor 產出機制檢討
|
||||
|
||||
## 三層機制定義
|
||||
|
||||
### 1. 中斷接續(Interruption Resume)
|
||||
Process 被殺掉後,重啟時能接續進度。
|
||||
**現狀**: 大部分 processor 有 `.tmp` → `.partial` 保護,但重跑時從頭開始。
|
||||
|
||||
### 2. 補充機制(Supplement)
|
||||
完成度不足時,只補沒做完的部分,不重跑整個。
|
||||
**現狀**: 全部從頭跑,無補充。
|
||||
|
||||
### 3. 糾錯機制(Error Correction)
|
||||
輸出檔損毀時能自動偵測並修復。
|
||||
**現狀**: file-existence check 只檢查檔案存在,不檢查內容是否有效。
|
||||
|
||||
---
|
||||
|
||||
## Processor 逐一檢討
|
||||
|
||||
### ASR
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ `.tmp` → `.partial`(executor) | ✅ OK |
|
||||
| 補充機制 | ❌ 每次從頭跑 | 若跑到 50% 被殺,下次從 0% 開始 |
|
||||
| 糾錯機制 | ❌ 不驗證內容 | file-existence check 看到 `.json` 存在就跳過,不管內容 |
|
||||
| Pipe | ✅ executor.run() | ✅ |
|
||||
| Timeout | ✅ 已移除(None) | ✅ |
|
||||
|
||||
**改善方案**:
|
||||
- 補充:ASR 重跑時掃描 existing `.json` 或 `.partial`,找出最後 segment 的 `end_time`,傳入 `--resume-from` 給 Python script
|
||||
- 糾錯:file-existence check 對 `.json` 做 `serde_json::from_str` 驗證,無效 → 視為不存在
|
||||
|
||||
### ASRX
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ❌ **不用 executor**,直接寫 `.json` | 被殺掉時留下壞檔 |
|
||||
| 補充機制 | ❌ 同 ASR | 依賴 ASR,ASR 不完整 ASRX 也不能跑 |
|
||||
| 糾錯機制 | ❌ 不驗證內容 | 同上 |
|
||||
| Pipe | ❌ **raw Command**,沒有 `.tmp` 保護 | 緊急 |
|
||||
| Timeout | ⚠️ 7200s hardcode | 應改為 None(同 ASR) |
|
||||
|
||||
**改善方案**:
|
||||
- **最優先**: 改為使用 `executor.run()`,獲得 `.tmp` 保護
|
||||
- 其他同 ASR
|
||||
|
||||
### YOLO
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ executor `.tmp` | ✅ |
|
||||
| 補充機制 | ❌ 從頭跑 | 若跑到 frame 100,000 被殺,下次從 frame 0 |
|
||||
| 糾錯機制 | ❌ 不驗證內容 | yolo.json 之前就是壞的但 file check 跳過 |
|
||||
|
||||
**改善方案**:
|
||||
- 補充:掃描 `.partial` 的最後 frame,傳入 `--resume-frame` 給 Python script
|
||||
- 糾錯:file-existence check 對 `.json` 做 JSON parse 驗證
|
||||
|
||||
### FACE / POSE / OCR
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ executor `.tmp` | ✅ |
|
||||
| 補充機制 | ❌ 從頭跑 | 同 YOLO |
|
||||
| 糾錯機制 | ❌ 不驗證內容 | 同 YOLO |
|
||||
|
||||
**改善方案**: 同 YOLO
|
||||
|
||||
### CUT
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ executor `.tmp` | ✅ |
|
||||
| 補充機制 | ✅ register 階段已完成,直接載入 | ✅ |
|
||||
| 糾錯機制 | ❌ 不驗證內容 | 同 YOLO |
|
||||
|
||||
**改善方案**: 糾錯即可
|
||||
|
||||
### SCENE
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ **最完整**:檢查 `.err`/`.json`/`.tmp` 三種狀態 | ✅ |
|
||||
| 補充機制 | ❌ 從頭跑 | ✅(scene 很快) |
|
||||
| 糾錯機制 | ⚠️ 有檢查 `.err` | ✅ |
|
||||
|
||||
### VISUAL_CHUNK
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ executor `.tmp` | ✅ |
|
||||
| 補充機制 | ❌ | ❌ |
|
||||
| 糾錯機制 | ❌ **錯誤被吞掉**(回傳空結果) | 應回報 error 而非靜默失敗 |
|
||||
|
||||
**改善方案**: 不要吞錯誤,讓 error 往上傳
|
||||
|
||||
### STORY
|
||||
| 面向 | 現狀 | 問題 |
|
||||
|------|------|------|
|
||||
| 中斷接續 | ✅ executor `.tmp` | ✅ |
|
||||
| 補充機制 | ❌ | ❌ |
|
||||
| 糾錯機制 | ❌ | ❌ |
|
||||
|
||||
---
|
||||
|
||||
## 優先級
|
||||
|
||||
### P0 — 立即修復
|
||||
|
||||
1. **ASRX 改用 executor.run()**
|
||||
- 檔案:`src/core/processor/asrx.rs`
|
||||
- 獲得 `.tmp` 保護、SIGKILL process group、`.partial` 保留
|
||||
- 移除 hardcode timeout
|
||||
|
||||
### P1 — 糾錯機制
|
||||
|
||||
2. **File-existence check 加入 JSON 驗證**
|
||||
- 檔案:`src/worker/job_worker.rs`
|
||||
- 在 `output_path.exists()` 之後,對 `.json` 做 `serde_json::from_str::<Value>`
|
||||
- 若 parse 失敗 → 不 skip,當作檔案不存在繼續跑
|
||||
- 若 parse 成功但內容空(無 segments/frames)→ 當不完整
|
||||
|
||||
### P2 — 補充機制
|
||||
|
||||
3. **ASR resume-from 補充**
|
||||
- 檔案:`src/core/processor/asr.rs` + `scripts/asr_processor.py`
|
||||
- Rust 端發現 `.partial` 存在,讀取最後 segment 的 end_time
|
||||
- 傳入 `--resume-from {time}` 給 Python script
|
||||
- Python script 跳過 `--resume-from` 之前的音訊
|
||||
|
||||
4. **YOLO/Face/Pose resume-frame 補充**
|
||||
- 檔案:各 processor.rs + 對應 Python script
|
||||
- 掃描 `.partial` 中的最後 frame_number
|
||||
- 傳入 `--resume-frame {frame}` 給 Python script
|
||||
|
||||
### P3 — 其他
|
||||
|
||||
5. **VisualChunk 不吞錯誤**
|
||||
6. **Executor SIGTERM → SIGKILL 兩段式關閉**
|
||||
240
docs_v1.0/DESIGN/RELEASE_PHASES.md
Normal file
240
docs_v1.0/DESIGN/RELEASE_PHASES.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Momentry Model — 分階段交付
|
||||
|
||||
## 核心架構
|
||||
|
||||
```
|
||||
Pipeline (training)
|
||||
│ 每個 processor 產出 .json
|
||||
│ Rule 1/3 Ingestion → chunks + embeddings
|
||||
▼
|
||||
momentry model for {video} ← 每部影片 = 一個 model
|
||||
│ release/phase1/latest/
|
||||
│ release/phase2/latest/
|
||||
▼
|
||||
momentry core (inference engine) ← Rust API server
|
||||
│ momentry_playground (dev)
|
||||
│ momentry (production)
|
||||
▼
|
||||
Search / Query / Identity APIs
|
||||
```
|
||||
|
||||
- **Pipeline** = training phase:影片 → processor output → chunks → embeddings
|
||||
- **Model** = 每部影片的產出 package(output_json + chunks + vectors)
|
||||
- **Engine** = momentry core,吃 model 提供 API(search, trace, identity)
|
||||
|
||||
每個影片可有多個 model 版本,命名保留升級空間:
|
||||
|
||||
| Model 版本 | Qdrant Collection | 內容 | 觸發時機 |
|
||||
|-----------|------------------|------|---------|
|
||||
| `{uuid}_v1` | `momentry_dev_v1` | sentence chunk embedding(base) | ASR + ASRX + Rule 1 完成 |
|
||||
| `{uuid}_v2` | `momentry_dev_v2` | 完整 pipeline + 5W1H | 全部完成 |
|
||||
| `{uuid}_v3` | `momentry_dev_v3` | object identity + custom detector | v2 + object instance matching 完成 |
|
||||
|
||||
各版本共存不覆蓋。
|
||||
|
||||
## 階段劃分
|
||||
|
||||
### Phase 1:Sentence Chunk Embedding(base model)
|
||||
|
||||
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
|
||||
|
||||
**交付內容**:
|
||||
- `{uuid}.asr.json`
|
||||
- `{uuid}.asrx.json`
|
||||
- chunks(chunk_type = 'sentence')
|
||||
- chunk_vectors(sentence embedding)
|
||||
|
||||
**用途**: 終端使用者可進行語意搜尋
|
||||
|
||||
### Phase 2:完整 Pipeline(v2 model)
|
||||
|
||||
**觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
|
||||
|
||||
**交付內容**:
|
||||
- Phase 1 全部內容
|
||||
- 所有 `{uuid}.*.json`(cut, yolo, face, pose, ocr, ...)
|
||||
- chunks(chunk_type = 'cut', 'visual', 'trace', 'story')
|
||||
- chunk_vectors(summary embedding)
|
||||
- identities / identity_bindings / face_detections
|
||||
|
||||
**用途**: 完整搜尋 + 摘要 + 人物識別
|
||||
|
||||
---
|
||||
|
||||
## Worker Pipeline
|
||||
|
||||
```
|
||||
ASR 完成 → ASRX 完成
|
||||
↓
|
||||
Rule 1 Ingestion (sentence chunks)
|
||||
↓
|
||||
vectorize_chunks (sentence embedding)
|
||||
↓
|
||||
📦 Phase 1 release ───→ release/phase1/latest/ (base model)
|
||||
↓
|
||||
其他 processors 繼續 (yolo, face, pose, ocr, ...)
|
||||
↓
|
||||
Rule 3 Ingestion + 5W1H Agent
|
||||
↓
|
||||
📦 Phase 2 release ───→ release/phase2/latest/ (full model)
|
||||
```
|
||||
|
||||
## 產出目錄結構
|
||||
|
||||
```
|
||||
release/
|
||||
├── phase1/
|
||||
│ ├── {version}_{timestamp}/
|
||||
│ │ ├── output_json/ ← 所有已完成的 .json
|
||||
│ │ ├── chunks.csv ← sentence chunks
|
||||
│ │ ├── vectors.csv ← sentence embeddings
|
||||
│ │ ├── schema.sql ← chunks table DDL
|
||||
│ │ └── RELEASE_INFO.txt
|
||||
│ └── latest → {version}_{timestamp}
|
||||
│
|
||||
└── phase2/
|
||||
├── {version}_{timestamp}/
|
||||
│ ├── output_json/ ← 所有 .json
|
||||
│ ├── chunks.csv ← 所有 chunks
|
||||
│ ├── vectors.csv ← 所有 embeddings
|
||||
│ ├── identities.csv ← 人物身分
|
||||
│ ├── schema.sql ← 完整 schema
|
||||
│ └── RELEASE_INFO.txt
|
||||
└── latest → {version}_{timestamp}
|
||||
```
|
||||
|
||||
## momentry model vs momentry core
|
||||
|
||||
| | momentry model | momentry core |
|
||||
|---|---|---|
|
||||
| 類比 | 訓練好的 weights | inference engine |
|
||||
| 內容 | `.json` + chunks + vectors | Rust binary |
|
||||
| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
|
||||
| 版本 | `{uuid}_v1`(base) / `{uuid}_v2` / `{uuid}_v3` | `momentry_playground` / `momentry` |
|
||||
| 交付對象 | 終端使用者 | 部署工程師 |
|
||||
|
||||
---
|
||||
|
||||
## Wiki 機制:每個 model 都可被調整
|
||||
|
||||
每個 momentry model(`{uuid}_v1` / `v2` / `v3`)不只是唯讀的產出,而是可透過 wiki 機制持續改善。
|
||||
|
||||
### 與傳統 RAG 的區別
|
||||
|
||||
| | 傳統 RAG | momentry wiki |
|
||||
|---|---|---|
|
||||
| 知識儲存 | vector DB(ephemeral) | model package(permanent) |
|
||||
| 修正方式 | query 時 LLM 決定是否採用 | 使用者/Agent 直接編輯 |
|
||||
| 修正持久性 | ❌ 下次 query 就消失 | ✅ 寫入 model,版本化保存 |
|
||||
| 模型改進 | 無(僅改變 prompt) | 下次 version bump 時合併為 ground truth |
|
||||
| 協作方式 | 單向(retrieve → generate) | 雙向(編輯 → 合併 → 改進) |
|
||||
| 離線可用 | ❌ 需 vector DB + LLM | ✅ 離線查閱 wiki 目錄 |
|
||||
|
||||
**momentry wiki 不是 RAG 的替代品,而是 model 的生命週期管理機制。**
|
||||
|
||||
### 概念
|
||||
|
||||
```
|
||||
momentry model (release package)
|
||||
├── output_json/ ← 唯讀,processor 產出
|
||||
├── chunks.csv ← 唯讀,ingestion 產出
|
||||
├── vectors.csv ← 唯讀,embedding 產出
|
||||
└── wiki/ ← 可編輯,使用者貢獻知識
|
||||
├── identities.json ← "trace 5 = Audrey Hepburn"
|
||||
├── objects.json ← "object 42 = 郵票 #1"
|
||||
├── corrections.json ← "ASR 'Hello' → 'Halo'"
|
||||
└── changelog.json ← 編輯歷史
|
||||
```
|
||||
|
||||
### 資料流向
|
||||
|
||||
```
|
||||
使用者/Agent 編輯 wiki
|
||||
↓
|
||||
DB wiki_entries + wiki_revisions 寫入
|
||||
↓
|
||||
下次 release 打包時 merge 進 model
|
||||
↓
|
||||
TKG label 更新 (tkg_nodes.label)
|
||||
↓
|
||||
新版 model version bump
|
||||
```
|
||||
|
||||
### 與 TKG 的關係
|
||||
|
||||
wiki 的 identity 和 object 標註會回寫到 TKG node label:
|
||||
```
|
||||
(face_trace:5) label="Audrey Hepburn" ← wiki 編輯
|
||||
(object_instance:42) label="郵票 #1" ← wiki 編輯
|
||||
```
|
||||
|
||||
這些編輯累積後,可做為下一版 model training 的 ground truth。
|
||||
|
||||
### 實作方向
|
||||
|
||||
**DB 層** — 新 table `wiki_entries` + `wiki_revisions`:
|
||||
```sql
|
||||
wiki_entries (target_type, target_id, title, body, summary, status, version, file_uuid)
|
||||
wiki_revisions (entry_id, version, title, body, summary, change_summary, edited_by)
|
||||
```
|
||||
|
||||
**API 層** — CRUD + 版本歷史:
|
||||
```
|
||||
GET /api/v1/wiki/{target_type}/{target_id}
|
||||
PUT /api/v1/wiki/{target_type}/{target_id}
|
||||
GET /api/v1/wiki/{target_type}/{target_id}/revisions
|
||||
POST /api/v1/wiki/search
|
||||
```
|
||||
|
||||
**打包層** — `release_pack.py` 加入 wiki 匯出,與 model 共存
|
||||
|
||||
---
|
||||
|
||||
## Phase 3:Object Identity(v3 model)
|
||||
|
||||
### 目標
|
||||
|
||||
從影片中提取關鍵物體(郵票、手槍、信封、放大鏡...),對同類物體做 instance-level 的跨畫面追蹤與辨識,達到類似 face trace 的效果 — 不只是 detect class,還能區分「這一張郵票」vs「那一張郵票」。
|
||||
|
||||
### 現狀問題
|
||||
|
||||
1. **COCO 80 類不包含關鍵物體** — 郵票、手槍、信封、放大鏡等不在 COCO 資料集中
|
||||
2. **YOLOv5nano 偵測率低** — 即使是 COCO 類別(knife, cell phone)在 nano 模型上 recall 不足
|
||||
3. **無 object instance matching** — 目前只有 frame-level detection,沒有跨 frame 的物體追蹤
|
||||
|
||||
### 技術方向
|
||||
|
||||
```
|
||||
YOLOv8m/OWL-ViT → 改善 detection coverage
|
||||
↓
|
||||
Object Tracker (IoU + embedding,類似 face tracker)
|
||||
↓
|
||||
object_trace → TKG CO_OCCURS_WITH edges
|
||||
↓
|
||||
object identity → 同物體跨場景辨識
|
||||
```
|
||||
|
||||
| 方向 | 方法 | 效果 |
|
||||
|------|------|------|
|
||||
| Model upgrade | `yolov5nu` → `yolov8s.pt` / `yolov8m.pt` | COCO recall 提升 |
|
||||
| Custom fine-tune | 收集 stamps/guns 資料 fine-tune YOLO | 可偵測非 COCO 物件 |
|
||||
| Zero-shot | OWL-ViT / Grounding DINO by text prompt | 不用 training,但速度慢 |
|
||||
| Object trace | IoU + embedding 跨 frame 匹配 | instance-level 追蹤 |
|
||||
| Object identity | clustering 跨場景辨識同一物體 | 可在全片搜尋「這把槍」 |
|
||||
|
||||
### 與 TKG 整合
|
||||
|
||||
```
|
||||
face_trace -[:CO_OCCURS_WITH]-> object_instance:5 (這把槍)
|
||||
face_trace -[:CO_OCCURS_WITH]-> object_instance:42 (這張郵票)
|
||||
|
||||
查詢: "Audrey Hepburn 拿這把槍的畫面"
|
||||
→ face_trace:5 -[:SPEAKS_AS]-> SPEAKER_0
|
||||
→ face_trace:5 -[:CO_OCCURS_WITH]-> object_instance:5
|
||||
```
|
||||
|
||||
### 交付順序
|
||||
|
||||
1. YOLO model upgrade(低難度,立即見效)
|
||||
2. Object tracker(中難度,參考 face tracker 實作)
|
||||
3. Custom fine-tune / zero-shot(高難度,需資料或新模型)
|
||||
361
docs_v1.0/DESIGN/TMDb_Identity_File_System_V1.0.md
Normal file
361
docs_v1.0/DESIGN/TMDb_Identity_File_System_V1.0.md
Normal file
@@ -0,0 +1,361 @@
|
||||
---
|
||||
document_type: "design"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "TMDb 整合 — Identity 檔案系統設計"
|
||||
date: "2026-05-16"
|
||||
version: "V1.0"
|
||||
status: "completed"
|
||||
owner: "M5"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "tmdb"
|
||||
- "identity"
|
||||
- "cache"
|
||||
- "file-system"
|
||||
- "resource"
|
||||
- "design"
|
||||
ai_query_hints:
|
||||
- "查詢 TMDb Identity 檔案系統設計的內容"
|
||||
- "TMDb 整合的三個階段是什麼"
|
||||
- "如何從 cache 建立 TMDb identities"
|
||||
- "identity 檔案化目錄結構"
|
||||
- "TMDb resource API endpoint 列表"
|
||||
- "TMDb face matching 整合位置"
|
||||
related_documents:
|
||||
- "REFERENCE/Face_Pipeline.md"
|
||||
- "REFERENCE/Trace_Structure.md"
|
||||
- "REFERENCE/Demo_EndToEnd.md"
|
||||
- "REFERENCE/Services_Inventory.md"
|
||||
---
|
||||
|
||||
# TMDb 整合 — Identity 檔案系統設計 V1.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-16 |
|
||||
| 文件版本 | V1.0 |
|
||||
| 狀態 | Completed |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-16 | 三階段 TMDb 整合設計:Identity 檔案化、Agent Cache、Resource 納管 | OpenCode | DeepSeek V4 Flash |
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
三個計劃循序實作,建立 Identity 的 filesystem 副本與 TMDb 外部資源整合:
|
||||
|
||||
1. **Plan 1: Identity 檔案化** — 每個 identity 在 `{OUTPUT}/identities/{uuid}/identity.json` 有完整備份
|
||||
2. **Plan 2: TMDb Agent + Cache** — 唯一外連點,fetch TMDb API → cache 到 `{uuid}.tmdb.json`
|
||||
3. **Plan 3: TMDb 納管** — resource endpoint + health 整合
|
||||
|
||||
### 設計原則
|
||||
|
||||
- **全本地為預設**:TMDb 是唯一需要外連的服務,視為 optional plugin
|
||||
- **Cache-first**:TMDb API 只 call 一次,之後全從 local cache 讀
|
||||
- **Dual-write**:DB + filesystem 保持一致
|
||||
- **filesystem 為 canonical snapshot**:DB 是 primary store,filesystem 是可攜離線副本
|
||||
|
||||
---
|
||||
|
||||
## Plan 1: Identity 檔案化
|
||||
|
||||
### 目的
|
||||
|
||||
為每個 identity 建立 filesystem snapshot,使 identity 資料:
|
||||
- **可搬移**:`cp -r identities/` 到另一台機器即可
|
||||
- **可檢查**:`cat {uuid}/identity.json` 直接看完整 identity 資料
|
||||
- **可備份**:tar identities/ 即為 identity 完整備份
|
||||
- **可離線**:不需要 DB 也能取得 identity 基本資訊
|
||||
|
||||
### 目錄結構
|
||||
|
||||
```
|
||||
{OUTPUT_DIR}/
|
||||
├── identities/
|
||||
│ ├── _index.json ← { uuid: name } 索引
|
||||
│ ├── a9a901056d6b46ff92da0c3c1a57dff4/
|
||||
│ │ └── identity.json ← V1: 完整 identity 資訊
|
||||
│ └── b0b101167e8c4a53a0.../
|
||||
│ └── identity.json
|
||||
└── {file_uuid}.tmdb.json ← V2: TMDb raw cache
|
||||
```
|
||||
|
||||
### identity.json 格式
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 1,
|
||||
"identity_uuid": "a9a901056d6b46ff92da0c3c1a57dff4",
|
||||
"name": "Cary Grant",
|
||||
"identity_type": "people",
|
||||
"source": "tmdb",
|
||||
"status": "confirmed",
|
||||
"tmdb_id": 112,
|
||||
"tmdb_profile": "https://image.tmdb.org/t/p/w185/abc.jpg",
|
||||
"metadata": {
|
||||
"tmdb_character": "Peter Joshua",
|
||||
"tmdb_cast_order": 0,
|
||||
"tmdb_movie_id": 4808
|
||||
},
|
||||
"file_bindings": [
|
||||
{
|
||||
"file_uuid": "3a6c1865...",
|
||||
"trace_ids": [10, 23],
|
||||
"face_count": 12
|
||||
}
|
||||
],
|
||||
"created_at": "2026-05-16T12:00:00Z",
|
||||
"updated_at": "2026-05-16T12:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### _index.json 格式
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 1,
|
||||
"updated_at": "2026-05-16T12:00:00Z",
|
||||
"entries": {
|
||||
"a9a901056d6b46ff92da0c3c1a57dff4": "Cary Grant",
|
||||
"b0b101167e8c4a53a09d6c2a68e0abf1": "Audrey Hepburn"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 寫入策略:Dual-write
|
||||
|
||||
任何 identity 變更 → DB write → `save_identity_file()` → filesystem write
|
||||
|
||||
```
|
||||
identity 變更發生處:
|
||||
├── TMDb probe (probe.rs) → create_identities_from_data() → save_identity_file() per identity
|
||||
├── Face matching API (identity_agent_api.rs) → match_faces_iterative() → save_identity_file() per matched identity
|
||||
├── Face matching Worker P2.5 (job_worker.rs) → match_faces_against_tmdb() → save_identity_file() per affected identity
|
||||
├── Manual bind/unbind (identity_binding.rs) → bind/unbind handler → save_identity_file() per identity
|
||||
└── One-time migration (migrate_identity_files.py) → 全部 identities 檔案化
|
||||
```
|
||||
|
||||
### API: `storage.rs`
|
||||
|
||||
```rust
|
||||
// structs
|
||||
IdentityFile { version, identity_uuid, name, identity_type, source, status,
|
||||
tmdb_id, tmdb_profile, metadata, file_bindings, created_at, updated_at }
|
||||
FileBinding { file_uuid, trace_ids, face_count }
|
||||
|
||||
// core functions
|
||||
identity_dir(uuid: &str) -> PathBuf
|
||||
read_identity_file(uuid: &str) -> Result<IdentityFile>
|
||||
write_identity_file(file: &IdentityFile) -> Result<()>
|
||||
list_identity_uuids() -> Result<Vec<String>>
|
||||
count_identity_files() -> usize
|
||||
|
||||
// index
|
||||
read_index() -> Result<HashMap<String, String>>
|
||||
update_index(uuid: &str, name: &str) -> Result<()>
|
||||
|
||||
// dual-write hook
|
||||
async fn save_identity_file(db: &PostgresDb, uuid: &str) -> Result<()>
|
||||
// 1. 查 DB 取得 identity full data
|
||||
// 2. 查 DB 取得 file_bindings
|
||||
// 3. 寫 identity.json
|
||||
// 4. 更新 _index.json
|
||||
```
|
||||
|
||||
### 改動清單
|
||||
|
||||
| # | 檔案 | 屬性 | 內容 |
|
||||
|---|------|------|------|
|
||||
| 1.1 | `src/core/identity/storage.rs` | NEW | IdentityFile struct + CRUD + index + save_identity_file() |
|
||||
| 1.2 | `src/core/identity/mod.rs` | NEW | module declaration |
|
||||
| 1.3 | `src/core/mod.rs` | EDIT | `pub mod identity;` |
|
||||
| 1.4 | `src/core/db/postgres_db.rs` | EDIT | `get_identity_file_bindings(uuid)` helper |
|
||||
| 1.5 | `src/core/tmdb/probe.rs` | EDIT | hook: save_identity_file() |
|
||||
| 1.6 | `src/api/identity_binding.rs` | EDIT | hook: bind/unbind |
|
||||
| 1.7 | `src/api/identity_agent_api.rs` | EDIT | hook: match_faces_iterative |
|
||||
| 1.8 | `src/worker/job_worker.rs` | EDIT | hook: P2.5 matching |
|
||||
| 1.9 | `src/api/server.rs` | EDIT | health/detailed: identities section |
|
||||
| 1.10 | `scripts/migrate_identity_files.py` | NEW | one-time migration DB→filesystem |
|
||||
|
||||
---
|
||||
|
||||
## Plan 2: TMDb Agent + Cache
|
||||
|
||||
### 目的
|
||||
|
||||
將 TMDb 設定為「唯一外連點 + local cache」,實作全離線 identity enrichment。
|
||||
|
||||
### 目錄結構
|
||||
|
||||
```
|
||||
{OUTPUT_DIR}/
|
||||
├── {file_uuid}.tmdb.json ← TMDb raw cache (file-level)
|
||||
├── identities/{uuid}/
|
||||
│ └── identity.json ← Processed identity (identity-level)
|
||||
```
|
||||
|
||||
### Cache 格式 (`{uuid}.tmdb.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"file_uuid": "3a6c1865...",
|
||||
"fetched_at": "2026-05-16T12:00:00Z",
|
||||
"source": "agent",
|
||||
"movie": {
|
||||
"tmdb_id": 4808,
|
||||
"title": "Charade",
|
||||
"release_date": "1963-12-05",
|
||||
"overview": "After Regina Lampert...",
|
||||
"poster_path": "/8wvQp...jpg"
|
||||
},
|
||||
"cast": [
|
||||
{
|
||||
"name": "Cary Grant",
|
||||
"character": "Peter Joshua",
|
||||
"profile_path": "/abc123.jpg",
|
||||
"order": 0
|
||||
}
|
||||
],
|
||||
"cast_count": 20,
|
||||
"identities_created": 0
|
||||
}
|
||||
```
|
||||
|
||||
### 流程
|
||||
|
||||
```
|
||||
Step 1: POST /agents/tmdb/prefetch
|
||||
→ tmdb_agent.py (唯一外連) → TMDB API search → credits
|
||||
→ 寫入 {uuid}.tmdb.json (source: agent)
|
||||
|
||||
Step 2: POST /file/:uuid/tmdb-probe
|
||||
→ probe_from_cache() 讀 {uuid}.tmdb.json
|
||||
→ INSERT identities (source='tmdb')
|
||||
→ spawn tmdb_embed_extractor.py (背景)
|
||||
→ save_identity_file() for each identity (Plan 1 hook)
|
||||
|
||||
Step 3: POST /agents/identity/analyze (既存 endpoint)
|
||||
→ match_faces_iterative() 自動包含 TMDb identities
|
||||
```
|
||||
|
||||
### probe.rs 重構
|
||||
|
||||
```rust
|
||||
// 新增 (讀 cache)
|
||||
pub async fn probe_from_cache(db, file_uuid) -> Result<TmdbProbeResult> {
|
||||
let cache = cache::read_tmdb_cache(file_uuid)?;
|
||||
create_identities_from_data(db, file_uuid, &cache.movie, &cache.cast).await
|
||||
}
|
||||
|
||||
// 共用內部函數 (從 probe_movie 抽離)
|
||||
async fn create_identities_from_data(db, file_uuid, movie, cast) -> Result<TmdbProbeResult> {
|
||||
// 原本 probe_movie 的 INSERT + embed spawn + store logic
|
||||
// 尾端呼叫 save_identity_file() per identity
|
||||
}
|
||||
|
||||
// 保留 (direct API call, 後備)
|
||||
pub async fn probe_movie(db, filename, file_uuid) -> Result<...> {
|
||||
let movie_name = extract_movie_name(filename)?;
|
||||
// search TMDB API → credits
|
||||
// 可選擇性寫入 cache 供下次使用
|
||||
create_identities_from_data(db, file_uuid, &movie, &cast).await
|
||||
}
|
||||
```
|
||||
|
||||
### 改動清單
|
||||
|
||||
| # | 檔案 | 屬性 | 內容 |
|
||||
|---|------|------|------|
|
||||
| 2.1 | `src/core/tmdb/cache.rs` | NEW | TmdbCache struct + read/write |
|
||||
| 2.2 | `src/core/tmdb/mod.rs` | EDIT | `pub mod cache;` `pub mod status;` |
|
||||
| 2.3 | `src/core/tmdb/probe.rs` | EDIT | refactor: probe_from_cache() + create_identities_from_data() |
|
||||
| 2.4 | `scripts/tmdb_agent.py` | NEW | fetch TMDB API → cache tmdb.json |
|
||||
| 2.5 | `src/api/tmdb_api.rs` | NEW | 5 routes + 5 handlers |
|
||||
| 2.6 | `src/api/server.rs` | EDIT | `.merge(tmdb_routes())` |
|
||||
|
||||
---
|
||||
|
||||
## Plan 3: TMDb 納管
|
||||
|
||||
### 目的
|
||||
|
||||
將 TMDb 以 managed resource 形式納入系統監控與管理。
|
||||
|
||||
### health/detailed 擴充
|
||||
|
||||
```json
|
||||
{
|
||||
"integrations": {
|
||||
"tmdb": {
|
||||
"api_key_configured": true,
|
||||
"enabled": true,
|
||||
"api_reachable": true,
|
||||
"api_latency_ms": 120,
|
||||
"api_error": null,
|
||||
"last_check_at": "2026-05-16T12:00:00Z"
|
||||
}
|
||||
},
|
||||
"identities": {
|
||||
"directory_exists": true,
|
||||
"files_count": 3481,
|
||||
"index_ok": true,
|
||||
"db_count": 3481,
|
||||
"synced": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
| Method | Path | 說明 |
|
||||
|--------|------|------|
|
||||
| `GET` | `/api/v1/resource/tmdb` | TMDb 完整狀態 + stats + cache count |
|
||||
| `POST` | `/api/v1/resource/tmdb/check` | ping TMDb API → 更新健康狀態 |
|
||||
|
||||
### 改動清單
|
||||
|
||||
| # | 檔案 | 屬性 | 內容 |
|
||||
|---|------|------|------|
|
||||
| 3.1 | `src/core/tmdb/status.rs` | NEW | check_tmdb_api(), count_tmdb_identities(), count_cache_files() |
|
||||
| 3.2 | `src/api/tmdb_api.rs` | EDIT | GET/POST resource endpoints |
|
||||
| 3.3 | `src/api/server.rs` | EDIT | integrations in health/detailed |
|
||||
|
||||
---
|
||||
|
||||
## 完整 API 表 (Plan 2 + 3)
|
||||
|
||||
| Method | Path | Handler | Plan | Description |
|
||||
|--------|------|---------|------|-------------|
|
||||
| `POST` | `/api/v1/agents/tmdb/prefetch` | `prefetch_tmdb` | 2 | agent fetch TMDB → cache |
|
||||
| `POST` | `/api/v1/file/:file_uuid/tmdb-probe` | `tmdb_probe` | 2 | cache → identities |
|
||||
| `GET` | `/api/v1/file/:file_uuid/tmdb-cache` | `tmdb_cache_view` | 2 | view raw cache |
|
||||
| `GET` | `/api/v1/resource/tmdb` | `tmdb_resource_status` | 3 | full TMDb status |
|
||||
| `POST` | `/api/v1/resource/tmdb/check` | `tmdb_resource_check` | 3 | ping health check |
|
||||
|
||||
## Migration
|
||||
|
||||
一次性腳本:`scripts/migrate_identity_files.py`
|
||||
|
||||
```bash
|
||||
python3 scripts/migrate_identity_files.py
|
||||
# → 讀 DB identities table → 寫 identity files → 建 index
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 執行順序
|
||||
|
||||
```
|
||||
Plan 1 (identity 檔案化) → Plan 2 (TMDb agent) → Plan 3 (TMDb 納管)
|
||||
1.1 → 1.2 → 1.3 → 2.1 → 2.2 → 2.3 → 3.1 → 3.2 → 3.3
|
||||
1.4 → 1.5 → 1.6 → 2.4 → 2.5 → 2.6
|
||||
1.7 → 1.8 → 1.9 →
|
||||
1.10
|
||||
```
|
||||
101
docs_v1.0/DESIGN/TRACE_SEARCH_API_DESIGN.md
Normal file
101
docs_v1.0/DESIGN/TRACE_SEARCH_API_DESIGN.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Trace Search API 設計
|
||||
|
||||
## 概念
|
||||
|
||||
trace 是一種 chunk。
|
||||
|
||||
現有的 chunk_type: `cut`, `sentence`, `visual`, `story`
|
||||
新增 chunk_type: `trace`
|
||||
|
||||
每個 trace(人物跨 frame 追蹤軌跡)就是一個時間區間 + 區間內的 ASR text。
|
||||
跟其他 chunk 完全一樣,只是切分維度不同:
|
||||
- cut chunk = 鏡頭切換
|
||||
- sentence chunk = 語句邊界
|
||||
- visual chunk = 畫面物體組合
|
||||
- **trace chunk = 人物出現區間 + 當下 spoken text**
|
||||
|
||||
這樣 trace 可以直接放進現有的 `chunks` 表,共用 embedding、搜尋、Qdrant sync 整套機制,不需要任何新 table。
|
||||
|
||||
## chunks 表現有結構
|
||||
|
||||
```sql
|
||||
chunks (
|
||||
id, file_uuid, chunk_type, -- 'trace' 新增
|
||||
start_frame, end_frame, start_time, end_time,
|
||||
text_content, -- trace 區間的 ASR text
|
||||
embedding, -- text_content 的 pgvector
|
||||
metadata JSONB, -- { trace_id, face_count, identity_id, identity_name }
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
## 資料產生流程(worker 擴充)
|
||||
|
||||
在 face processing + `store_traced_faces.py` 完成後:
|
||||
|
||||
1. 查詢 `face_detections` 聚合每個 trace 的 `MIN(frame)`, `MAX(frame)`, `COUNT(*)`
|
||||
2. 對每個 trace,查詢 `pre_chunks WHERE processor_type='asr'` 中與 trace time range 重疊的 text
|
||||
3. 彙整 text → EmbeddingGemma 產生 `embedding`
|
||||
4. 寫入 `chunks`(`chunk_type='trace'`),metadata 含 `trace_id`, `face_count`, `identity_id`
|
||||
5. embedding 自動進 Qdrant(與既有 chunk 同一 collection)
|
||||
|
||||
## Search API 擴充
|
||||
|
||||
Universal Search 的 `types` 原本就支援 `"chunk"`。
|
||||
在 chunk 搜尋中過濾 `chunk_type = 'trace'` 即可。
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"query": "open the door",
|
||||
"types": ["chunk"],
|
||||
"filters": { "chunk_type": "trace" },
|
||||
"uuid": "aeed71342a899fe4b4c57b7d41bcb692",
|
||||
"page": 1,
|
||||
"page_size": 20
|
||||
}
|
||||
```
|
||||
|
||||
**Response**(與既有 Chunk result 相同):
|
||||
```json
|
||||
{
|
||||
"type": "chunk",
|
||||
"chunk_id": "chunk_42",
|
||||
"chunk_type": "trace",
|
||||
"start_frame": 45200, "end_frame": 45900,
|
||||
"start_time": 1808.0, "end_time": 1836.0,
|
||||
"score": 0.87,
|
||||
"text": "Open the door. Come on, hurry up.",
|
||||
"metadata": {
|
||||
"trace_id": 5,
|
||||
"face_count": 42,
|
||||
"identity_name": "Audrey Hepburn"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
完全沿用既有的 `SearchResult::Chunk` variant,不用新增 enum variant。
|
||||
|
||||
### 搜尋語法
|
||||
|
||||
```sql
|
||||
SELECT c.*
|
||||
FROM dev.chunks c
|
||||
WHERE c.file_uuid = $1
|
||||
AND c.chunk_type = 'trace'
|
||||
AND c.embedding IS NOT NULL
|
||||
ORDER BY c.embedding <=> $2
|
||||
LIMIT $3;
|
||||
```
|
||||
|
||||
## 總結
|
||||
|
||||
| 項目 | 作法 |
|
||||
|------|------|
|
||||
| 新 table | ❌ 不需要 |
|
||||
| 新 enum variant | ❌ 不需要 |
|
||||
| SearchResult 改動 | ❌ 不需要 |
|
||||
| chunk_type 新增 | ✅ `'trace'` |
|
||||
| worker 擴充 | ✅ 產生 trace chunk (face done 後) |
|
||||
| SearchFilters 擴充 | ✅ 加 `chunk_type` filter |
|
||||
| Qdrant | ✅ 自動(既有 chunk collection) |
|
||||
1453
docs_v1.0/DESIGN/VIDEO_PROCESSING_SPEC.md
Normal file
1453
docs_v1.0/DESIGN/VIDEO_PROCESSING_SPEC.md
Normal file
File diff suppressed because it is too large
Load Diff
264
docs_v1.0/DESIGN/VIDEO_REGISTRATION.md
Normal file
264
docs_v1.0/DESIGN/VIDEO_REGISTRATION.md
Normal file
@@ -0,0 +1,264 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Video Registration"
|
||||
date: "2026-03-25"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "video"
|
||||
- "registration"
|
||||
ai_query_hints:
|
||||
- "查詢 Video Registration 的內容"
|
||||
- "Video Registration 的主要目的是什麼?"
|
||||
- "如何操作或實施 Video Registration?"
|
||||
---
|
||||
|
||||
# Video Registration
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | Warren |
|
||||
| 建立時間 | 2026-03-25 |
|
||||
| 文件版本 | V1.1 |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-03-25 | 創建文件 | Warren | OpenCode |
|
||||
| V1.1 | 2026-03-26 | 修正 curl 範例,新增 API Key 驗證標頭 | OpenCode | deepseek-reasoner |
|
||||
|
||||
---
|
||||
|
||||
## 概述
|
||||
|
||||
影片註冊 API (`POST /api/v1/register`) 用於將影片加入 Momentry Core 系統進行處理。
|
||||
|
||||
## 路徑格式
|
||||
|
||||
### 支援的路徑格式
|
||||
|
||||
| 格式 | 範例 | 說明 |
|
||||
|------|------|------|
|
||||
| 相對路徑 | `./demo/video.mp4` | 推薦格式 |
|
||||
| 相對路徑(無 ./) | `demo/video.mp4` | 自動加上 `./` |
|
||||
| 絕對路徑 | `/Users/.../sftpgo/data/demo/video.mp4` | 支援但不推薦 |
|
||||
|
||||
### 路徑結構
|
||||
|
||||
```
|
||||
./username/filepath
|
||||
│ │ │
|
||||
│ │ └── 檔案路徑(可以是多層目錄)
|
||||
│ └── 使用者名稱(SFTPgo 用戶目錄名稱)
|
||||
└── 相對路徑前綴
|
||||
```
|
||||
|
||||
**範例**:
|
||||
- `./demo/video.mp4` → username=`demo`, filepath=`video.mp4`
|
||||
- `./demo/movies/2024/video.mp4` → username=`demo`, filepath=`movies/2024/video.mp4`
|
||||
- `./warren/project1/interview.mp4` → username=`warren`, filepath=`project1/interview.mp4`
|
||||
|
||||
## UUID 計算
|
||||
|
||||
### 計算規則
|
||||
|
||||
```
|
||||
UUID = SHA256(username/filepath)[0:16]
|
||||
```
|
||||
|
||||
**範例**:
|
||||
```rust
|
||||
// 路徑: ./demo/video.mp4
|
||||
// username: "demo"
|
||||
// filepath: "video.mp4"
|
||||
// key: "demo/video.mp4"
|
||||
// UUID: SHA256("demo/video.mp4")[0:16]
|
||||
```
|
||||
|
||||
### 特性
|
||||
|
||||
| 特性 | 說明 |
|
||||
|------|------|
|
||||
| 用戶隔離 | 不同用戶的相同檔名會產生不同 UUID |
|
||||
| 一致性 | 相同相對路徑一定產生相同 UUID |
|
||||
| 遷移安全 | SFTPgo 資料路徑變更後 UUID 保持一致 |
|
||||
|
||||
### 範例
|
||||
|
||||
```rust
|
||||
// 用戶 demo 的影片
|
||||
compute_uuid_from_relative_path("./demo/video.mp4")
|
||||
// → "9760d0820f0cf9a7"
|
||||
|
||||
// 用戶 warren 的相同檔名影片
|
||||
compute_uuid_from_relative_path("./warren/video.mp4")
|
||||
// → "a1b2c3d4e5f6g7h8" (不同的 UUID)
|
||||
```
|
||||
|
||||
## 重複註冊檢查
|
||||
|
||||
### 行為
|
||||
|
||||
1. 系統檢查 UUID 是否已存在於資料庫
|
||||
2. 如果存在,返回 `already_exists: true` 和現有影片資訊
|
||||
3. 如果不存在,創建新的影片記錄
|
||||
|
||||
### API 回應
|
||||
|
||||
**新註冊**:
|
||||
```json
|
||||
{
|
||||
"uuid": "9760d0820f0cf9a7",
|
||||
"video_id": 18,
|
||||
"job_id": 2,
|
||||
"file_name": "video.mp4",
|
||||
"duration": 159.637188,
|
||||
"width": 640,
|
||||
"height": 360,
|
||||
"already_exists": false
|
||||
}
|
||||
```
|
||||
|
||||
**重複註冊**:
|
||||
```json
|
||||
{
|
||||
"uuid": "9760d0820f0cf9a7",
|
||||
"video_id": 18,
|
||||
"job_id": 2,
|
||||
"file_name": "video.mp4",
|
||||
"duration": 159.637188,
|
||||
"width": 640,
|
||||
"height": 360,
|
||||
"already_exists": true
|
||||
}
|
||||
```
|
||||
|
||||
## SFTPgo 整合
|
||||
|
||||
### 目錄結構
|
||||
|
||||
SFTPgo 的用戶目錄結構:
|
||||
|
||||
```
|
||||
/Users/accusys/momentry/var/sftpgo/data/
|
||||
├── demo/ ← 用戶目錄
|
||||
│ ├── video.mp4
|
||||
│ └── movies/
|
||||
│ └── movie1.mp4
|
||||
├── warren/ ← 用戶目錄
|
||||
│ └── project1/
|
||||
│ └── interview.mp4
|
||||
└── momentry/ ← 用戶目錄
|
||||
└── presentation.mp4
|
||||
```
|
||||
|
||||
### 註冊流程
|
||||
|
||||
1. SFTPgo 用戶上傳檔案到各自的目錄
|
||||
2. n8n 或其他服務調用註冊 API
|
||||
3. 使用相對路徑格式:`./username/filepath`
|
||||
4. 系統計算 UUID 並檢查重複
|
||||
5. 創建處理任務
|
||||
|
||||
## 程式碼範例
|
||||
|
||||
### 註冊影片
|
||||
|
||||
```bash
|
||||
# 使用相對路徑註冊
|
||||
curl -X POST http://localhost:3002/api/v1/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: YOUR_API_KEY" \
|
||||
-d '{"path": "./demo/video.mp4"}'
|
||||
|
||||
# 或使用多層目錄
|
||||
curl -X POST http://localhost:3002/api/v1/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: YOUR_API_KEY" \
|
||||
-d '{"path": "./demo/movies/2024/video.mp4"}'
|
||||
```
|
||||
|
||||
### UUID 計算函數
|
||||
|
||||
```rust
|
||||
// 使用相對路徑計算 UUID
|
||||
pub fn compute_uuid_from_relative_path(relative_path: &str) -> String {
|
||||
let (username, filepath) = extract_user_from_relative_path(relative_path);
|
||||
compute_uuid(&username, &filepath)
|
||||
}
|
||||
|
||||
// 從相對路徑提取用戶名和檔案路徑
|
||||
pub fn extract_user_from_relative_path(relative_path: &str) -> (String, String) {
|
||||
let path = relative_path.strip_prefix("./").unwrap_or(relative_path);
|
||||
let path_buf = PathBuf::from(path);
|
||||
|
||||
let mut components = path_buf.components();
|
||||
let username = components
|
||||
.next()
|
||||
.map(|c| c.as_os_str().to_string_lossy().to_string())
|
||||
.unwrap_or_default();
|
||||
|
||||
let filepath: String = components
|
||||
.map(|c| c.as_os_str().to_string_lossy().to_string())
|
||||
.collect::<Vec<_>>()
|
||||
.join("/");
|
||||
|
||||
(username, filepath)
|
||||
}
|
||||
```
|
||||
|
||||
## 相關 API
|
||||
|
||||
### Probe API(僅探測,不註冊)
|
||||
|
||||
如果只需要取得影片資訊而不註冊,可以使用 Probe API:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3002/api/v1/probe \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: YOUR_API_KEY" \
|
||||
-d '{"path": "./demo/video.mp4"}'
|
||||
```
|
||||
|
||||
**回應範例**:
|
||||
```json
|
||||
{
|
||||
"uuid": "a1b10138a6bbb0cd",
|
||||
"file_name": "video.mp4",
|
||||
"duration": 120.5,
|
||||
"width": 1920,
|
||||
"height": 1080,
|
||||
"fps": 30.0,
|
||||
"cached": false,
|
||||
"format": {...},
|
||||
"streams": [...]
|
||||
}
|
||||
```
|
||||
|
||||
**與 Register API 的差異**:
|
||||
|
||||
| 功能 | Probe API | Register API |
|
||||
|------|-----------|---------------|
|
||||
| 計算 UUID | ✓ | ✓ |
|
||||
| 執行 ffprobe | ✓ | ✓ |
|
||||
| 儲存 probe.json | ✓ | ✓ |
|
||||
| 寫入 videos 表 | ✗ | ✓ |
|
||||
| 建立 monitor_job | ✗ | ✓ |
|
||||
| 返回 job_id | ✗ | ✓ |
|
||||
| 適用場景 | 預覽影片資訊 | 註冊並處理影片 |
|
||||
|
||||
## 相關檔案
|
||||
|
||||
| 檔案 | 說明 |
|
||||
|------|------|
|
||||
| `src/core/storage/uuid.rs` | UUID 計算邏輯 |
|
||||
| `src/api/server.rs` | 註冊與 Probe API 實現 |
|
||||
| `src/core/probe/ffprobe.rs` | ffprobe 整合 |
|
||||
| `docs_v1.0/IMPLEMENTATION/SFTPGO_DEMO_USER.md` | SFTPgo 用戶設置 |
|
||||
| `docs_v1.0/REFERENCE/API_ENDPOINTS.md` | API 端點總覽 |
|
||||
201
docs_v1.0/DESIGN/VISION_AGENT_API.md
Normal file
201
docs_v1.0/DESIGN/VISION_AGENT_API.md
Normal file
@@ -0,0 +1,201 @@
|
||||
# Momentry Eye API Reference
|
||||
|
||||
**Vision Agent** — Multi-model zero-shot object detection service.
|
||||
Port: `5052` | Resource IDs: `eye-gdino`, `eye-paligemma`
|
||||
|
||||
---
|
||||
|
||||
## Models
|
||||
|
||||
| Model | ID | Params | Size | Confidence | Speed | License |
|
||||
|-------|-----|--------|------|------------|-------|---------|
|
||||
| Grounding DINO | `grounding-dino` | 232M | 891MB | ✅ 0-1 score | ~340ms | Apache 2.0 |
|
||||
| PaliGemma 3B | `paligemma` | 2,923M | ~3GB | ❌ no score | ~80ms | Gemma license |
|
||||
|
||||
## Endpoints
|
||||
|
||||
### `GET /health`
|
||||
|
||||
System status and loaded models.
|
||||
|
||||
```bash
|
||||
curl localhost:5052/health
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"models_loaded": ["grounding-dino"],
|
||||
"models_available": ["grounding-dino", "paligemma"],
|
||||
"device": "mps",
|
||||
"port": 5052
|
||||
}
|
||||
```
|
||||
|
||||
### `GET /models`
|
||||
|
||||
List available models with specs.
|
||||
|
||||
```bash
|
||||
curl localhost:5052/models
|
||||
```
|
||||
|
||||
### `POST /detect`
|
||||
|
||||
Detect objects in a single video frame.
|
||||
|
||||
```bash
|
||||
curl localhost:5052/detect \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"time":5461, "prompt":"gun", "model":"grounding-dino"}'
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
| Param | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `uuid` | string | `aeed71342a...` | Video file UUID |
|
||||
| `time` | float | `0` | Timestamp in seconds |
|
||||
| `prompt` | string | `"gun"` | Object to detect |
|
||||
| `model` | string | `"grounding-dino"` | Model: `grounding-dino`, `paligemma`, or `fusion` |
|
||||
| `threshold` | float | `0.1` | Minimum confidence (GDINO only) |
|
||||
| `weights` | object | — | Fusion weights, e.g. `{"grounding-dino":0.6,"paligemma":0.4}` |
|
||||
|
||||
**Fusion mode** runs both models and combines results with weighted scoring. Default weights: GDINO 0.6, PaliGemma 0.4.
|
||||
|
||||
```bash
|
||||
# Fusion: run both models, combine results
|
||||
curl localhost:5052/detect \
|
||||
-d '{"time":206, "prompt":"water gun", "model":"fusion"}'
|
||||
|
||||
# Custom fusion weights
|
||||
curl localhost:5052/detect \
|
||||
-d '{"time":206, "prompt":"gun", "model":"fusion",
|
||||
"weights":{"grounding-dino":0.5,"paligemma":0.5}}'
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "grounding-dino",
|
||||
"detections": [
|
||||
{"bbox": [726.2, 567.4, 969.0, 694.6], "score": 0.476, "label": "gun"},
|
||||
{"bbox": [686.7, 567.0, 969.6, 918.3], "score": 0.262, "label": "gun"}
|
||||
],
|
||||
"time_ms": 345.2,
|
||||
"n_detections": 2,
|
||||
"shot_url": "/shots/aeed7134_5461s_gun_grounding-dino.jpg"
|
||||
}
|
||||
```
|
||||
|
||||
**Fusion response** also includes `per_model` (detections per model) and `fusion` (deduplicated combined list with `fused_score`).
|
||||
|
||||
### `POST /search`
|
||||
|
||||
Search across a time range.
|
||||
|
||||
```bash
|
||||
# Natural language query
|
||||
curl localhost:5052/search \
|
||||
-d '{"query":"find the gun", "range":"5400-5600", "interval":10}'
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
| Param | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `query` | string | `"find the gun"` | Natural language query (parsed to extract object) |
|
||||
| `target` | string | — | `file_uuid:chunk_id` or `file_uuid:trace_id` — resolves to time range |
|
||||
| `range` | string | `"0-6780"` | Manual time range |
|
||||
| `interval` | int | `30` | Scan interval in seconds |
|
||||
| `model` | string | `"grounding-dino"` | Detection model |
|
||||
| `threshold` | float | `0.15` | Minimum confidence |
|
||||
|
||||
**Target resolution:**
|
||||
|
||||
| Format | Example | Resolves to |
|
||||
|--------|---------|-------------|
|
||||
| `file_uuid:chunk_id` | `uuid:uuid_story_90` | Chunk's time range |
|
||||
| `file_uuid:trace_id` | `uuid:trace_5` | Trace's time range |
|
||||
| `file_uuid:chunk_index` | `uuid:500` | Chunk index 500's range |
|
||||
|
||||
```bash
|
||||
# Using target
|
||||
curl localhost:5052/search \
|
||||
-d '{"target":"aeed71342...:aeed71342..._story_90", "query":"gun"}'
|
||||
|
||||
# Using trace
|
||||
curl localhost:5052/search \
|
||||
-d '{"target":"aeed71342...:trace_5", "query":"person"}'
|
||||
```
|
||||
|
||||
### `POST /multimodal`
|
||||
|
||||
Multi-modal search across sentence chunks — combines ASR text match + visual confirmation.
|
||||
|
||||
```bash
|
||||
# Search for Jean-Louis: ASR match + GDINO child detection
|
||||
curl localhost:5052/multimodal \
|
||||
-d '{"keyword":"Jean-Louis", "prompt":"child"}'
|
||||
|
||||
# Search trace chunks visually (no ASR)
|
||||
curl localhost:5052/multimodal \
|
||||
-d '{"keyword":"", "prompt":"person", "chunk_type":"trace", "range":"3500-4000"}'
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
| Param | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `keyword` | string | — | ASR keyword to search in sentence text |
|
||||
| `prompt` | string | same as keyword | Visual prompt for GDINO |
|
||||
| `chunk_type` | string | `"sentence"` | `sentence`, `trace`, `story`, `cut` |
|
||||
| `target` | string | — | Specific chunk target |
|
||||
| `range` | string | `"0-6780"` | Time range (for non-sentence chunks) |
|
||||
| `threshold` | float | `0.15` | Visual detection threshold |
|
||||
|
||||
### `GET /shots/<filename>`
|
||||
|
||||
Retrieve annotated detection images.
|
||||
|
||||
```bash
|
||||
curl -o result.jpg localhost:5052/shots/aeed7134_5461s_gun_grounding-dino.jpg
|
||||
```
|
||||
|
||||
## Object Detection Performance Summary
|
||||
|
||||
| Object type | Size in frame | GDINO | PaliGemma | Best prompt |
|
||||
|-------------|--------------|-------|-----------|-------------|
|
||||
| Gun (realistic) | 15-30% | ✅ 0.36-0.67 | ✅ | `pistol` / `handgun` |
|
||||
| Water gun (toy) | 15-31% | ❌ 0 | ✅ | `water gun` (PaliGemma) |
|
||||
| Child (Jean-Louis) | 30-60% | ⚠️ 0.3-0.9 | ❌ | `child` (high FP on adults) |
|
||||
| Stamp | <5% | ❌ FP | ❌ | — |
|
||||
| Passport | <10% | ❌ FP | ❌ | — |
|
||||
| Magnifying glass | <5% | ❌ FP | ❌ | — |
|
||||
| Cup / Bottle | 5-15% | ✅ 0.3-0.5 | — | `cup` / `bottle` |
|
||||
| Cell phone | 5-10% | ✅ 0.3-0.5 | — | `cell phone` |
|
||||
|
||||
## Resource Registration
|
||||
|
||||
On startup, the agent auto-registers as resources in `dev.resources`:
|
||||
|
||||
| Resource ID | Type | Status |
|
||||
|-------------|------|--------|
|
||||
| `eye-gdino` | `vision_model` | `online` |
|
||||
| `eye-paligemma` | `vision_model` | `online` |
|
||||
|
||||
Heartbeat updates every 60 seconds. Discover via:
|
||||
|
||||
```sql
|
||||
SELECT * FROM dev.resources WHERE resource_type = 'vision_model';
|
||||
```
|
||||
|
||||
## Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `scripts/vision_agent.py` | Vision Agent server (port 5052) |
|
||||
| `output_dev/vision_shots/` | Annotated detection screenshots |
|
||||
| `docs/ZERO_SHOT_DETECTION_RESEARCH.md` | Full model research report |
|
||||
105
docs_v1.0/DESIGN/VISUALIZATION_TOOL_CHOICES_V1.0.0.md
Normal file
105
docs_v1.0/DESIGN/VISUALIZATION_TOOL_CHOICES_V1.0.0.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# 視覺呈現工具選型 v1.0.0
|
||||
|
||||
Momentry 前端視覺化工具選擇記錄。
|
||||
|
||||
## SVG(內建)
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | Trace 時間軸、泳道圖、長條圖、矩陣 |
|
||||
| 授權 | 瀏覽器內建,無授權問題 |
|
||||
| 適用 | V1 TraceThumbnailTimeline、V2 IdentitySwimlane、V3 DurationHistogram、V4 SimilarityMatrix |
|
||||
| 優點 | 零依賴、向量清晰、可互動 |
|
||||
| 缺點 | 大規模節點時效能下降 |
|
||||
|
||||
## Three.js
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | 3D 臉部網格、3D 時空立方體 |
|
||||
| 授權 | **MIT** — 可商用,需保留版權聲明 |
|
||||
| 適用 | Face3DViewer(MediaPipe 468 landmarks)、V5 3D Space-Time Cube |
|
||||
| npm | `three` + `@types/three` |
|
||||
| 檔案 | `node_modules/three/LICENSE`(MIT) |
|
||||
| Bundle | 約 120KB gzip |
|
||||
| 優點 | WebGL 封裝完整、OrbitControls、社群龐大 |
|
||||
| 缺點 | 需手動管理 Dispose 避免記憶體洩漏 |
|
||||
|
||||
## MediaPipe Face Mesh
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | 人臉 468 個 3D landmark 偵測 |
|
||||
| 授權 | **Apache 2.0** — 可商用 |
|
||||
| 適用 | Face3DViewer |
|
||||
| 部署 | `scripts/face_landmarks_server.py`(port 11437) |
|
||||
| 輸入 | 臉部裁切 JPEG |
|
||||
| 輸出 | 478 個 (x, y, z) 3D 座標 |
|
||||
| 優點 | 輕量即時、跨平台 |
|
||||
| 缺點 | 僅正面臉部、無紋理 |
|
||||
|
||||
## Three.js Face3DViewer 記憶體管理
|
||||
|
||||
```typescript
|
||||
// 正確的 Dispose 模式
|
||||
function disposeScene() {
|
||||
cancelAnimationFrame(animId)
|
||||
for (const obj of objects) {
|
||||
scene?.remove(obj)
|
||||
if (obj instanceof THREE.Mesh) {
|
||||
obj.geometry?.dispose()
|
||||
if (Array.isArray(obj.material)) obj.material.forEach(m => m.dispose())
|
||||
else obj.material?.dispose()
|
||||
}
|
||||
if (obj instanceof THREE.Points) {
|
||||
obj.geometry?.dispose()
|
||||
if (obj.material) obj.material.dispose()
|
||||
}
|
||||
}
|
||||
objects = []
|
||||
controls?.dispose()
|
||||
controls = null
|
||||
if (renderer) { renderer.dispose(); renderer = null }
|
||||
scene = null; camera = null
|
||||
}
|
||||
```
|
||||
|
||||
## 技術選型對照
|
||||
|
||||
| 視覺化 | 工具 | 授權 | Bundle | 狀態 |
|
||||
|--------|------|:----:|:-----:|:----:|
|
||||
| V0 Trace Grid | Vue + Tailwind | — | 0 KB | ✅ |
|
||||
| V1 Thumbnail Timeline | SVG | — | 0 KB | ✅ |
|
||||
| V2 Identity Swimlane | SVG | — | 0 KB | ✅ |
|
||||
| V3 Duration Histogram | SVG | — | 0 KB | ✅ |
|
||||
| V4 Similarity Matrix | SVG | — | 0 KB | ✅ |
|
||||
| 3D Face Mesh | Three.js | MIT | ~120 KB | ✅ |
|
||||
| V5 3D Space-Time Cube | Three.js | MIT | ~120 KB | 🔜 |
|
||||
| Heatmap (Canvas) | Canvas 2D | — | 0 KB | 🔜 |
|
||||
| Trace Video | ffmpeg | GPL | 獨立行程 | ✅ |
|
||||
| **文件渲染** | | | | |
|
||||
| API 文件 | **Markdown** | — | 0 KB | ✅ |
|
||||
| API 圖解 | **Mermaid** (flowchart, sequence, ER, mindmap) | MIT | ~50 KB (VS Code 插件) | ✅ |
|
||||
| CLI 閱讀 | **glow** (terminal MD renderer) | MIT | 獨立 binary | ✅ |
|
||||
|
||||
## Markdown
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | 所有 API 文件、設計規格、測試報告 |
|
||||
| 授權 | 純文字格式,無授權問題 |
|
||||
| 工具 | VS Code 內建預覽、`glow` CLI |
|
||||
| 優點 | 版本控制友善(diff 可讀)、純文字、跨平台 |
|
||||
| 缺點 | 無動態互動能力 |
|
||||
|
||||
## Mermaid
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | API 流程圖(sequence)、架構圖(flowchart)、資料模型(ER)、端點總覽(mindmap) |
|
||||
| 授權 | **MIT** — 可商用 |
|
||||
| VS Code 插件 | `Markdown Preview Mermaid Support` |
|
||||
| 支援圖表 | flowchart, sequence, class, state, ER, mindmap, pie, gantt |
|
||||
| 檔案 | `API_USAGE_GUIDE_V1.0.0.md`(含 6 張 Mermaid 圖表) |
|
||||
| 優點 | Markdown 內嵌、版本控制友善、免截圖 |
|
||||
| 缺點 | VS Code/GitHub 以外需插件支援 |
|
||||
114
docs_v1.0/DESIGN/VOICE_TECH_CHOICES_V1.0.0.md
Normal file
114
docs_v1.0/DESIGN/VOICE_TECH_CHOICES_V1.0.0.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# 語音互動技術選型 v1.0.0
|
||||
|
||||
Momentry Demo Runner 語音技術選擇記錄。
|
||||
|
||||
## 語音輸出(TTS)
|
||||
|
||||
### macOS `say`(已採用)
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | 朗讀展示解說文字 |
|
||||
| 授權 | macOS 內建,無授權問題 |
|
||||
| 語言 | 支援 40+ 語言,含中文(Meijia)、英文(Samantha)、日文(Kyoko)等 |
|
||||
| 方式 | `subprocess.Popen(["say", "-v", "Meijia", "文字"])` |
|
||||
| 優點 | 零安裝、零依賴、低延遲、多語系 |
|
||||
| 缺點 | 僅 macOS、無法控制語速微調 |
|
||||
|
||||
**結論**:最適合 Momentry 的 TTS 方案 — macOS 內建、免費、多語系支援完整。
|
||||
|
||||
---
|
||||
|
||||
## 語音輸入(Speech-to-Command)
|
||||
|
||||
### 方案比較
|
||||
|
||||
| 方案 | 本地/雲端 | 語言 | 模型大小 | 延遲 | 精準度 | 授權 |
|
||||
|------|:---------:|:----:|:--------:|:----:|:------:|:----:|
|
||||
| **Vosk**(已整合) | ✅ **本地** | 中+英 | 42MB | 即時 | 中高 | Apache 2.0 |
|
||||
| macOS NSSpeechRecognizer | ✅ 本地 | 多語 | 系統內建 | 即時 | 中 | macOS 內建 |
|
||||
| Google Speech Recognition | ☁️ 雲端 | 120+ 語言 | — | ~1s | 高 | 免費(有限額) |
|
||||
| Whisper (tiny) | ✅ 本地 | 100+ 語言 | ~150MB | ~2s | 高 | MIT |
|
||||
| Porcupine | ✅ 本地 | 關鍵字 | ~2MB | 即時 | 高(限關鍵字) | Apache 2.0 |
|
||||
|
||||
### Vosk(已採用為本地方案)
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 模型 | `vosk-model-small-cn-0.22`(42MB,中文) |
|
||||
| 語言 | 中文、英文(需下載對應模型) |
|
||||
| 方式 | Python `vosk` 套件直接呼叫 |
|
||||
| 優點 | 純本地、即時、中英皆可、模型小 |
|
||||
| 缺點 | 需下載模型(一次性)、嘈雜環境精準度下降 |
|
||||
| 語音 | 僅偵測指令關鍵字:next/stop/repeat/goto 等 |
|
||||
|
||||
### Google Speech Recognition(備援方案)
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 用途 | 當 Vosk 模型未安裝時自動降級使用 |
|
||||
| 方式 | Python `SpeechRecognition` + Google API |
|
||||
| 優點 | 免下載模型、精準度高、多語系 |
|
||||
| 缺點 | **需網路**、每次請求 ~1s 延遲、有使用配額限制 |
|
||||
|
||||
### 整合策略
|
||||
|
||||
```
|
||||
啟動 --voice-control
|
||||
│
|
||||
├── Vosk 模型存在? → 使用 Vosk(本地離線)
|
||||
│
|
||||
└── Vosk 不存在? → 使用 Google(需網路)
|
||||
│
|
||||
└── 也失敗? → 顯示「語音不可用」
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Demo Runner 整合
|
||||
|
||||
### 指令集(中英雙語)
|
||||
|
||||
| 指令 | English | 功能 |
|
||||
|:----:|:-------:|------|
|
||||
| 下一個 / 繼續 | next / continue | 前進到下一步 |
|
||||
| 停止 | stop / quit | 結束當前展示 |
|
||||
| 重複 | repeat / again | 重複朗讀當前解說 |
|
||||
| 跳到第 N 步 | go to N / step N | 跳到指定步驟 |
|
||||
|
||||
### 程式碼結構
|
||||
|
||||
```python
|
||||
# 背景執行緒監聽語音
|
||||
def voice_command_listener(lang):
|
||||
# 1. 嘗試 Vosk(本地)
|
||||
# 2. 降級 Google Speech Recognition(雲端)
|
||||
# 3. 將辨識結果放入佇列
|
||||
|
||||
# 主迴圈輪詢佇列
|
||||
def main():
|
||||
while demo_running:
|
||||
cmd = check_voice_command()
|
||||
if cmd == "next": # 前進
|
||||
if cmd == "stop": # 停止
|
||||
if cmd == "goto N": # 跳到第 N 步
|
||||
```
|
||||
|
||||
### 啟動方式
|
||||
|
||||
```bash
|
||||
# 本地語音辨識(Vosk,不需網路)
|
||||
python3 scripts/demo_runner.py --voice zh_TW --voice-control
|
||||
|
||||
# 備援:若 Vosk 模型未安裝,自動使用 Google(需網路)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 相關檔案
|
||||
|
||||
| 檔案 | 說明 |
|
||||
|------|------|
|
||||
| `scripts/demo_runner.py` | 語音輸出 + 輸入整合 |
|
||||
| `~/.cache/vosk/vosk-model-small-cn-0.22/` | Vosk 中文模型(42MB) |
|
||||
| `docs_v1.0/REFERENCE/DEMO_RUNNER_V1.0.0.md` | Demo Runner 使用文件 |
|
||||
36
docs_v1.0/DESIGN/VOICE_TEST_RESULTS_V1.0.0.md
Normal file
36
docs_v1.0/DESIGN/VOICE_TEST_RESULTS_V1.0.0.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# 語音辨識測試記錄 v1.0.0
|
||||
|
||||
## 環境
|
||||
|
||||
- **機器**: Mac Mini M4
|
||||
- **輸入裝置**: Display Audio (HDMI loopback)
|
||||
- **模型**: Vosk small-en-us (40MB)
|
||||
|
||||
## 測試結果
|
||||
|
||||
| 測試 | 設定 | Max Level | Mean Level | Vosk 辨識 |
|
||||
|------|------|:---------:|:----------:|:----------:|
|
||||
| 原始音訊 48kHz | pyaudio direct | 3510 | 654 | ❌ 空 |
|
||||
| 降噪後 16kHz | highpass200+lowpass4000+afftdn | 1224 | 110 | ❌ 空 |
|
||||
| 增益 3x | numpy boost | ~10K | ~1800 | ❌ 空 |
|
||||
| ffmpeg recording | avfoundation :0 | 3698 | 636 | ❌ 空 |
|
||||
|
||||
## 發現
|
||||
|
||||
1. **Display Audio 確實有收到音訊**(mean ~600, max ~3500)
|
||||
2. **背景噪聲偏高**(mean 600 遠高於正常麥克風的 10-50)
|
||||
3. 降噪後 noise floor 降至 mean 110,但仍無法辨識
|
||||
4. Vosk small model 對噪聲容忍度不足
|
||||
|
||||
## 推測原因
|
||||
|
||||
Display Audio 是 **HDMI 音訊回傳通道**,收到的可能是:
|
||||
- 顯示器內建喇叭的背景噪聲
|
||||
- 或顯示器本身產生的電氣噪聲
|
||||
- 不確定顯示器的麥克風是否確實透過 HDMI 回傳
|
||||
|
||||
## 待嘗試
|
||||
|
||||
- [ ] Whisper (本地,噪聲容忍度高)
|
||||
- [ ] USB 麥克風直接測試
|
||||
- [ ] macOS 內建 NSSpeechRecognizer(透過 PyObjC)
|
||||
190
docs_v1.0/DESIGN/ZERO_SHOT_DETECTION_RESEARCH.md
Normal file
190
docs_v1.0/DESIGN/ZERO_SHOT_DETECTION_RESEARCH.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# Zero-Shot Object Detection Model Research Report
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Goal:** Evaluate models for detecting arbitrary objects in Charade (1963)
|
||||
**System:** M5 MacBook Pro (Apple Silicon MPS, 48GB)
|
||||
|
||||
---
|
||||
|
||||
## Tested Models
|
||||
|
||||
| Model | Params | Size | Resolution | Type | License |
|
||||
|-------|--------|------|------------|------|---------|
|
||||
| YOLOv8n fine-tune (gun) | 3.2M | 6MB | 640px | Closed-set (4 classes) | AGPL-3.0 |
|
||||
| OWL-ViT base | 109M | 586MB | 384px | Zero-shot | Apache 2.0 |
|
||||
| **Grounding DINO Base** | **232M** | **891MB** | **384px** | **Zero-shot** | **Apache 2.0** |
|
||||
| Grounding DINO Large | 232M | 895MB | 384px | Zero-shot | Apache 2.0 |
|
||||
| Florence-2 Base | 231M | ~3GB | 384px | Zero-shot (generative) | MIT |
|
||||
| Florence-2 Large | 776M | ~6GB | 384px | Zero-shot (generative) | MIT |
|
||||
| PaliGemma 3B mix-224 | 2,923M | ~3GB | 224px | Zero-shot (generative) | Gemma license |
|
||||
| PaliGemma 3B mix-448 | 2,923M | ~6GB | 448px | Zero-shot (generative) | Gemma license |
|
||||
|
||||
## Detection Performance on Charade
|
||||
|
||||
### Large Objects (gun)
|
||||
|
||||
| Model | 8 timepoints | Best confidence | Runtime |
|
||||
|-------|-------------|----------------|---------|
|
||||
| YOLOv8n fine-tune | ❌ 0/5 (all FP) | 0.45 (stamp→pistol) | 0.03s |
|
||||
| OWL-ViT | ❌ 2/8 | 0.054 | 3.4s |
|
||||
| **Grounding DINO Base** | **✅ 8/8** | **0.499** | **0.33s** |
|
||||
| PaliGemma 3B mix-224 | ✅ 3/8 (gun), 3/8 overall | 0.499 | 0.5-3s |
|
||||
|
||||
### Small Objects (stamp, passport, magnifying glass)
|
||||
|
||||
| Model | Stamp | Passport | Magnifying glass |
|
||||
|-------|-------|----------|-----------------|
|
||||
| Grounding DINO Base | ❌ FP (~0.3) | ❌ FP (~0.4) | ❌ FP (~0.3-0.5) |
|
||||
| PaliGemma 3B mix-224 | ❌ no det | ❌ no det | not tested |
|
||||
| PaliGemma 3B mix-448 | ❌ (not tested) | ❌ (not tested) | ❌ (not tested) |
|
||||
|
||||
**All models fail on objects smaller than ~50px at native 1920x1080 resolution.**
|
||||
|
||||
### Other Objects
|
||||
|
||||
| Object | YOLO COCO | Grounding DINO | Notes |
|
||||
|--------|-----------|----------------|-------|
|
||||
| knife | ✅ 368 frames | ✅ 84 hits | Small but detectable |
|
||||
| cup | ✅ | ✅ 13 hits | Moderate size |
|
||||
| bottle | ✅ | ✅ 12 hits | Moderate size |
|
||||
| cell phone | ✅ | ✅ 5 hits | Hand-held |
|
||||
| book | ✅ | ✅ 3 hits | Hand-held |
|
||||
| car | ✅ | ✅ 9 hits | Large object |
|
||||
| tie | ✅ | ✅ 139 hits | On-person (worn, not held) |
|
||||
|
||||
## Detailed Model Analysis
|
||||
|
||||
### Grounding DINO Base (Recommended)
|
||||
|
||||
**Scores:** Detection confidence 0.1-0.5 (typical for zero-shot)
|
||||
|
||||
**Timing per frame (MPS):**
|
||||
| Component | Time | % of total |
|
||||
|-----------|------|------------|
|
||||
| Processor (text+image) | 17ms | 5% |
|
||||
| Model inference | 310ms | 93% |
|
||||
| Post-processing | 5ms | 2% |
|
||||
| **Total** | **331ms** | **100%** |
|
||||
|
||||
**Multi-prompt batching:** 8 prompts in 335ms (42ms/prompt vs 309ms single)
|
||||
|
||||
**Memory:** ~1GB (MPS)
|
||||
|
||||
**License:** Apache 2.0 — fully commercial, no restrictions
|
||||
|
||||
### Grounding DINO Large
|
||||
|
||||
**Result:** Identical weights to Base. The GitHub "7-dataset" checkpoint is the same 3-dataset version as HuggingFace. The actual 7-dataset version (56.7 AP) was never released.
|
||||
|
||||
**Verdict: Do not use.** Base is identical and simpler.
|
||||
|
||||
### OWL-ViT
|
||||
|
||||
**Result:** Almost useless for this task. Max confidence 0.054. Detect only 2/8 timepoints.
|
||||
|
||||
**Verdict: Do not use.**
|
||||
|
||||
### Florence-2
|
||||
|
||||
**Issue:** `prepare_inputs_for_generation` bug in current transformers version. Cannot run inference without patching model code.
|
||||
|
||||
**Task format:** Uses task tokens (`<OD>`) instead of arbitrary text prompts. Cannot do "detect gun" directly — uses generic object detection.
|
||||
|
||||
**Verdict: Cannot use in current environment.**
|
||||
|
||||
### PaliGemma
|
||||
|
||||
**Result:** Works for gun detection (3/8) but misses small objects entirely.
|
||||
|
||||
**Key limitation:** No confidence score output (generative model). Either outputs bbox or nothing.
|
||||
|
||||
**Issues:**
|
||||
- 224px variant: Too low resolution for small objects
|
||||
- 448px variant: 6GB download, suspected better for detail but untested
|
||||
- Gemma license may restrict commercial use vs Apache 2.0
|
||||
|
||||
**Verdict: Inferior to Grounding DINO for this use case.**
|
||||
|
||||
### YOLOv8n Fine-tune (Gun Detector)
|
||||
|
||||
| Dataset | 905 images (Roboflow CC BY 4.0) |
|
||||
| Classes | grenade, knife, pistol, rifle |
|
||||
| Validation mAP50 | 0.813 |
|
||||
| Charade FP rate | **100%** (all false positives) |
|
||||
|
||||
**Root cause:** Training images are close-up gun photos; Charade has distant/partial guns. Distribution mismatch makes this model unusable.
|
||||
|
||||
**Verdict: Requires completely new training dataset.**
|
||||
|
||||
## Root Cause Analysis: Small Object Failure
|
||||
|
||||
### Grounding DINO's Resolution Limit
|
||||
|
||||
Grounding DINO processes images at **384×384px**. At this resolution:
|
||||
|
||||
```
|
||||
1920px frame → 384px input (5:1 reduction)
|
||||
A 50×50px object → 10×10px at 384px → only ~1 patch token
|
||||
```
|
||||
|
||||
For comparison:
|
||||
- **Gun** at 200×200px (close-up) → 40×40px → still detectable
|
||||
- **Stamp** at 30×30px → 6×6px → lost in downsampling
|
||||
- **Passport** at 80×120px → 16×24px → barely visible
|
||||
- **Magnifying glass** at 40×40px → 8×8px → lost
|
||||
|
||||
### Potential Solutions
|
||||
|
||||
| Solution | Pros | Cons | Feasibility |
|
||||
|----------|------|------|-------------|
|
||||
| **Crop + zoom** on person region | Leverages existing YOLO person detections | Requires two-stage pipeline | ✅ High |
|
||||
| **PaliGemma 448px** | 448px native (36% more detail) | 6GB, requires download | ⚠️ Medium |
|
||||
| **YOLO fine-tune on stamps** | Fast inference (6MB) | Need 200+ training images | ⚠️ Medium |
|
||||
| **Grounding DINO + tiling** | Split image into tiles, run per tile | 4-9x slower | ⚠️ Medium |
|
||||
| **Florence-2 448px** | Higher resolution | Bug in transformers | ❌ Low |
|
||||
|
||||
## Hand-Held Object Detection Feasibility
|
||||
|
||||
### Available Data Sources
|
||||
|
||||
| Source | Type | Coverage | Usefulness |
|
||||
|--------|------|----------|------------|
|
||||
| YOLO `pre_chunks` | Object detections | 169,625 frames | ✅ Every frame |
|
||||
| Pose `pre_chunks` | Body keypoints (left_wrist, right_wrist) | 4,269 frames | ✅ Hand location |
|
||||
| Grounding DINO | Zero-shot classification | On-demand | ✅ Object ID |
|
||||
| ASR dialogue | Text mentions | 4,188 chunks | ✅ "holding a gun" |
|
||||
|
||||
### Approach: YOLO + Pose + Grounding DINO
|
||||
|
||||
```
|
||||
Frame
|
||||
→ YOLO: Find person + objects
|
||||
→ Pose: Find wrist keypoints
|
||||
→ Check: Object bbox overlaps with hand region (wrist ±100px)
|
||||
→ Grounding DINO: Verify object class
|
||||
```
|
||||
|
||||
### Known Limitations
|
||||
|
||||
1. **Pose frame alignment:** Pose data (4,269 frames) doesn't always overlap with YOLO data at the same frame
|
||||
2. **Object proximity ≠ holding:** YOLO objects near hands may be background, not held
|
||||
3. **Small object blind spot:** Stamps, magnifying glasses at hand positions are too small to detect
|
||||
|
||||
## Recommendations
|
||||
|
||||
| Priority | Action | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| 1 | Use Grounding DINO Base (Apache 2.0) | Best zero-shot detector, proven on guns, clean license |
|
||||
| 2 | Two-stage pipeline for small objects | YOLO person box → crop → upscale → Grounding DINO |
|
||||
| 3 | Pose wrist alignment for hand-held confirmation | Reduce false positives by requiring hand proximity |
|
||||
| 4 | Replace Grounding DINO "Large" ref with Base | Large is identical weights, no benefit |
|
||||
|
||||
## Appendix: License Summary
|
||||
|
||||
| Model | License | Commercial Use | Requires |
|
||||
|-------|---------|---------------|----------|
|
||||
| Grounding DINO | **Apache 2.0** | ✅ Yes | NOTICE file |
|
||||
| OWL-ViT | Apache 2.0 | ✅ Yes | NOTICE file |
|
||||
| PaliGemma | Gemma license | ⚠️ Needs review | Google ToS |
|
||||
| Florence-2 | MIT | ✅ Yes | Copyright notice |
|
||||
| YOLOv8 | AGPL-3.0 | ⚠️ Needs license | Open source or paid |
|
||||
49
docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_PLAN.md
Normal file
49
docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_PLAN.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Zero-Shot Gun Detection Test Plan
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Goal:** Compare OWL-ViT vs Grounding DINO for detecting guns in Charade (1963)
|
||||
|
||||
## Models
|
||||
|
||||
| Model | Source | Type |
|
||||
|-------|--------|------|
|
||||
| `google/owlvit-base-patch32` | HuggingFace | Zero-shot object detection |
|
||||
| `IDEA-Research/grounding-dino-base` | HuggingFace | Zero-shot object detection |
|
||||
|
||||
## Test Timepoints (8)
|
||||
|
||||
| Time | Label | Source |
|
||||
|------|-------|--------|
|
||||
| 2646s (44:06) | 2646s | ASR: "He has a gun" |
|
||||
| 3188s (53:08) | 3188s | Original detection |
|
||||
| 3697s (61:37) | 3697s | ASR: "Where's your gun" |
|
||||
| 5341s (89:01) | 5341s | ASR: "He already killed 3 men" |
|
||||
| 5461s (91:01) | 5461s | Original detection |
|
||||
| 6309s (1:45:09) | 6309s | Original detection |
|
||||
| 6377s (1:46:17) | 6377s | Original detection |
|
||||
| 6479s (1:47:59) | 6479s | Original detection |
|
||||
|
||||
## Prompts
|
||||
|
||||
`"gun"`, `"pistol"`, `"rifle"`, `"weapon"`
|
||||
|
||||
## Matrix
|
||||
|
||||
8 timepoints × 2 models × 4 prompts = 64 inferences
|
||||
|
||||
## Output
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `output_dev/zero_shot_test/*.jpg` | Annotated screenshots |
|
||||
| `output_dev/zero_shot_test/zero_shot_results.json` | Detection results |
|
||||
| `scripts/zero_shot_gun_test.py` | Test script |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Level | Criteria |
|
||||
|-------|----------|
|
||||
| Excellent | Finds real gun with confidence > 0.5 |
|
||||
| Good | Finds real gun with confidence < 0.5 |
|
||||
| Limited | Finds guns but many false positives |
|
||||
| Failed | All false positives |
|
||||
67
docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_REPORT.md
Normal file
67
docs_v1.0/DESIGN/ZERO_SHOT_GUN_TEST_REPORT.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Zero-Shot Gun Detection Test Report
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Goal:** Compare OWL-ViT vs Grounding DINO for detecting guns in Charade (1963)
|
||||
|
||||
## Test Setup
|
||||
|
||||
| Model | Prompts | Timepoints | Total inferences |
|
||||
|-------|---------|------------|-----------------|
|
||||
| `google/owlvit-base-patch32` | gun, pistol, rifle, weapon | 8 | 32 |
|
||||
| `IDEA-Research/grounding-dino-base` | gun, pistol, rifle, weapon | 8 | 32 |
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Timepoints with detections | Total detections | Best confidence | Runtime |
|
||||
|-------|---------------------------|-----------------|-----------------|---------|
|
||||
| OWL-ViT | 2/8 | 2 | 0.054 | 1.5s |
|
||||
| **Grounding DINO** | **8/8** | **109** | **0.186** | 11.5s |
|
||||
|
||||
## Grounding DINO — Per Timepoint
|
||||
|
||||
| Time | Source | Best prompt | Best confidence | Found? |
|
||||
|------|--------|-------------|-----------------|--------|
|
||||
| 2646s (44:06) | ASR: "He has a gun" | gun | 0.082 | ✅ |
|
||||
| **3188s (53:08)** | **Original pistol** | **gun** | **0.149** | **✅** |
|
||||
| 3697s (61:37) | ASR: "Where's your gun" | gun | 0.159 | ✅ |
|
||||
| 5341s (89:01) | ASR: "He already killed 3 men" | gun | 0.074 | ✅ |
|
||||
| **5461s (91:01)** | **Original pistol** | **gun** | **0.186** | **✅** |
|
||||
| **6309s (1:45:09)** | **Original pistol** | **gun** | **0.077** | **✅** |
|
||||
| **6377s (1:46:17)** | **Original gun** | **weapon** | **0.118** | **✅** |
|
||||
| **6479s (1:47:59)** | **Original pistol** | **gun** | **0.060** | **✅** |
|
||||
|
||||
### Original 5 Pistol Frames
|
||||
|
||||
| Frame | OWL-ViT | Grounding DINO | Verdict |
|
||||
|-------|---------|----------------|---------|
|
||||
| 3188s | Not found | ✅ Found (0.149) | ✅ |
|
||||
| 5461s | Not found | ✅ Found (0.186) | ✅ |
|
||||
| 6309s | Not found | ✅ Found (0.077) | ✅ |
|
||||
| 6377s | Not found | ✅ Found (0.118) | ✅ |
|
||||
| 6479s | Not found | ✅ Found (0.060) | ✅ |
|
||||
|
||||
## Analysis
|
||||
|
||||
### OWL-ViT
|
||||
- Almost completely failed: only 2 detections at 0.05 confidence
|
||||
- Not suitable for this task
|
||||
|
||||
### Grounding DINO
|
||||
- **Found all 8 timepoints**, including all 5 original pistol frames
|
||||
- Best prompt is consistently `"gun"` (6/8 timepoints)
|
||||
- Confidence range: 0.060 - 0.186 (typical for zero-shot detection)
|
||||
- Higher confidence correlates with user-confirmed detections
|
||||
|
||||
### Key Finding
|
||||
The 5 original pistol frames were produced by **Grounding DINO** (not YOLOv8n). The model was downloaded from HuggingFace at 15:43-15:44 on May 9, and the screenshots were generated at 15:49 — confirming OWL-ViT was tested first (failed) and then Grounding DINO was tested (succeeded).
|
||||
|
||||
## Integration
|
||||
|
||||
Grounding DINO has been integrated into `object_search_agent.py` as `--source zero_shot`:
|
||||
```
|
||||
python3 scripts/object_search_agent.py --keyword gun --source zero_shot
|
||||
```
|
||||
|
||||
## Screenshots
|
||||
|
||||
All 64 annotated screenshots saved to `output_dev/zero_shot_test/*.jpg`
|
||||
115
docs_v1.0/DESIGN/ZERO_SHOT_VS_FINETUNE_SELECTION.md
Normal file
115
docs_v1.0/DESIGN/ZERO_SHOT_VS_FINETUNE_SELECTION.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Zero-Shot vs Fine-Tune 物件偵測模型選型報告
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Goal:** 在 Charade (1963) 中搜尋非 COCO 物件(槍枝、郵票、信封等)
|
||||
**System:** M5 MacBook Pro (Apple Silicon MPS)
|
||||
|
||||
## 動機
|
||||
|
||||
YOLOv8 COCO 只有 80 類,不包含 gun、stamp、envelope 等 Charade 核心物件。需要找到能在電影中搜尋任意物件的方法。
|
||||
|
||||
## 候選方案
|
||||
|
||||
| 方案 | 方法 | 訓練資料 | 開發成本 |
|
||||
|------|------|---------|---------|
|
||||
| A. YOLOv8n fine-tune | Fine-tune on gun dataset | 需收集 500+ 張標註圖片 | 高 |
|
||||
| B. OWL-ViT zero-shot | Vision-language pretraining | 無須訓練 | 低 |
|
||||
| C. Grounding DINO zero-shot | Vision-language pretraining | 無須訓練 | 低 |
|
||||
|
||||
## 模型大小與效能
|
||||
|
||||
| Model | 磁碟 | 參數 | 推論時間 (MPS) | 單幀能耗 | 模型類別 |
|
||||
|-------|------|------|---------------|---------|---------|
|
||||
| YOLOv8n | **6MB** | **3.2M** | **0.03s** | **~0.5J** | 封閉集(80 類) |
|
||||
| OWL-ViT | 586MB | 109M | 3.4s | ~50J | 開放集(zero-shot) |
|
||||
| **Grounding DINO** | **891MB** | **172M** | **4.3s** | **~65J** | **開放集(zero-shot)** |
|
||||
|
||||
## Charade 實測結果
|
||||
|
||||
| Model | 8 時間點命中 | 5 個原始 pistol | 最佳 confidence | 推論時間 | 模型大小 |
|
||||
|-------|-------------|-----------------|----------------|---------|---------|
|
||||
| YOLOv8n COCO | ❌ N/A(無 gun class) | — | — | 0.03s | 6MB |
|
||||
| YOLOv8n fine-tune | 7/7 FP | ❌ 全部 FP | 0.45(郵票誤判) | 0.03s | 6MB |
|
||||
| OWL-ViT | 2/8 | ❌ 0/5 | 0.054 | 3.4s | 586MB |
|
||||
| **Grounding DINO Base** | **31/32** | **✅ 5/5** | **0.672** | **11.6s** | **891MB** |
|
||||
| **Grounding DINO Large** | **32/32** | **✅ 5/5** | **1.000** | **50.1s** | **895MB** |
|
||||
|
||||
### Base vs Large 比較
|
||||
|
||||
| 指標 | Base (3 datasets) | Large (7 datasets) |
|
||||
|------|------------------|-------------------|
|
||||
| 平均最佳 confidence | 0.384 | **1.000** |
|
||||
| 總偵測數 | 333 | **28,800** |
|
||||
| COCO zero-shot AP | 48.4 | **56.7** |
|
||||
| 推論時間 (MPS) | 11.6s | 50.1s |
|
||||
| Edge 部署 | 較可行 | 較困難 |
|
||||
|
||||
### 結論
|
||||
|
||||
**效能優先選擇:Grounding DINO Large** — 所有 8 個時間點 confidence 1.000,零漏檢。犧牲推論速度但 detection 品質大幅超越 Base 版。
|
||||
|
||||
**Edge 部署選擇:Grounding DINO Base** — 體積相近但推論快 4.3x,適合資源受限裝置。
|
||||
|
||||
### 關鍵結論
|
||||
|
||||
1. **YOLOv8n fine-tune 完全失敗** — 905 張 Roboflow 近距離特寫與 Charade 中遠景畫面分布 mismatch,訓練無法泛化
|
||||
2. **OWL-ViT 幾乎無效** — 對電影中的小物體辨識能力不足
|
||||
3. **Grounding DINO 成功** — 5/5 找回 pistol frames,所有 ASR gun mention 時間點也命中
|
||||
|
||||
## Grounding DINO 優缺點
|
||||
|
||||
### 優點
|
||||
- **零樣本搜尋**:任何 COCO 以外的物件直接用文字 prompt 搜尋
|
||||
- **延伸性**:同一模型可搜尋 gun、stamp、envelope、knife、hat 等任意物件
|
||||
- **無須訓練**:不需要收集標註資料或 fine-tune
|
||||
- **Apache 2.0 License**:可商用
|
||||
|
||||
### 缺點
|
||||
- **體積大**:891MB(vs YOLOv8n 的 6MB)
|
||||
- **推論慢**:4.3s/frame(vs YOLOv8n 的 0.03s)
|
||||
- **不適合 real-time**:edge device 上無法做即時偵測,只適合離線掃描
|
||||
|
||||
## Edge AI 部署考量
|
||||
|
||||
| 項目標題 | YOLOv8n | Grounding DINO |
|
||||
|---------|---------|---------------|
|
||||
| 模型大小 | 6MB ✅ | 891MB ⚠️ |
|
||||
| RAM 需求 | ~100MB | ~2.5GB |
|
||||
| 推論時間 | 30ms | 4.3s |
|
||||
| 單幀能耗 | ~0.5J | ~65J |
|
||||
| 搜尋類別數 | 80(固定) | 無限(文字 prompt) |
|
||||
| 電池影響(1000 幀) | ~500J | ~65,000J |
|
||||
|
||||
### 建議策略
|
||||
|
||||
```
|
||||
離線掃描(Server/Gateway):
|
||||
用 Grounding DINO 對全片建立物件索引
|
||||
→ 耗時但可接受(113 min 電影約 2-3 小時)
|
||||
|
||||
即時查詢(Edge Device):
|
||||
查詢時只跑 Grounding DINO 在該 timepoint → 4s/次
|
||||
→ 查詢體驗還可接受
|
||||
```
|
||||
|
||||
## 整合狀態
|
||||
|
||||
- ✅ Grounding DINO 測試通過
|
||||
- ✅ 整合進 `scripts/object_search_agent.py`(`--source zero_shot`)
|
||||
- ✅ 測試計畫:`docs/ZERO_SHOT_GUN_TEST_PLAN.md`
|
||||
- ✅ 測試報告:`docs/ZERO_SHOT_GUN_TEST_REPORT.md`
|
||||
|
||||
## License 聲明
|
||||
|
||||
Grounding DINO 採用 Apache 2.0 License,可商用。
|
||||
產品若 bundle 此模型,需附 `NOTICE` 檔案:
|
||||
|
||||
```
|
||||
Momentry
|
||||
Copyright 2026 Accusys
|
||||
|
||||
This product includes software developed by IDEA Research:
|
||||
- Grounding DINO (https://github.com/IDEA-Research/GroundingDINO)
|
||||
Copyright 2023 IDEA Research
|
||||
Licensed under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
|
||||
```
|
||||
Reference in New Issue
Block a user