feat: Phase 2.6 edges migration to Qdrant (TKG-only architecture)

Phase 2.6.1: co_occurrence_edges migration - build_co_occurrence_edges_from_qdrant() - Qdrant embeddings → frame grouping → YOLO objects - Result: 6679 edges (vs 6701 PostgreSQL) Phase 2.6.2: face_face_edges migration - build_face_face_edges_from_qdrant() - Qdrant embeddings → frame grouping → face pairs - mutual_gaze detection preserved - Result: 6 edges (exact match) Phase 2.6.3: speaker_face_edges migration - build_speaker_face_edges_from_qdrant() - Qdrant embeddings → trace_id frame ranges - SPEAKS_AS edge creation Architecture: - All edges use Qdrant payload (no face_detections queries) - PostgreSQL fallback for empty Qdrant - Estimated 3.6x performance improvement Testing: - Playground (3003): ✓ All Phase 2.6 logs verified - Edge counts: ✓ Close match with PostgreSQL - Fallback: ✓ Working Docs: - docs_v1.0/DESIGN/TKG_PHASE2_6_EDGES_MIGRATION.md - docs_v1.0/M4_workspace/2026-06-21_phase2_6_test.md
2026-06-21 04:47:49 +08:00
parent 0afc70fc5b
commit 2cfcfdd1af
2926 changed files with 8311058 additions and 1394 deletions
--- a/docs_v1.0/DESIGN/PER_FILE_VOICE_COLLECTION_V1.0.md
+++ b/docs_v1.0/DESIGN/PER_FILE_VOICE_COLLECTION_V1.0.md
@@ -0,0 +1,143 @@
+---
+title: Per-File Voice Collection V1.0
+version: 1.0
+date: 2026-06-20
+author: OpenCode
+status: approved
+---
+
+# Per-File Voice Collection V1.0
+
+| Scope | Status | Applicable to | Binary |
+|-------|--------|---------------|--------|
+| Qdrant voice collection naming, storage, lifecycle | Approved | `momentry_playground`, `momentry` | Both |
+
+## Problem Statement
+
+ASRX processor stores speaker voice embeddings (192-dim ECAPA-TDNN) in Qdrant for speaker diarization and future identity matching. The current design uses a single global collection `{prefix}_voice` for all files, creating several issues:
+
+1. **No isolation**: All files' voice embeddings share one collection, making per-file cleanup error-prone
+2. **Unnecessary migration**: Workspace `_workspace_voice` → production `_voice` migration during checkin adds complexity with no benefit for per-file processing artifacts
+3. **No event type distinction**: No payload field to distinguish speaker embeddings from future audio event types (gunshots, screams, music, etc.)
+4. **Cross-file matching is impractical**: Current point ID includes file_uuid, but querying across files requires filtering rather than direct collection access
+
+## Design
+
+### Collection Naming: Per-File
+
+```
+{file_uuid}_voice
+```
+
+Examples:
+- `d3f9ae8e471a1fc4d47022c66091b920_voice`
+- `92ed12dbb7fbea5e6ddfe668e1f31444_voice`
+
+### Collection Schema
+
+| Property | Value |
+|----------|-------|
+| Name | `{file_uuid}_voice` |
+| Vector dimension | 192 |
+| Distance metric | Cosine |
+| On-disk | false (default, in-memory for fast search during processing) |
+
+### Point Schema
+
+**Point ID**: `SHA256(speaker_id + "_" + segment_index)` → first 8 bytes as u64
+- No file_uuid in hash (redundant, collection is per-file)
+
+**Payload**:
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `speaker_id` | String | Speaker label from ASRX | `"SPEAKER_00"` |
+| `segment_index` | Integer | Segment index within ASRX result | `5` |
+| `start_frame` | Integer | Start frame number | `120` |
+| `end_frame` | Integer | End frame number | `240` |
+| `start_time` | Float | Start time in seconds | `4.0` |
+| `end_time` | Float | End time in seconds | `8.0` |
+| `event_type` | String | Type of audio event | `"speaker"` |
+
+### Event Type Extensibility
+
+The `event_type` field reserves space for future audio recognition:
+
+| event_type | Description | Future Model | Dim |
+|------------|-------------|--------------|-----|
+| `"speaker"` | Speaker voice embedding (current) | ECAPA-TDNN | 192 |
+| `"gunshot"` | Gunshot detection embedding | YAMNet / custom | TBD |
+| `"scream"` | Scream/shout detection | YAMNet / custom | TBD |
+| `"music"` | Music segment embedding | CLMR / custom | TBD |
+
+Each event type with a different dimension would use a separate per-file collection (`{file_uuid}_gunshot`, etc.).
+
+### Lifecycle
+
+```
+Processing:
+  ASRX completes → store_voice_embeddings_to_qdrant()
+                    → ensure_collection("{file_uuid}_voice", 192)
+                    → upsert_vector per segment
+
+Checkin:
+  No voice migration needed (data already in per-file collection)
+
+Checkout / File Deletion:
+  Delete collection "{file_uuid}_voice" (or delete by filter)
+
+Cross-File Matching (future):
+  Job scans all "*_voice" collections, or maintains {prefix}_speaker_profiles index
+```
+
+### Changes from Current Design
+
+| Aspect | Current | New |
+|--------|---------|-----|
+| Collection name | `{prefix}_voice` | `{file_uuid}_voice` |
+| Point ID hash input | `file_uuid + speaker_id + index` | `speaker_id + index` |
+| Workspace dual-write | `_workspace_voice` → `_voice` migration | Removed (no migration needed) |
+| Payload event_type | Not present | `"speaker"` |
+| Checkin voice migration | Scroll + upsert | Nothing (data already isolated) |
+| Checkout voice deletion | Filter by file_uuid from `{prefix}_voice` | Delete collection or filter |
+| QdrantWorkspace voice methods | `voice_collection()`, `upsert_voice_embedding()` | Removed |
+
+### Files Affected
+
+| File | Change |
+|------|--------|
+| `src/worker/processor.rs:1291-1360` | `store_voice_embeddings_to_qdrant()` — per-file collection, event_type payload |
+| `src/worker/processor.rs:919-942` | Remove workspace voice dual-write |
+| `src/core/checkin.rs:208-242` | Remove voice migration block |
+| `src/core/checkin.rs:358-379` | Update checkout voice deletion to target `{file_uuid}_voice` |
+| `src/core/db/qdrant_workspace.rs` | Remove `voice_collection()`, `upsert_voice_embedding()`, voice from `ensure_all()`, `scroll_by_file_uuid()`, `WorkspaceScrollResult`, `delete_by_file_uuid()` |
+
+### Cross-File Matching (Future Design)
+
+For future multi-file speaker matching, a separate index collection can be maintained:
+
+```
+{prefix}_speaker_profiles (192-dim Cosine)
+  - payload: speaker_id (global), source_file_uuids[], reference_count, centroid_embedding
+```
+
+This index would be updated:
+1. During a periodic batch job that scans all `*_voice` collections
+2. Or incrementally when new voice data is added
+
+The per-file collection design makes this cleaner because:
+- Source data is cleanly partitioned
+- The index is explicitly a derived/cached structure
+- Index rebuild means rescraping `*_voice` collections, not untangling a global collection
+
+## Migration
+
+Existing voice data in `{prefix}_voice` and `{prefix}_workspace_voice` can be left as-is for backward compatibility. New processing will write to `{file_uuid}_voice`. Old data in `{prefix}_voice` will remain queryable if needed.
+
+No data migration script is required — old data is read-only legacy.
+
+## Version History
+
+| Version | Date | Author | Change |
+|---------|------|--------|--------|
+| 1.0 | 2026-06-20 | OpenCode | Initial design |
--- a/docs_v1.0/DESIGN/Processor_Module_V1.0.md
+++ b/docs_v1.0/DESIGN/Processor_Module_V1.0.md
@@ -0,0 +1,758 @@
+# Processor Module V1.0
+
+**Date**: 2026-06-19
+**Version**: 1.0.0
+**Status**: Draft
+
+---
+
+## 1. 架構總覽
+
+### 1.1 PythonExecutor 統一執行框架
+
+所有 processor 透過 `PythonExecutor` 執行 Python 腳本，提供：
+- SHA256 checksum 驗證 (從 `checksums.sha256` 讀取)
+- Retry 機制 (exponential backoff: 1s → 2s → 4s → ...)
+- Timeout 管理 (各 processor 獨立設定)
+- stdout/stderr 即時處理 (tracing::info/warn/error)
+
+### 1.2 雙軌設計
+
+| 型別 | 特性 | Processor |
+|------|------|-----------|
+| **Frame-based** | 逐幀處理，輸出 per-frame 資料 | yolo, ocr, face, pose, mediapipe, appearance |
+| **Time-based** | 分析全域/時間序列，輸出事件列表 | cut, asrx, scene, story, 5w1h |
+
+### 1.3 8Hz 統一採樣 (新增)
+
+所有 Frame-based processor 共用同一份 8Hz 幀清單：
+
+```
+影片 FPS: ~30
+Sample Interval: round(fps / 8) = 4
+Sample Frames: 0, 4, 8, 12, 16, ...
+```
+
+---
+
+## 2. Processor 規格總表
+
+| # | 名稱 | 型別 | Python 腳本 | 輸出檔案 | 依賴 | GPU | 模型 | CPU | 記憶體 | Timeout |
+|---|------|------|-------------|----------|------|-----|------|-----|--------|---------|
+| 1 | cut | Time | `cut_processor.py` | `.cut.json` | — | ❌ | PySceneDetect | 0.5 | 512MB | 3600s |
+| 2 | asrx | Time | `asrx_processor.py` | `.asrx.json` | cut | ❌ | speechbrain | 0.8 | 2048MB | 7200s |
+| 3 | yolo | Frame | `yolo_processor.py` | `.yolo.json` | — | ✅ | yolov8n | 0.3 | 1024MB | 7200s |
+| 4 | ocr | Frame | `ocr_processor.py` | `.ocr.json` | — | ❌ | paddleocr | 0.8 | 1024MB | 7200s |
+| 5 | face | Frame | `face_processor.py` | `.face.json` | — | ✅ | insightface/buffalo_l | 0.6 | 1536MB | 7200s |
+| 6 | pose | Frame | `pose_processor.py` | `.pose.json` | — | ✅ | mediapipe/pose | 0.4 | 1024MB | 7200s |
+| 7 | mediapipe | Frame | `mediapipe_holistic_processor.py` | `.mediapipe.json` | — | ❌ | mediapipe/holistic | 0.3 | 1024MB | 7200s |
+| 8 | appearance | Frame | `appearance_processor.py` | `.appearance.json` | pose | ❌ | HSV | 0.3 | 512MB | 7200s |
+| 9 | scene | Time | `scene_classifier.py` | `.scene.json` | cut | ❌ | places365 | 0.3 | 512MB | 7200s |
+| 10 | story | Time | `story_processor.py` | `.story.json` | asrx+cut+yolo+face | ❌ | gemma4 | 0.1 | 256MB | 7200s |
+| 11 | 5w1h | Time | `parent_chunk_5w1h.py` | — | story | ❌ | gemma4 | 0.1 | 256MB | 7200s |
+
+---
+
+## 3. 各 Processor 詳細規格
+
+### 3.1 Cut — 場景切換偵測
+
+**型別**: Time-based
+**腳本**: `cut_processor.py`
+**模型**: PySceneDetect
+
+```rust
+pub struct CutResult {
+    pub frame_count: u64,
+    pub fps: f64,
+    pub scenes: Vec<CutScene>,
+}
+
+pub struct CutScene {
+    pub scene_number: u32,
+    pub start_frame: u64,
+    pub end_frame: u64,
+    pub start_time: f64,
+    pub end_time: f64,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "frame_count": 8951,
+  "fps": 29.97,
+  "scenes": [
+    {"scene_number": 1, "start_frame": 0, "end_frame": 150, "start_time": 0.0, "end_time": 5.0},
+    ...
+  ]
+}
+```
+
+---
+
+### 3.2 ASRX — 語音辨識 + Speaker Diarization
+
+**型別**: Time-based
+**腳本**: `asrx_processor.py`
+**模型**: speechbrain/ecapa-tdnn
+**依賴**: cut (需要場景邊界)
+
+```rust
+pub struct AsrxResult {
+    pub language: Option<String>,
+    pub segments: Vec<AsrxSegment>,
+    pub embeddings: Option<Vec<Vec<f32>>>,
+}
+
+pub struct AsrxSegment {
+    pub start_time: f64,
+    pub end_time: f64,
+    pub start_frame: u64,
+    pub end_frame: u64,
+    pub text: String,
+    pub speaker_id: Option<String>,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "language": "zh",
+  "segments": [
+    {
+      "start_time": 0.1,
+      "end_time": 2.0,
+      "start_frame": 3,
+      "end_frame": 60,
+      "text": "大家好",
+      "speaker_id": "SPEAKER_0"
+    },
+    ...
+  ]
+}
+```
+
+---
+
+### 3.3 YOLO — 物件偵測
+
+**型別**: Frame-based
+**腳本**: `yolo_processor.py`
+**模型**: yolov8n
+**GPU**: ✅
+**採樣**: 8Hz
+
+```rust
+pub struct YoloResult {
+    pub frame_count: u64,
+    pub fps: f64,
+    pub frames: Vec<YoloFrame>,
+}
+
+pub struct YoloFrame {
+    pub frame: u64,
+    pub timestamp: f64,
+    pub objects: Vec<YoloObject>,
+}
+
+pub struct YoloObject {
+    pub class_name: String,
+    pub class_id: u32,
+    pub x: i32,
+    pub y: i32,
+    pub width: i32,
+    pub height: i32,
+    pub confidence: f32,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "frame_count": 2238,
+  "fps": 29.97,
+  "frames": {
+    "0": {"detections": [{"class_name": "person", "class_id": 0, "x": 100, "y": 50, "width": 200, "height": 400, "confidence": 0.95}]},
+    "4": {"detections": [...]},
+    ...
+  }
+}
+```
+
+**可用類別** (43 種 COCO): person, bicycle, car, motorbike, chair, cup, cell phone, laptop, book, remote, tie, umbrella, baseball bat, ...
+
+---
+
+### 3.4 OCR — 文字辨識
+
+**型別**: Frame-based
+**腳本**: `ocr_processor.py`
+**模型**: paddleocr
+**採樣**: 8Hz
+
+```rust
+pub struct OcrResult {
+    pub frame_count: u64,
+    pub fps: f64,
+    pub frames: Vec<OcrFrame>,
+}
+
+pub struct OcrFrame {
+    pub frame: u64,
+    pub timestamp: f64,
+    pub texts: Vec<OcrText>,
+}
+
+pub struct OcrText {
+    pub text: String,
+    pub x: i32,
+    pub y: i32,
+    pub width: i32,
+    pub height: i32,
+    pub confidence: f32,
+}
+```
+
+---
+
+### 3.5 Face — 人臉偵測 + Embedding
+
+**型別**: Frame-based
+**腳本**: `face_processor.py`
+**模型**: insightface/buffalo_l
+**GPU**: ✅
+**採樣**: 8Hz
+
+```rust
+pub struct FaceResult {
+    pub frame_count: u64,
+    pub fps: f64,
+    pub frames: Vec<FaceFrame>,
+}
+
+pub struct FaceFrame {
+    pub frame: u64,
+    pub timestamp: f64,
+    pub faces: Vec<Face>,
+}
+
+pub struct Face {
+    pub face_id: Option<String>,
+    pub x: i32,
+    pub y: i32,
+    pub width: i32,
+    pub height: i32,
+    pub confidence: f32,
+    pub embedding: Option<Vec<f32>>,
+    pub landmarks: Option<serde_json::Value>,
+    pub attributes: Option<FaceAttributes>,
+}
+
+pub struct FaceAttributes {
+    pub age: Option<i32>,
+    pub gender: Option<String>,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "frame_count": 2238,
+  "fps": 29.97,
+  "frames": [
+    {
+      "frame": 0,
+      "timestamp": 0.0,
+      "faces": [{
+        "face_id": "face_0",
+        "x": 500, "y": 300, "width": 200, "height": 250,
+        "confidence": 0.98,
+        "embedding": [0.12, -0.34, ...],
+        "landmarks": {
+          "nose": [[x,y], ...],
+          "left_eye": [[x,y], ...],
+          "right_eye": [[x,y], ...]
+        },
+        "attributes": {"age": 35, "gender": "male"}
+      }]
+    }
+  ]
+}
+```
+
+**Landmarks**: nose (8pts) + left_eye (6pts) + right_eye (6pts) = 20 pts
+
+---
+
+### 3.6 Pose — 身體姿勢
+
+**型別**: Frame-based
+**腳本**: `pose_processor.py`
+**模型**: mediapipe/pose
+**GPU**: ✅
+**採樣**: 8Hz
+
+```rust
+pub struct PoseResult {
+    pub frame_count: u64,
+    pub fps: f64,
+    pub frames: Vec<PoseFrame>,
+}
+
+pub struct PoseFrame {
+    pub frame: u64,
+    pub timestamp: f64,
+    pub persons: Vec<PersonPose>,
+}
+
+pub struct PersonPose {
+    pub keypoints: Vec<Keypoint>,
+    pub bbox: Bbox,
+}
+
+pub struct Keypoint {
+    pub x: f64,
+    pub y: f64,
+    pub z: f64,
+    pub visibility: f64,
+}
+
+pub struct Bbox {
+    pub x: i32,
+    pub y: i32,
+    pub width: i32,
+    pub height: i32,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "frame_count": 2238,
+  "fps": 29.97,
+  "frames": [
+    {
+      "frame": 0,
+      "timestamp": 0.0,
+      "persons": [{
+        "keypoints": [
+          {"x": 0.5, "y": 0.3, "z": 0.1, "visibility": 0.95},
+          ...
+        ],
+        "bbox": {"x": 400, "y": 100, "width": 300, "height": 600}
+      }]
+    }
+  ]
+}
+```
+
+**Keypoints**: 33 個身體關節 (nose, shoulders, elbows, wrists, hips, knees, ankles, ...)
+
+**用途**: 提供 appearance_processor 的 bbox 來源，計算上下半身色彩 ROI
+
+---
+
+### 3.7 MediaPipe Holistic — 完整關鍵點
+
+**型別**: Frame-based
+**腳本**: `mediapipe_holistic_processor.py`
+**模型**: mediapipe/holistic
+**GPU**: ❌
+**採樣**: 8Hz
+
+```rust
+pub struct MediaPipeResult {
+    pub metadata: MediaPipeMetadata,
+    pub frames: HashMap<String, MediaPipeDictEntry>,
+}
+
+pub struct MediaPipeMetadata {
+    pub fps: f64,
+    pub total_frames: i64,
+    pub processed_frames: i64,
+    pub sample_interval: i64,
+    pub width: i64,
+    pub height: i64,
+    pub processor: String,
+}
+
+pub struct MediaPipeDictEntry {
+    pub frame: String,
+    pub timestamp: f64,
+    pub persons: Vec<MediaPipePerson>,
+}
+
+pub struct MediaPipePerson {
+    pub person_id: u64,
+    pub bbox: Option<MediaPipeBBox>,
+    pub face_mesh: Option<MediaPipeFaceMesh>,
+    pub pose: Option<MediaPipePose>,
+    pub hands: MediaPipeHands,
+}
+
+pub struct MediaPipeHands {
+    pub left: Option<MediaPipeHand>,
+    pub right: Option<MediaPipeHand>,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "metadata": {
+    "fps": 29.97,
+    "total_frames": 8951,
+    "processed_frames": 2238,
+    "sample_interval": 4,
+    "width": 1920,
+    "height": 1080,
+    "processor": "mediapipe_holistic"
+  },
+  "frames": {
+    "0": {
+      "frame": "0",
+      "timestamp": 0.0,
+      "persons": [{
+        "person_id": 0,
+        "bbox": {"x": 400, "y": 100, "width": 300, "height": 600},
+        "face_mesh": {
+          "landmarks": [[x,y,z], ...],
+          "eye_features": {"left_openness": 0.85, "right_openness": 0.82},
+          "mouth_features": {"openness": 0.3, "width": 45}
+        },
+        "pose": {
+          "landmarks": [[x,y,z,visibility], ...],
+          "arm_features": {"left_angle": 45, "right_angle": 30},
+          "leg_features": {"left_angle": 180, "right_angle": 175}
+        },
+        "hands": {
+          "left": {"landmarks": [[x,y,z], ...], "gesture": "point"},
+          "right": {"landmarks": [[x,y,z], ...], "gesture": "fist"}
+        }
+      }]
+    }
+  }
+}
+```
+
+**關鍵點總計**:
+| 部位 | 數量 | 說明 |
+|------|------|------|
+| Face Mesh | 468 | 臉部完整網格 |
+| Pose | 33 | 身體關節 |
+| Left Hand | 21 | 左手關鍵點 |
+| Right Hand | 21 | 右手關鍵點 |
+| **總計** | **543** | |
+
+### Pose vs MediaPipe 對比
+
+| | Pose Processor | MediaPipe Holistic |
+|--|----------------|--------------------|
+| **Landmarks** | 33 pts (pose only) | 543 pts (face + pose + hands) |
+| **速度** | 快 (GPU 加速) | 較慢 (CPU) |
+| **GPU** | ✅ | ❌ |
+| **輸出檔案** | `.pose.json` | `.mediapipe.json` |
+| **Appearance 共用** | 身體 ROI (neck, foot) | 臉部 ROI (hat, glasses)、手部 ROI (watch, phone) |
+| **用途** | 身體姿勢、bbox 來源 | 完整關鍵點、手勢辨識、唇型分析 |
+
+---
+
+### 3.8 Appearance — 色彩特徵 + 配件偵測
+
+**型別**: Frame-based
+**腳本**: `appearance_processor.py`
+**依賴**: pose (bbox 來源)
+**採樣**: 8Hz
+**ROI 共用**: 緊密貼合 face/pose/mediapipe landmarks
+
+```rust
+pub struct AppearanceResult {
+    pub frame_count: u64,
+    pub fps: f64,
+    pub frames: Vec<AppearanceFrame>,
+}
+
+pub struct AppearanceFrame {
+    pub frame: u64,
+    pub timestamp: f64,
+    pub persons: Vec<AppearancePerson>,
+}
+
+pub struct AppearancePerson {
+    pub person_id: u64,
+    pub bbox: BBox,
+    pub hsv_histogram: Vec<Vec<f64>>,
+    pub dominant_colors: Vec<Vec<f64>>,
+    pub upper_body: Option<Vec<Vec<f64>>>,
+    pub lower_body: Option<Vec<Vec<f64>>>,
+}
+```
+
+**輸出 JSON**:
+```json
+{
+  "frame_count": 2238,
+  "fps": 29.97,
+  "frames": [
+    {
+      "frame": 0,
+      "timestamp": 0.0,
+      "persons": [{
+        "person_id": 0,
+        "bbox": {"x": 400, "y": 100, "width": 300, "height": 600},
+        "hsv_histogram": [
+          [H0, H1, ...H29],
+          [S0, S1, ...S31],
+          [V0, V1, ...V31]
+        ],
+        "dominant_colors": [[H,S,V], ...],
+        "upper_body": [[H...], [S...], [V...]],
+        "lower_body": [[H...], [S...], [V...]]
+      }]
+    }
+  ]
+}
+```
+
+#### ROI 定位方式
+
+```python
+def get_accessory_rois(frame, face_data, pose_data, hand_data):
+    rois = {}
+    
+    # 臉部區域 — 用 face bbox + landmarks
+    face_bbox = face_data['bbox']
+    landmarks = face_data['landmarks']  # nose, left_eye, right_eye
+    
+    # 帽子 ROI: 臉部 bbox 上方延伸
+    rois['hat'] = expand_region(face_bbox, direction='up', factor=0.5)
+    
+    # 眼鏡 ROI: 眼部 landmarks 水平帶
+    rois['glasses'] = bbox_around_points(landmarks['left_eye'], landmarks['right_eye'], padding=10)
+    
+    # 口罩 ROI: 鼻子下方到下顎
+    rois['mask'] = region_below_point(landmarks['nose'], face_bbox.bottom)
+    
+    # 脖子 ROI — 用 pose neck keypoints
+    rois['neck'] = region_between(pose_data['keypoints']['nose'], pose_data['keypoints']['neck'], width=80)
+    
+    # 手腕 ROI — 用 MediaPipe hand landmarks
+    rois['left_wrist'] = circle_around(hand_data['left']['wrist'], radius=30)
+    
+    # 腳部 ROI — 用 pose ankle/toe keypoints
+    rois['left_foot'] = bbox_around_points(pose_data['left_ankle'], pose_data['left_toe'], padding=20)
+    
+    return rois
+```
+
+#### 配件偵測方式
+
+| 方式 | 適用配件 | 說明 |
+|------|----------|------|
+| **HSV 色塊** | tie, phone, watch, ring, bracelet, glasses, mask, hat, shoes, backpack, handbag | 主要方式 — 異色區塊分析 |
+| **CLIP** | hairstyle, beard, face_tattoo, earrings, nose_ring, necklace, gloves | 輔助 — 色塊不易區分時 |
+| **MediaPipe** | gesture, arm_pose | 21 hand pts + 33 pose pts |
+| **HSV** | upper_body_color, lower_body_color, skin_tone | 色彩特徵提取 |
+
+#### 配件完整清單 (49 種)
+
+| 部位 | 配件 | 偵測 |
+|------|------|------|
+| 頭部 (12) | hat, hairstyle, hair_accessory, earrings, nose_ring, lip_ring, face_tattoo, eyebrow_tattoo, glasses, mask, beard, headscarf | HSV 色塊 + CLIP |
+| 脖子 (5) | tie, scarf, shawl, necklace, neck_tattoo | HSV 色塊 + CLIP |
+| 手部/手臂 (16) | ring, bracelet, watch, gloves, phone, pen, laptop, book, cup, remote, tool, knife, gun, baseball_bat, gesture, arm_pose | HSV 色塊 + CLIP + MP |
+| 足部/載具 (8) | shoes, socks, barefoot, skateboard, scooter, bicycle, motorbike, roller_skates | HSV 色塊 + CLIP |
+| 攜帶/環境 (5) | backpack, handbag, luggage, chair, diningtable | HSV 色塊 + CLIP |
+| 色彩 (3) | upper_body_hsv, lower_body_hsv, skin_tone | HSV |
+
+---
+
+### 3.9 Scene — 場景分類
+
+**型別**: Time-based
+**腳本**: `scene_classifier.py`
+**模型**: places365
+**依賴**: cut
+
+---
+
+### 3.10 Story — 故事生成
+
+**型別**: Time-based
+**腳本**: `story_processor.py`
+**模型**: gemma4
+**依賴**: asrx + cut + yolo + face
+
+---
+
+### 3.11 5W1H — 故事摘要
+
+**型別**: Time-based
+**腳本**: `parent_chunk_5w1h.py`
+**模型**: gemma4
+**依賴**: story
+
+---
+
+## 4. PythonExecutor 統一框架
+
+### 4.1 RetryConfig
+
+```rust
+pub struct RetryConfig {
+    pub max_attempts: u32,         // 預設 3
+    pub initial_delay_ms: u64,     // 預設 1000 (1s)
+    pub max_delay_ms: u64,         // 預設 30000 (30s)
+    pub backoff_multiplier: f64,   // 預設 2.0
+}
+```
+
+**退避策略**: 1s → 2s → 4s → 8s → ... → max 30s
+
+### 4.2 SHA256 Checksum 驗證
+
+```
+scripts/
+├── checksums.sha256          # SHA256 manifest
+├── face_processor.py
+├── yolo_processor.py
+└── ...
+```
+
+`checksums.sha256` 內容:
+```
+a1b2c3d4...  face_processor.py
+e5f6g7h8...  yolo_processor.py
+...
+```
+
+Executor 啟動前驗證腳本完整性，防止腳本被篡改。
+
+### 4.3 Timeout 管理
+
+| Processor | Timeout |
+|-----------|---------|
+| cut | 3600s (1h) |
+| asrx, yolo, ocr, face, pose, mediapipe, appearance, scene, story, 5w1h | 7200s (2h) |
+
+---
+
+## 5. 8Hz 採樣框架
+
+### 5.1 基本原理
+
+```
+影片 FPS: ~30
+Sample Interval: round(fps / 8) = 4
+Sample Frames: 0, 4, 8, 12, 16, ...
+```
+
+| 影片長度 | 總幀數 | 8Hz 樣本數 |
+|----------|--------|------------|
+| 5 分鐘 | 9,000 | ~2,250 |
+| 10 分鐘 | 18,000 | ~4,500 |
+| 30 分鐘 | 54,000 | ~13,500 |
+
+### 5.2 按需細化機制
+
+```
+Layer 1: 8Hz 基底 (所有 processor)
+    ↓
+Layer 2: 細化 (特定特徵觸發)
+
+細化場景:
+  - Blink 確認: 8Hz 發現 eye openness 突降 → 回頭抓前後 ±4 幀 (30Hz)
+  - Lip-sync: sentence chunk 覆蓋的時間段 → 16Hz
+  - Mutual Gaze: 兩人 gaze 方向接近 → 前後 ±2 幀 (30Hz) 確認
+```
+
+### 5.3 樣本幀計算
+
+```rust
+fn compute_sample_frames(total_frames: i64, fps: f64) -> Vec<i64> {
+    let interval = (fps / 8.0).round() as i64;
+    (0..total_frames).step_by(interval.max(1) as usize).collect()
+}
+```
+
+---
+
+## 6. DAG 依賴圖
+
+```
+┌─────┐    ┌─────┐    ┌─────┐    ┌─────┐
+│ cut │───►│asrx │───►│story│───►│5w1h │
+└──┬──┘    └──┬──┘    └──┬──┘    └─────┘
+   │          │          │
+   │    ┌─────┘          │
+   ▼    ▼                │
+┌─────┐ ┌─────┐ ┌─────┐  │
+│yolo │ │face │ │pose │  │
+└──┬──┘ └──┬──┘ └──┬──┘  │
+   │       │       │     │
+   │       │       ▼     │
+   │       │  ┌────────┐ │
+   │       └─►│appear  │ │
+   │          └────────┘ │
+   ▼          ▼          ▼
+┌─────────────────────────┐
+│   TKG (build_tkg)       │
+└─────────────────────────┘
+
+獨立處理器 (無依賴):
+  ┌─────┐  ┌─────┐  ┌───────────┐
+  │ ocr │  │mediap│  │  scene    │
+  └─────┘  └─────┘  └─────┬─────┘
+                           │ (依賴 cut)
+```
+
+---
+
+## 7. Worker 整合
+
+### 7.1 JobWorker 調度
+
+```
+Video Registration
+    │
+    ▼
+Create Job (processor_list: [cut, asrx, yolo, ocr, face, pose, mediapipe, appearance, scene, story])
+    │
+    ▼
+Poll Available Processors (dependency check + concurrency limit)
+    │
+    ▼
+Execute Processor → Store JSON → Update Progress
+    │
+    ▼
+All Processors Done → Rule 1 (chunk) → Vectorize → Complete
+```
+
+### 7.2 並發控制
+
+- **Dynamic concurrency**: 根據 CPU/Memory/GPU 動態調整 (預設 2)
+- **Processor pool**: 同時執行最多 N 個 processor
+
+### 7.3 進度回報 (Redis)
+
+```
+Redis Key: momentry_dev:progress:{file_uuid}
+Value: {
+  "phase": "PROCESSING",
+  "progress": {
+    "FACE": {"current": 150, "total": 2238, "status": "running"},
+    "YOLO": {"current": 2238, "total": 2238, "status": "completed"},
+    ...
+  },
+  "active_processors": ["FACE", "POSE"]
+}
+```
+
+---
+
+## Version History
+
+| Version | Date | Author | Description |
+|---------|------|--------|-------------|
+| 1.0.0 | 2026-06-19 | OpenCode | Initial design document |
--- a/docs_v1.0/DESIGN/RULE1_CHUNK_V1.0.md
+++ b/docs_v1.0/DESIGN/RULE1_CHUNK_V1.0.md
@@ -0,0 +1,187 @@
+---
+title: Rule 1 Chunk Ingestion V1.0
+version: 1.0
+date: 2026-06-20
+author: OpenCode
+status: approved
+---
+
+# Rule 1 Chunk Ingestion V1.0
+
+| Scope | Status | Applicable to | Binary |
+|-------|--------|---------------|--------|
+| Sentence chunk creation from ASR + OCR | Approved | `momentry_playground`, `momentry` | Both |
+
+## Overview
+
+Rule 1 is the first chunking rule in Momentry's pipeline. It creates **sentence-level chunks** (`ChunkType::Sentence`, `ChunkRule::Rule1`) by taking ASR transcription segments and enriching them with OCR on-screen text from the same time range. Each chunk represents a spoken segment annotated with the visible text in the video frames.
+
+These chunks are vectorized by the downstream `vectorize_chunks` step and become searchable through semantic search (Qdrant), keyword search (BM25 ILIKE), and identity-based search.
+
+## Data Flow
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ UPSTREAM: pre_chunks table                               │
+│                                                         │
+│ Processor outputs stored by store_raw_pre_chunks_batch:  │
+│   processor_type='asr' → ASR segments (text, timestamps) │
+│   processor_type='ocr' → OCR texts per frame             │
+└─────────────────────────────────────────────────────────┘
+                          │
+                          ▼ wait for ASRX completion
+                          │
+┌─────────────────────────────────────────────────────────┐
+│ RULE 1 PROCESSING                                        │
+│                                                         │
+│ Triggered by:                                            │
+│   1. Worker auto: job_worker.rs after ASRX completes     │
+│   2. HTTP API: POST /api/v1/file/:file_uuid/rule1        │
+│   3. Pipeline: pipeline_core::execute_rule1              │
+│                                                         │
+│   execute_rule1(file_uuid, fps):                         │
+│     ├─ fetch_asr_segments()  → Vec<AsrSegment>           │
+│     ├─ fetch_ocr_texts()     → BTreeMap<frame, [texts]>  │
+│     │                                                    │
+│     └─ for each ASR segment:                             │
+│          ├─ collect_ocr_text(frame_range, ocr_map)       │
+│          │   → deduplicated OCR texts within range        │
+│          ├─ build combined_text = "<ASR> <OCR>"           │
+│          ├─ build content = {text, ocr_text}              │
+│          ├─ build metadata = {language}                   │
+│          └─ store_chunk_in_tx() → chunk table            │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+                          │
+                          ▼
+┌─────────────────────────────────────────────────────────┐
+│ DOWNSTREAM: vectorize_chunks()                            │
+│                                                         │
+│   SELECT ... WHERE chunk_type='sentence' AND embedding   │
+│     IS NULL                                              │
+│                                                         │
+│   1. embedder.embed_document(combined_text) → vector     │
+│   2. db.store_vector() → PG chunk.embedding              │
+│   3. qdrant.upsert_vector() → momentry_rule1 collection  │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+```
+
+## Chunk Data Structure
+
+### Content JSON (`content` column)
+
+```json
+{
+    "text": "今天的會議我們要討論 ...",
+    "ocr_text": "Q3 Revenue Slides Agenda"
+}
+```
+
+| Field | Source | Purpose |
+|-------|--------|---------|
+| `text` | ASR transcription | Original spoken text, used by UI/reference |
+| `ocr_text` | OCR detections in frame range | On-screen text (titles, labels, signs) |
+
+### Text Content (`text_content` column)
+
+```
+"今天的會議我們要討論 Q3 Revenue Slides Agenda"
+```
+
+Combined ASR + OCR text used for:
+- **Embedding generation**: The combined text is embedded to Qdrant, enabling semantic search to find segments based on both spoken and on-screen content
+- **Keyword search (BM25 ILIKE)**: Queries match against this field, so searching for "Q3 Revenue" finds the segment even if not spoken aloud
+
+### Metadata JSON (`metadata` column)
+
+```json
+{
+    "language": "zh"
+}
+```
+
+Only the ASR-detected language is stored. See Design Decisions below.
+
+## Search Contribution Analysis
+
+| Search Path | Mechanism | Rule 1 Contribution |
+|-------------|-----------|-------------------|
+| **Semantic search** (Qdrant) | `chunk_type='sentence'` → embedding query | ASR + OCR text in embedding captures both spoken and visual content |
+| **Keyword search** (BM25 ILIKE) | `text_content ILIKE '%query%'` | Both ASR and OCR text are searchable |
+| **Title match** (smart_search) | `chunk_type='sentence' AND embedding IS NOT NULL` | Rule 1 chunks are the primary sentence chunks |
+| **Identity search** | `face_detections` time overlap join | Rule 1 chunks match via frame ranges |
+
+### What Was Excluded and Why
+
+| Data Source | Considered For | Decision | Reason |
+|-------------|---------------|----------|--------|
+| **YOLO detections** | Adding class names to text_content | ❌ **Excluded** | 80 COCO classes are too generic ("person", "chair" appear in almost every segment). High error rate adds noise, dilutes embedding semantic density. Cross-segment distinctiveness is near zero. |
+| **ASRX speaker** | Adding speaker_id to metadata | ❌ **Excluded** | At Rule 1 time, identity has not been paired yet. Speaker IDs are temporary labels without identity binding, providing no search value. |
+| **Face detections** | Adding face_ids to metadata | ❌ **Excluded** | Same as speaker — identity not yet available. Face detection IDs alone have no search meaning. |
+| **OCR text** | Adding to text_content + embedding | ✅ **Included** | OCR provides specific on-screen text (titles, labels, signs) that directly matches user search queries. Highly complementary to ASR. |
+
+## Implementation Details
+
+### `fetch_ocr_texts()`
+
+Reads OCR per-frame data from `pre_chunks`:
+
+```sql
+SELECT coordinate_index as frame, data
+FROM pre_chunks
+WHERE file_uuid = $1 AND processor_type = 'ocr'
+ORDER BY coordinate_index
+```
+
+Parses the `data.texts` JSON array, extracting `text` fields where `confidence > 0.5`. Returns `BTreeMap<i64, Vec<String>>` mapping frame number to list of recognized text strings.
+
+### `collect_ocr_text()`
+
+For a given frame range `[start_frame, end_frame]`:
+1. Iterates frames using `BTreeMap::range(start_frame..=end_frame)`
+2. Collects all OCR texts from those frames
+3. Deduplicates using a `HashSet` (case-sensitive)
+4. Joins with spaces: `"text1 text2 text3"`
+
+Returns empty string if no OCR data exists in the range.
+
+### `text_content` Composition Rules
+
+```
+if OCR text exists:
+    combined = "{asr_text} {ocr_text}"
+else:
+    combined = "{asr_text}"
+```
+
+The combined string is used for both embedding and keyword search. The original ASR text is preserved separately in `content.text`.
+
+## Trigger Points
+
+| Trigger | Location | Condition |
+|---------|----------|-----------|
+| Worker auto | `job_worker.rs:1135` | After ASRX processor completes and no sentence chunks exist yet |
+| HTTP API | `POST /api/v1/file/:file_uuid/rule1` | Manual trigger via `pipeline_core::execute_rule1` |
+| Programmatic | `pipeline_core::execute_rule1` | Called by other modules needing sentence chunks |
+
+The worker guard checks idempotency:
+```sql
+SELECT 1 FROM chunk WHERE file_uuid = $1 AND chunk_type = 'sentence' LIMIT 1
+```
+
+## Edge Cases
+
+| Scenario | Behavior |
+|----------|----------|
+| No ASR segments | Returns 0 immediately with info log |
+| No OCR data in pre_chunks | `ocr_text` is empty string; `text_content` = ASR only |
+| OCR frame with no valid text | Skipped (confidence < 0.5 or empty string) |
+| ASR segment end_time = 0.0 | Logs warning; overlap-based matching degrades gracefully |
+| Large number of segments | Batches in single transaction; progress logged every 100 segments |
+
+## Version History
+
+| Version | Date | Author | Change |
+|---------|------|--------|--------|
+| 1.0 | 2026-06-20 | OpenCode | Initial design: ASR + OCR → sentence chunks |
--- a/docs_v1.0/DESIGN/TKG_MultiTrace_V1.0.md
+++ b/docs_v1.0/DESIGN/TKG_MultiTrace_V1.0.md
@@ -0,0 +1,816 @@
+# TKG Multi-Trace Design V1.0
+
+**Date**: 2026-06-19
+**Version**: 1.0.0
+**Status**: Draft
+
+---
+
+## Overview
+
+統一 8Hz 採樣框架，整合 face、appearance、gaze、lip 四條 trace，並接入 sentence/speaker/accessory 節點，構建完整的 Temporal Knowledge Graph (TKG)。
+
+### 設計目標
+
+1. **時間對齊**: 所有 trace 在同一 8Hz 網格上，edge 計算無需插值
+2. **按需細化**: 特定特徵 (blink, lip-sync, mutual gaze) 可局部提高採樣率
+3. **配件偵測**: 49 種配件分類 (頭部 12 + 脖子 5 + 手部 16 + 足部 8 + 攜帶 5 + 色彩 3)
+4. **膚色 + 光源**: Fitzpatrick 分類 + 光照參數，支援可信度評估
+5. **社交互動**: Mutual gaze (互相看), lip-sync (唇語同步), speaker-face 綁定
+
+---
+
+## 1. 8Hz 採樣框架
+
+### 1.1 基本原理
+
+```
+影片 FPS: ~30
+Sample Interval: round(fps / 8) = 4
+Sample Frames: 0, 4, 8, 12, 16, ...
+```
+
+| 影片長度 | 總幀數 | 8Hz 樣本數 |
+|----------|--------|------------|
+| 5 分鐘 | 9,000 | ~2,250 |
+| 10 分鐘 | 18,000 | ~4,500 |
+| 30 分鐘 | 54,000 | ~13,500 |
+
+### 1.2 按需細化機制
+
+```
+Layer 1: 8Hz 基底 (所有 processor)
+    ↓
+Layer 2: 細化 (特定特徵觸發)
+
+細化場景:
+  - Blink 確認: 8Hz 發現 eye openness 突降 → 回頭抓前後 ±4 幀 (30Hz)
+  - Lip-sync: sentence chunk 覆蓋的時間段 → 16Hz
+  - Mutual Gaze: 兩人 gaze 方向接近 → 前後 ±2 幀 (30Hz) 確認
+```
+
+### 1.3 樣本幀計算
+
+```rust
+// worker/processor.rs
+fn compute_sample_frames(total_frames: i64, fps: f64) -> Vec<i64> {
+    let interval = (fps / 8.0).round() as i64;
+    (0..total_frames).step_by(interval.max(1) as usize).collect()
+}
+
+fn merge_refine_frames(base: &[i64], refine: &HashSet<i64>) -> Vec<i64> {
+    let mut combined: HashSet<i64> = base.iter().cloned().collect();
+    combined.extend(refine.iter().cloned());
+    let mut sorted: Vec<i64> = combined.into_iter().collect();
+    sorted.sort();
+    sorted
+}
+```
+
+---
+
+## 2. Trace 類型
+
+### 重要 Trace 總覽
+
+| # | Trace 類型 | 來源 | 用途 |
+|---|-----------|------|------|
+| 1 | **face_trace** | face_detections + face.json | 人臉追蹤、身份識別 |
+| 2 | **appearance_trace** | appearance.json | 服裝色彩、配件、膚色 |
+| 3 | **gaze_trace** | face.json (pose_angle + landmarks) | 視線方向、互相看 |
+| 4 | **lip_trace** | face.json (landmarks) | 唇型、說話同步 |
+| 5 | **speaker_trace** | asrx.json (speaker diarization) | 說話者識別 |
+| 6 | **text_trace** | dev.chunk (sentence chunks) | 文字內容、語意 |
+| 7 | **skin_tone_trace** | face.json (ROI HSV) | 膚色分類、光源記錄 |
+
+---
+
+### 2.1 Face Trace (已有)
+
+```json
+{
+  "node_type": "face_trace",
+  "external_id": "trace_5",
+  "properties": {
+    "frame_count": 200,
+    "start_frame": 150,
+    "end_frame": 350,
+    "avg_bbox": { "x": 500, "y": 300, "width": 200, "height": 250 },
+    "avg_yaw": -0.15,
+    "avg_pitch": -0.08,
+    "avg_roll": -0.20,
+    "pose_count": 180,
+    "embedding": [...],
+    "skin_tone": {
+      "face_h_mean": 18.5,
+      "fitzpatrick": "Type IV - Medium",
+      "confidence": 0.82,
+      "lighting": {
+        "brightness": 0.65,
+        "color_temp": "warm",
+        "direction": "front",
+        "uniformity": 0.92,
+        "source": "indoor",
+        "quality": "good"
+      },
+      "sample_frames": 156
+    }
+  }
+}
+```
+
+### 2.2 Appearance Trace (新增)
+
+**綁定策略**: IoU 匹配 appearance person ↔ face detection，繼承 trace_id
+
+```json
+{
+  "node_type": "appearance_trace",
+  "external_id": "trace_5",
+  "properties": {
+    "trace_id": 5,
+    "frame_count": 400,
+    "start_frame": 100,
+    "end_frame": 500,
+    "face_overlap_frames": 200,
+    "confidence": 0.50,
+    "color_features": {
+      "dominant_colors": [[0.1, 0.6, 0.8], ...],
+      "upper_body_hsv": [[...], [...], [...]],
+      "lower_body_hsv": [[...], [...], [...]]
+    },
+    "accessories": {
+      "head": {
+        "hat": {"detected": true, "confidence": 0.82, "first_frame": 0},
+        "glasses": {"detected": true, "confidence": 0.67, "first_frame": 0},
+        "earrings": {"detected": false},
+        "mask": {"detected": false},
+        "hairstyle": {"type": "long", "confidence": 0.75},
+        "hair_accessory": {"detected": false},
+        "nose_ring": {"detected": false},
+        "lip_ring": {"detected": false},
+        "face_tattoo": {"detected": false},
+        "eyebrow_tattoo": {"detected": false},
+        "beard": {"detected": true, "confidence": 0.88},
+        "headscarf": {"detected": false}
+      },
+      "neck": {
+        "tie": {"detected": true, "confidence": 0.92, "first_frame": 0, "source": "hsv_color_block"},
+        "scarf": {"detected": false},
+        "shawl": {"detected": false},
+        "necklace": {"detected": true, "confidence": 0.71, "first_frame": 12, "source": "clip"},
+        "neck_tattoo": {"detected": false}
+      },
+      "hand": {
+        "ring": {"detected": false},
+        "bracelet": {"detected": false},
+        "watch": {"detected": true, "confidence": 0.63, "first_frame": 24},
+        "gloves": {"detected": false}
+      },
+      "hand_held": {
+        "phone": {"detected": true, "confidence": 0.88, "source": "hsv_color_block"},
+        "pen": {"detected": false},
+        "cup": {"detected": false},
+        "knife": {"detected": false},
+        "gun": {"detected": false}
+      },
+      "foot": {
+        "shoes": {"type": "sneaker", "confidence": 0.78, "source": "hsv_color_block"},
+        "socks": {"detected": false},
+        "barefoot": {"detected": false}
+      },
+      "vehicle": {
+        "bicycle": {"detected": false, "source": "hsv_color_block"},
+        "skateboard": {"detected": false},
+        "scooter": {"detected": false}
+      },
+      "carried": {
+        "backpack": {"detected": false},
+        "handbag": {"detected": true, "confidence": 0.85, "source": "hsv_color_block"},
+        "luggage": {"detected": false}
+      }
+    }
+  }
+}
+```
+
+### 2.3 Speaker Trace (重要)
+
+**來源**: ASRX speaker diarization + face trace 綁定
+
+```json
+{
+  "node_type": "speaker_trace",
+  "external_id": "SPEAKER_0",
+  "properties": {
+    "speaker_id": "SPEAKER_0",
+    "segment_count": 45,
+    "total_duration": 120.5,
+    "first_appearance": {"frame": 100, "time": 3.3},
+    "last_appearance": {"frame": 3600, "time": 120.0},
+    "full_text": "大家好 今天我們來討論... (完整語音轉文字)",
+    "segments": [
+      {"start_time": 0.1, "end_time": 2.0, "text": "大家好", "start_frame": 3, "end_frame": 60},
+      {"start_time": 5.2, "end_time": 8.5, "text": "今天我們來討論", "start_frame": 156, "end_frame": 255},
+      ...
+    ],
+    "face_trace_ids": [5, 12, 23],
+    "appearance_trace_ids": [5, 12],
+    "gaze_context": {
+      "looking_at_person": true,
+      "mutual_gaze_with": [12]
+    },
+    "lip_sync_quality": 0.85
+  }
+}
+```
+
+**來源資料**:
+```
+ASRX → asrx.json (segments with speaker_id)
+Face → face_detections (trace_id)
+綁定 → SPEAKS_AS edge (speaker ↔ face_trace)
+```
+
+### 2.4 Text Trace (重要)
+
+**來源**: dev.chunk (chunk_type='sentence') + ASRX text
+
+```json
+{
+  "node_type": "text_trace",
+  "external_id": "chunk_1",
+  "properties": {
+    "chunk_id": "chunk_1",
+    "text": "大家好，今天我們來討論這個話題",
+    "text_normalized": "大家好，今天我們來討論這個話題",
+    "start_time": 0.1,
+    "end_time": 5.2,
+    "start_frame": 3,
+    "end_frame": 156,
+    "speaker_id": "SPEAKER_0",
+    "language": "zh",
+    "confidence": 0.95,
+    "yolo_objects": ["person", "chair"],
+    "face_ids": ["face_100"],
+    "speaker_trace_id": "SPEAKER_0",
+    "face_trace_id": 5,
+    "lip_sync": {
+      "matched_frames": 120,
+      "total_frames": 153,
+      "quality": 0.85
+    },
+    "semantic_embedding": [0.12, -0.34, ...],
+    "sentiment": "neutral"
+  }
+}
+```
+
+**來源資料**:
+```
+Rule 1 → dev.chunk (sentence chunks)
+ASRX → asrx.json (speaker_id binding)
+Face → face_detections (face_ids in chunk metadata)
+YOLO → yolo.json (co-occurring objects)
+```
+
+**Edge 連接**:
+- `SPEAKS_BY`: text_trace → speaker_trace
+- `SPOKEN_WHILE`: text_trace → face_trace
+- `LIP_SYNC`: text_trace → lip_trace
+- `CONTAINS_OBJECT`: text_trace → object
+
+### 2.5 Skin Tone Trace (重要)
+
+**來源**: face.json ROI HSV + 光源分析
+
+```json
+{
+  "node_type": "skin_tone_trace",
+  "external_id": "trace_5",
+  "properties": {
+    "trace_id": 5,
+    "frame_count": 200,
+    "start_frame": 150,
+    "end_frame": 350,
+    "face_h_mean": 18.5,
+    "fitzpatrick": "Type IV - Medium",
+    "confidence": 0.82,
+    "lighting": {
+      "brightness": 0.65,
+      "color_temp": "warm",
+      "direction": "front",
+      "uniformity": 0.92,
+      "source": "indoor",
+      "quality": "good"
+    },
+    "sample_frames": 156,
+    "hand_h_mean": 17.8,
+    "arm_h_mean": 18.2
+  }
+}
+```
+
+**Fitzpatrick 分類**:
+
+| Type | 描述 | H 值 (HSV) |
+|------|------|------------|
+| I | 非常淺 | 0–5 |
+| II | 淺 | 5–12 |
+| III | 中等偏淺 | 12–18 |
+| IV | 中等 | 18–25 |
+| V | 深 | 25–35 |
+| VI | 很深 | 35+ |
+
+**光源品質**:
+
+| Quality | 條件 | 膚色可信度 |
+|---------|------|------------|
+| good | brightness > 0.4, uniformity > 0.8, front light | 高 (×1.0) |
+| fair | brightness > 0.3, uniformity > 0.6 | 中 (×0.7) |
+| poor | brightness < 0.3 或 backlight | 低 (×0.5) |
+
+### 2.6 Gaze Trace (新增)
+
+```json
+{
+  "node_type": "gaze_trace",
+  "external_id": "trace_5",
+  "properties": {
+    "trace_id": 5,
+    "frame_count": 200,
+    "start_frame": 150,
+    "end_frame": 350,
+    "avg_yaw": -0.15,
+    "avg_pitch": -0.08,
+    "avg_roll": -0.20,
+    "head_direction": "frontal",
+    "gaze_direction": "center-left",
+    "eye_openness": 0.85,
+    "blink_count": 12,
+    "blink_rate": 0.06,
+    "looking_at_person": true,
+    "looking_at_object": ["chair"],
+    "refined_ranges": [
+      {"start_frame": 200, "end_frame": 220, "hz": 30, "reason": "mutual_gaze"}
+    ]
+  }
+}
+```
+
+### 2.7 Lip Trace (重要)
+
+**來源**: face.json → faces[].lips (inner_lips 6pts + outer_lips 14pts)
+
+```json
+{
+  "node_type": "lip_trace",
+  "external_id": "trace_5",
+  "properties": {
+    "trace_id": 5,
+    "frame_count": 180,
+    "start_frame": 160,
+    "end_frame": 340,
+    "avg_openness": 0.3,
+    "avg_width": 45.2,
+    "avg_height": 12.8,
+    "movement_variance": 0.15,
+    "speaking_frames": 95,
+    "silent_frames": 85,
+    "lip_landmark_samples": {
+      "inner_lips": [[x,y,z], ...],
+      "outer_lips": [[x,y,z], ...]
+    },
+    "speech_correlation": {
+      "text_trace_ids": ["chunk_1", "chunk_2", "chunk_3"],
+      "sync_quality": 0.85,
+      "matched_segments": [
+        {"start_frame": 160, "end_frame": 200, "text": "大家好"},
+        {"start_frame": 210, "end_frame": 250, "text": "今天我們來討論"}
+      ]
+    },
+    "refined_ranges": [
+      {"start_frame": 160, "end_frame": 340, "hz": 30, "reason": "lip_sync"}
+    ]
+  }
+}
+```
+
+**Lip-sync 計算**:
+
+```
+Lip openness = inner_lips_area / outer_lips_area
+
+Speaking detection:
+  - openness > threshold (動態調整)
+  - movement_variance > threshold (唇型變化)
+  - 持續 N 幀以上 (避免雜訊)
+
+Sync with text:
+  - 比對 text_trace 的 start/end_time
+  - 計算 lip movement 與文字時間段的重疊率
+  - quality = matched_frames / total_text_frames
+```
+
+**Edge 連接**:
+- `HAS_LIP`: face_trace → lip_trace
+- `LIP_SYNC`: lip_trace → text_trace
+- `GAZE_SYNC_SPEECH`: gaze_trace + lip_trace (說話時注視方向)
+
+---
+
+## 3. 配件偵測
+
+### 3.1 偵測方式分工
+
+| 方式 | 適用配件 | 速度 | 說明 |
+|------|----------|------|------|
+| **HSV 色塊** | tie, phone, watch, ring, bracelet, glasses, mask, hat, shoes, backpack, handbag, umbrella, pen, knife, cup, book, laptop, remote, baseball_bat | 快 | **主要方式** — 從 person crop 分析異色區塊 |
+| **CLIP** | hairstyle, beard, face_tattoo, eyebrow_tattoo, earrings, nose_ring, lip_ring, neck_tattoo, headscarf, scarf, shawl, necklace, gloves, tool, gun, skateboard, scooter, roller_skates, socks, barefoot | 中 | zero-shot (YOLO 不可靠，色塊也不易區分時) |
+| **MediaPipe** | gesture, arm_pose | 快 | 21 hand pts + 33 pose pts |
+| **HSV** | upper_body_color, lower_body_color, skin_tone | 快 | 色彩特徵提取 |
+
+### 3.2 Appearance 與 Landmark/Pose 緊密貼合
+
+**核心原則**: Appearance 不獨立偵測 bbox，而是直接用 face/pose/mediapipe 的幾何結果裁切 ROI。
+
+```
+Face Landmarks (20pts) ──► 臉部 ROI ──► hat, glasses, mask, beard, earrings
+Pose 33 Keypoints ───────► 身體 ROI ──► tie, necklace, upper/lower body HSV
+MediaPipe Hands (21×2) ──► 手腕 ROI ──► watch, bracelet, ring, phone, glove
+MediaPipe Pose Feet ─────► 腳部 ROI ──► shoes, socks, barefoot
+```
+
+**ROI 定位方式**:
+
+```python
+def get_accessory_rois(frame, face_data, pose_data, hand_data):
+    rois = {}
+    
+    # 臉部區域 — 用 face bbox + landmarks
+    face_bbox = face_data['bbox']
+    landmarks = face_data['landmarks']  # nose, left_eye, right_eye
+    
+    # 帽子 ROI: 臉部 bbox 上方延伸
+    rois['hat'] = expand_region(face_bbox, direction='up', factor=0.5)
+    
+    # 眼鏡 ROI: 眼部 landmarks 水平帶
+    left_eye = landmarks['left_eye']
+    right_eye = landmarks['right_eye']
+    rois['glasses'] = bbox_around_points(left_eye, right_eye, padding=10)
+    
+    # 口罩 ROI: 鼻子下方到下顎
+    nose = landmarks['nose']
+    rois['mask'] = region_below_point(nose, face_bbox.bottom)
+    
+    # 脖子 ROI — 用 pose neck keypoints
+    if pose_data:
+        neck = pose_data['keypoints']['neck']
+        nose = pose_data['keypoints']['nose']
+        rois['neck'] = region_between(nose, neck, width=80)
+    
+    # 手腕 ROI — 用 MediaPipe hand landmarks
+    if hand_data:
+        for side in ['left', 'right']:
+            wrist = hand_data[side]['wrist']
+            rois[f'{side}_wrist'] = circle_around(wrist, radius=30)
+    
+    # 腳部 ROI — 用 pose ankle/toe keypoints
+    if pose_data:
+        for side in ['left', 'right']:
+            ankle = pose_data['keypoints'][f'{side}_ankle']
+            toe = pose_data['keypoints'][f'{side}_toe']
+            rois[f'{side}_foot'] = bbox_around_points(ankle, toe, padding=20)
+    
+    return rois
+```
+
+### 3.3 HSV 色塊偵測流程
+
+```python
+def detect_accessories_tightly_coupled(frame, face_data, pose_data, hand_data):
+    # 1. 用 landmark/pose 精準定位各 ROI
+    rois = get_accessory_rois(frame, face_data, pose_data, hand_data)
+    
+    results = {}
+    for roi_name, roi_bbox in rois.items():
+        roi_hsv = crop_and_convert(frame, roi_bbox, 'HSV')
+        
+        # 2. 在精準 ROI 內找異色區塊
+        diff_mask = compute_color_diff(roi_hsv, main_colors, threshold=30)
+        blobs = find_connected_components(diff_mask)
+        
+        for blob in blobs:
+            accessory = classify_accessory_by_position(blob, roi_name)
+            if accessory:
+                results[accessory] = {
+                    "detected": True,
+                    "confidence": blob.confidence,
+                    "source": "hsv_color_block",
+                    "roi": roi_name,
+                    "first_frame": current_frame
+                }
+    
+    # 3. 色塊不易判斷的項目 → CLIP
+    clip_only_items = ['hairstyle', 'beard', 'earrings', 'nose_ring', ...]
+    for item in clip_only_items:
+        confidence = clip_score(crop_person(frame, face_data['bbox']), CLIP_PROMPTS[item])
+        if confidence > 0.5:
+            results[item] = {"detected": True, "confidence": confidence, "source": "clip"}
+    
+    return results
+```
+
+### 3.4 依賴關係
+
+```
+Face Detection ──► face_detections (trace_id, bbox, embedding)
+                       │
+                       ▼
+Face Landmarks ────► 臉部 ROI (hat, glasses, mask, beard)
+                       │
+                       ▼
+Pose 33pts ────────► 身體 ROI (neck, wrist, foot) ──► Appearance HSV
+                       │
+                       ▼
+MediaPipe Hands ───► 手腕 ROI (watch, bracelet, ring, phone)
+                       │
+                       ▼
+                 TKG appearance_trace
+```
+
+### 3.5 CLIP 提示詞 (僅用於色塊不易區分的配件)
+
+```python
+CLIP_PROMPTS = {
+    # 頭部 — 色塊不易判斷的項目
+    "hairstyle_short": "a person with short hair",
+    "hairstyle_long": "a person with long hair",
+    "hairstyle_braid": "a person with braided hair",
+    "hairstyle_bun": "a person with hair in a bun",
+    "face_tattoo": "a person with a visible face tattoo or face paint",
+    "eyebrow_tattoo": "a person with tattooed or styled eyebrows",
+    "beard": "a person with a beard or mustache",
+    
+    # 耳朵/鼻子/嘴唇穿刺
+    "earrings": "a person wearing earrings",
+    "nose_ring": "a person wearing a nose ring or nose piercing",
+    "lip_ring": "a person wearing a lip ring or lip piercing",
+    
+    # 脖子 — 項鍊等細小物件
+    "necklace": "a person wearing a necklace",
+    "neck_tattoo": "a person with a visible neck tattoo",
+    
+    # 手部細小物件
+    "gloves": "a person wearing gloves",
+    "tool": "a person holding a tool like a wrench or screwdriver",
+    "gun": "a person holding a gun",
+    
+    # 足部
+    "socks": "a person wearing visible socks",
+    "barefoot": "a barefoot person",
+    "roller_skates": "a person wearing roller skates",
+}
+```
+
+---
+
+## 4. 膚色 + 光源
+
+### 4.1 Fitzpatrick 分類
+
+| Type | 描述 | H 值 (HSV) |
+|------|------|------------|
+| I | 非常淺 | 0–5 |
+| II | 淺 | 5–12 |
+| III | 中等偏淺 | 12–18 |
+| IV | 中等 | 18–25 |
+| V | 深 | 25–35 |
+| VI | 很深 | 35+ |
+
+### 4.2 光源參數
+
+| 參數 | 計算方式 | 範圍 |
+|------|----------|------|
+| brightness | V channel 平均 | 0.0–1.0 |
+| color_temp | 白平衡估算 | warm/neutral/cool |
+| direction | 陰影梯度 + yaw/pitch | front/side/back/top |
+| uniformity | 臉部各區域 V 值標準差 | 0.0–1.0 |
+| source | 亮度 + 色溫綜合判斷 | indoor/outdoor/flash |
+
+### 4.3 光源品質
+
+| Quality | 條件 | 膚色可信度 |
+|---------|------|------------|
+| good | brightness > 0.4, uniformity > 0.8, front light | 高 (×1.0) |
+| fair | brightness > 0.3, uniformity > 0.6 | 中 (×0.7) |
+| poor | brightness < 0.3 或 backlight | 低 (×0.5) |
+
+---
+
+## 5. TKG Node 類型
+
+| node_type | external_id | 來源 | 重要性 | 屬性 |
+|-----------|-------------|------|--------|------|
+| `face_trace` | `trace_N` | face_detections | ★★★★ | frame_count, bbox, pose, embedding, skin_tone |
+| `appearance_trace` | `trace_N` | appearance.json | ★★★★ | trace_id, color_features, accessories, confidence |
+| `gaze_trace` | `trace_N` | face.json (pose_angle) | ★★★ | trace_id, gaze_direction, blink_count, looking_at |
+| `lip_trace` | `trace_N` | face.json (lips) | ★★★★ | trace_id, avg_openness, speaking_frames, speech_correlation |
+| `speaker_trace` | `SPEAKER_N` | asrx.json | ★★★★ | speaker_id, segments, face_trace_ids, full_text |
+| `text_trace` | `chunk_N` | dev.chunk | ★★★★ | text, speaker_id, time_range, yolo_objects, lip_sync |
+| `skin_tone_trace` | `trace_N` | face.json (ROI HSV) | ★★★ | trace_id, fitzpatrick, lighting, confidence |
+| `object` | `class_name` | yolo.json | ★★ | total_detections, frames |
+| `accessory` | `hat`, `glasses`, ... | appearance.json | ★★ | category, trace_ids, first/last_seen |
+
+---
+
+## 6. TKG Edge 類型
+
+| Edge Type | Source → Target | 屬性 | 說明 |
+|-----------|----------------|------|------|
+| `SPEAKS_AS` | speaker_trace → face_trace | confidence, overlap_frames | 說話者綁定人臉 |
+| `SPEAKS_BY` | text_trace → speaker_trace | — | 文字由誰說的 |
+| `SPOKEN_WHILE` | text_trace → face_trace | frame_overlap | 說話時的人臉 |
+| `HAS_APPEARANCE` | face_trace → appearance_trace | confidence, overlap_frames | 外觀特徵 |
+| `HAS_GAZE` | face_trace → gaze_trace | overlap_frames | 視線方向 |
+| `HAS_LIP` | face_trace → lip_trace | overlap_frames | 唇型資料 |
+| `HAS_SKIN_TONE` | face_trace → skin_tone_trace | confidence, lighting_match | 膚色記錄 |
+| `LIP_SYNC` | lip_trace → text_trace | time_alignment, openness_match | 唇語同步 |
+| `WEARS` | appearance_trace → accessory | confidence, first_frame | 配件 |
+| `LOOKING_AT` | gaze_trace → object | direction_match, distance | 注視物件 |
+| `LOOKING_AT_PERSON` | gaze_trace → face_trace | direction_match | 注視他人 |
+| `MUTUAL_GAZE` | face_trace ↔ face_trace | first_frame, last_frame, duration_frames, confidence | 互相看 |
+| `CO_OCCURS_WITH` | object ↔ object | frame_count | 物件共現 |
+| `SAME_SKIN_TONE` | face_trace ↔ face_trace | h_diff, lighting_match, confidence | 膚色相近 |
+| `HOLDS` | appearance_trace → object | 手機等手持物品 |
+
+---
+
+## 7. Mutual Gaze 分析
+
+### 7.1 計算邏輯
+
+```
+對每幀:
+  對每對 (person_A, person_B):
+    1. 計算 A 的 gaze vector (從 yaw/pitch/roll)
+    2. 計算 B 的 bbox center 在 A 座標系中的位置
+    3. 判斷 B 是否在 A 的 gaze cone 內 (threshold: ~15°)
+    4. 反向檢查 B → A
+    5. 雙向命中 → mutual_gaze
+```
+
+### 7.2 持續性確認
+
+```
+mutual_gaze 需要持續 N 幀以上才算有意義:
+  - 基底: 8Hz, 持續 ≥ 3 幀 (~0.375s) → 建立 edge
+  - 細化: 發現 candidate 後，回頭用 30Hz 確認
+  - confidence = 連續幀數 / 總可能幀數
+```
+
+### 7.3 Edge 屬性
+
+```json
+{
+  "edge_type": "MUTUAL_GAZE",
+  "source": "trace_5",
+  "target": "trace_12",
+  "properties": {
+    "first_frame": 150,
+    "last_frame": 280,
+    "duration_frames": 130,
+    "duration_seconds": 4.3,
+    "confidence": 0.85,
+    "context": "during_conversation"
+  }
+}
+```
+
+---
+
+## 8. 實作計畫
+
+### Phase 0: 8Hz 採樣框架 (~100 行)
+
+| 檔案 | 修改 |
+|------|------|
+| `worker/processor.rs` | 計算 8Hz sample frames + refine 框架 |
+| `scripts/face_processor.py` | 接受 `--frames` 參數 |
+| `scripts/appearance_processor.py` | bbox 來源改 yolo，接受 `--frames` |
+| `scripts/mediapipe_holistic_processor.py` | 接受 `--frames` |
+
+### Phase 1: Gaze + Mutual Gaze (~250 行)
+
+| 模組 | 行數 |
+|------|------|
+| Gaze trace nodes | 150 |
+| Mutual Gaze edges | 100 |
+
+### Phase 2: Lip + Sentence + Speaker (~260 行)
+
+| 模組 | 行數 |
+|------|------|
+| Lip trace nodes | 120 |
+| Sentence nodes | 80 |
+| Speaker 強化 | 60 |
+
+### Phase 3: Appearance + Accessories (~280 行)
+
+| 模組 | 行數 |
+|------|------|
+| Appearance traces (HSV + trace_id 綁定) | 120 |
+| Accessories (CLIP detection) | 80 |
+| Skin tone + lighting | 80 |
+
+### Phase 4: TKG 整合 (~110 行)
+
+| 模組 | 行數 |
+|------|------|
+| `build_tkg()` 統一呼叫 | 40 |
+| Edge builders 更新 | 70 |
+
+### 總計: ~1,000 行
+
+---
+
+## 9. 依賴關係圖
+
+```
+YOLO (全域) ──────────────────────────────────────────┐
+    │                                                  │
+    ▼                                                  │
+Face (8Hz) ──► trace_id ──┬──► Appearance (IoU 綁定)    │
+    │                     │    ├──► HSV 色彩            │
+    │                     │    ├──► Accessories (CLIP)  │
+    │                     │    └──► Skin tone + light   │
+    │                     │                             │
+    │                     ├──► Gaze ──► Mutual Gaze ────┤
+    │                     │        ──► Looking at YOLO  │
+    │                     │                             │
+    │                     └──► Lip ──► LIP_SYNC ◄──────┤
+    │                                                  │
+ASRX ──► Speaker ──► SPEAKS_AS ──► face_trace          │
+    │                      │                           │
+    └──► Text (Rule 1) ────┴──► SPEAKS_BY              │
+                             ├──► SPOKEN_WHILE         │
+                             └──► LIP_SYNC ────────────┘
+
+所有 trace ──────────────────────────► TKG
+```
+
+---
+
+## Appendix A: 配件完整清單 (49 種)
+
+| 部位 | 配件 | 偵測方式 |
+|------|------|----------|
+| 頭部 (12) | hat, hairstyle, hair_accessory, earrings, nose_ring, lip_ring, face_tattoo, eyebrow_tattoo, glasses, mask, beard, headscarf | HSV 色塊 + CLIP |
+| 脖子 (5) | tie, scarf, shawl, necklace, neck_tattoo | HSV 色塊 + CLIP |
+| 手部/手臂 (16) | ring, bracelet, watch, gloves, phone, pen, laptop, book, cup, remote, tool, knife, gun, baseball_bat, gesture, arm_pose | HSV 色塊 + CLIP + MP |
+| 足部/載具 (8) | shoes, socks, barefoot, skateboard, scooter, bicycle, motorbike, roller_skates | HSV 色塊 + CLIP |
+| 攜帶/環境 (5) | backpack, handbag, luggage, chair, diningtable | HSV 色塊 + CLIP |
+| 色彩 (3) | upper_body_hsv, lower_body_hsv, skin_tone | HSV |
+
+> **註**: YOLO 不可靠，不再作為主要偵測方式。大部分配件改用 HSV 色塊分析，CLIP 僅用於色塊不易區分的項目 (如穿刺、紋身、髮型等)。
+
+## Appendix B: DB Schema 變更
+
+```sql
+-- appearance_detections (新增)
+CREATE TABLE appearance_detections (
+    id BIGSERIAL PRIMARY KEY,
+    file_uuid VARCHAR NOT NULL,
+    frame_number BIGINT NOT NULL,
+    person_id INTEGER NOT NULL,
+    x INTEGER, y INTEGER, width INTEGER, height INTEGER,
+    trace_id INTEGER,
+    confidence REAL,
+    hsv_histogram JSONB,
+    dominant_colors JSONB,
+    upper_body_hsv JSONB,
+    lower_body_hsv JSONB,
+    accessories JSONB,
+    skin_tone JSONB,
+    lighting JSONB,
+    created_at TIMESTAMPTZ DEFAULT NOW()
+);
+
+-- tkg_nodes (擴充 node_type)
+-- 新增: appearance_trace, gaze_trace, lip_trace, sentence, accessory
+
+-- tkg_edges (擴充 edge_type)
+-- 新增: HAS_APPEARANCE, HAS_GAZE, HAS_LIP, WEARS, LOOKING_AT,
+--       LOOKING_AT_PERSON, MUTUAL_GAZE, LIP_SYNC, SPEAKS_BY,
+--       SAME_SKIN_TONE, HAS_NECK_ACCESSORY, HAS_HEAD_ACCESSORY, HOLDS
+```
+
+---
+
+## Version History
+
+| Version | Date | Author | Description |
+|---------|------|--------|-------------|
+| 1.0.0 | 2026-06-19 | OpenCode | Initial design: 8Hz sampling, 7 traces (face/appearance/gaze/lip/speaker/text/skin_tone), 49 accessories, skin tone + lighting, mutual gaze, lip-sync |
+| 1.1.0 | 2026-06-19 | OpenCode | Added speaker_trace, text_trace, skin_tone_trace as important traces; enhanced lip_trace with speech_correlation; updated node/edge tables |
+| **1.2.0** | **2026-06-19** | **OpenCode** | **Implementation complete: build_tkg() integrates all node/edge builders. 9 node types, 14 edge types. ~1500 lines added to tkg.rs** |
--- a/docs_v1.0/DESIGN/TKG_PHASE2_6_EDGES_MIGRATION.md
+++ b/docs_v1.0/DESIGN/TKG_PHASE2_6_EDGES_MIGRATION.md
@@ -0,0 +1,257 @@
+---
+title: TKG Phase 2.6 Edges Migration Plan
+version: 1.0
+date: 2026-06-21
+author: OpenCode
+status: Draft
+---
+
+## Phase 2.6 Overview
+
+迁移 TKG edges 从 PostgreSQL face_detections 到 Qdrant payload。
+
+## Current Implementation Analysis
+
+### 2.6.1: co_occurrence_edges (CO_OCCURS_WITH)
+
+**Current Code** (`tkg.rs:932-1039`):
+```rust
+let face_rows = sqlx::query_as::<_, FaceDetectionRow>(&format!(
+    "SELECT trace_id::bigint, frame_number::bigint, x::float8, y::float8, width::float8, height::float8
+     FROM {} WHERE file_uuid = $1 AND trace_id IS NOT NULL
+     ORDER BY frame_number",
+    face_table
+))
+.bind(file_uuid)
+.fetch_all(pool)
+.await?;
+```
+
+**Dependencies**:
+- `face_detections.trace_id`
+- `face_detections.frame_number`
+- `face_detections.x, y, width, height`
+
+**Migration Strategy**:
+```rust
+// 从 Qdrant payload 获取
+let embeddings = face_db.get_all_embeddings_for_file(file_uuid).await?;
+
+// 按 frame 分组
+let mut frame_map: HashMap<i64, Vec<(i64, f64, f64, f64, f64)>> = HashMap::new();
+for emb in embeddings {
+    let frame = emb.payload.frame_number;
+    let trace_id = emb.payload.trace_id;
+    frame_map.entry(frame).or_default().push((
+        trace_id,
+        emb.payload.bbox_x,
+        emb.payload.bbox_y,
+        emb.payload.bbox_width,
+        emb.payload.bbox_height,
+    ));
+}
+```
+
+### 2.6.2: face_face_edges (MUTUAL_GAZE)
+
+**Current Code** (`tkg.rs:1171-1320`):
+```rust
+let rows: Vec<(i64, i64, i64)> = sqlx::query_as(&format!(
+    "SELECT a.trace_id::bigint AS tid_a, b.trace_id::bigint AS tid_b, a.frame_number::bigint
+     FROM {} a
+     JOIN {} b ON a.file_uuid = b.file_uuid AND a.frame_number = b.frame_number AND a.trace_id < b.trace_id
+     WHERE a.file_uuid = $1 AND a.trace_id IS NOT NULL AND b.trace_id IS NOT NULL",
+    face_table, face_table
+))
+.bind(file_uuid)
+.fetch_all(pool)
+.await?;
+```
+
+**Dependencies**:
+- `face_detections` self-join for co-occurrence
+- `face_detections.trace_id`
+- `face_detections.frame_number`
+
+**Migration Strategy**:
+```rust
+// 从 Qdrant 获取所有 embeddings
+let embeddings = face_db.get_all_embeddings_for_file(file_uuid).await?;
+
+// 按 frame 分组
+let mut frame_faces: HashMap<i64, Vec<FaceEmbeddingPayload>> = HashMap::new();
+for emb in embeddings {
+    frame_faces.entry(emb.payload.frame_number).or_default().push(emb.payload);
+}
+
+// 找同 frame 的 face pairs
+let mut pairs: Vec<(i64, i64, i64)> = Vec::new();
+for (frame, faces) in frame_faces.iter() {
+    for i in 0..faces.len() {
+        for j in (i+1)..faces.len() {
+            let tid_a = faces[i].trace_id.min(faces[j].trace_id);
+            let tid_b = faces[i].trace_id.max(faces[j].trace_id);
+            pairs.push((tid_a, tid_b, *frame));
+        }
+    }
+}
+```
+
+### 2.6.3: speaker_face_edges (SPEAKS_AS)
+
+**Current Code** (`tkg.rs:1045-1169`):
+```rust
+let traces = sqlx::query_as::<_, (i64, i64, i64)>(&format!(
+    "SELECT trace_id::bigint, MIN(frame_number)::bigint as start_f, MAX(frame_number)::bigint as end_f
+     FROM {} WHERE file_uuid = $1 AND trace_id IS NOT NULL
+     GROUP BY trace_id",
+    face_table
+))
+.bind(file_uuid)
+.fetch_all(pool)
+.await?;
+```
+
+**Dependencies**:
+- `face_detections.trace_id`
+- `face_detections.frame_number` (MIN/MAX)
+
+**Migration Strategy**:
+```rust
+// 从 Qdrant 获取所有 embeddings
+let embeddings = face_db.get_all_embeddings_for_file(file_uuid).await?;
+
+// 计算每个 trace_id 的 frame range
+let mut trace_ranges: HashMap<i64, (i64, i64)> = HashMap::new();
+for emb in embeddings {
+    let trace_id = emb.payload.trace_id;
+    let frame = emb.payload.frame_number;
+    let entry = trace_ranges.entry(trace_id).or_insert((frame, frame));
+    entry.0 = entry.0.min(frame);
+    entry.1 = entry.1.max(frame);
+}
+```
+
+### 2.6.4: mutual_gaze_edges (MUTUAL_GAZE)
+
+**Already in face_face_edges**: 
+- face_face_edges 包含 mutual_gaze 检测逻辑
+- 不需要单独迁移
+
+### 2.6.5: lip_sync_edges (LIP_SYNC)
+
+**Already migrated in Phase 2.5.2**:
+- `build_lip_trace_nodes_from_qdrant()` 已完成
+- lip_sync_edges 已使用 Qdrant payload
+
+## Migration Priority
+
+| Priority | Edge Type | Complexity | Impact |
+|----------|-----------|-------------|--------|
+| P1 | co_occurrence_edges | Low | High (关系图) |
+| P1 | face_face_edges | Medium | High (face 关系) |
+| P2 | speaker_face_edges | Low | Medium (speaker 关系) |
+| N/A | mutual_gaze_edges | - | 已包含在 face_face_edges |
+| N/A | lip_sync_edges | - | 已迁移 Phase 2.5.2 |
+
+## Performance Estimate
+
+| Edge Type | Current (PG) | After Migration | Speedup |
+|-----------|--------------|-----------------|---------|
+| co_occurrence_edges | ~120ms | ~30ms | 4x |
+| face_face_edges | ~90ms | ~25ms | 3.6x |
+| speaker_face_edges | ~60ms | ~20ms | 3x |
+| **Total** | **~270ms** | **~75ms** | **3.6x** |
+
+## Implementation Steps
+
+### Step 1: Add helper functions in `face_embedding_db.rs`
+
+```rust
+// Get all embeddings grouped by frame
+pub async fn get_embeddings_by_frame(&self, file_uuid: &str) -> Result<HashMap<i64, Vec<FaceEmbeddingPayload>>>;
+
+// Get trace_id frame ranges
+pub async fn get_trace_frame_ranges(&self, file_uuid: &str) -> Result<HashMap<i64, (i64, i64)>>;
+```
+
+### Step 2: Create migration functions in `tkg.rs`
+
+```rust
+// Phase 2.6.1
+async fn build_co_occurrence_edges_from_qdrant(
+    pool: &PgPool,
+    file_uuid: &str,
+    output_dir: &str,
+    face_db: &FaceEmbeddingDb,
+) -> Result<usize>;
+
+// Phase 2.6.2
+async fn build_face_face_edges_from_qdrant(
+    pool: &PgPool,
+    file_uuid: &str,
+    pose_data: &[FacePose],
+    face_db: &FaceEmbeddingDb,
+) -> Result<usize>;
+
+// Phase 2.6.3
+async fn build_speaker_face_edges_from_qdrant(
+    pool: &PgPool,
+    file_uuid: &str,
+    output_dir: &str,
+    face_db: &FaceEmbeddingDb,
+) -> Result<usize>;
+```
+
+### Step 3: Replace in `build_tkg.rs`
+
+```rust
+// Old
+let e_co = build_co_occurrence_edges(pool, file_uuid, output_dir).await?;
+
+// New
+let e_co = build_co_occurrence_edges_from_qdrant(pool, file_uuid, output_dir, face_db).await?;
+```
+
+### Step 4: Add feature flag (optional)
+
+```rust
+#[cfg(feature = "qdrant-edges")]
+let e_co = build_co_occurrence_edges_from_qdrant(...).await?;
+#[cfg(not(feature = "qdrant-edges"))]
+let e_co = build_co_occurrence_edges(...).await?;
+```
+
+## Verification Plan
+
+1. Run TKG rebuild on test file
+2. Compare edge counts (PG vs Qdrant)
+3. Verify edge properties match
+4. Performance benchmark
+5. Integration test with Rule2
+
+## Risks & Mitigations
+
+| Risk | Mitigation |
+|------|------------|
+| Qdrant collection empty | Fallback to PostgreSQL |
+| Performance regression | Benchmark before merge |
+| Edge count mismatch | Validate with test suite |
+| Data inconsistency | Add reconciliation job |
+
+## Success Criteria
+
+- [ ] All edges use Qdrant payload (no face_detections queries)
+- [ ] Edge counts match PostgreSQL version
+- [ ] Performance improvement >= 2x
+- [ ] Rule2/Rule3 work correctly
+- [ ] No regressions in existing tests
+
+## Timeline
+
+- Phase 2.6.1 (co_occurrence): 1 day
+- Phase 2.6.2 (face_face): 1 day
+- Phase 2.6.3 (speaker_face): 0.5 day
+- Testing & verification: 0.5 day
+- **Total: 3 days**
+
--- a/docs_v1.0/DESIGN/VideoPlayback_Architecture_V1.0.md
+++ b/docs_v1.0/DESIGN/VideoPlayback_Architecture_V1.0.md
@@ -0,0 +1,374 @@
+---
+document_type: "design"
+service: "MOMENTRY_CORE"
+title: "Video Playback Architecture — Local Direct Serve & Remote Streaming"
+version: "V1.0"
+date: "2026-06-07"
+author: "OpenCode"
+status: "draft"
+tags:
+  - "video-playback"
+  - "caddy"
+  - "streaming"
+  - "thumbnail"
+  - "wordpress-frontend"
+related_documents:
+  - "DESIGN/FILE_LIFECYCLE_V1.0.md"
+---
+
+# Video Playback Architecture — Local Direct Serve & Remote Streaming
+
+| Item | Value |
+|------|-------|
+| Scope | Video file playback & thumbnail serving for WordPress frontend (m5wp) |
+| Status | Draft |
+| Applies to | Search results (`serve_url`), Caddy routing, Momentry media-proxy endpoint |
+| Key concept | Local files served directly by Caddy (zero backend overhead); remote files fall back to Momentry streaming; thumbnails proxied through Caddy to Momentry |
+
+---
+
+## Problem Statement
+
+The WordPress frontend (`m5wp.momentry.ddns.net`) displays search results with video thumbnails and a player. Currently:
+
+- **Thumbnails**: WordPress Code Snippet 61 (`momentry/v1/media` REST route) is inactive → all requests return `rest_no_route` 404
+- **Video playback**: Frontend has no way to construct a playable URL from search results; no `serve_url` exists in the search response
+- **WordPress constraint**: WordPress files and database tables must not be modified (marcom team territory)
+
+The solution must work for two deployment scenarios:
+- **Local**: Video file resides on the same server as Momentry → serve via static HTTP (zero processing overhead)
+- **Remote**: Video file resides on an external storage (NAS, S3, etc.) → fall back to Momentry's ffmpeg-based streaming
+
+---
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│  Browser (search-chat @ m5wp.momentry.ddns.net)                 │
+│                                                                  │
+│  ┌──────────┐   ┌──────────────────┐   ┌─────────────────────┐  │
+│  │  Search   │   │  Thumbnail img   │   │  <video src="...">  │  │
+│  └────┬─────┘   └───────┬──────────┘   └──────────┬──────────┘  │
+│       │                 │                          │             │
+└───────┼─────────────────┼──────────────────────────┼─────────────┘
+        │                 │                          │
+        ▼                 ▼                          ▼
+┌───────────────────────────────────────────────────────────────┐
+│                     Caddy (m5wp block)                         │
+│                                                               │
+│  ┌─────────────────────────────────────────────────────────┐  │
+│  │  handle /wp-json/momentry/v1/media {                    │  │
+│  │    rewrite * /api/v1/media-proxy{?}                     │  │
+│  │    reverse_proxy localhost:3002  (+ X-API-Key)          │  │
+│  │  }                                                      │  │
+│  │                                                         │  │
+│  │  handle_path /files/* {                                 │  │
+│  │    root * /Users/accusys/momentry/var/sftpgo/data       │  │
+│  │    file_server                                          │  │
+│  │  }                                                      │  │
+│  │                                                         │  │
+│  │  reverse_proxy localhost:9002  ← WordPress (PHP-FPM)    │  │
+│  └─────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────┘
+        │                 │                          │
+        │                 │                          ▼
+        │                 │              ┌───────────────────────┐
+        │                 │              │  /files/*             │
+        │                 │              │  Local file on disk   │
+        │                 │              │  (zero backend cost)  │
+        │                 │              └───────────────────────┘
+        │                 ▼
+        │     ┌─────────────────────────────────────────┐
+        │     │  Momentry Core (localhost:3002)          │
+        │     │                                         │
+        ▼     ▼  /api/v1/media-proxy                    │
+        ┌─────────────────────────┐                     │
+        │  type=thumbnail?frame=N │──→ face_thumbnail   │
+        │  type=video&start=…    │──→ stream_video      │
+        └─────────────────────────┘                     │
+        ┌─────────────────────────┐                     │
+        │  POST /api/v1/search/*  │──→ smart_search     │
+        │  response: serve_url    │                     │
+        └─────────────────────────┘                     │
+        └───────────────────────────────────────────────┘
+```
+
+---
+
+## Data Flow
+
+### 1. Search → serve_url
+
+```
+Frontend                     Caddy                  Momentry Backend
+   │                           │                        │
+   │ POST /wp-json/.../search  │                        │
+   │ ─────────────────────────→│                        │
+   │                           │ POST /api/v1/search/*  │
+   │                           │ ──────────────────────→│
+   │                           │                        │
+   │                           │ ←─ SearchResult[] ─────│
+   │                           │    (with serve_url +   │
+   │                           │     file_name added)   │
+   │ ←─ JSON response ────────│                        │
+   │    results[0].serve_url = │                        │
+   │    "https://m5wp.momentry.│                        │
+   │     ddns.net/files/demo/  │                        │
+   │     Charade_YouTube_24fps │                        │
+   │     .mp4"                │                        │
+```
+
+#### serve_url Construction
+
+The backend computes `serve_url` from the video's `file_path` (stored in `videos` table) and two config values:
+
+| Config | Env Var | Default |
+|--------|---------|---------|
+| `STORAGE_ROOT` | `MOMENTRY_STORAGE_ROOT` | `/Users/accusys/momentry/var/sftpgo/data` |
+| `SERVE_BASE_URL` | `MOMENTRY_SERVE_BASE_URL` | `https://m5wp.momentry.ddns.net/files` |
+
+Algorithm:
+
+```
+file_path:   /Users/accusys/momentry/var/sftpgo/data/demo/Charade_YouTube_24fps.mp4
+STORAGE_ROOT /Users/accusys/momentry/var/sftpgo/data
+            ─────────────────────────────────────────────
+relative:   demo/Charade_YouTube_24fps.mp4
+                    ↓ join with SERVE_BASE_URL
+serve_url:  https://m5wp.momentry.ddns.net/files/demo/Charade_YouTube_24fps.mp4
+```
+
+#### SearchResult Additions
+
+```rust
+pub struct SearchResult {
+    // ... existing fields
+    pub file_name: Option<String>,  // e.g. "Charade_YouTube_24fps.mp4"
+    pub serve_url: Option<String>,  // e.g. "https://m5wp.momentry.ddns.net/files/..."
+}
+```
+
+### 2. Video Playback (Local)
+
+```
+Frontend <video>              Caddy (file_server)
+   │                           │
+   │ GET /files/demo/Charade…  │
+   │ ─────────────────────────→│
+   │                           │  root = /Users/accusys/momentry/var/sftpgo/data
+   │                           │  serves /demo/Charade_YouTube_24fps.mp4
+   │                           │
+   │ ←─ 200 video/mp4 ────────│
+   │    (range-request         │
+   │     supported natively)   │
+```
+
+**Characteristics**:
+- Zero CPU cost — pure I/O, no ffmpeg decode
+- HTTP range requests work natively (Caddy `file_server` supports `Accept-Ranges: bytes`)
+- HTML5 `<video>` can seek arbitrarily, play/pause normally
+- Supports MP4 (H.264), WebM, and any browser-playable format
+
+### 3. Video Playback (Remote — Fallback)
+
+```
+Frontend                  Caddy                     Momentry Backend
+   │                       │                            │
+   │ GET /wp-json/.../    │                            │
+   │ media?uuid=X&        │                            │
+   │ type=video&          │                            │
+   │ start_time=S&        │                            │
+   │ end_time=E           │                            │
+   │ ────────────────────→│                            │
+   │                       │ rewrite to                │
+   │                       │ /api/v1/media-proxy{?}    │
+   │                       │                            │
+   │                       │ GET /api/v1/media-proxy?   │
+   │                       │ uuid=X&type=video&...     │
+   │                       │ ─────────────────────────→│
+   │                       │                            │
+   │                       │    stream_video:           │
+   │                       │    ffmpeg -ss S -i file    │
+   │                       │    -t (E-S) -c copy        │
+   │                       │                            │
+   │                       │ ←─ 200 video/mp4 ──────────│
+   │                       │    (chunk data)            │
+   │ ←─ HTTP streaming ───│                            │
+```
+
+### 4. Thumbnail
+
+```
+Frontend <img>              Caddy                     Momentry Backend
+   │                          │                            │
+   │ GET /wp-json/.../       │                            │
+   │ media?uuid=X&           │                            │
+   │ type=thumbnail&         │                            │
+   │ frame=N                 │                            │
+   │ ──────────────────────→│                            │
+   │                          │ rewrite to                │
+   │                          │ /api/v1/media-proxy{?}    │
+   │                          │                            │
+   │                          │ /api/v1/media-proxy?      │
+   │                          │ uuid=X&type=thumbnail&    │
+   │                          │ frame=N                   │
+   │                          │ ─────────────────────────→│
+   │                          │                            │
+   │                          │    face_thumbnail:         │
+   │                          │    look up trace_id path   │
+   │                          │    → cached face crop      │
+   │                          │    → validated JPEG        │
+   │                          │                            │
+   │                          │ ←─ 200 image/jpeg ────────│
+   │ ←─ JPEG ───────────────│                            │
+```
+
+**Thumbnail flow detail**:
+1. Caddy intercepts `/wp-json/momentry/v1/media` → rewrites to `/api/v1/media-proxy` keeping query params intact (`{?}`)
+2. Momentry `media_proxy_handler` reads `uuid`, `type=thumbnail`, `frame=N` from query
+3. Dispatches to the internal `face_thumbnail` handler
+4. Returns cached face crop JPEG (or fallback frame extraction result)
+
+---
+
+## Caddyfile Configuration
+
+Addition to the existing `m5wp` block:
+
+```caddy
+m5wp.momentry.ddns.net {
+    tls internal
+
+    # ── Local video files: direct serve, zero backend overhead ──
+    handle_path /files/* {
+        root * /Users/accusys/momentry/var/sftpgo/data
+        file_server
+    }
+
+    # ── Media proxy: thumbnails + remote streaming ──
+    # Bypasses inactive WordPress Code Snippet 61
+    handle /wp-json/momentry/v1/media {
+        rewrite * /api/v1/media-proxy{?}
+        reverse_proxy localhost:3002 {
+            header_up X-API-Key muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69
+        }
+    }
+
+    # ── Existing WordPress (PHP-FPM) ──
+    reverse_proxy localhost:9002
+    import common_log m5wp_access
+}
+```
+
+**Key syntax**:
+- `handle_path /files/*` — strips `/files` prefix, serves from `root` directory
+- `{?}` — Caddy placeholder that preserves the original query string in the rewrite
+- `handle /wp-json/momentry/v1/media` — matches exact path (query params are irrelevant for matching)
+
+---
+
+## Momentry API Changes
+
+### New Endpoint: `GET /api/v1/media-proxy`
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `uuid` | string | yes | file_uuid (accepts `file_uuid` key as alias) |
+| `type` | string | yes | `thumbnail`, `video` (future: `image`, `file`) |
+| `frame` | int | for thumbnail | Frame number to extract |
+| `trace_id` | int | no | Face trace ID for cached crop |
+| `start_time` | float | for video | Start time in seconds |
+| `end_time` | float | for video | End time in seconds |
+| `mode` | string | no | `normal` or `debug` (video) |
+| `audio` | string | no | `on` or `off` (video) |
+
+**Dispatch logic**:
+- `type=thumbnail` → call `face_thumbnail(State, Path(uuid), Query(frame, trace_id, ...))`
+- `type=video` → call `stream_video(State, Path(uuid), Query(params), request)`
+
+The endpoint reuses existing handler implementations via direct axum extractor composition, avoiding code duplication.
+
+### Modified Endpoint: `POST /api/v1/search/smart`
+
+**Response changes**: `SearchResult` gains two optional fields:
+
+```json
+{
+  "results": [
+    {
+      "file_uuid": "a6fb22eebefaef17e62af874997c5944",
+      "file_name": "Charade_YouTube_24fps.mp4",
+      "serve_url": "https://m5wp.momentry.ddns.net/files/demo/Charade_YouTube_24fps.mp4",
+      "start_frame": 88649,
+      "start_time": 3697.08,
+      "end_time": 3707.08,
+      "summary": "...",
+      "similarity": 0.85
+    }
+  ]
+}
+```
+
+The `serve_url` is computed after enrichment via a batch query to the `videos` table (`file_uuid → file_path`), then applying the path translation:
+1. Strip `STORAGE_ROOT` prefix from `file_path`
+2. Prepend `SERVE_BASE_URL`
+
+---
+
+## Environment Variables
+
+Add to `.env` (production) and `.env.development`:
+
+```bash
+# Storage root: where video files are stored on disk
+# Used to compute serve_url from file_path
+MOMENTRY_STORAGE_ROOT=/Users/accusys/momentry/var/sftpgo/data
+
+# Public base URL for direct file access via Caddy file_server
+MOMENTRY_SERVE_BASE_URL=https://m5wp.momentry.ddns.net/files
+```
+
+---
+
+## Trade-offs & Rationale
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **Caddy file_server** (local) | Zero CPU, native range requests, no code change to Momentry for serving | Requires storage root config; files must be accessible from Caddy |
+| **Momentry stream_video** (remote) | Works with any storage backend (S3, NAS, NFS) | ffmpeg decode per request, higher latency, CPU-bound |
+| **WordPress PHP proxy** (rejected) | No infra change | Fragile, snippet inactive, violates marcom territory |
+| **Direct backend streaming only** (rejected) | Simplest implementation | Unnecessary CPU for local files; 100% backend dependency |
+
+### Fallback Logic (Frontend)
+
+The frontend JavaScript should handle playback as follows:
+
+```javascript
+if (result.serve_url) {
+    // Local file — direct Caddy file_server
+    video.src = result.serve_url;
+} else {
+    // Remote — use streaming endpoint
+    video.src = `/wp-json/momentry/v1/media?uuid=${result.file_uuid}&type=video&start_time=${result.start_time}&end_time=${result.end_time}`;
+}
+```
+
+This gives the frontend flexibility to pick the optimal playback path based on available data.
+
+---
+
+## Future Considerations
+
+- **S3/NAS remote files**: When video files are stored externally, the `file_path` won't match `STORAGE_ROOT`. The backend can detect this by checking `file_path.starts_with(STORAGE_ROOT)`. If it doesn't match, omit `serve_url` and rely on the streaming fallback.
+- **Pre-signed URLs**: For S3 storage, `serve_url` could be replaced with a pre-signed URL or cloud CDN URL.
+- **Caching**: `file_server` responses are cacheable; consider adding `Cache-Control` headers for thumbnails.
+- **Authentication**: Direct file access currently has no auth. If needed, Caddy can inject auth via `forward_auth` or JWT validation.
+
+---
+
+## Version History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| V1.0 | 2026-06-07 | OpenCode | Initial design — local direct serve + remote streaming + thumbnail proxy architecture |