feat: Rule2 TKG relationship chunks + Phase0-1 Qdrant integration

Phase 0: TKG builder populate face_detections from face.json - Fix face.json parser for pose_angle format - Call store_traced_faces.py to set trace_id - Skip if trace_id already populated Phase 1: Qdrant face embeddings integration - Add FaceEmbeddingDb module (src/core/db/face_embedding_db.rs) - Create dev_face_embeddings collection (dim=512) - Store 1122 face embeddings with pose metadata - API: init_collection, batch_upsert, search_similar Rule2: TKG edges → relationship chunks - Design: RULE2_TKG_RELATIONSHIP_V1.0.md - Implementation: rule2_ingest.rs - ChunkType::Relationship added - Edge types: SPEAKS_AS, MUTUAL_GAZE, CO_OCCURS_WITH, HAS_APPEARANCE, WEARS - Auto-trigger on TKG rebuild API: - POST /api/v1/file/:file_uuid/rule2 (vectorization) - POST /api/v1/file/:file_uuid/tkg/rebuild (auto Rule2) Test: 75 relationship chunks created + vectorized
2026-06-21 00:22:41 +08:00
parent 17e4e15860
commit 3ad6f8740a
10 changed files with 3811 additions and 30 deletions
--- a/docs_v1.0/DESIGN/RULE2_TKG_RELATIONSHIP_V1.0.md
+++ b/docs_v1.0/DESIGN/RULE2_TKG_RELATIONSHIP_V1.0.md
@@ -0,0 +1,235 @@
+---
+title: Rule 2 TKG Relationship Chunks V1.0
+version: 1.0
+date: 2026-06-20
+author: OpenCode
+status: approved
+---
+
+# Rule 2 TKG Relationship Chunks V1.0
+
+| Scope | Status | Applicable to | Binary |
+|-------|--------|---------------|--------|
+| TKG relationship vectorization | Approved | `momentry_playground`, `momentry` | Both |
+
+## Overview
+
+Rule 2 creates **relationship chunks** by converting TKG edges into searchable, vectorized units. Each TKG edge becomes a chunk with LLM-generated natural language description, enabling semantic search for relationship queries.
+
+**Key Change:** Original Rule 2 (YOLO frame objects) is deprecated due to COCO classes being too generic. New Rule 2 focuses on TKG relationships.
+
+## Data Flow
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ UPSTREAM: TKG Builder                                     │
+│                                                         │
+│   tkg_nodes: face_trace, speaker, object, etc.          │
+│   tkg_edges: speaker_face, mutual_gaze, co_occurs, etc. │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+                          │
+                          ▼ after TKG complete
+                          │
+┌─────────────────────────────────────────────────────────┐
+│ RULE 2 PROCESSING                                        │
+│                                                         │
+│ Triggered by:                                            │
+│   1. Worker auto: job_worker.rs after TKG completes      │
+│   2. HTTP API: POST /api/v1/file/:file_uuid/rule2        │
+│                                                         │
+│   ingest_rule2(file_uuid):                               │
+│     ├─ Query tkg_edges by type (priority order)          │
+│     ├─ For each edge:                                    │
+│     │   ├─ Resolve source_node / target_node             │
+│     │   ├─ Resolve identity names (if face_trace)        │
+│     │   ├─ Build context JSON                            │
+│     │   ├─ call_llm(context) → text_content              │
+│     │   └─ INSERT INTO chunk (chunk_type='relationship') │
+│     │                                                    │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+                          │
+                          ▼
+┌─────────────────────────────────────────────────────────┐
+│ DOWNSTREAM: vectorize_chunks()                            │
+│                                                         │
+│   SELECT ... WHERE chunk_type='relationship'              │
+│     AND embedding IS NULL                                │
+│                                                         │
+│   1. embedder.embed_document(text_content) → vector      │
+│   2. db.store_vector() → PG chunk.embedding              │
+│   3. qdrant.upsert_vector() → momentry_rule2 collection  │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+```
+
+## Edge Type Priority
+
+| Priority | Edge Type | Description | Example Output |
+|----------|-----------|-------------|----------------|
+| P0 | `speaker_face` | Speaker ↔ Face trace | "SPEAKER_01 以 Cary Grant 的身份說話，從 frame 100 到 350" |
+| P0 | `mutual_gaze` | Two face traces looking at each other | "Cary Grant 和 Grace Kelly 互相看對方 24 幀，起始於 frame 450" |
+| P1 | `face_face` | Two face traces co-occurring | "Cary Grant 和 Grace Kelly 同框 180 幀" |
+| P1 | `co_occurs` | Object ↔ Object co-occurrence | "物件 'car' 和 'person' 在同一畫面出現 60 幀" |
+| P2 | `has_appearance` | Face trace ↔ Appearance trace | "Cary Grant 穿著藍色上衣，戴眼鏡" |
+| P2 | `wears` | Face trace ↔ Accessory | "Cary Grant 戴帽子，信心值 0.82" |
+
+## Chunk Data Structure
+
+### Content JSON (`content` column)
+
+```json
+{
+    "edge_type": "speaker_face",
+    "edge_id": 123,
+    "source_node": {
+        "id": 45,
+        "node_type": "speaker",
+        "external_id": "SPEAKER_01",
+        "label": "SPEAKER_01"
+    },
+    "target_node": {
+        "id": 67,
+        "node_type": "face_trace",
+        "external_id": "trace_5",
+        "label": "Face Trace 5",
+        "identity_name": "Cary Grant"
+    },
+    "properties": {
+        "first_frame": 100,
+        "last_frame": 350,
+        "frame_count": 250,
+        "lip_sync_confidence": 0.85
+    }
+}
+```
+
+### Text Content (`text_content` column)
+
+LLM-generated natural language description in Traditional Chinese:
+
+```
+"SPEAKER_01 以 Cary Grant 的身份說話，從 frame 100 到 frame 350，唇語同步信心值 0.85"
+```
+
+### Metadata JSON (`metadata` column)
+
+```json
+{
+    "source_type": "speaker",
+    "target_type": "face_trace",
+    "has_identity": true,
+    "identity_source": "tmdb"
+}
+```
+
+## LLM Prompt Template
+
+```text
+你是影片關係描述專家。請用繁體中文描述以下人物/物件關係：
+
+關係類型: {edge_type}
+來源節點: {source_node.node_type} - {source_node.external_id}
+  身份名稱: {identity_name} (如果有)
+目標節點: {target_node.node_type} - {target_node.external_id}
+  身份名稱: {identity_name} (如果有)
+關係屬性:
+  - 起始幀: {first_frame}
+  - 結束幀: {last_frame}
+  - 幀數: {frame_count}
+  - 信心值: {confidence}
+
+要求：
+1. 使用自然語言，不要輸出 JSON
+2. 包含時間範圍（幀號）
+3. 包含人物名字（如有 identity）
+4. 簡潔，20-50 字
+5. 用繁體中文
+
+範例輸出：
+"SPEAKER_01 以 Cary Grant 的身份說話，從 frame 100 到 frame 350"
+"Cary Grant 和 Grace Kelly 互相看對方 24 幀，起始於 frame 450"
+```
+
+## Edge → Chunk Conversion Rules
+
+### speaker_face Edge
+
+```rust
+// Source: speaker node
+// Target: face_trace node
+// Properties: first_frame, last_frame, lip_sync_confidence
+
+let text_content = call_llm(format!(
+    "SPEAKER {} 對應 face trace {}，身份 {}，frame {}-{}",
+    speaker_id, trace_id, identity_name, first_frame, last_frame
+));
+```
+
+### mutual_gaze Edge
+
+```rust
+// Source: face_trace node A
+// Target: face_trace node B  
+// Properties: first_frame, gaze_frame_count, yaw_a_avg, yaw_b_avg
+
+let text_content = call_llm(format!(
+    "人物 {} 和 {} 互相看對方 {} 幀，起始於 frame {}",
+    identity_a, identity_b, gaze_frame_count, first_frame
+));
+```
+
+### has_appearance Edge
+
+```rust
+// Source: face_trace node
+// Target: appearance_trace node
+// Properties: clothing colors, accessories
+
+let text_content = call_llm(format!(
+    "人物 {} 穿著 {} 上衣，{} 下衣",
+    identity_name, upper_color, lower_color
+));
+```
+
+## Search Contribution
+
+| Search Path | Mechanism | Rule 2 Contribution |
+|-------------|-----------|-------------------|
+| **Semantic search** (Qdrant) | `chunk_type='relationship'` → embedding query | LLM descriptions enable natural language queries |
+| **Keyword search** (BM25 ILIKE) | `text_content ILIKE '%互相看%'` | Relationship keywords searchable |
+| **Agent tkg_query** | Direct edge queries | Rule 2 complements with vectorized search |
+| **identity_text** | Reverse lookup | "誰戴眼鏡" → has_appearance chunks |
+
+## Trigger Points
+
+| Trigger | Location | Condition |
+|---------|----------|-----------|
+| Worker auto | `job_worker.rs` | After TKG builder completes |
+| HTTP API | `POST /api/v1/file/:file_uuid/rule2` | Manual trigger |
+| Pipeline | `pipeline_core::execute_rule2` | Called by other modules |
+
+## Edge Cases
+
+| Scenario | Behavior |
+|----------|----------|
+| No tkg_edges | Returns 0 immediately with info log |
+| Edge without identity | Use node external_id (e.g., "trace_5") in description |
+| LLM call fails | Fallback to template-based description |
+| Multiple edges same type | Each edge becomes separate chunk |
+
+## Qdrant Collection
+
+| Property | Value |
+|----------|-------|
+| Collection name | `momentry_rule2` |
+| Vector size | 768 (nomic-embed-text-v2-moe) |
+| Distance | Cosine |
+| Payload | `{chunk_id, file_uuid, edge_type, source_type, target_type}` |
+
+## Version History
+
+| Version | Date | Author | Change |
+|---------|------|--------|--------|
+| 1.0 | 2026-06-20 | OpenCode | Initial design: TKG edges → relationship chunks |