feat: trace chunks with co-appearance relationships

- New trace_ingest module: creates chunks for each face trace (time + bbox + ASR text) - Computes pairwise time overlaps between traces -> co_appearances in metadata - Worker auto-triggers after face trace store + Qdrant sync - SearchFilters: chunk_type filter (sentence/cut/trace/visual) - SearchFilters: co_appears_with_trace_id filter
2026-05-09 06:18:32 +08:00
parent 9f5afd1b86
commit b902763d45
5 changed files with 373 additions and 6 deletions
--- a/docs/TRACE_SEARCH_API_DESIGN.md
+++ b/docs/TRACE_SEARCH_API_DESIGN.md
@@ -0,0 +1,101 @@
+# Trace Search API 設計
+
+## 概念
+
+trace 是一種 chunk。
+
+現有的 chunk_type: `cut`, `sentence`, `visual`, `story`
+新增 chunk_type: `trace`
+
+每個 trace（人物跨 frame 追蹤軌跡）就是一個時間區間 + 區間內的 ASR text。
+跟其他 chunk 完全一樣，只是切分維度不同：
+- cut chunk = 鏡頭切換
+- sentence chunk = 語句邊界
+- visual chunk = 畫面物體組合
+- **trace chunk = 人物出現區間 + 當下 spoken text**
+
+這樣 trace 可以直接放進現有的 `chunks` 表，共用 embedding、搜尋、Qdrant sync 整套機制，不需要任何新 table。
+
+## chunks 表現有結構
+
+```sql
+chunks (
+    id, file_uuid, chunk_type,                   -- 'trace' 新增
+    start_frame, end_frame, start_time, end_time,
+    text_content,                                 -- trace 區間的 ASR text
+    embedding,                                    -- text_content 的 pgvector
+    metadata JSONB,                               -- { trace_id, face_count, identity_id, identity_name }
+    ...
+)
+```
+
+## 資料產生流程（worker 擴充）
+
+在 face processing +  `store_traced_faces.py` 完成後：
+
+1. 查詢 `face_detections` 聚合每個 trace 的 `MIN(frame)`, `MAX(frame)`, `COUNT(*)`
+2. 對每個 trace，查詢 `pre_chunks WHERE processor_type='asr'` 中與 trace time range 重疊的 text
+3. 彙整 text → EmbeddingGemma 產生 `embedding`
+4. 寫入 `chunks`（`chunk_type='trace'`），metadata 含 `trace_id`, `face_count`, `identity_id`
+5. embedding 自動進 Qdrant（與既有 chunk 同一 collection）
+
+## Search API 擴充
+
+Universal Search 的 `types` 原本就支援 `"chunk"`。
+在 chunk 搜尋中過濾 `chunk_type = 'trace'` 即可。
+
+**Request**：
+```json
+{
+  "query": "open the door",
+  "types": ["chunk"],
+  "filters": { "chunk_type": "trace" },
+  "uuid": "aeed71342a899fe4b4c57b7d41bcb692",
+  "page": 1,
+  "page_size": 20
+}
+```
+
+**Response**（與既有 Chunk result 相同）：
+```json
+{
+  "type": "chunk",
+  "chunk_id": "chunk_42",
+  "chunk_type": "trace",
+  "start_frame": 45200, "end_frame": 45900,
+  "start_time": 1808.0, "end_time": 1836.0,
+  "score": 0.87,
+  "text": "Open the door. Come on, hurry up.",
+  "metadata": {
+    "trace_id": 5,
+    "face_count": 42,
+    "identity_name": "Audrey Hepburn"
+  }
+}
+```
+
+完全沿用既有的 `SearchResult::Chunk` variant，不用新增 enum variant。
+
+### 搜尋語法
+
+```sql
+SELECT c.*
+FROM dev.chunks c
+WHERE c.file_uuid = $1
+  AND c.chunk_type = 'trace'
+  AND c.embedding IS NOT NULL
+ORDER BY c.embedding <=> $2
+LIMIT $3;
+```
+
+## 總結
+
+| 項目 | 作法 |
+|------|------|
+| 新 table | ❌ 不需要 |
+| 新 enum variant | ❌ 不需要 |
+| SearchResult 改動 | ❌ 不需要 |
+| chunk_type 新增 | ✅ `'trace'` |
+| worker 擴充 | ✅ 產生 trace chunk (face done 後) |
+| SearchFilters 擴充 | ✅ 加 `chunk_type` filter |
+| Qdrant | ✅ 自動（既有 chunk collection） |