feat: ASRX hybrid pipeline, identity history, worker fixes, checkpoint system

2026-06-02 07:13:23 +08:00
parent e3066c3f49
commit e1572907ae
198 changed files with 43705 additions and 8910 deletions
--- a/docs_v1.0/DESIGN/ASRX_HYBRID_PIPELINE_V1.0.md
+++ b/docs_v1.0/DESIGN/ASRX_HYBRID_PIPELINE_V1.0.md
@@ -0,0 +1,588 @@
+# ASRX Hybrid Pipeline v1.0 — 聲紋分離混合架構
+
+| 項目 | 內容 |
+|------|------|
+| **範圍** | ASRX 處理器重構：whisperx → VAD-first hybrid pipeline |
+| **狀態** | Draft |
+| **適用版本** | Momentry Core V4.0+ |
+| **作者** | OpenCode / Warren |
+| **建立日期** | 2026-06-01 |
+
+---
+
+## 1. 問題
+
+### 1.1 現有問題
+
+| 問題 | 說明 | 影響 |
+|------|------|------|
+| **Whisper 合併短句** | `whisper small` 會將兩個人的對話錯認成一個連續段 (A+B → 一句) | ASR segment 內混兩人話語，speaker 無法分離 |
+| **ASRX v2 speaker_id = null** | `asrx_processor_v2.py` 使用 `whisperx.DiarizationPipeline()` 但該 API 未在 whisperx `__init__.py` 暴露 | 所有 segment speaker 均為 null |
+| **文字丟失** | `asrx_processor_custom.py` 的 `SelfASRXFixed.process_with_segments()` 只輸出 `text: ""` | Rule 1 合併時無文字可用 |
+| **錯誤的聲紋後端** | `asrx_processor_v2.py` 依賴 whisperx 內建 diarization，但該功能不穩定 | 準確度 ~85%，需 HF token |
+| **多版本混亂** | 7 個 root-level 變體、14 個 asrx_self 檔案，生產環境使用錯誤版本 | 維護困難，不知哪個是對的 |
+
+### 1.2 痛點場景
+
+**兩個說話人短句來回切換**（訪談、對話）：
+
+```
+Audio: A(2s) → B(1.5s) → A(3s)
+Whisper: ───────[0-7s, "A+B+A 全部混在一起"]───────
+```
+
+Whisper 在句間停頓處不切段，導致 ASR 時間邊界無法反映 speaker 切換。
+
+---
+
+## 2. 架構
+
+### 2.1 核心原則
+
+1. **VAD 先定邊界** — 用 VAD 在句間停頓處切段，取代 whisper 的邊界
+2. **ASR 後做** — 每段各自轉錄，保有獨立文字
+3. **聲紋聚類定 speaker** — ECAPA-TDNN + AgglomerativeClustering
+
+### 2.2 5 步 Pipeline
+
+```
+Audio
+  │
+  ① whisper (一次, 粗略定位)
+  │   找到說話段 + 初步文字 + 語種
+  │   [0-7s, "今天天氣很好我覺得也不錯對啊", zh]
+  │
+  ② VAD scan (在每段內細切)
+  │   利用句間停頓切開
+  │   段1 [0-2s]    段2 [2-3.5s]    段3 [3.5-7s]
+  │
+  ③ whisper per refined segment (各段轉錄)
+  │   段1 → "今天天氣很好"     (zh, 0.98)
+  │   段2 → "我覺得也不錯"     (zh, 0.97)
+  │   段3 → "對啊"             (zh, 0.96)
+  │
+  ④ ECAPA-TDNN per refined segment (聲紋提取)
+  │   段1 → emb[0] (192-dim)
+  │   段2 → emb[1] (192-dim)
+  │   段3 → emb[2] (192-dim)
+  │
+  ⑤ AgglomerativeClustering (聚類定 speaker)
+  │   emb[0]=SPEAKER_0, emb[1]=SPEAKER_1, emb[2]=SPEAKER_0
+  │
+  輸出:
+    start  end    text         language  speaker_id
+    0.0    2.0    今天天氣很好    zh        SPEAKER_0
+    2.0    3.5    我覺得也不錯    zh        SPEAKER_1
+    3.5    7.0    對啊            zh        SPEAKER_0
+```
+
+### 2.3 流程圖
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    asrx_processor.py                                │
+│                      (wrapper)                                     │
+│                                                                    │
+│  ① ffprobe → select best track → ffmpeg → 16kHz WAV               │
+│                                                                    │
+│  ② SelfASRXFixed.process(audio_wav, file_uuid)                     │
+│     │                                                              │
+│     ├─ Step 1: whisper.transcribe() → rough segments               │
+│     ├─ Step 2: VAD scan each rough segment                         │
+│     ├─ Step 3: whisper per refined segment → text+language         │
+│     ├─ Step 4: ECAPA-TDNN per segment → 192-dim embedding         │
+│     ├─ Step 5: AgglomerativeClustering → speaker_labels            │
+│     │                                                              │
+│     ├─ Step 6: Store embeddings in Qdrant                          │
+│     │  └─ {file_uuid, speaker_id, text, language, start, end}      │
+│     │                                                              │
+│     └─ Step 7: Classify high-quality embeddings                    │
+│        ├─ quality > threshold → reference profile                  │
+│        ├─ 送入聲音分類模型推論性別/屬性                               │
+│        └─ 寫入 Qdrant (type: speaker_reference)                    │
+│                                                                    │
+│  ③ 輸出 JSON 格式 (不含 embedding)                                 │
+│                                                                    │
+│  Rust: rule1_ingest.rs                                            │
+│     └─ pre_chunks(processor_type='asrx') → chunks                  │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. 檔案組織
+
+### 3.1 最終檔案結構
+
+```
+scripts/
+├── asrx_processor.py            ← production (cleaned custom.py)
+│
+└── asrx_self/                   ← 核心庫
+    ├── __init__.py              ← package marker
+    ├── vad.py                   ← Silero VAD (新增 scan_within_segment)
+    ├── whisper_local.py         ← 🆕 封裝 whisper 載入+轉錄
+    ├── speaker_encoder.py       ← ECAPA-TDNN 192-dim
+    ├── speaker_cluster_fixed.py ← AgglomerativeClustering
+    └── main_fixed.py            ← 🔧 重寫為 5 步 pipeline
+```
+
+### 3.2 刪除清單
+
+**Root-level 變體**（全部刪除）：
+
+| 檔案 | 原因 |
+|------|------|
+| `asrx_processor.py` | 原始 whisperx 版，diarization 壞的 |
+| `asrx_processor_v2.py` | 同上，Rust 目前錯誤呼叫此檔 |
+| `asrx_processor_v2_noalign.py` | 跳過對齊但 diarization 仍壞 |
+| `asrx_processor_v2_transcribe.py` | 只轉錄不做 speaker |
+| `asrx_processor_simplified.py` | 變體 |
+| `asrx_processor_contract_v1.py` | 18KB，pyannote，需 HF token |
+
+**asrx_self 內被取代的舊版**：
+
+| 檔案 | 原因 | 取代者 |
+|------|------|--------|
+| `main.py` | 用 SpectralClustering，有 NaN 問題 | `main_fixed.py` |
+| `speaker_cluster.py` | 用 SpectralClustering，不穩定 | `speaker_cluster_fixed.py` |
+
+### 3.3 搬離清單
+
+非生產工具搬至 `tools/asrx/`：
+
+```
+tools/asrx/
+├── integrate_face_asrx_speaker.py
+├── speaker_player_gui.py
+├── speaker_player_gui_face.py
+├── speaker_player_interactive.py
+├── speaker_audio_player.py
+├── test_long_movie.py
+├── test_gui_face_player.py
+└── docs/
+    ├── FINAL_TEST_REPORT.md
+    ├── GUI_FACE_PLAYER_USAGE.md
+    ├── LONG_MOVIE_TEST_SUMMARY.md
+    └── SPEAKER_PLAYER_GUIDE.md
+```
+
+---
+
+---
+
+## 4. Qdrant 聲紋向量儲存
+
+### 4.1 儲存流程
+
+```
+Step 4 輸出: 每個 refined segment 有 {embedding: [192-dim], text, language, start, end}
+Step 5 輸出: 每個 segment 被標上 speaker_id {SPEAKER_0, SPEAKER_1, ...}
+
+Step 6: Qdrant 儲存
+  ┌─ 每個 segment → Qdrant point
+  │   point_id = hash(file_uuid + segment_index)  ← 可重複查詢
+  │   vector   = embedding (192-dim)
+  │   payload  = {
+  │     "file_uuid":   str,     ← 聚類後填入
+  │     "speaker_id":  str,     ← 聚類後填入
+  │     "text":        str,     ← ASR 轉錄結果
+  │     "language":    str,     ← 語種 (zh/en/...)
+  │     "start_time":  f64,     ← 秒
+  │     "end_time":    f64,     ← 秒
+  │     "type":        "speaker_embedding"  ← 便於區分
+  │   }
+  └─
+```
+
+### 4.2 Qdrant Collection
+
+| 項目 | 內容 |
+|------|------|
+| Collection Name | `momentry_speaker` (或共用現有 collection) |
+| Vector Dimension | 192 (ECAPA-TDNN 輸出) |
+| Distance Metric | Cosine |
+| Point ID | `hash(file_uuid + "_" + segment_index)` |
+
+### 4.3 Rust `upsert_speaker_embedding`
+
+```rust
+impl QdrantDb {
+    pub async fn upsert_speaker_embedding(
+        &self,
+        point_id: u64,
+        vector: &[f32],
+        file_uuid: &str,
+        speaker_id: &str,
+        text: &str,
+        language: &str,
+        start_time: f64,
+        end_time: f64,
+    ) -> Result<()> {
+        // Qdrant PUT /collections/{collection}/points?wait=true
+        // payload: {file_uuid, speaker_id, text, language, start_time, end_time, type: "speaker_embedding"}
+    }
+}
+```
+
+### 4.4 與現有 Face Embedding 的關係
+
+| 類別 | Qdrant Collection | Dim | Payload |
+|------|-------------------|-----|---------|
+| Face | `momentry` (self.collection_name) | 512 (FaceNet) | `file_uuid, trace_id, frame_number` |
+| **Speaker** | `momentry` 或獨立 collection | **192** (ECAPA-TDNN) | `file_uuid, speaker_id, text, language, start, end` |
+
+---
+
+## 5. 模組詳細設計
+
+### 5.1 `vad.py` — 語音活動檢測
+
+| 項目 | 內容 |
+|------|------|
+| 模型 | Silero VAD (torch.hub, snakers4/silero-vad) |
+| 現有函數 | `load_vad_model()`, `extract_speech_segments()` |
+| **新增函數** | **`scan_within_segment(wav, start_sec, end_sec, model, utils, min_speech_duration_ms=500)`** |
+
+`scan_within_segment` 作用：
+- 在一個時間範圍 `[start_sec, end_sec]` 內執行 VAD 掃描
+- 只回傳該範圍內的語音子片段 `[(s1, e1), (s2, e2), ...]`
+- 利用句間停頓切分，解決 whisper 合併問題
+
+### 5.2 `whisper_local.py` 🆕 — Whisper 封裝
+
+| 項目 | 內容 |
+|------|------|
+| 模型 | `whisper.load_model("base")` (可設定) |
+| 函數 | `load_model()`, `transcribe_segment(audio, start, end)` |
+
+```python
+def transcribe_segment(wav, sample_rate, start_sec, end_sec, model) -> dict:
+    """轉錄單一段落，回傳 {text, language, lang_prob, segments}"""
+```
+
+每段獨立轉錄，保留語言與信心度。
+
+### 5.3 `speaker_encoder.py` — 聲紋編碼器
+
+| 項目 | 內容 |
+|------|------|
+| 模型 | SpeechBrain ECAPA-TDNN (`spkrec-ecapa-voxceleb`) |
+| 輸出維度 | 192-dim |
+| EER | 0.80% (VoxCeleb1) |
+| 授權 | MIT (不需要 HuggingFace token) |
+| 函數 | `load_speaker_encoder()`, `extract_speaker_embedding()`, `extract_speaker_embeddings_batch()` |
+
+### 5.4 `speaker_cluster_fixed.py` — 說話人聚類
+
+| 項目 | 內容 |
+|------|------|
+| 演算法 | AgglomerativeClustering (cosine + average linkage) |
+| 取代 | `speaker_cluster.py` (SpectralClustering, NaN 問題) |
+| 函數 | `robust_speaker_clustering(embeddings, n_speakers=None, max_speakers=10)` |
+
+### 5.5 `main_fixed.py` 🔧 — 核心調度器（7 步 Pipeline）
+
+```python
+class SelfASRXFixed:
+    def process(self, audio_path, output_path=None, file_uuid=None):
+        """
+        7 步 speaker diarization pipeline
+        
+        Steps:
+          1. whisper.transcribe(audio) → rough segments + text + language
+          2. VAD scan each rough segment → refined segments
+          3. whisper per refined segment → {text, language, lang_prob}
+          4. ECAPA-TDNN per refined segment → 192-dim embeddings
+          5. AgglomerativeClustering → speaker_labels
+          6. Store all embeddings in Qdrant (if file_uuid provided)
+             payload: {file_uuid, speaker_id, text, language, start_time, end_time, type: "speaker_embedding"}
+          7. High-quality embeddings (quality > threshold) → classify + store reference
+             payload: {type: "speaker_reference", file_uuid, speaker_id, n_segments, avg_quality, ...}
+        
+        Returns:
+            {
+                "segments": [
+                    {
+                        "start": float, "end": float,
+                        "text": str, "language": str,
+                        "lang_prob": float, "speaker": str,
+                        "speaker_id": str, "quality": float
+                    },
+                    ...
+                ],
+                "speaker_stats": {...},
+                "n_speakers": int,
+                "total_duration": float,
+                "references": [
+                    {
+                        "speaker_id": str,
+                        "n_segments": int,
+                        "avg_quality": float,
+                        "gender": str
+                    }
+                ]
+            }
+        """
+    
+    def _store_speaker_embeddings(self, segments, file_uuid):
+        """Step 6: 每個 segment 的 192-dim embedding 存入 Qdrant"""
+    
+    def _classify_high_quality_speakers(self, segments, embeddings, labels, file_uuid):
+        """Step 7: 高品質聲紋分級 + 分類 → Qdrant reference profile"""
+
+**移除**：
+
+| 舊方法 | 原因 |
+|--------|------|
+| `process_with_segments(audio, asr_segments)` | 外部 ASR 邊界來源不可靠，被 VAD 取代 |
+| `process()` VAD-only fallback | 無文字輸出，被完整 pipeline 取代 |
+
+### 5.6 `speaker_classifier.py` 🆕 — 高品質聲紋分級與分類
+
+#### 目的
+
+聚類後，對每個 cluster 的 embedding 進行品質評估，高於閾值的獨立建檔，並用外部模型做自動分類。
+
+#### 流程
+
+```
+Step ⑤ 聚類後，每個 segment 有 {embedding, speaker_id}
+  │
+  └─ Compute quality score per embedding
+      │
+      ├─ 低於閾值 → 寫入 Qdrant (一般 speaker_embedding)
+      │
+      └─ 高於閾值 (quality > 0.85)
+          ├─ 獨立建 reference profile
+          └─ 送入「支持聲音的模型」做分類
+              ├─ 語者性別 (male/female)
+              ├─ 語種口音 (zh-CN / zh-TW / en-US)
+              └─ 或跨影片 speaker 匹配用
+```
+
+#### Quality Score 計算
+
+```python
+def compute_embedding_quality(embeddings, labels, threshold=0.85):
+    """
+    每個 embedding 到所屬 cluster centroid 的餘弦相似度
+    
+    Args:
+        embeddings: [n_segments, 192]
+        labels: [n_segments] 聚類標籤
+        threshold: 高品質門檻
+    
+    Returns:
+        qualities: [n_segments] 每個 embedding 的品質分數
+        high_quality_mask: [n_segments] bool 陣列
+    """
+    from sklearn.metrics.pairwise import cosine_similarity
+    
+    unique_labels = set(labels)
+    centroids = {}
+    for label in unique_labels:
+        mask = labels == label
+        centroid = np.mean(embeddings[mask], axis=0)
+        centroid = centroid / np.linalg.norm(centroid)
+        centroids[label] = centroid
+    
+    qualities = []
+    for i, (emb, label) in enumerate(zip(embeddings, labels)):
+        sim = cosine_similarity([emb], [centroids[label]])[0][0]
+        qualities.append(sim)
+    
+    return np.array(qualities), np.array(qualities) >= threshold
+```
+
+#### Reference Profile 格式
+
+```json
+{
+    "point_id": "hash(speaker_reference_" + file_uuid + "_" + speaker_id + "_" + cluster_index)",
+    "vector": "[192-dim centroid embedding]",
+    "payload": {
+        "type": "speaker_reference",
+        "file_uuid": "來源影片",
+        "speaker_id": "SPEAKER_0",
+        "n_segments": 25,
+        "avg_quality": 0.92,
+        "total_duration": 45.3,
+        "language": "zh",
+        "gender": "male",
+        "text_samples": ["今天天氣很好", "我覺得也不錯", "..."]
+    }
+}
+```
+
+#### 支援的聲音分類模型（選項）
+
+| 模型 | 用途 | 優點 | 缺點 |
+|------|------|------|------|
+| **SpeechBrain gender classifier** | 性別分類 | 已整合 ECAPA-TDNN | 只分 male/female |
+| **CLAP** (LAION) | 零樣本音頻分類 | 可自訂 label text | 需額外安裝 |
+| **YAMNet** | 聲音事件分類 | Google 出品，521 classes | 不擅長語者屬性 |
+| **Wav2Vec2-BERT** (speechbrain) | 情感/屬性 | 多維度分類 | 模型較大 |
+| **自建 identity classifier** | 跨影片 speaker 匹配 | 與現有 identity 系統對接 | 需累積 reference data |
+
+> **待決定**: 選擇哪個分類模型，由後續 POC 決定。
+
+#### `main_fixed.py` 新增方法
+
+```python
+class SelfASRXFixed:
+    # ... 既有 6 個步驟 ...
+
+    def _classify_high_quality_speakers(self, segments, embeddings, labels, file_uuid):
+        """
+        步驟 7: 高品質聲紋分級與分類
+        
+        1. 計算 quality score
+        2. 高於閾值者建立 reference profile
+        3. 用分類模型推論性別/屬性
+        4. 寫入 Qdrant (type: speaker_reference)
+        """
+        qualities, mask = compute_embedding_quality(embeddings, labels)
+        
+        for i, (seg, emb, label, quality, is_high) in enumerate(
+            zip(segments, embeddings, labels, qualities, mask)
+        ):
+            seg["quality"] = float(quality)
+            if is_high:
+                profile = self._build_reference_profile(
+                    emb, seg, file_uuid
+                )
+                # 分類 (placeholder)
+                # gender = classify_gender(embedding)
+                self._store_speaker_reference(profile)
+```
+
+### 5.7 `asrx_processor.py` — 清理後的 wrapper
+
+清理項目：
+
+| 問題 | 位置 | 修法 |
+|------|------|------|
+| 硬編碼 UUID `dd61fda8...` | line 155 | 移除該 fallback path |
+| `os.chdir(script_dir)` | line 112 | 改區域性 Path 操作 |
+| ASR 文字丟棄 | line 258 | `text` 來自新 pipeline |
+| `_debug` dict | line 222 | 移除 |
+| `max_speakers=10` 寫死 | line 201 | 改 CLI 參數 `--max-speakers` |
+| 載入外部 ASR segments | line 148-174 | 移除（不再需要） |
+
+---
+
+## 6. 輸出格式
+
+### 6.1 ASRX JSON Output (由 `asrx_processor.py` 寫入)
+
+> **注意**: 192-dim embedding 不在此 JSON 中。embedding 在 Python 端直接送入 Qdrant，JSON 只保留中繼資料。
+
+```json
+{
+    "language": "zh",
+    "segments": [
+        {
+            "start_time": 0.0,
+            "end_time": 2.0,
+            "start_frame": 0,
+            "end_frame": 60,
+            "text": "今天天氣很好",
+            "speaker_id": "SPEAKER_0",
+            "language": "zh",
+            "lang_prob": 0.98
+        },
+        {
+            "start_time": 2.0,
+            "end_time": 3.5,
+            "start_frame": 60,
+            "end_frame": 105,
+            "text": "我覺得也不錯",
+            "speaker_id": "SPEAKER_1",
+            "language": "zh",
+            "lang_prob": 0.97
+        }
+    ],
+    "n_speakers": 2,
+    "speaker_stats": {
+        "SPEAKER_0": {"count": 1, "duration": 2.0},
+        "SPEAKER_1": {"count": 1, "duration": 1.5}
+    }
+}
+```
+
+### 6.2 Qdrant Point 格式 (由 Python `_store_speaker_embeddings` 寫入)
+
+> Embedding 不經過 Rust，直接在 Python 端完成 Qdrant HTTP PUT。
+
+| Qdrant 欄位 | 值 | 說明 |
+|-------------|-----|------|
+| `id` | `hash(file_uuid + "_" + segment_index)` | 可重複查詢的 point ID |
+| `vector` | `[f32; 192]` | ECAPA-TDNN 聲紋向量 |
+| `payload.file_uuid` | `str` | 影片識別碼 |
+| `payload.speaker_id` | `str` | 聚類後的 speaker 標籤 |
+| `payload.text` | `str` | 該段的轉錄文字 |
+| `payload.language` | `str` | 語種 (`zh`/`en`) |
+| `payload.start_time` | `f64` | 開始時間(秒) |
+| `payload.end_time` | `f64` | 結束時間(秒) |
+| `payload.type` | `"speaker_embedding"` | 便於與 face_embedding 區分 |
+
+### 6.3 Rust `AsrxResult` 對應
+
+```rust
+pub struct AsrxSegment {
+    pub start_time: f64,       // serde(alias = "start")
+    pub end_time: f64,         // serde(alias = "end")
+    pub start_frame: u64,      // default 0
+    pub end_frame: u64,        // default 0
+    pub text: String,
+    pub speaker_id: Option<String>,
+    pub language: Option<String>,    // 🆕 新增
+    pub lang_prob: Option<f64>,     // 🆕 新增
+}
+```
+
+---
+
+## 7. Rust 端變動
+
+| 檔案 | 變動 |
+|------|------|
+| `src/core/processor/asrx.rs` | `asrx_processor_v2.py` → `asrx_processor.py` |
+| `src/core/processor/asrx.rs` | `AsrxSegment` 新增 `language`, `lang_prob` 欄位 |
+| `src/core/processor/asrx.rs` | 傳遞 `--file-uuid` 給 Python 腳本，讓 Python 端可直接寫入 Qdrant |
+| `src/core/chunk/rule1_ingest.rs` | 若 `pre_chunks` data 含 `language` 則帶入 chunk metadata |
+| `src/core/db/qdrant_db.rs` | 🆕 新增 `upsert_speaker_embedding()` 方法 (可選，若 Python 端直接寫 Qdrant 則不需) |
+
+---
+
+## 8. 遷移計畫
+
+### 實作順序 (依賴關係排序)
+
+| 步驟 | 內容 | 檔案 | 風險 |
+|------|------|------|------|
+| **S1** | `vad.py`: 新增 `scan_within_segment()` | `asrx_self/vad.py` | 低 |
+| **S2** | 🆕 `whisper_local.py`: 封裝 whisper 載入 + 轉錄 | `asrx_self/whisper_local.py` | 低 |
+| **S3** | 🔧 `main_fixed.py`: 重寫為 7 步 pipeline | `asrx_self/main_fixed.py` | 中 |
+| **S4** | 🆕 `speaker_classifier.py`: 性別分類器 | `asrx_self/speaker_classifier.py` | 低 |
+| **S5** | 🔧 `custom.py` cleanup + rename → `asrx_processor.py` | `asrx_processor_custom.py` | 低 |
+| **S6** | 🔧 Rust `asrx.rs`: 改指向 + 傳 `--file-uuid` | `src/core/processor/asrx.rs` | 低 |
+| **S7** | ✅ 驗證：build + playground 測試 | — | 中 |
+| **S8** | 🧹 刪除變體 + 搬離工具 | — | 低 |
+
+### 驗證標準
+
+1. `cargo build` 通過
+2. Playground 3003: 註冊影片 → ASRX processor 完成
+3. 輸出 JSON 中 `speaker_id` 非 `null`
+4. Qdrant collection 有 `speaker_embedding` 點
+5. 性別正確標記 (male/female)
+
+---
+
+## 9. 版本歷史
+
+| 版本 | 日期 | 修改者 | 說明 |
+|------|------|--------|------|
+| V1.0 | 2026-06-01 | OpenCode | 初始版本：7 步 hybrid pipeline + Qdrant 聲紋儲存 + 高品質分類 |
--- a/docs_v1.0/DESIGN/Modular_Doc_System_V1.0.md
+++ b/docs_v1.0/DESIGN/Modular_Doc_System_V1.0.md
@@ -0,0 +1,385 @@
+---
+document_type: "design"
+service: "MOMENTRY_CORE"
+title: "模組生成式文件產出系統"
+date: "2026-05-17"
+version: "V1.0"
+status: "active"
+owner: "M5"
+created_by: "OpenCode"
+tags:
+  - "documentation"
+  - "modular"
+  - "generated-docs"
+  - "workspace"
+ai_query_hints:
+  - "查詢模組生成式文件產出系統的設計理念"
+  - "如何使用 API_WORKSPACE"
+  - "如何新增 API endpoint 文檔"
+  - "make deploy 流程"
+  - "自定義交付文件"
+related_documents:
+  - "STANDARDS/USER_DOCS_STANDARD.md"
+  - "STANDARDS/DOCS_STANDARD.md"
+  - "API_WORKSPACE/README.md"
+  - "API_WORKSPACE/modules/_template.md"
+---
+
+# 模組生成式文件產出系統
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | OpenCode |
+| 建立時間 | 2026-05-17 |
+| 文件版本 | V1.0 |
+| 目標讀者 | developer, documentation maintainer |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 |
+|------|------|------|--------|
+| V1.0 | 2026-05-17 | 建立設計文件 | OpenCode |
+
+---
+
+## 1. 設計理念
+
+### 1.1 痛點
+
+傳統 API 文件維護有常見問題：
+
+| 問題 | 具體表現 |
+|------|----------|
+| **內容重複** | 同一個 endpoint 在快速參考、完整手冊、教育訓練文件中寫三次 |
+| **更新遺漏** | 修改 curl 範例後，忘記同步到另一份文件 |
+| **交付僵化** | 無法按對象產出不同版本的 API 文件 |
+| **版本失靈** | YAML frontmatter 版本號與實際內容脫節 |
+
+### 1.2 核心原則
+
+```
+單一真理源（modules/）→ 組裝引擎（assemble_docs.sh）→ 多種交付產品（GUIDES/）
+
+        編輯       ──→      生成       ──→      部署
+    1 處修改模組      make all      make deploy
+```
+
+| 原則 | 說明 |
+|------|------|
+| **單一真理源** | 每個 endpoint 只在 `modules/` 中定義一次 |
+| **組裝而非撰寫** | 交付文件是 modules 的組合，不是手寫 |
+| **開發與交付分離** | `API_WORKSPACE/` 開發，`GUIDES/` 交付 |
+| **模組為最小可測試單位** | 每個 module 可獨立驗證正確性 |
+| **配置驅動** | `.toml` 配置定義哪些 module 以何種模式組裝成何種輸出 |
+
+### 1.3 檔案類型對照
+
+| 類型 | 角色 | 可編輯 | 位置 |
+|------|------|--------|------|
+| Module (模組) | 不可再拆的內容最小單位 | ✅ 是 | `API_WORKSPACE/modules/` |
+| Config (配方) | 定義組裝規則 | ✅ 是 | `API_WORKSPACE/configs/` |
+| Narrative (敘事) | 非結構化的前言/背景 | ✅ 是 | `API_WORKSPACE/narratives/` |
+| Assembled (產出) | 從模組組裝的交付文件 | ❌ 否（generated） | `API_WORKSPACE/_build/` → `GUIDES/` |
+
+---
+
+## 2. 目錄結構
+
+```
+docs_v1.0/
+├── API_WORKSPACE/                    ← 開發區
+│   ├── modules/                      ← 端點模組（單一真理源）
+│   │   ├── _template.md              ← 模組撰寫規範
+│   │   ├── 01_auth.md                ← 認證、Base URL
+│   │   ├── 02_health.md              ← 健康檢查
+│   │   ├── 03_register.md            ← 註冊、掃描
+│   │   ├── 04_lookup.md              ← 查詢、刪除
+│   │   ├── 05_process.md             ← 處理、進度、任務
+│   │   ├── 06_search.md              ← 搜尋（向量、n8n、視覺）
+│   │   ├── 07_identity.md            ← 身份 CRUD、bind/unbind
+│   │   ├── 08_identity_agent.md      ← Identity Agent
+│   │   ├── 09_tmdb.md                ← TMDb Enrichment
+│   │   ├── 10_pipeline.md            ← Stats、配置、未掛載端點
+│   │   └── 11_error_codes.md         ← 錯誤碼對照表
+│   │
+│   ├── configs/                      ← 組裝配方（每個輸出一份）
+│   │   ├── reference.toml            → API_REFERENCE.md
+│   │   ├── endpoints.toml            → API_ENDPOINTS.md
+│   │   ├── quickref.toml             → API_QUICK_REFERENCE.md
+│   │   ├── errors.toml               → API_ERROR_CODES.md
+│   │   ├── index.toml                → API_INDEX.md
+│   │   ├── marcom.toml               → API_TRAINING_MARCOM.md
+│   │   └── tmdb.toml                   → TMDb_User_Guide.md
+│   │
+│   ├── narratives/                   ← 非端點敘事前言
+│   │   └── marcom_intro.md
+│   │
+│   ├── _build/                       ← 生成暫存區（gitignored）
+│   ├── Makefile                      ← 組裝自動化入口
+│   ├── assemble_docs.sh              ← 組裝引擎
+│   └── README.md                     ← 開發者速查
+│
+├── GUIDES/                           ← 交付區
+│   ├── API_REFERENCE.md              (generated)
+│   ├── API_ENDPOINTS.md              (generated)
+│   ├── API_QUICK_REFERENCE.md        (generated)
+│   ├── API_ERROR_CODES.md            (generated)
+│   ├── API_INDEX.md                  (generated)
+│   ├── API_TRAINING_MARCOM.md        (generated)
+│   ├── TMDb_User_Guide.md            (generated)
+│   ├── Demo_EndToEnd.md              (手寫保留)
+│   ├── Pipeline_API_Demo.md          (手寫保留)
+│   └── ...                           (其他手寫文件)
+│
+├── DESIGN/
+├── REFERENCE/
+├── OPERATIONS/
+├── INTEGRATIONS/
+└── STANDARDS/
+```
+
+---
+
+## 3. 模組規範
+
+### 3.1 檔名規則
+
+- 格式：`NN_<name>.md`（NN = 兩位數排序 01-99）
+- 範例：`03_register.md`, `09_tmdb.md`
+- 依賴序號決定組裝時的 endpoint 順序
+
+### 3.2 Module Metadata 註解
+
+每個 module 開頭必須有 metadata 註解：
+
+```markdown
+<!-- module: auth -->
+<!-- description: Authentication, API Key, Base URL configuration -->
+<!-- depends: -->
+```
+
+| 欄位 | 必填 | 說明 |
+|------|------|------|
+| `module` | Yes | 唯一名稱，無空格無數字開頭 |
+| `description` | Yes | 一句話說明 |
+| `depends` | No | 依賴的其他 module 名稱（逗號分隔） |
+
+### 3.3 Endpoint 結構
+
+每個 endpoint 必須使用一致結構：
+
+```markdown
+### `METHOD /path/to/endpoint`
+
+**Auth**: Required / Optional / Public
+**Scope**: file-level / identity-level / system-level
+
+#### Request Parameters
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+
+#### Example
+
+```bash
+curl -s -X METHOD "$API/path" \
+  -H "X-API-Key: $KEY" \
+  -d '{"field": "value"}'
+```
+
+#### Response (200)
+
+```json
+{ ... }
+```
+
+#### Error Codes
+
+| Code | HTTP | When |
+|------|------|------|
+```
+```
+
+### 3.4 變數規則
+
+| 變數 | 用途 | 範例值 |
+|------|------|--------|
+| `$API` | Base URL | `http://localhost:3003` |
+| `$KEY` | API Key | `your-api-key-here` |
+| `$FILE_UUID` | File UUID | `3a6c1865...` |
+| `$IDENTITY_UUID` | Identity UUID | `a9a90105...` |
+
+---
+
+## 4. 組裝引擎
+
+### 4.1 `assemble_docs.sh`
+
+Shell 腳本，接收三個參數：
+
+| 參數 | 說明 | 範例 |
+|------|------|------|
+| `--config` | TOML 配方路徑 | `configs/reference.toml` |
+| `--modules` | Module 目錄 | `modules/` |
+| `--build` | 輸出目錄 | `_build/` |
+
+### 4.2 三種組裝模式
+
+| mode | 行為 | 適用 |
+|------|------|------|
+| `full` | 完整包含 module 全部內容（除 metadata） | API_REFERENCE, API_ENDPOINTS |
+| `summary` | 僅擷取 endpoint 表格 + curl 範例 | API_QUICK_REFERENCE |
+| `index` | 生成文件總覽（掃描 modules 目錄自動產生索引） | API_INDEX |
+
+### 4.3 組裝流程
+
+```
+1. 讀取 config.toml → 解析 title, modules, mode, narrative
+2. 生成 YAML frontmatter（含 document_type, date, version）
+3. 生成 title heading + info block
+4. （可選）摘自 TOC：從 modules ## headings 生成目錄
+5. （可選）插入 narrative intro
+6. 遍歷 modules：
+   - full mode: 複製整份內容（跳過 <!-- --> 註解）
+   - summary mode: 只提取 | table | + ```bash code block
+   - index mode: 自動掃描 modules 目錄生成清單
+7. 寫入 _build/ 輸出檔案
+```
+
+---
+
+## 5. 配方格式（config.toml）
+
+```toml
+title = "輸出文件標題"
+output = "_build/FILENAME.md"     # 輸出路徑（相對於 API_WORKSPACE）
+mode = "full"                      # full | summary | index
+modules = ["01_auth", "03_register"]  # 要包含的 module 名稱
+narrative = "narratives/xxx.md"   # （可選）包含的敘事前言
+toc = true                         # （可選）是否生成目錄
+
+[frontmatter]
+document_type = "api_reference"    # 用於 YAML frontmatter
+service = "MOMENTRY_CORE"
+version = "V1.0"
+owner = "M5"
+created_by = "OpenCode"
+```
+
+### 內建配方一覽
+
+| 檔案 | 輸出 | Modules | Mode |
+|------|------|---------|------|
+| `reference.toml` | API_REFERENCE.md | 01-11 | full |
+| `endpoints.toml` | API_ENDPOINTS.md | 01-10 | full |
+| `quickref.toml` | API_QUICK_REFERENCE.md | 01-06,09 | summary |
+| `errors.toml` | API_ERROR_CODES.md | 11 | full |
+| `index.toml` | API_INDEX.md | (auto) | index |
+| `marcom.toml` | API_TRAINING_MARCOM.md | 01,03,06 + narrative | full |
+| `tmdb.toml` | TMDb_User_Guide.md | 01,03,09 | full |
+
+---
+
+## 6. 工作流程
+
+### 6.1 日常修改
+
+```bash
+# 1. 編輯模組
+cd API_WORKSPACE
+vim modules/09_tmdb.md
+
+# 2. 重新生成單一文件
+make tmdb
+
+# 3. 預覽結果
+less _build/TMDb_User_Guide.md
+
+# 4. 部署
+make deploy
+```
+
+### 6.2 新增端點
+
+```bash
+# 1. 找到所屬模組
+ls modules/
+# 決定該 endpoint 屬於哪個模組（如 tmdb, identity, search）
+
+# 2. 在對應模組加入 endpoint 文檔
+vim modules/09_tmdb.md
+
+# 3. 重新生成所有文件
+make all
+
+# 4. 確認所有引用此端點的文件都有正確更新
+make check
+
+# 5. 部署
+make deploy
+```
+
+### 6.3 客製化交付
+
+```bash
+# 新增一個客製化配方
+cat > configs/integration_partner.toml << TOML
+title = "Integration Partner API Guide"
+output = "_build/PARTNER_GUIDE.md"
+mode = "full"
+modules = ["01_auth", "06_search", "09_tmdb", "11_error_codes"]
+toc = true
+[frontmatter]
+document_type = "user_manual"
+service = "MOMENTRY_CORE"
+version = "V1.0"
+owner = "M5"
+created_by = "OpenCode"
+TOML
+
+# 在 Makefile 中加入對應 target
+echo "partner:" >> Makefile
+echo '	@$$(SCRIPT) --config configs/integration_partner.toml --modules $$(MODULES) --build $$(BUILD)' >> Makefile
+
+# 生成
+make partner
+
+# 部署
+make deploy
+```
+
+---
+
+## 7. 交付客製化對照表
+
+| 對象 | 需要 modules | make target | 輸出 |
+|------|-------------|-------------|------|
+| API Developer | 01-11 (all) | `make reference` | API_REFERENCE.md |
+| Quick Start User | 01-06,09 | `make quickref` | API_QUICK_REFERENCE.md |
+| Marcom Team | 01,03,06 + narrative | `make marcom` | API_TRAINING_MARCOM.md |
+| TMDb User | 01,03,09 | `make tmdb` | TMDb_User_Guide.md |
+| Integration Partner | 01,06,09,11 | Custom config | PARTNER_GUIDE.md |
+
+---
+
+## 8. GUIDES/ 文件類型說明
+
+| 類型 | 來源 | 說明 |
+|------|------|------|
+| `API_*.md` (7 files) | Generated from API_WORKSPACE | API 功能文件，endpoint 列表 + curl 範例 |
+| `Demo_*.md`, `M5API_*.md` | 手寫 | 敘事性指引，含完整 step-by-step 流程 |
+| `PORTAL_*.md` | 手寫 | Portal 開發計畫與 Demo 指引 |
+| `USER_MANUAL.md` | 手寫 | 系統操作使用手冊 |
+
+> **提醒**：不要直接修改 GUIDES/ 中的 generated files。修改應在 API_WORKSPACE/modules/ 中進行，然後執行 `make deploy`。
+
+---
+
+## 相關文件
+
+- `API_WORKSPACE/README.md` — 開發者快速上手指南
+- `API_WORKSPACE/modules/_template.md` — 模組撰寫範本
+- `STANDARDS/DOCS_STANDARD.md` — 文件創建規範
+- `STANDARDS/USER_DOCS_STANDARD.md` — 使用者文件規範
--- a/docs_v1.0/DESIGN/REPRESENTATIVE_FRAME_API_V1.md
+++ b/docs_v1.0/DESIGN/REPRESENTATIVE_FRAME_API_V1.md
@@ -0,0 +1,128 @@
+# Representative Frame API V1.0
+
+Portal 影片代表畫面 API — 沒有指定 frame_number 時自動偵測男女主角找到最佳互動 frame。
+
+---
+
+## 1. Overview
+
+### Purpose
+
+Portal 需要為每個影片顯示一張代表畫面（thumbnail），內容應為該影片最具代表性的 scene — 通常包含男女主角同框且互看的時刻。
+
+### Principle
+
+**沒有指定 frame_number → auto-detect representative frame**
+
+既有端點不需改動，只需在 `frame` 參數為空時自動偵測。
+
+---
+
+## 2. Endpoint
+
+### `GET /api/v1/file/:file_uuid/thumbnail`
+
+**Query Parameters**:
+
+| Param | Type | Required | Description |
+|-------|------|----------|-------------|
+| `frame` | i64 | ❌ | 指定 frame；不傳則 auto-detect |
+| `x` | i32 | ❌ | bbox crop x |
+| `y` | i32 | ❌ | bbox crop y |
+| `w` | i32 | ❌ | bbox crop width |
+| `h` | i32 | ❌ | bbox crop height |
+
+**Response**: Pure JPEG bytes (Content-Type: image/jpeg)
+
+**Examples**:
+```
+GET /api/v1/file/:uuid/thumbnail                     → auto-detect
+GET /api/v1/file/:uuid/thumbnail?frame=38165         → 指定 frame
+GET /api/v1/file/:uuid/thumbnail?frame=38165&x=723&y=205&w=221&h=221  → 指定 crop
+```
+
+---
+
+## 3. Internal Algorithm
+
+### Auto-detect Fallback Chain
+
+```
+Step 1: Auto-detect 主角 (top 2 by face count)
+  └─ face_detections JOIN identities
+
+Step 2: TKG Bridge — mutual_gaze?
+  ├── 有 mutual_gaze edge → first_frame ✅
+  └── 無 → face_detections 第一次同框 frame ✅
+
+Step 3: 只有一個主角?
+  └─ 該主角 face_quality (w×h×confidence) 最高 frame
+
+Step 4: 完全無 identity?
+  └─ 任 identity 的 face_quality 最高 frame
+
+Step 5: 完全無 face?
+  └─ 404 "No faces in this file"
+```
+
+### TKG Bridge Query
+
+```sql
+-- 找兩主角各自的 main trace
+SELECT trace_id FROM face_detections
+WHERE file_uuid = $1 AND identity_id = $2 AND trace_id IS NOT NULL
+GROUP BY trace_id ORDER BY COUNT(*) DESC LIMIT 1;
+
+-- TKG mutual_gaze 查詢
+SELECT (e.properties->>'first_frame')::bigint
+FROM tkg_edges e
+JOIN tkg_nodes a ON a.id = e.source_node_id
+JOIN tkg_nodes b ON b.id = e.target_node_id
+WHERE e.file_uuid = $1
+  AND a.external_id = concat('trace_', $4)
+  AND b.external_id = concat('trace_', $5)
+  AND e.properties->>'mutual_gaze' = 'true'
+LIMIT 1;
+
+-- Fallback: 第一次同框
+SELECT MIN(fd_a.frame_number)::bigint
+FROM face_detections fd_a
+JOIN face_detections fd_b ON fd_a.frame_number = fd_b.frame_number
+WHERE fd_a.file_uuid = $1 AND fd_a.identity_id = $2 AND fd_b.identity_id = $3;
+```
+
+---
+
+## 4. Implementation
+
+### Files Changed
+
+| File | Change |
+|------|--------|
+| `src/api/media_api.rs` | `ThumbQuery.frame` → `Option<i64>`; add auto-detect fallback |
+| `src/core/processor/tkg.rs` | Add `query_auto_representative_frame()` + structs (已實作) |
+| `src/core/processor/mod.rs` | Export new function + structs (已實作) |
+
+### Existing Trace-level Endpoints (不變)
+
+```
+GET /api/v1/file/:uuid/trace/:tid/representative-face  → JSON (legacy)
+GET /api/v1/file/:uuid/trace/:tid/thumbnail             → JPEG (auto via select_rep_face)
+```
+
+### No Changes
+
+- ❌ No new DB tables / migrations
+- ❌ No changes to `select_rep_face` / blurdetect
+- ❌ No chunk / cut / pre_chunks dependency
+
+---
+
+## 5. Version History
+
+| Date | Version | Author | Change |
+|------|---------|--------|--------|
+| 2026-05-22 | 1.0 | OpenCode | Initial design |
+| 2026-05-22 | 1.1 | OpenCode | 簡化為單一 endpoint: frame 為 None 時 auto-detect |
+
+*Updated: 2026-05-22*
--- a/docs_v1.0/DESIGN/Redis_Progress_Reporting_V1.0.md
+++ b/docs_v1.0/DESIGN/Redis_Progress_Reporting_V1.0.md
@@ -0,0 +1,270 @@
+---
+document_type: "design_doc"
+service: "MOMENTRY_CORE"
+title: "Redis Progress Reporting V1.0"
+version: "V1.0"
+date: "2026-05-17"
+author: "M5"
+status: "draft"
+---
+
+# Redis Progress Reporting V1.0
+
+| 項目 | 內容 |
+|------|------|
+| Service | `MOMENTRY_CORE` |
+| Version | V1.0 |
+| Date | 2026-05-17 |
+| Author | M5 (OpenCode) |
+| Status | Draft |
+
+## 1. Overview
+
+This document defines the standardized progress reporting architecture for Momentry Core processors. It replaces the inconsistent ad-hoc progress patterns found across `scripts/`, `src/worker/`, and `src/api/`.
+
+### 1.1 Problems Addressed
+
+| # | Problem | Detail |
+|---|---------|--------|
+| 1 | Worker Redis key does not match `OPERATIONS/MOMENTRY_CORE_REDIS_KEYS.md` V1.0 spec | Worker writes `worker:job:{uuid}:processor:{name}` instead of spec `job:{uuid}:processor:{name}` |
+| 2 | Progress API reads wrong key | `get_progress()` reads `worker:job:{uuid}:processor:{name}` — unresolved with Playground subscriber which writes `job:{uuid}:processor:{name}` |
+| 3 | Swift processors (Face/OCR/Pose) lack RedisPublisher | Progress lost — only stdout text |
+| 4 | ASRX/Story/Visual chunk have no incremental progress | Start + complete only, no `current/total` updates |
+| 5 | `frames_processed` / `chunks_produced` never updated in real-time | Worker only writes processor hash at start and exit |
+| 6 | No `output_count` / `output_type` fields | Impossible to know how many faces/objects/segments were produced |
+
+### 1.2 Key Design Decisions
+
+| Decision | Rationale |
+|----------|-----------|
+| Progress unit = frames for video processors | All media-level processors work frame by frame |
+| Output count separate from progress | Processors may produce N outputs per frame (multiple faces, objects) |
+| Pub/sub for real-time, Hash for final state | Pub/sub is transient; Hash persists for API queries |
+
+---
+
+## 2. Redis Key Architecture
+
+### 2.1 Key Patterns
+
+All keys use the configured `REDIS_KEY_PREFIX` (default: `momentry:` for production, `momentry_dev:` for playground).
+
+| Pattern | Type | TTL | Purpose | Owner |
+|---------|------|-----|---------|-------|
+| `{prefix}progress:{uuid}` | Pub/Sub | — | Real-time progress messages | Python scripts |
+| `{prefix}job:{uuid}` | Hash | 24h | Per-video job state | Worker |
+| `{prefix}job:{uuid}:processor:{name}` | Hash | 24h | Per-processor final state | Worker |
+| `{prefix}job:{uuid}:processor:{name}:output_count` | String | 24h | Output count by type | Worker |
+
+### 2.2 Processor Hash Fields
+
+```
+{prefix}job:{uuid}:processor:{name}
+├── status          String   running / completed / failed / pending
+├── current         u32      Units processed (frames for video processors)
+├── total           u32      Total units
+├── output_count    u32      Output items produced (faces, objects, segments)
+├── output_type     String   Type name of output: faces / objects / segments / cuts / etc.
+├── pid             i32      OS process ID (0 if not running)
+├── error           String   Error message if failed
+└── updated_at      String   ISO 8601 timestamp
+```
+
+### 2.3 Migrated Keys
+
+The following key patterns from the original implementation are REMOVED:
+
+| Old Key | Reason |
+|---------|--------|
+| `{prefix}worker:job:{uuid}:processor:{name}` | Non-standard prefix — not in `MOMENTRY_CORE_REDIS_KEYS.md` spec |
+| `{prefix}job:{uuid}:processor:{name}:status` (flat) | Redundant — status stored in Hash |
+| `{prefix}job:{uuid}:processor:{name}:progress` (flat) | Replaced by `current` + `total` for percent calculation |
+| `{prefix}job:{uuid}:processor:{name}:current` (flat) | Replaced by Hash fields |
+| `{prefix}job:{uuid}:processor:{name}:total` (flat) | Replaced by Hash fields |
+| `{prefix}job:{uuid}:processor:{name}:started_at` (flat) | Replaced by Hash `updated_at` |
+
+---
+
+## 3. Pub/Sub Message Format
+
+### 3.1 Channel
+
+```
+{prefix}progress:{uuid}
+```
+
+### 3.2 Message JSON
+
+```json
+{
+  "processor": "face",
+  "current": 150,
+  "total": 162696,
+  "output_count": 423,
+  "output_type": "faces",
+  "message": "Processing frame 150",
+  "timestamp": 1700000000
+}
+```
+
+### 3.3 Field Definitions
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `processor` | String | ✅ | Processor name: asr / asrx / yolo / ocr / face / pose / cut / story / visual_chunk |
+| `current` | u32 | ✅ | Units processed (frames for video processors) |
+| `total` | u32 | ✅ | Total units |
+| `output_count` | u32 | ❌ | Output items produced so far |
+| `output_type` | String | ❌ | Type name: faces / objects / segments / cuts / text_regions / persons / speakers / stories / visual_chunks |
+| `message` | String | ❌ | Human-readable progress description |
+| `timestamp` | u64 | ✅ | Unix timestamp |
+
+---
+
+## 4. Per-Processor Metrics
+
+| Processor | current/total Unit | output_type | When to Publish |
+|-----------|-------------------|-------------|-----------------|
+| ASR | frames | `segments` | Every 100 segments processed |
+| ASRX | frames | `speakers` | Every processing stage |
+| YOLO | frames | `objects` | Every 500 frames |
+| OCR | frames | `text_regions` | Every 5% |
+| Face | frames | `faces` | Every batch (5% of frames) |
+| Pose | frames | `persons` | Every 10% |
+| CUT | frames | `cuts` | Every scene detected |
+| Story | chunks | `stories` | Every chunk processed |
+| Visual chunk | frames | `visual_chunks` | Every chunk processed |
+
+### 4.1 Output Type Enum
+
+```rust
+pub enum OutputType {
+    Segments,       // ASR
+    Speakers,       // ASRX
+    Objects,        // YOLO
+    TextRegions,    // OCR
+    Faces,          // Face
+    Persons,        // Pose
+    Cuts,           // CUT
+    Stories,        // Story
+    VisualChunks,   // Visual chunk
+}
+```
+
+---
+
+## 5. Data Flow
+
+```
+┌──────────────────┐     Pub/Sub                          ┌──────────────────────┐
+│  Python Processor │ ───────── progress:{uuid} ──────────→│  Worker (subscriber) │
+│  (ASR/YOLO/Face)  │     {current, total,                 │                      │
+│                   │      output_count, output_type}       │  ──→ HSET            │
+└──────────────────┘                                       │  job:{uuid}:         │
+                                                           │  processor:{name}    │
+┌──────────────────┐                                       │                      │
+│  Swift Processor  │ ──→ Python wrapper ──→ pub/sub        │  (status, current,   │
+│  (Face/OCR/Pose)  │     (add RedisPublisher)             │   total, output_count,│
+└──────────────────┘                                       │   output_type)       │
+                                                           └──────────┬───────────┘
+                                                                      │ HGETALL
+                                                           ┌──────────▼───────────┐
+                                                           │  Progress API        │
+                                                           │  GET /progress/:uuid │
+                                                           │                     │
+                                                           │  ─→ compute %       │
+                                                           │  ─→ return JSON     │
+                                                           └─────────────────────┘
+```
+
+---
+
+## 6. Implementation Plan
+
+### Phase 1: Python Processor RedisPublisher
+
+| Task | Files | Effort |
+|------|-------|--------|
+| Add `RedisPublisher` to `face_processor.py` | `scripts/face_processor.py` | Medium |
+| Add `RedisPublisher` to `ocr_processor.py` | `scripts/ocr_processor.py` | Medium |
+| Add `RedisPublisher` to `pose_processor.py` | `scripts/pose_processor.py` | Medium |
+| Add incremental `.progress()` to `asrx_processor_custom.py` | `scripts/asrx_processor_custom.py` | Low |
+| Standardize pub/sub message to include `output_count`, `output_type` | All processor scripts | Low |
+
+### Phase 2: Worker
+
+| Task | Files | Effort |
+|------|-------|--------|
+| Fix Redis key from `worker:job:` to `job:` | `src/worker/processor.rs`, `src/core/db/redis_client.rs` | Low |
+| Subscribe to `progress:{uuid}` channel in `run_processor()` | `src/worker/processor.rs` | Medium |
+| HSET Processor Hash on each progress message | `src/worker/processor.rs` | Medium |
+| Set `output_count` and `output_type` from pub/sub message | `src/worker/processor.rs` | Low |
+
+### Phase 3: Progress API
+
+| Task | Files | Effort |
+|------|-------|--------|
+| Read `output_count`, `output_type` from Redis Hash | `src/api/server.rs` | Low |
+| Compute percentage from `current` / `total` | `src/api/server.rs` | Low |
+| Return `output_count`, `output_type` in response JSON | `src/api/server.rs` | Low |
+| Remove `worker:` fallback path | `src/api/server.rs` | Low |
+
+### Phase 4: Cleanup
+
+| Task | Files | Effort |
+|------|-------|--------|
+| Remove old `worker:job:` keys from Redis | Deployment script | Low |
+| Remove `update_processor_progress()` DB path (stale `processing_status` JSONB) | `src/core/db/postgres_db.rs` | Medium |
+
+---
+
+## 7. API Response Changes
+
+### ProgressResponse (new fields)
+
+```json
+{
+  "processors": [
+    {
+      "name": "face",
+      "status": "running",
+      "current": 150,
+      "total": 162696,
+      "progress": 0,
+      "frames_processed": 150,
+      "output_count": 423,
+      "output_type": "faces"
+    }
+  ]
+}
+```
+
+---
+
+## 8. Dependencies
+
+| Component | Version | Role |
+|-----------|---------|------|
+| Redis | ≥ 6.0 | Pub/Sub + Hash storage |
+| `redis_publisher.py` | Existing | Python → Redis pub/sub client |
+| `redis_client.rs` | Existing | Rust Redis client for worker + API |
+
+---
+
+## 9. References
+
+| Doc | Relation |
+|-----|----------|
+| `OPERATIONS/MOMENTRY_CORE_REDIS_KEYS.md` | Parent spec — this doc supersedes sections 4, 7, 8 |
+| `DESIGN/VIDEO_PROCESSING_SPEC.md` §2.3 | Original progress design (ProcessProgress struct) |
+| `src/worker/processor.rs` | Worker progress write implementation |
+| `scripts/redis_publisher.py` | Python pub/sub client |
+| `src/api/server.rs` (get_progress) | Progress API handler |
+
+---
+
+## Version History
+
+| Version | Date | Author | Change |
+|---------|------|--------|--------|
+| V1.0 | 2026-05-17 | M5 (OpenCode) | Initial draft — replaces ad-hoc progress patterns |