fix: ASRX duplication, TKG edges, trace ingest, and add pipeline progress publishing

- ASRX handler no longer stores duplicate 'asr' pre_chunks
- Pre_chunks storage made idempotent (delete-before-insert)
- Rule 1 + trace_ingest changed to query 'asrx' not 'asr'
- Trace chunks removed (dynamic from TKG/Qdrant)
- TKG scroll_face_points fixed: trace_id >= 1 (not == 1)
- TKG AsrxSegmentEntry: start/end -> start_time/end_time (match ASRX JSON)
- Unregister error handling: log instead of silent discard
- Add publish_pipeline_progress calls at each pipeline stage
  (processors, rule1, face_trace, identity_agent, TKG, rule2, completion)
This commit is contained in:
Accusys
2026-07-02 10:43:46 +08:00
parent d791d138f2
commit 3eabd45882
65 changed files with 9481 additions and 3856 deletions

View File

@@ -0,0 +1,81 @@
---
title: Charade Identity Processing Fix Report
date: 2026-06-29
author: OpenCode
status: completed
---
## Summary
**Problem**: Charade file (UUID: c36f35685177c981aa139b66bbbccc5b) identity processing failed because of data corruption and missing TKG nodes.
**Root Cause**: Circular dependency chain broken:
- face_detections had 3x duplicate records (12726 instead of 4242)
- All trace_id = NULL (UPDATE failed)
- TKG Phase 2.5 couldn't create face_track nodes (needs trace_id)
- Identity Agent couldn't mark suggestions (needs TKG nodes)
## Fix Steps
### Step 1: Clean Duplicate Data ✅
- Deleted 8484 duplicate records
- 12726 → 4242 unique face_detections
### Step 2: Write trace_id ✅
- store_traced_faces.py successfully updated DB
- 4242 faces with trace_id (100% populated)
- 426 unique traces
### Step 3: Create TKG Nodes ✅
- Created 426 face_track nodes via SQL
- Fixed external_id format: "face_track_*" (matches Rust code)
### Step 4: Run Identity Agent ✅
- Identity matching: 2 traces matched to Audrey Hepburn
- TKG marking: 2/2 nodes marked as "suggested"
## Final Results
| Metric | Before | After |
|--------|--------|-------|
| face_detections | 12726 (3x duplicates) | 4242 (unique) |
| trace_id populated | 0 | 4242 (100%) |
| TKG face_track nodes | 0 | 426 |
| Identity suggestions | 0 | 2 (Audrey Hepburn) |
**Identity Matches**:
- Trace 202: Audrey Hepburn (score=0.6002)
- Trace 311: Audrey Hepburn (score=0.6724)
## Technical Details
### Data Sources
- face.json: 3176 frames, 4242 faces
- face_traced.json: 426 traces (IoU tracking)
- Qdrant _faces: 374 traces with embeddings
- Qdrant _seeds: 2 TMDb seeds
### Tools Used
- PostgreSQL: face_detections, tkg_nodes tables
- Python: store_traced_faces.py, identity_matcher.py
- Qdrant: _faces, _seeds collections
## Next Steps
1. User confirmation: Check suggested identities via Portal UI
2. Manual confirmation: Confirm Audrey Hepburn matches
3. Propagation: Run Round 2 matching (propagate confirmed identities)
4. Stranger clustering: Cluster unmatched traces (TH=0.40)
## Files Modified
- PostgreSQL: public.face_detections (deleted 8484 duplicates)
- PostgreSQL: public.tkg_nodes (created 426 face_track nodes)
- Qdrant: _faces collection (updated 3176 trace_ids)
## Related Documents
- docs/PROCESSING_PIPELINE.md
- src/core/processor/tkg.rs:550-683 (build_face_track_nodes)
- scripts/store_traced_faces.py (trace_id storage)
- scripts/identity_matcher.py (TMDb matching)

View File

@@ -0,0 +1,116 @@
---
title: Cut Scene Detection Escape Fix
date: 2026-06-30
author: OpenCode
status: completed
---
## Summary
**Problem**: Cut scene detection returned only 1 scene (fallback) instead of 833 scenes for Charade video.
**Root Cause**: Python script `cut_processor.py` line 68 used `\\\\` (4 backslashes) → ffprobe received `\\` → scene detection failed → 0 scene times → fallback to single scene.
## Fix
### Code Changes
1. **scripts/cut_processor.py** line 68:
- Before: `f"movie={video_path},select='gt(scene\\\\,0.3)',showinfo"`
- After: `f"movie={video_path},select='gt(scene\\,0.3)',showinfo"`
2. **src/core/processor/cut.rs** line 127:
- Already correct: `&format!("movie={},select='gt(scene\\,0.3)',showinfo", video_path)`
- No changes needed
### Escape Analysis
| Escape Level | Python String | ffprobe receives | Result |
|--------------|---------------|------------------|--------|
| `\\\\` | `"\\"` | `\\` | ❌ 0 scenes |
| `\\` | `"\\"` | `\` | ✅ 832 scenes |
| `\` (raw) | `r"\ "` | `\` | ✅ 832 scenes |
### Testing
```bash
# Before fix
python3 scripts/cut_processor.py video.mp4 output.json
# Result: 1 scene (fallback)
# After fix
python3 scripts/cut_processor.py video.mp4 output.json
# Result: 833 scenes
```
## Verification
### File: 3dfc20618fb522e795240b5f0e5ff6f0 (Charade)
| Metric | Before | After |
|--------|--------|-------|
| cut.json scenes | 1 | 833 |
| workspace.sqlite pre_chunks (cut) | 12 | 833 |
| Scene 1 end_frame | 162695 (whole video) | 932 |
### Workspace.sqlite Status
```bash
sqlite3 output/3dfc20618fb522e795240b5f0e5ff6f0.workspace.sqlite \
"SELECT processor_type, COUNT(*) FROM pre_chunks GROUP BY processor_type;"
cut|833
ocr|942
```
## Technical Details
### ffprobe Command
Correct format:
```bash
ffprobe -v quiet -show_entries frame=pts_time -of default=nk=0 \
-f lavfi "movie=/path/to/video.mp4,select='gt(scene\\,0.3)',showinfo" \
-show_frames
```
- `scene\\,0.3` in shell → ffprobe receives `scene\,0.3`
- The `\` escapes the comma in ffmpeg filter syntax
### Python subprocess Behavior
- Without `shell=True`: arguments passed directly to executable
- Python string `"\\\\"` → subprocess receives `"\\"`
- Python string `"\\"` → subprocess receives `"\"`
- Raw string `r"\ "` → subprocess receives `"\"`
## Impact
### Affected Videos
- Charade (UUID: 3dfc20618fb522e795240b5f0e5ff6f0)
- Other videos registered before this fix may have incorrect scene counts
### Remediation
1. Re-run cut detection for affected videos
2. Update workspace.sqlite pre_chunks
3. If in PostgreSQL: update public.pre_chunks table
## Next Steps
1. Verify fix in production by registering new video
2. Check if other videos need remediation
3. Consider adding unit test for cut escape handling
## Related Files
- scripts/cut_processor.py
- src/core/processor/cut.rs
- src/api/files.rs (register API uses Python script)
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-06-30 | OpenCode | Initial report |

View File

@@ -0,0 +1,117 @@
# Face Detections 表清理計劃
## 問題
所有使用 `face_detections` 表的代碼都是錯誤的,需要改為使用 Qdrant workspace traces。
## 正確架構
### PostgreSQL
```
identities (全局人物主表)
├── id
├── uuid
├── name
├── status
└── metadata
```
### Qdrant Payload
```
{prefix}_workspace_traces (512d vectors)
├── file_uuid
├── trace_id
├── frame_number
├── identity_id ← 绑定存储在这里
├── bbox
├── confidence
└── embedding
```
## 錯誤代碼位置 (197 處)
### 1. Processor 層 (寫入錯誤)
- `src/core/processor/processor.rs` - line 744, 1311
- `src/core/processor/job_worker.rs` - line 647
- `src/core/db/workspace_sqlite.rs` - line 257-263 (函數定義)
- `src/core/db/postgres_db.rs` - line 2712 (函數定義)
### 2. TKG 處理器 (大量使用)
- `src/core/processor/tkg.rs` - ~50 處使用 `face_detections`
### 3. Chunk Ingest
- `src/core/chunk/trace_ingest.rs` - line 10
- `src/core/chunk/rule2_ingest.rs` - line 26
### 4. API 層 (查詢/更新錯誤)
- `src/api/identity_api.rs` - 22 處
- `src/api/identity_binding.rs` - 12 處
- `src/api/identities.rs` - 2 處
- `src/api/identity_agent_api.rs` - 7 處
- `src/api/files.rs` - 4 處
- `src/api/media_api.rs` - 3 處
### 5. Identity 層
- `src/core/identity/storage.rs` - 3 處
## 修改計劃
### Phase 1: 分析現有代碼
1. 理解當前 face_detections 表的使用方式
2. 理解 Qdrant workspace traces 的結構
3. 確定需要修改的函數列表
### Phase 2: 創建 Qdrant 查詢輔助函數
1. 創建 `QdrantWorkspace` 查詢方法
2. 創建 trace 到 identity 的綁定查詢
3. 創建 face 匹配查詢
### Phase 3: 修改 Processor 層
1. 修改 `processor.rs` - 移除 face_detections 寫入
2. 修改 `job_worker.rs` - 移除 face_detections 查詢
3. 修改 `workspace_sqlite.rs` - 移除 face_detections 相關函數
4. 修改 `postgres_db.rs` - 移除 face_detections 相關函數
### Phase 4: 修改 TKG 處理器
1. 重構 `tkg.rs` - 使用 Qdrant workspace traces 代替 face_detections
2. 移除 `populate_face_detections_from_face_json` 函數
3. 修改 face 匹配邏輯
### Phase 5: 修改 API 層
1. 修改 `identity_api.rs` - 使用 Qdrant 查詢
2. 修改 `identity_binding.rs` - 使用 Qdrant 綁定
3. 修改 `identities.rs` - 使用 Qdrant 查詢
4. 修改 `identity_agent_api.rs` - 使用 Qdrant 匹配
5. 修改 `files.rs` - 移除 face_detections 查詢
6. 修改 `media_api.rs` - 移除 face_detections 查詢
### Phase 6: 修改 Chunk Ingest
1. 修改 `trace_ingest.rs` - 使用 Qdrant traces
2. 修改 `rule2_ingest.rs` - 使用 Qdrant traces
### Phase 7: 測試
1. 測試 face 追蹤
2. 測試 identity 綁定
3. 測試 TKG 構建
4. 測試 API 端點
### Phase 8: 清理
1. 移除 face_detections 表(可選)
2. 更新文檔
3. 更新測試
## 風險評估
- **高風險**: TKG 處理器有大量 face_detections 使用
- **中風險**: API 層需要重構查詢邏輯
- **低風險**: Processor 層修改相對簡單
## 預估時間
- Phase 1-2: 2-3 小時
- Phase 3-4: 4-6 小時
- Phase 5-6: 3-4 小時
- Phase 7-8: 2-3 小時
- **總計**: 11-16 小時
## 依賴關係
- 需要 Qdrant workspace traces 正確填充
- 需要 face.json 格式正確
- 需要 SwiftFacePose 正常工作