fix: ASRX duplication, TKG edges, trace ingest, and add pipeline progress publishing
- ASRX handler no longer stores duplicate 'asr' pre_chunks - Pre_chunks storage made idempotent (delete-before-insert) - Rule 1 + trace_ingest changed to query 'asrx' not 'asr' - Trace chunks removed (dynamic from TKG/Qdrant) - TKG scroll_face_points fixed: trace_id >= 1 (not == 1) - TKG AsrxSegmentEntry: start/end -> start_time/end_time (match ASRX JSON) - Unregister error handling: log instead of silent discard - Add publish_pipeline_progress calls at each pipeline stage (processors, rule1, face_trace, identity_agent, TKG, rule2, completion)
This commit is contained in:
@@ -0,0 +1,81 @@
|
||||
---
|
||||
title: Charade Identity Processing Fix Report
|
||||
date: 2026-06-29
|
||||
author: OpenCode
|
||||
status: completed
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Problem**: Charade file (UUID: c36f35685177c981aa139b66bbbccc5b) identity processing failed because of data corruption and missing TKG nodes.
|
||||
|
||||
**Root Cause**: Circular dependency chain broken:
|
||||
- face_detections had 3x duplicate records (12726 instead of 4242)
|
||||
- All trace_id = NULL (UPDATE failed)
|
||||
- TKG Phase 2.5 couldn't create face_track nodes (needs trace_id)
|
||||
- Identity Agent couldn't mark suggestions (needs TKG nodes)
|
||||
|
||||
## Fix Steps
|
||||
|
||||
### Step 1: Clean Duplicate Data ✅
|
||||
- Deleted 8484 duplicate records
|
||||
- 12726 → 4242 unique face_detections
|
||||
|
||||
### Step 2: Write trace_id ✅
|
||||
- store_traced_faces.py successfully updated DB
|
||||
- 4242 faces with trace_id (100% populated)
|
||||
- 426 unique traces
|
||||
|
||||
### Step 3: Create TKG Nodes ✅
|
||||
- Created 426 face_track nodes via SQL
|
||||
- Fixed external_id format: "face_track_*" (matches Rust code)
|
||||
|
||||
### Step 4: Run Identity Agent ✅
|
||||
- Identity matching: 2 traces matched to Audrey Hepburn
|
||||
- TKG marking: 2/2 nodes marked as "suggested"
|
||||
|
||||
## Final Results
|
||||
|
||||
| Metric | Before | After |
|
||||
|--------|--------|-------|
|
||||
| face_detections | 12726 (3x duplicates) | 4242 (unique) |
|
||||
| trace_id populated | 0 | 4242 (100%) |
|
||||
| TKG face_track nodes | 0 | 426 |
|
||||
| Identity suggestions | 0 | 2 (Audrey Hepburn) |
|
||||
|
||||
**Identity Matches**:
|
||||
- Trace 202: Audrey Hepburn (score=0.6002)
|
||||
- Trace 311: Audrey Hepburn (score=0.6724)
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Data Sources
|
||||
- face.json: 3176 frames, 4242 faces
|
||||
- face_traced.json: 426 traces (IoU tracking)
|
||||
- Qdrant _faces: 374 traces with embeddings
|
||||
- Qdrant _seeds: 2 TMDb seeds
|
||||
|
||||
### Tools Used
|
||||
- PostgreSQL: face_detections, tkg_nodes tables
|
||||
- Python: store_traced_faces.py, identity_matcher.py
|
||||
- Qdrant: _faces, _seeds collections
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. User confirmation: Check suggested identities via Portal UI
|
||||
2. Manual confirmation: Confirm Audrey Hepburn matches
|
||||
3. Propagation: Run Round 2 matching (propagate confirmed identities)
|
||||
4. Stranger clustering: Cluster unmatched traces (TH=0.40)
|
||||
|
||||
## Files Modified
|
||||
|
||||
- PostgreSQL: public.face_detections (deleted 8484 duplicates)
|
||||
- PostgreSQL: public.tkg_nodes (created 426 face_track nodes)
|
||||
- Qdrant: _faces collection (updated 3176 trace_ids)
|
||||
|
||||
## Related Documents
|
||||
|
||||
- docs/PROCESSING_PIPELINE.md
|
||||
- src/core/processor/tkg.rs:550-683 (build_face_track_nodes)
|
||||
- scripts/store_traced_faces.py (trace_id storage)
|
||||
- scripts/identity_matcher.py (TMDb matching)
|
||||
116
docs_v1.0/M4_workspace/2026-06-30_cut_escape_fix.md
Normal file
116
docs_v1.0/M4_workspace/2026-06-30_cut_escape_fix.md
Normal file
@@ -0,0 +1,116 @@
|
||||
---
|
||||
title: Cut Scene Detection Escape Fix
|
||||
date: 2026-06-30
|
||||
author: OpenCode
|
||||
status: completed
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Problem**: Cut scene detection returned only 1 scene (fallback) instead of 833 scenes for Charade video.
|
||||
|
||||
**Root Cause**: Python script `cut_processor.py` line 68 used `\\\\` (4 backslashes) → ffprobe received `\\` → scene detection failed → 0 scene times → fallback to single scene.
|
||||
|
||||
## Fix
|
||||
|
||||
### Code Changes
|
||||
|
||||
1. **scripts/cut_processor.py** line 68:
|
||||
- Before: `f"movie={video_path},select='gt(scene\\\\,0.3)',showinfo"`
|
||||
- After: `f"movie={video_path},select='gt(scene\\,0.3)',showinfo"`
|
||||
|
||||
2. **src/core/processor/cut.rs** line 127:
|
||||
- Already correct: `&format!("movie={},select='gt(scene\\,0.3)',showinfo", video_path)`
|
||||
- No changes needed
|
||||
|
||||
### Escape Analysis
|
||||
|
||||
| Escape Level | Python String | ffprobe receives | Result |
|
||||
|--------------|---------------|------------------|--------|
|
||||
| `\\\\` | `"\\"` | `\\` | ❌ 0 scenes |
|
||||
| `\\` | `"\\"` | `\` | ✅ 832 scenes |
|
||||
| `\` (raw) | `r"\ "` | `\` | ✅ 832 scenes |
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Before fix
|
||||
python3 scripts/cut_processor.py video.mp4 output.json
|
||||
# Result: 1 scene (fallback)
|
||||
|
||||
# After fix
|
||||
python3 scripts/cut_processor.py video.mp4 output.json
|
||||
# Result: 833 scenes
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
### File: 3dfc20618fb522e795240b5f0e5ff6f0 (Charade)
|
||||
|
||||
| Metric | Before | After |
|
||||
|--------|--------|-------|
|
||||
| cut.json scenes | 1 | 833 |
|
||||
| workspace.sqlite pre_chunks (cut) | 12 | 833 |
|
||||
| Scene 1 end_frame | 162695 (whole video) | 932 |
|
||||
|
||||
### Workspace.sqlite Status
|
||||
|
||||
```bash
|
||||
sqlite3 output/3dfc20618fb522e795240b5f0e5ff6f0.workspace.sqlite \
|
||||
"SELECT processor_type, COUNT(*) FROM pre_chunks GROUP BY processor_type;"
|
||||
|
||||
cut|833
|
||||
ocr|942
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### ffprobe Command
|
||||
|
||||
Correct format:
|
||||
```bash
|
||||
ffprobe -v quiet -show_entries frame=pts_time -of default=nk=0 \
|
||||
-f lavfi "movie=/path/to/video.mp4,select='gt(scene\\,0.3)',showinfo" \
|
||||
-show_frames
|
||||
```
|
||||
|
||||
- `scene\\,0.3` in shell → ffprobe receives `scene\,0.3`
|
||||
- The `\` escapes the comma in ffmpeg filter syntax
|
||||
|
||||
### Python subprocess Behavior
|
||||
|
||||
- Without `shell=True`: arguments passed directly to executable
|
||||
- Python string `"\\\\"` → subprocess receives `"\\"`
|
||||
- Python string `"\\"` → subprocess receives `"\"`
|
||||
- Raw string `r"\ "` → subprocess receives `"\"`
|
||||
|
||||
## Impact
|
||||
|
||||
### Affected Videos
|
||||
|
||||
- Charade (UUID: 3dfc20618fb522e795240b5f0e5ff6f0)
|
||||
- Other videos registered before this fix may have incorrect scene counts
|
||||
|
||||
### Remediation
|
||||
|
||||
1. Re-run cut detection for affected videos
|
||||
2. Update workspace.sqlite pre_chunks
|
||||
3. If in PostgreSQL: update public.pre_chunks table
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Verify fix in production by registering new video
|
||||
2. Check if other videos need remediation
|
||||
3. Consider adding unit test for cut escape handling
|
||||
|
||||
## Related Files
|
||||
|
||||
- scripts/cut_processor.py
|
||||
- src/core/processor/cut.rs
|
||||
- src/api/files.rs (register API uses Python script)
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0 | 2026-06-30 | OpenCode | Initial report |
|
||||
@@ -0,0 +1,117 @@
|
||||
# Face Detections 表清理計劃
|
||||
|
||||
## 問題
|
||||
所有使用 `face_detections` 表的代碼都是錯誤的,需要改為使用 Qdrant workspace traces。
|
||||
|
||||
## 正確架構
|
||||
|
||||
### PostgreSQL
|
||||
```
|
||||
identities (全局人物主表)
|
||||
├── id
|
||||
├── uuid
|
||||
├── name
|
||||
├── status
|
||||
└── metadata
|
||||
```
|
||||
|
||||
### Qdrant Payload
|
||||
```
|
||||
{prefix}_workspace_traces (512d vectors)
|
||||
├── file_uuid
|
||||
├── trace_id
|
||||
├── frame_number
|
||||
├── identity_id ← 绑定存储在这里
|
||||
├── bbox
|
||||
├── confidence
|
||||
└── embedding
|
||||
```
|
||||
|
||||
## 錯誤代碼位置 (197 處)
|
||||
|
||||
### 1. Processor 層 (寫入錯誤)
|
||||
- `src/core/processor/processor.rs` - line 744, 1311
|
||||
- `src/core/processor/job_worker.rs` - line 647
|
||||
- `src/core/db/workspace_sqlite.rs` - line 257-263 (函數定義)
|
||||
- `src/core/db/postgres_db.rs` - line 2712 (函數定義)
|
||||
|
||||
### 2. TKG 處理器 (大量使用)
|
||||
- `src/core/processor/tkg.rs` - ~50 處使用 `face_detections` 表
|
||||
|
||||
### 3. Chunk Ingest
|
||||
- `src/core/chunk/trace_ingest.rs` - line 10
|
||||
- `src/core/chunk/rule2_ingest.rs` - line 26
|
||||
|
||||
### 4. API 層 (查詢/更新錯誤)
|
||||
- `src/api/identity_api.rs` - 22 處
|
||||
- `src/api/identity_binding.rs` - 12 處
|
||||
- `src/api/identities.rs` - 2 處
|
||||
- `src/api/identity_agent_api.rs` - 7 處
|
||||
- `src/api/files.rs` - 4 處
|
||||
- `src/api/media_api.rs` - 3 處
|
||||
|
||||
### 5. Identity 層
|
||||
- `src/core/identity/storage.rs` - 3 處
|
||||
|
||||
## 修改計劃
|
||||
|
||||
### Phase 1: 分析現有代碼
|
||||
1. 理解當前 face_detections 表的使用方式
|
||||
2. 理解 Qdrant workspace traces 的結構
|
||||
3. 確定需要修改的函數列表
|
||||
|
||||
### Phase 2: 創建 Qdrant 查詢輔助函數
|
||||
1. 創建 `QdrantWorkspace` 查詢方法
|
||||
2. 創建 trace 到 identity 的綁定查詢
|
||||
3. 創建 face 匹配查詢
|
||||
|
||||
### Phase 3: 修改 Processor 層
|
||||
1. 修改 `processor.rs` - 移除 face_detections 寫入
|
||||
2. 修改 `job_worker.rs` - 移除 face_detections 查詢
|
||||
3. 修改 `workspace_sqlite.rs` - 移除 face_detections 相關函數
|
||||
4. 修改 `postgres_db.rs` - 移除 face_detections 相關函數
|
||||
|
||||
### Phase 4: 修改 TKG 處理器
|
||||
1. 重構 `tkg.rs` - 使用 Qdrant workspace traces 代替 face_detections
|
||||
2. 移除 `populate_face_detections_from_face_json` 函數
|
||||
3. 修改 face 匹配邏輯
|
||||
|
||||
### Phase 5: 修改 API 層
|
||||
1. 修改 `identity_api.rs` - 使用 Qdrant 查詢
|
||||
2. 修改 `identity_binding.rs` - 使用 Qdrant 綁定
|
||||
3. 修改 `identities.rs` - 使用 Qdrant 查詢
|
||||
4. 修改 `identity_agent_api.rs` - 使用 Qdrant 匹配
|
||||
5. 修改 `files.rs` - 移除 face_detections 查詢
|
||||
6. 修改 `media_api.rs` - 移除 face_detections 查詢
|
||||
|
||||
### Phase 6: 修改 Chunk Ingest
|
||||
1. 修改 `trace_ingest.rs` - 使用 Qdrant traces
|
||||
2. 修改 `rule2_ingest.rs` - 使用 Qdrant traces
|
||||
|
||||
### Phase 7: 測試
|
||||
1. 測試 face 追蹤
|
||||
2. 測試 identity 綁定
|
||||
3. 測試 TKG 構建
|
||||
4. 測試 API 端點
|
||||
|
||||
### Phase 8: 清理
|
||||
1. 移除 face_detections 表(可選)
|
||||
2. 更新文檔
|
||||
3. 更新測試
|
||||
|
||||
## 風險評估
|
||||
- **高風險**: TKG 處理器有大量 face_detections 使用
|
||||
- **中風險**: API 層需要重構查詢邏輯
|
||||
- **低風險**: Processor 層修改相對簡單
|
||||
|
||||
## 預估時間
|
||||
- Phase 1-2: 2-3 小時
|
||||
- Phase 3-4: 4-6 小時
|
||||
- Phase 5-6: 3-4 小時
|
||||
- Phase 7-8: 2-3 小時
|
||||
- **總計**: 11-16 小時
|
||||
|
||||
## 依賴關係
|
||||
- 需要 Qdrant workspace traces 正確填充
|
||||
- 需要 face.json 格式正確
|
||||
- 需要 SwiftFacePose 正常工作
|
||||
Reference in New Issue
Block a user