fix: ASRX duplication, TKG edges, trace ingest, and add pipeline progress publishing

- ASRX handler no longer stores duplicate 'asr' pre_chunks
- Pre_chunks storage made idempotent (delete-before-insert)
- Rule 1 + trace_ingest changed to query 'asrx' not 'asr'
- Trace chunks removed (dynamic from TKG/Qdrant)
- TKG scroll_face_points fixed: trace_id >= 1 (not == 1)
- TKG AsrxSegmentEntry: start/end -> start_time/end_time (match ASRX JSON)
- Unregister error handling: log instead of silent discard
- Add publish_pipeline_progress calls at each pipeline stage
  (processors, rule1, face_trace, identity_agent, TKG, rule2, completion)
This commit is contained in:
Accusys
2026-07-02 10:43:46 +08:00
parent d791d138f2
commit 3eabd45882
65 changed files with 9481 additions and 3856 deletions

View File

@@ -0,0 +1,545 @@
<!-- module: progress -->
<!-- description: Real-time progress tracking for processing pipeline, TKG build, and identity agent -->
<!-- depends: 01_auth, 03_register, 05_process -->
# Progress Tracking — API Workspace Module
## Overview
The progress tracking system provides real-time visibility into all processing stages:
| System | Redis Key | Coverage |
|--------|-----------|----------|
| **Processor Progress** | `{prefix}progress:{file_uuid}` | 7 main processors (cut, asr, asrx, ocr, face, pose, appearance) |
| **TKG Progress** | `{prefix}progress:{file_uuid}:tkg` | 18 TKG build phases (9 node types + 8 edge types + face_tracing) |
| **Agent Progress** | `{prefix}progress:{file_uuid}:agent` | 5 Identity Agent phases |
---
## `POST /api/v1/progress/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get real-time processing progress including processor status, TKG build phases, and identity agent phases.
### Example
```bash
curl -s -X POST "$API/api/v1/progress/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"overall_progress": 71,
"cpu_percent": 45.2,
"gpu_percent": 30.1,
"memory_percent": 62.4,
"processors": [
{"name": "asr", "status": "complete", "progress": 100, "current": 0, "total": 0, "message": "done"},
{"name": "face", "status": "complete", "progress": 100, "current": 0, "total": 0, "message": "done"},
{"name": "pose", "status": "complete", "progress": 100, "current": 0, "total": 0, "message": "done"}
],
"tkg_progress": {
"file_uuid": "3a6c1865...",
"phase": "mutual_gaze_edges",
"phase_index": 13,
"total_phases": 18,
"phase_progress": 0.8,
"overall_progress": 0.72,
"stats": {
"total_faces": 1250,
"traced_faces": 1250,
"total_traces": 45,
"face_track_nodes": 45,
"gaze_track_nodes": 45,
"lip_track_nodes": 12,
"text_region_nodes": 8,
"appearance_nodes": 38,
"accessory_nodes": 5,
"object_nodes": 156,
"hand_nodes": 22,
"speaker_nodes": 14,
"co_occurrence_edges": 890,
"speaker_face_edges": 120,
"face_face_edges": 234,
"mutual_gaze_edges": 67,
"total_nodes": 345,
"total_edges": 1311
},
"message": "67 mutual gaze edges",
"updated_at": "2026-07-02T10:30:00Z"
},
"agent_progress": {
"file_uuid": "3a6c1865...",
"phase": "completed",
"phase_index": 5,
"total_phases": 5,
"phase_progress": 1.0,
"overall_progress": 1.0,
"stats": {
"total_faces": 1250,
"total_traces": 45,
"clusters": 18,
"identities_created": 18,
"tmdb_matches": 5,
"speaker_bindings": 12,
"confirmations": 18
},
"message": "Identity Agent processing completed",
"updated_at": "2026-07-02T10:28:00Z"
}
}
```
### Field Descriptions
#### Top Level
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `overall_progress` | integer | Overall processor progress (0100) |
| `processors` | array | Per-processor status |
| `tkg_progress` | object | TKG build progress (null if not started) |
| `agent_progress` | object | Identity Agent progress (null if not started) |
#### TKG Progress Fields
| Field | Type | Description |
|-------|------|-------------|
| `phase` | string | Current phase name (see TKG Phases below) |
| `phase_index` | integer | Current phase index (017) |
| `total_phases` | integer | Total phases: 18 |
| `phase_progress` | float | Progress within current phase (0.01.0) |
| `overall_progress` | float | Overall TKG progress (0.01.0) |
| `stats` | object | Counts for all node and edge types |
| `message` | string | Human-readable status message |
#### TKG Phases (18 total)
| Index | Phase | Description |
|-------|-------|-------------|
| 0 | `face_tracing` | Populate trace_id from face.json |
| 1 | `face_track_nodes` | Build face_track nodes |
| 2 | `gaze_track_nodes` | Build gaze_track nodes |
| 3 | `lip_track_nodes` | Build lip_track nodes |
| 4 | `text_region_nodes` | Build text_region nodes |
| 5 | `appearance_nodes` | Build appearance_trace nodes |
| 6 | `accessory_nodes` | Build accessory nodes |
| 7 | `object_nodes` | Build yolo_object nodes |
| 8 | `hand_nodes` | Build hand nodes |
| 9 | `speaker_nodes` | Build speaker nodes |
| 10 | `co_occurrence_edges` | Build co_occurrence edges |
| 11 | `speaker_face_edges` | Build speaker_face edges |
| 12 | `face_face_edges` | Build face_face edges |
| 13 | `mutual_gaze_edges` | Build mutual_gaze edges |
| 14 | `lip_sync_edges` | Build lip_sync edges |
| 15 | `has_appearance_edges` | Build has_appearance edges |
| 16 | `wears_edges` | Build wears edges |
| 17 | `hand_object_edges` | Build hand_object edges |
#### TKG Stats Fields
| Field | Type | Description |
|-------|------|-------------|
| `total_faces` | integer | Total face detections |
| `traced_faces` | integer | Faces with trace_id assigned |
| `total_traces` | integer | Unique trace count |
| `face_track_nodes` | integer | Face track nodes created |
| `gaze_track_nodes` | integer | Gaze track nodes created |
| `lip_track_nodes` | integer | Lip track nodes created |
| `text_region_nodes` | integer | Text region nodes created |
| `appearance_nodes` | integer | Appearance trace nodes created |
| `accessory_nodes` | integer | Accessory nodes created |
| `object_nodes` | integer | YOLO object nodes created |
| `hand_nodes` | integer | Hand nodes created |
| `speaker_nodes` | integer | Speaker nodes created |
| `co_occurrence_edges` | integer | Co-occurrence edges created |
| `speaker_face_edges` | integer | Speaker-face edges created |
| `face_face_edges` | integer | Face-face edges created |
| `mutual_gaze_edges` | integer | Mutual gaze edges created |
| `lip_sync_edges` | integer | Lip sync edges created |
| `has_appearance_edges` | integer | Has-appearance edges created |
| `wears_edges` | integer | Wears edges created |
| `hand_object_edges` | integer | Hand-object edges created |
| `total_nodes` | integer | Total nodes (sum of all node types) |
| `total_edges` | integer | Total edges (sum of all edge types) |
---
## `GET /api/v1/stats/ingestion-status/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get detailed ingestion status showing completion of all 24 processing steps.
### Example
```bash
curl -s "$API/api/v1/stats/ingestion-status/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.steps[] | {name, status, detail}'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"steps": [
{"name": "rule1_sentence", "status": "done", "detail": "156 sentence chunks"},
{"name": "auto_vectorize", "status": "done", "detail": "156 embedded"},
{"name": "face_track", "status": "done", "detail": "45 traces / 1250 detections"},
{"name": "trace_chunks", "status": "done", "detail": "45 trace chunks"},
{"name": "tkg_face_track", "status": "done", "detail": "45 nodes"},
{"name": "tkg_gaze_track", "status": "done", "detail": "45 nodes"},
{"name": "tkg_lip_track", "status": "done", "detail": "12 nodes"},
{"name": "tkg_text_region", "status": "done", "detail": "8 nodes"},
{"name": "tkg_appearance", "status": "done", "detail": "38 nodes"},
{"name": "tkg_accessory", "status": "done", "detail": "5 nodes"},
{"name": "tkg_object", "status": "done", "detail": "156 nodes"},
{"name": "tkg_hand", "status": "done", "detail": "22 nodes"},
{"name": "tkg_speaker", "status": "done", "detail": "14 nodes"},
{"name": "tkg_co_occurrence", "status": "done", "detail": "890 edges"},
{"name": "tkg_speaker_face", "status": "done", "detail": "120 edges"},
{"name": "tkg_face_face", "status": "done", "detail": "234 edges"},
{"name": "tkg_mutual_gaze", "status": "done", "detail": "67 edges"},
{"name": "tkg_lip_sync", "status": "done", "detail": "12 edges"},
{"name": "tkg_has_appearance", "status": "done", "detail": "38 edges"},
{"name": "tkg_wears", "status": "done", "detail": "22 edges"},
{"name": "tkg_hand_object", "status": "done", "detail": "18 edges"},
{"name": "rule2_relationship", "status": "done", "detail": "1331 relationship chunks"},
{"name": "identity_match", "status": "done", "detail": "18 identities matched"},
{"name": "scene_metadata", "status": "done", "detail": null}
],
"related_identities": [
{"uuid": "a9a901056d6b46ff92da0c3c1a57dff4", "name": "John Smith"}
],
"strangers": 3
}
```
### Step Descriptions
| Step | Status When Done |
|------|-----------------|
| `rule1_sentence` | sentence_count > 0 |
| `auto_vectorize` | sentence_embedded > 0 |
| `face_track` | trace_count > 0 |
| `trace_chunks` | trace_chunks > 0 |
| `tkg_face_track``tkg_speaker` | Node count > 0 (9 steps) |
| `tkg_co_occurrence``tkg_hand_object` | Edge count > 0 (8 steps) |
| `rule2_relationship` | relationship_chunks > 0 |
| `identity_match` | identity_count > 0 |
| `scene_metadata` | scene_meta.json exists |
---
## `POST /api/v1/file/:file_uuid/tkg/rebuild`
**Auth**: Required
**Scope**: file-level
Manually trigger TKG rebuild. Automatically triggers Rule 2 ingestion after TKG completes.
### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/tkg/rebuild" \
-H "X-API-Key: $KEY" \
-H "Content-Type: application/json" -d '{}'
```
### Response (200)
```json
{
"success": true,
"message": "TKG rebuild started",
"nodes": 345,
"edges": 1311
}
```
---
## `POST /api/v1/file/:file_uuid/rule2`
**Auth**: Required
**Scope**: file-level
Manually trigger Rule 2 ingestion (TKG edges → relationship chunks).
### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/rule2" \
-H "X-API-Key: $KEY" \
-H "Content-Type: application/json" -d '{}'
```
### Response (200)
```json
{
"success": true,
"message": "Rule 2 ingestion: 1331 relationship chunks created",
"rule2_count": 1331
}
```
---
## Processing Pipeline Flow
```
1. Processors (concurrent)
├── cut, asr, ocr, face, pose, appearance → complete
└── asrx → after cut+asr
2. Post-Processor Triggers (automatic)
├── Rule 1 Ingestion (ASR+OCR → sentence chunks)
├── Face Trace + DB Store (face_traced.json → Qdrant trace_id)
├── TMDb Face Matching (if enabled)
├── Heuristic Scene Metadata
├── Identity Agent (face + ASRX)
└── TKG Build (automatic after processors complete)
└── Rule 2 Ingestion (automatic after TKG)
└── Relationship chunks vectorized
3. Completion
└── Job marked completed when all ingestion steps done
```
## Error Codes
| Code | HTTP | When |
|------|------|------|
| E001 | 400 | Invalid file_uuid format |
| E002 | 404 | File not found |
| E003 | 404 | No TKG data available |
| E010 | 500 | Qdrant connection failed |
| E011 | 500 | Database connection failed |
---
## `GET /api/v1/stats/pipeline/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get segmented pipeline progress with weighted stage breakdown. Shows overall progress as weighted sum of all pipeline stages.
### Pipeline Stages and Weights
| Stage | Weight | Description |
|-------|--------|-------------|
| `processors` | 30% | 7 concurrent processors (cut, asr, asrx, ocr, face, pose, appearance) |
| `rule1_ingestion` | 5% | ASR+OCR → sentence chunks |
| `face_tracing` | 5% | Face trace_id assignment |
| `identity_agent` | 10% | Identity creation, TMDb matching, speaker binding |
| `tkg_nodes` | 20% | TKG node building (9 node types) |
| `tkg_edges` | 15% | TKG edge building (8 edge types) |
| `rule2_ingestion` | 15% | TKG edges → relationship chunks |
### Example
```bash
curl -s "$API/api/v1/stats/pipeline/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"overall_progress": 0.65,
"stages": [
{"name": "processors", "weight": 0.30, "progress": 1.0, "status": "completed", "detail": "7/7 complete"},
{"name": "rule1_ingestion", "weight": 0.05, "progress": 1.0, "status": "completed", "detail": "156 chunks"},
{"name": "face_tracing", "weight": 0.05, "progress": 1.0, "status": "completed", "detail": "45 traces"},
{"name": "identity_agent", "weight": 0.10, "progress": 1.0, "status": "completed", "detail": "18 identities"},
{"name": "tkg_nodes", "weight": 0.20, "progress": 1.0, "status": "completed", "detail": "345 nodes"},
{"name": "tkg_edges", "weight": 0.15, "progress": 0.5, "status": "running", "detail": "mutual_gaze_edges: 67/8 expected"},
{"name": "rule2_ingestion", "weight": 0.15, "progress": 0.0, "status": "pending", "detail": null}
],
"updated_at": "2026-07-02T10:30:00Z"
}
```
### Field Descriptions
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `overall_progress` | float | Weighted sum of all stage progress (0.01.0) |
| `stages` | array | Per-stage progress breakdown |
| `stages[].name` | string | Stage name |
| `stages[].weight` | float | Stage weight in overall progress |
| `stages[].progress` | float | Stage completion (0.01.0) |
| `stages[].status` | string | `"pending"`, `"running"`, `"completed"`, `"failed"` |
| `stages[].detail` | string | Human-readable detail (optional) |
| `updated_at` | string | ISO 8601 timestamp |
### Overall Progress Calculation
```
overall_progress = Σ(stage.weight × stage.progress) for all stages
```
Example calculation:
- processors: 0.30 × 1.0 = 0.30
- rule1_ingestion: 0.05 × 1.0 = 0.05
- face_tracing: 0.05 × 1.0 = 0.05
- identity_agent: 0.10 × 1.0 = 0.10
- tkg_nodes: 0.20 × 1.0 = 0.20
- tkg_edges: 0.15 × 0.5 = 0.075
- rule2_ingestion: 0.15 × 0.0 = 0.0
- **Total: 0.775 (77.5%)**
---
## `GET /api/v1/stats/file/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get comprehensive file statistics from all data sources: JSON processing status, PostgreSQL counts, Qdrant collections, TKG nodes/edges, and Identity Agent stats.
### Example
```bash
curl -s "$API/api/v1/stats/file/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"file_name": "video.mp4",
"status": "processing",
"processors": [
{"name": "asr", "status": "complete", "progress": 100, "message": "done"},
{"name": "face", "status": "complete", "progress": 100, "message": "done"}
],
"postgres": {
"sentence_chunks": 156,
"trace_chunks": 45,
"relationship_chunks": 1331,
"identities": 18,
"file_identities": 18
},
"qdrant": {
"faces": 1250,
"face_traces": 45,
"face_identities": 18,
"text_chunks": 4562,
"speakers": 434
},
"tkg": {
"total_nodes": 345,
"total_edges": 1311,
"face_track_nodes": 45,
"gaze_track_nodes": 45,
"lip_track_nodes": 12,
"text_region_nodes": 8,
"appearance_nodes": 38,
"accessory_nodes": 5,
"object_nodes": 156,
"hand_nodes": 22,
"speaker_nodes": 14,
"co_occurrence_edges": 890,
"speaker_face_edges": 120,
"face_face_edges": 234,
"mutual_gaze_edges": 67,
"lip_sync_edges": 12,
"has_appearance_edges": 38,
"wears_edges": 22,
"hand_object_edges": 18
},
"identity_agent": {
"clusters": 18,
"identities_created": 18,
"tmdb_matches": 5,
"speaker_bindings": 12,
"confirmations": 18
}
}
```
### Field Descriptions
#### Top Level
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `file_name` | string | Original filename |
| `status` | string | File status: `registered`, `processing`, `completed`, `failed` |
| `processors` | array | Per-processor status from processing_status JSONB |
| `postgres` | object | PostgreSQL table counts |
| `qdrant` | object | Qdrant collection point counts |
| `tkg` | object | TKG node and edge counts by type |
| `identity_agent` | object | Identity Agent statistics |
#### PostgreSQL Stats
| Field | Type | Description |
|-------|------|-------------|
| `sentence_chunks` | integer | Rule 1 sentence chunks count |
| `trace_chunks` | integer | Face trace chunks count |
| `relationship_chunks` | integer | Rule 2 relationship chunks count |
| `identities` | integer | Unique identities bound to this file |
| `file_identities` | integer | File-identity mapping records |
#### Qdrant Stats
| Field | Type | Description |
|-------|------|-------------|
| `faces` | integer | Total face points in `_faces` collection |
| `face_traces` | integer | Unique trace IDs in `_faces` |
| `face_identities` | integer | Unique identity IDs bound in `_faces` |
| `text_chunks` | integer | Text chunk vectors in `momentry_*_rule1_v2` |
| `speakers` | integer | Speaker segments in `momentry_*_speaker` |
#### TKG Stats
| Field | Type | Description |
|-------|------|-------------|
| `total_nodes` | integer | Sum of all node types |
| `total_edges` | integer | Sum of all edge types |
| `face_track_nodes` | integer | Face track nodes |
| `gaze_track_nodes` | integer | Gaze track nodes |
| `lip_track_nodes` | integer | Lip track nodes |
| `text_region_nodes` | integer | Text region nodes |
| `appearance_nodes` | integer | Appearance trace nodes |
| `accessory_nodes` | integer | Accessory nodes |
| `object_nodes` | integer | YOLO object nodes |
| `hand_nodes` | integer | Hand nodes |
| `speaker_nodes` | integer | Speaker nodes |
| `co_occurrence_edges` | integer | Co-occurrence edges |
| `speaker_face_edges` | integer | Speaker-face edges |
| `face_face_edges` | integer | Face-face edges |
| `mutual_gaze_edges` | integer | Mutual gaze edges |
| `lip_sync_edges` | integer | Lip sync edges |
| `has_appearance_edges` | integer | Has-appearance edges |
| `wears_edges` | integer | Wears edges |
| `hand_object_edges` | integer | Hand-object edges |
#### Identity Agent Stats
| Field | Type | Description |
|-------|------|-------------|
| `clusters` | integer | Face clusters from face_clustered.json |
| `identities_created` | integer | Identities created from clusters |
| `tmdb_matches` | integer | TMDb identity matches |
| `speaker_bindings` | integer | Speaker-to-identity bindings |
| `confirmations` | integer | Confirmed identity bindings |