## Pipeline ### Dependency Graph ```mermaid flowchart TB subgraph Processors["10 Processors"] Cut[Cut] --> ASR[ASR] ASR --> ASRX[ASRX] ASRX --> Story[Story] Cut --> Story YOLO[YOLO] --> VisualChunk[VisualChunk] VisualChunk --> Story Face[Face] --> Story Story --> FiveW1H[5W1H] OCR[OCR] Pose[Pose] end subgraph Ingestion["入庫 (Post-Processing)"] ASR --> Rule1[Rule 1 Sentence] ASRX --> Rule1 Rule1 --> Vectorize[Auto-Vectorize] Rule1 --> Phase1[Phase 1 Pack] Cut --> Rule3[Rule 3 Scene] ASR --> Rule3 Face --> Trace[Face Trace] Trace --> Qdrant[Qdrant Sync] Trace --> TraceChunks[Trace Chunks] Trace --> TKG[TKG Builder] Face --> TMDbMatch[TMDb Match] Face --> SceneMeta[Scene Metadata] YOLO --> SceneMeta Face --> IdentityAgent[Identity Agent] ASRX --> IdentityAgent Cut --> Agent5W1H[5W1H Agent] ASR --> Agent5W1H Agent5W1H --> Phase2[Phase 2 Pack] end style Processors fill:#1a1a2e,stroke:#e94560 style Ingestion fill:#16213e,stroke:#0f3460 ``` ### Pipeline Completion Flow The pipeline is **not complete** until both the 10 processors AND the 入庫 (ingestion) steps have finished. The worker polls every 3 seconds and only marks the job as `completed` when all ingestion steps verify OK. ``` 10 processors done ↓ (job status stays "running") Algorithm 1 Trigger: Rule 1 + Vectorize + Phase 1 Pack ↓ (job runs in parallel) Algorithm 2 Trigger: Face Trace → TKG, Scene Metadata, Identity Agent, 5W1H Agent ↓ (poll checks every 3s) Ingestion verification: rule1 ✓ vectorize ✓ rule3 ✓ face_trace ✓ tkg ✓ scene_meta ✓ 5w1h ✓ ↓ job status = "completed" ``` ### 10 Processor Stages | # | Processor | Depends On | Description | |---|-----------|------------|-------------| | 1 | `Cut` | — | Scene boundary detection (PySceneDetect) | | 2 | `ASR` | Cut | Automatic speech recognition (faster-whisper) | | 3 | `ASRX` | ASR | Speaker diarization + ASR refinement | | 4 | `YOLO` | — | Object detection (YOLOv8) | | 5 | `OCR` | — | Optical character recognition | | 6 | `Face` | — | Face detection + recognition (InsightFace + CoreML) | | 7 | `Pose` | — | Pose estimation | | 8 | `VisualChunk` | YOLO | Visual object chunking | | 9 | `Story` | ASRX + Cut + YOLO + Face | Narrative scene summarization (LLM, with embedding) | | 10 | `5W1H` | Story | Who/What/When/Where/Why extraction (LLM, with embedding) | ### 入庫 (Post-Processing / Ingestion) These steps run after the 10 processors and are **required for pipeline completion**. The worker checks all of them before marking the job as done. | # | Step | Triggers When | Verification | |---|------|--------------|-------------| | 1 | **Rule 1 Sentence Chunking** | ASR + ASRX done | `chunk` table has rows with `chunk_type = 'sentence'` | | 2 | **Auto-Vectorize** | Rule 1 done | `chunk.embedding` IS NOT NULL for sentence chunks | | 3 | **Phase 1 Pack** | Rule 1 done | `release_pack.py --phase 1` executed | | 4 | **Rule 3 Scene Chunking** | All 10 processors done + Cut + ASR | `chunk` table has rows with `chunk_type = 'cut'` | | 5 | **Face Trace** | All 10 processors done + Face | `face_detections.trace_id` IS NOT NULL | | 6 | **Qdrant Face Sync** | Face Trace done | Qdrant face_embedding collection populated | | 7 | **Trace Chunks** | Face Trace done | `chunk` table has rows with `chunk_type = 'trace'` | | 8 | **TKG Builder** | Face Trace done | `tkg_nodes` + `tkg_edges` tables have rows | | 9 | **TMDb Face Matching** | TMDb enabled + Face done | `face_detections.identity_id` IS NOT NULL | | 10 | **Heuristic Scene Metadata** | Face + YOLO done | `{file_uuid}.scene_meta.json` exists on disk | | 11 | **Identity Agent** | Face + ASRX done | `identities` with `source = 'identity_agent'` | | 12 | **5W1H Agent** | Cut + ASR done | `chunk.summary_text` IS NOT NULL for cut chunks | | 13 | **Release Pack** | 5W1H Agent done | `release_pack.py --phase 2` executed | ### Ingestion Status Check real-time ingestion status for a file: ```bash curl "$API/api/v1/stats/ingestion-status/{file_uuid}" ``` Returns per-step `done` / `pending` status with detail counts. #### Example ```bash curl "http://localhost:3003/api/v1/stats/ingestion-status/bd80fec9c42afb0307eb28f22c64c76a" | jq '.steps[] | {name, status, detail}' ``` #### Response ```json { "file_uuid": "bd80fec9c42afb0307eb28f22c64c76a", "steps": [ { "name": "rule1_sentence", "status": "pending", "detail": "0 sentence chunks" }, { "name": "auto_vectorize", "status": "pending", "detail": "0 embedded" }, { "name": "rule3_scene", "status": "pending", "detail": "0 scene chunks" }, { "name": "face_trace", "status": "pending", "detail": "0 traces" }, { "name": "trace_chunks", "status": "pending", "detail": "0 trace chunks" }, { "name": "tkg", "status": "pending", "detail": "0 nodes, 0 edges" }, { "name": "identity_match", "status": "pending", "detail": "0 identities" }, { "name": "scene_metadata", "status": "pending", "detail": null }, { "name": "5w1h", "status": "pending", "detail": "0 scenes with 5W1H" } ] } ``` ### Stats Endpoints | Method | Endpoint | Auth | Description | |--------|----------|------|-------------| | GET | `/api/v1/stats/sftpgo` | No | SFTPGo service status | | GET | `/api/v1/stats/ingestion-status/:file_uuid` | No | Per-file ingestion checklist | ### Configuration ### `POST /api/v1/config/cache` **Auth**: Required **Scope**: system-level Toggle the Redis cache on or off. #### Request Parameters | Field | Type | Required | Description | |-------|------|----------|-------------| | `enabled` | boolean | Yes | `true` to enable, `false` to disable | #### Example ```bash curl -s -X POST "$API/api/v1/config/cache" \ -H "Content-Type: application/json" \ -H "X-API-Key: $KEY" \ -d '{"enabled": false}' ``` ### Unmounted Routes The following routes are defined in source code but are **NOT** currently mounted in the router: | Endpoint | Source file | |----------|-------------| | `/api/v1/search/persons` | `universal_search.rs` (not mounted) | | `/api/v1/who` | `who.rs` | | `/api/v1/who/candidates` | `who.rs` |