← Back to index Logout

Pipeline

Pipeline Completion Flow

The pipeline is not complete until both the 10 processors AND the 入庫 (ingestion) steps have finished. The worker polls every 3 seconds and only marks the job as completed when all ingestion steps verify OK.

10 processors done
       (job status stays "running")
Algorithm 1 Trigger: Rule 1 + Vectorize + Phase 1 Pack
       (job runs in parallel)
Algorithm 2 Trigger: Face Trace  TKG, Scene Metadata, Identity Agent, 5W1H Agent
       (poll checks every 3s)
Ingestion verification: rule1  vectorize  rule3  face_trace  tkg  scene_meta  5w1h 
     
job status = "completed"

10 Processor Stages

# Processor Depends On Description
1 Cut Scene boundary detection (PySceneDetect)
2 ASR Cut Automatic speech recognition (faster-whisper)
3 ASRX ASR Speaker diarization + ASR refinement
4 YOLO Object detection (YOLOv8)
5 OCR Optical character recognition
6 Face Face detection + recognition (InsightFace + CoreML)
7 Pose Pose estimation
8 VisualChunk YOLO Visual object chunking
9 Story ASRX + Cut + YOLO + Face Narrative scene summarization (LLM, with embedding)
10 5W1H Story Who/What/When/Where/Why extraction (LLM, with embedding)

入庫 (Post-Processing / Ingestion)

These steps run after the 10 processors and are required for pipeline completion. The worker checks all of them before marking the job as done.

# Step Triggers When Verification
1 Rule 1 Sentence Chunking ASR + ASRX done chunk table has rows with chunk_type = 'sentence'
2 Auto-Vectorize Rule 1 done chunk.embedding IS NOT NULL for sentence chunks
3 Phase 1 Pack Rule 1 done release_pack.py --phase 1 executed
4 Rule 3 Scene Chunking All 10 processors done + Cut + ASR chunk table has rows with chunk_type = 'cut'
5 Face Trace All 10 processors done + Face face_detections.trace_id IS NOT NULL
6 Qdrant Face Sync Face Trace done Qdrant face_embedding collection populated
7 Trace Chunks Face Trace done chunk table has rows with chunk_type = 'trace'
8 TKG Builder Face Trace done tkg_nodes + tkg_edges tables have rows
9 TMDb Face Matching TMDb enabled + Face done face_detections.identity_id IS NOT NULL
10 Heuristic Scene Metadata Face + YOLO done {file_uuid}.scene_meta.json exists on disk
11 Template 5W1H Story Summary (PG) Story done chunk.embedding IS NOT NULL for story chunks
12 LLM 5W1H Summary (PG) 5W1H done chunk.embedding IS NOT NULL for llm chunks
13 Voice Embedding (Qdrant) ASRX done Qdrant voice collection populated
14 Face Embedding (Qdrant) Face done Qdrant face collection populated
15 Identity Agent Face + ASRX done identities with source = 'identity_agent'
16 5W1H Agent Cut + ASR done chunk.summary_text IS NOT NULL for cut chunks
17 Release Pack 5W1H Agent done release_pack.py --phase 2 executed

Ingestion Status

Check real-time ingestion status for a file:

curl "$API/api/v1/stats/ingestion-status/{file_uuid}"

Returns per-step done / pending status with detail counts.

Example

curl "http://localhost:3003/api/v1/stats/ingestion-status/bd80fec9c42afb0307eb28f22c64c76a" | jq '.steps[] | {name, status, detail}'

Response

{
  "file_uuid": "bd80fec9c42afb0307eb28f22c64c76a",
  "steps": [
    { "name": "rule1_sentence", "status": "pending", "detail": "0 sentence chunks" },
    { "name": "auto_vectorize",  "status": "pending", "detail": "0 embedded" },
    { "name": "rule3_scene",     "status": "pending", "detail": "0 scene chunks" },
    { "name": "face_trace",      "status": "pending", "detail": "0 traces" },
    { "name": "trace_chunks",    "status": "pending", "detail": "0 trace chunks" },
    { "name": "tkg",             "status": "pending", "detail": "0 nodes, 0 edges" },
    { "name": "identity_match",  "status": "pending", "detail": "0 identities" },
    { "name": "scene_metadata",  "status": "pending", "detail": null },
    { "name": "5w1h",            "status": "pending", "detail": "0 scenes with 5W1H" }
  ]
}

Stats Endpoints

Method Endpoint Auth Description
GET /api/v1/stats/sftpgo No SFTPGo service status
GET /api/v1/stats/ingestion-status/:file_uuid No Per-file ingestion checklist

Configuration

See 13_config.md for runtime configuration endpoints:

Endpoint Description
POST /api/v1/config/cache Toggle Redis cache
POST /api/v1/config/auto-pipeline Toggle auto-pipeline on register
POST /api/v1/config/watcher-auto-register Toggle watcher auto-register

Unmounted Routes

The following routes are defined in source code but are NOT currently mounted in the router:

Endpoint Source file
/api/v1/search/persons universal_search.rs (not mounted)
/api/v1/who who.rs
/api/v1/who/candidates who.rs