momentry_core/deliverable_v1.1.0/modules/10_pipeline.md at 17e4e15860e059aeecee352d7d1a635d557d27ad

admin/momentry_core

Fork 0

Files

Accusys e1572907ae feat: ASRX hybrid pipeline, identity history, worker fixes, checkpoint system

2026-06-02 07:13:23 +08:00

6.4 KiB

Raw Blame History

Pipeline

Dependency Graph

flowchart TB
    subgraph Processors["10 Processors"]
        Cut[Cut] --> ASR[ASR]
        ASR --> ASRX[ASRX]
        ASRX --> Story[Story]
        Cut --> Story
        YOLO[YOLO] --> VisualChunk[VisualChunk]
        VisualChunk --> Story
        Face[Face] --> Story
        Story --> FiveW1H[5W1H]
        OCR[OCR]
        Pose[Pose]
    end

    subgraph Ingestion["入庫 (Post-Processing)"]
        ASR --> Rule1[Rule 1 Sentence]
        ASRX --> Rule1
        Rule1 --> Vectorize[Auto-Vectorize]
        Rule1 --> Phase1[Phase 1 Pack]

        Cut --> Rule3[Rule 3 Scene]
        ASR --> Rule3

        Face --> Trace[Face Trace]
        Trace --> Qdrant[Qdrant Sync]
        Trace --> TraceChunks[Trace Chunks]
        Trace --> TKG[TKG Builder]

        Face --> TMDbMatch[TMDb Match]
        Face --> SceneMeta[Scene Metadata]
        YOLO --> SceneMeta
        Face --> IdentityAgent[Identity Agent]
        ASRX --> IdentityAgent

        Cut --> Agent5W1H[5W1H Agent]
        ASR --> Agent5W1H
        Agent5W1H --> Phase2[Phase 2 Pack]
    end

    style Processors fill:#1a1a2e,stroke:#e94560
    style Ingestion fill:#16213e,stroke:#0f3460

Pipeline Completion Flow

The pipeline is not complete until both the 10 processors AND the 入庫 (ingestion) steps have finished. The worker polls every 3 seconds and only marks the job as completed when all ingestion steps verify OK.

10 processors done
     ↓  (job status stays "running")
Algorithm 1 Trigger: Rule 1 + Vectorize + Phase 1 Pack
     ↓  (job runs in parallel)
Algorithm 2 Trigger: Face Trace → TKG, Scene Metadata, Identity Agent, 5W1H Agent
     ↓  (poll checks every 3s)
Ingestion verification: rule1 ✓ vectorize ✓ rule3 ✓ face_trace ✓ tkg ✓ scene_meta ✓ 5w1h ✓
     ↓
job status = "completed"

10 Processor Stages

#	Processor	Depends On	Description
1	`Cut`	—	Scene boundary detection (PySceneDetect)
2	`ASR`	Cut	Automatic speech recognition (faster-whisper)
3	`ASRX`	ASR	Speaker diarization + ASR refinement
4	`YOLO`	—	Object detection (YOLOv8)
5	`OCR`	—	Optical character recognition
6	`Face`	—	Face detection + recognition (InsightFace + CoreML)
7	`Pose`	—	Pose estimation
8	`VisualChunk`	YOLO	Visual object chunking
9	`Story`	ASRX + Cut + YOLO + Face	Narrative scene summarization (LLM, with embedding)
10	`5W1H`	Story	Who/What/When/Where/Why extraction (LLM, with embedding)

入庫 (Post-Processing / Ingestion)

These steps run after the 10 processors and are required for pipeline completion. The worker checks all of them before marking the job as done.

#	Step	Triggers When	Verification
1	Rule 1 Sentence Chunking	ASR + ASRX done	`chunk` table has rows with `chunk_type = 'sentence'`
2	Auto-Vectorize	Rule 1 done	`chunk.embedding` IS NOT NULL for sentence chunks
3	Phase 1 Pack	Rule 1 done	`release_pack.py --phase 1` executed
4	Rule 3 Scene Chunking	All 10 processors done + Cut + ASR	`chunk` table has rows with `chunk_type = 'cut'`
5	Face Trace	All 10 processors done + Face	`face_detections.trace_id` IS NOT NULL
6	Qdrant Face Sync	Face Trace done	Qdrant face_embedding collection populated
7	Trace Chunks	Face Trace done	`chunk` table has rows with `chunk_type = 'trace'`
8	TKG Builder	Face Trace done	`tkg_nodes` + `tkg_edges` tables have rows
9	TMDb Face Matching	TMDb enabled + Face done	`face_detections.identity_id` IS NOT NULL
10	Heuristic Scene Metadata	Face + YOLO done	`{file_uuid}.scene_meta.json` exists on disk
11	Identity Agent	Face + ASRX done	`identities` with `source = 'identity_agent'`
12	5W1H Agent	Cut + ASR done	`chunk.summary_text` IS NOT NULL for cut chunks
13	Release Pack	5W1H Agent done	`release_pack.py --phase 2` executed

Ingestion Status

Check real-time ingestion status for a file:

curl "$API/api/v1/stats/ingestion-status/{file_uuid}"

Returns per-step done / pending status with detail counts.

Example

curl "http://localhost:3003/api/v1/stats/ingestion-status/bd80fec9c42afb0307eb28f22c64c76a" | jq '.steps[] | {name, status, detail}'

Response

{
  "file_uuid": "bd80fec9c42afb0307eb28f22c64c76a",
  "steps": [
    { "name": "rule1_sentence", "status": "pending", "detail": "0 sentence chunks" },
    { "name": "auto_vectorize",  "status": "pending", "detail": "0 embedded" },
    { "name": "rule3_scene",     "status": "pending", "detail": "0 scene chunks" },
    { "name": "face_trace",      "status": "pending", "detail": "0 traces" },
    { "name": "trace_chunks",    "status": "pending", "detail": "0 trace chunks" },
    { "name": "tkg",             "status": "pending", "detail": "0 nodes, 0 edges" },
    { "name": "identity_match",  "status": "pending", "detail": "0 identities" },
    { "name": "scene_metadata",  "status": "pending", "detail": null },
    { "name": "5w1h",            "status": "pending", "detail": "0 scenes with 5W1H" }
  ]
}

Stats Endpoints

Method	Endpoint	Auth	Description
GET	`/api/v1/stats/sftpgo`	No	SFTPGo service status
GET	`/api/v1/stats/ingestion-status/:file_uuid`	No	Per-file ingestion checklist

Configuration

`POST /api/v1/config/cache`

Auth: Required Scope: system-level

Toggle the Redis cache on or off.

Request Parameters

Field	Type	Required	Description
`enabled`	boolean	Yes	`true` to enable, `false` to disable

Example

curl -s -X POST "$API/api/v1/config/cache" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"enabled": false}'

Unmounted Routes

The following routes are defined in source code but are NOT currently mounted in the router:

Endpoint	Source file
`/api/v1/search/persons`	`universal_search.rs` (not mounted)
`/api/v1/who`	`who.rs`
`/api/v1/who/candidates`	`who.rs`

6.4 KiB Raw Blame History

Pipeline

Dependency Graph

Pipeline Completion Flow

10 Processor Stages

入庫 (Post-Processing / Ingestion)

Ingestion Status

Example

Response

Stats Endpoints

Configuration

POST /api/v1/config/cache

Request Parameters

Example

Unmounted Routes

6.4 KiB

Raw Blame History

`POST /api/v1/config/cache`