<!-- module: pipeline -->
<!-- description: Pipeline processors, ingestion status, stats endpoints -->
<!-- depends: 01_auth -->

## Pipeline

### Dependency Graph

```mermaid
flowchart TB
    subgraph Processors["10 Processors"]
        Cut[Cut] --> ASR[ASR]
        ASR --> ASRX[ASRX]
        ASRX --> Story[Story]
        Cut --> Story
        YOLO[YOLO] --> VisualChunk[VisualChunk]
        VisualChunk --> Story
        Face[Face] --> Story
        Story --> FiveW1H[5W1H]
        OCR[OCR]
        Pose[Pose]
    end

    subgraph Ingestion["入庫 (Post-Processing)"]
        ASR --> Rule1[Rule 1 Sentence]
        ASRX --> Rule1
        Rule1 --> Vectorize[Auto-Vectorize]
        Rule1 --> Phase1[Phase 1 Pack]

        Cut --> Rule3[Rule 3 Scene]
        ASR --> Rule3

        Face --> Trace[Face Trace]
        Trace --> Qdrant[Qdrant Sync]
        Trace --> TraceChunks[Trace Chunks]
        Trace --> TKG[TKG Builder]

        Face --> TMDbMatch[TMDb Match]
        Face --> SceneMeta[Scene Metadata]
        YOLO --> SceneMeta
        Face --> IdentityAgent[Identity Agent]
        ASRX --> IdentityAgent

        Cut --> Agent5W1H[5W1H Agent]
        ASR --> Agent5W1H
        Agent5W1H --> Phase2[Phase 2 Pack]
    end

    style Processors fill:#1a1a2e,stroke:#e94560
    style Ingestion fill:#16213e,stroke:#0f3460
```

### Pipeline Completion Flow

The pipeline is **not complete** until both the 10 processors AND the 入庫 (ingestion) steps have finished. The worker polls every 3 seconds and only marks the job as `completed` when all ingestion steps verify OK.

```
10 processors done
     ↓  (job status stays "running")
Algorithm 1 Trigger: Rule 1 + Vectorize + Phase 1 Pack
     ↓  (job runs in parallel)
Algorithm 2 Trigger: Face Trace → TKG, Scene Metadata, Identity Agent, 5W1H Agent
     ↓  (poll checks every 3s)
Ingestion verification: rule1 ✓ vectorize ✓ rule3 ✓ face_trace ✓ tkg ✓ scene_meta ✓ 5w1h ✓
     ↓
job status = "completed"
```

### 10 Processor Stages

| # | Processor | Depends On | Description |
|---|-----------|------------|-------------|
| 1 | `Cut` | — | Scene boundary detection (PySceneDetect) |
| 2 | `ASR` | Cut | Automatic speech recognition (faster-whisper) |
| 3 | `ASRX` | ASR | Speaker diarization + ASR refinement |
| 4 | `YOLO` | — | Object detection (YOLOv8) |
| 5 | `OCR` | — | Optical character recognition |
| 6 | `Face` | — | Face detection + recognition (InsightFace + CoreML) |
| 7 | `Pose` | — | Pose estimation |
| 8 | `VisualChunk` | YOLO | Visual object chunking |
| 9 | `Story` | ASRX + Cut + YOLO + Face | Narrative scene summarization (LLM, with embedding) |
| 10 | `5W1H` | Story | Who/What/When/Where/Why extraction (LLM, with embedding) |

### 入庫 (Post-Processing / Ingestion)

These steps run after the 10 processors and are **required for pipeline completion**. The worker checks all of them before marking the job as done.

| # | Step | Triggers When | Verification |
|---|------|--------------|-------------|
| 1 | **Rule 1 Sentence Chunking** | ASR + ASRX done | `chunk` table has rows with `chunk_type = 'sentence'` |
| 2 | **Auto-Vectorize** | Rule 1 done | `chunk.embedding` IS NOT NULL for sentence chunks |
| 3 | **Phase 1 Pack** | Rule 1 done | `release_pack.py --phase 1` executed |
| 4 | **Rule 3 Scene Chunking** | All 10 processors done + Cut + ASR | `chunk` table has rows with `chunk_type = 'cut'` |
| 5 | **Face Trace** | All 10 processors done + Face | `face_detections.trace_id` IS NOT NULL |
| 6 | **Qdrant Face Sync** | Face Trace done | Qdrant face_embedding collection populated |
| 7 | **Trace Chunks** | Face Trace done | `chunk` table has rows with `chunk_type = 'trace'` |
| 8 | **TKG Builder** | Face Trace done | `tkg_nodes` + `tkg_edges` tables have rows |
| 9 | **TMDb Face Matching** | TMDb enabled + Face done | `face_detections.identity_id` IS NOT NULL |
| 10 | **Heuristic Scene Metadata** | Face + YOLO done | `{file_uuid}.scene_meta.json` exists on disk |
| 11 | **Identity Agent** | Face + ASRX done | `identities` with `source = 'identity_agent'` |
| 12 | **5W1H Agent** | Cut + ASR done | `chunk.summary_text` IS NOT NULL for cut chunks |
| 13 | **Release Pack** | 5W1H Agent done | `release_pack.py --phase 2` executed |

### Ingestion Status

Check real-time ingestion status for a file:

```bash
curl "$API/api/v1/stats/ingestion-status/{file_uuid}"
```

Returns per-step `done` / `pending` status with detail counts.

#### Example

```bash
curl "http://localhost:3003/api/v1/stats/ingestion-status/bd80fec9c42afb0307eb28f22c64c76a" | jq '.steps[] | {name, status, detail}'
```

#### Response

```json
{
  "file_uuid": "bd80fec9c42afb0307eb28f22c64c76a",
  "steps": [
    { "name": "rule1_sentence", "status": "pending", "detail": "0 sentence chunks" },
    { "name": "auto_vectorize",  "status": "pending", "detail": "0 embedded" },
    { "name": "rule3_scene",     "status": "pending", "detail": "0 scene chunks" },
    { "name": "face_trace",      "status": "pending", "detail": "0 traces" },
    { "name": "trace_chunks",    "status": "pending", "detail": "0 trace chunks" },
    { "name": "tkg",             "status": "pending", "detail": "0 nodes, 0 edges" },
    { "name": "identity_match",  "status": "pending", "detail": "0 identities" },
    { "name": "scene_metadata",  "status": "pending", "detail": null },
    { "name": "5w1h",            "status": "pending", "detail": "0 scenes with 5W1H" }
  ]
}
```

### Stats Endpoints

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| GET | `/api/v1/stats/sftpgo` | No | SFTPGo service status |
| GET | `/api/v1/stats/ingestion-status/:file_uuid` | No | Per-file ingestion checklist |

### Configuration

### `POST /api/v1/config/cache`

**Auth**: Required
**Scope**: system-level

Toggle the Redis cache on or off.

#### Request Parameters

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `enabled` | boolean | Yes | `true` to enable, `false` to disable |

#### Example

```bash
curl -s -X POST "$API/api/v1/config/cache" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"enabled": false}'
```

### Unmounted Routes

The following routes are defined in source code but are **NOT** currently mounted in the router:

| Endpoint | Source file |
|----------|-------------|
| `/api/v1/search/persons` | `universal_search.rs` (not mounted) |
| `/api/v1/who` | `who.rs` |
| `/api/v1/who/candidates` | `who.rs` |