docs: pipeline completion flow requires 入库
This commit is contained in:
@@ -50,6 +50,22 @@ flowchart TB
|
||||
style Ingestion fill:#16213e,stroke:#0f3460
|
||||
```
|
||||
|
||||
### Pipeline Completion Flow
|
||||
|
||||
The pipeline is **not complete** until both the 10 processors AND the 入庫 (ingestion) steps have finished. The worker polls every 3 seconds and only marks the job as `completed` when all ingestion steps verify OK.
|
||||
|
||||
```
|
||||
10 processors done
|
||||
↓ (job status stays "running")
|
||||
Algorithm 1 Trigger: Rule 1 + Vectorize + Phase 1 Pack
|
||||
↓ (job runs in parallel)
|
||||
Algorithm 2 Trigger: Face Trace → TKG, Scene Metadata, Identity Agent, 5W1H Agent
|
||||
↓ (poll checks every 3s)
|
||||
Ingestion verification: rule1 ✓ vectorize ✓ rule3 ✓ face_trace ✓ tkg ✓ scene_meta ✓ 5w1h ✓
|
||||
↓
|
||||
job status = "completed"
|
||||
```
|
||||
|
||||
### 10 Processor Stages
|
||||
|
||||
| # | Processor | Depends On | Description |
|
||||
@@ -65,25 +81,25 @@ flowchart TB
|
||||
| 9 | `Story` | ASRX + Cut + YOLO + Face | Narrative scene summarization (LLM, with embedding) |
|
||||
| 10 | `5W1H` | Story | Who/What/When/Where/Why extraction (LLM, with embedding) |
|
||||
|
||||
### Post-Processing (入庫)
|
||||
### 入庫 (Post-Processing / Ingestion)
|
||||
|
||||
After all 10 processors complete, the pipeline runs the following storage & enrichment steps:
|
||||
These steps run after the 10 processors and are **required for pipeline completion**. The worker checks all of them before marking the job as done.
|
||||
|
||||
| # | Step | Requires | Evidence |
|
||||
|---|------|----------|----------|
|
||||
| 1 | **Rule 1 Sentence Chunking** | ASR + ASRX | `chunk` table, `chunk_type = 'sentence'` |
|
||||
| 2 | **Auto-Vectorize** | Rule 1 | `chunk.embedding` IS NOT NULL (pgvector) |
|
||||
| 3 | **Phase 1 Pack** | Rule 1 | `release_pack.py --phase 1` |
|
||||
| 4 | **Rule 3 Scene Chunking** | Cut + ASR | `chunk` table, `chunk_type = 'cut'` |
|
||||
| 5 | **Face Trace** | Face | `face_detections.trace_id` IS NOT NULL |
|
||||
| 6 | **Qdrant Face Sync** | Face Trace | Qdrant face_embedding collection |
|
||||
| 7 | **Trace Chunks** | Face Trace | `chunk` table, `chunk_type = 'trace'` |
|
||||
| 8 | **TKG Builder** | Face Trace | `tkg_nodes` + `tkg_edges` tables |
|
||||
| 9 | **TMDb Face Matching** | Face + TMDb enabled | `face_detections.identity_id` IS NOT NULL |
|
||||
| 10 | **Heuristic Scene Metadata** | Face + YOLO | `{file_uuid}.scene_meta.json` on disk |
|
||||
| 11 | **Identity Agent** | Face + ASRX | `identities` with `source = 'identity_agent'` |
|
||||
| 12 | **5W1H Agent** | Cut + ASR | `chunk.summary_text` IS NOT NULL (chunk_type = 'cut') |
|
||||
| 13 | **Release Pack** | 5W1H Agent | `release_pack.py --phase 2` output |
|
||||
| # | Step | Triggers When | Verification |
|
||||
|---|------|--------------|-------------|
|
||||
| 1 | **Rule 1 Sentence Chunking** | ASR + ASRX done | `chunk` table has rows with `chunk_type = 'sentence'` |
|
||||
| 2 | **Auto-Vectorize** | Rule 1 done | `chunk.embedding` IS NOT NULL for sentence chunks |
|
||||
| 3 | **Phase 1 Pack** | Rule 1 done | `release_pack.py --phase 1` executed |
|
||||
| 4 | **Rule 3 Scene Chunking** | All 10 processors done + Cut + ASR | `chunk` table has rows with `chunk_type = 'cut'` |
|
||||
| 5 | **Face Trace** | All 10 processors done + Face | `face_detections.trace_id` IS NOT NULL |
|
||||
| 6 | **Qdrant Face Sync** | Face Trace done | Qdrant face_embedding collection populated |
|
||||
| 7 | **Trace Chunks** | Face Trace done | `chunk` table has rows with `chunk_type = 'trace'` |
|
||||
| 8 | **TKG Builder** | Face Trace done | `tkg_nodes` + `tkg_edges` tables have rows |
|
||||
| 9 | **TMDb Face Matching** | TMDb enabled + Face done | `face_detections.identity_id` IS NOT NULL |
|
||||
| 10 | **Heuristic Scene Metadata** | Face + YOLO done | `{file_uuid}.scene_meta.json` exists on disk |
|
||||
| 11 | **Identity Agent** | Face + ASRX done | `identities` with `source = 'identity_agent'` |
|
||||
| 12 | **5W1H Agent** | Cut + ASR done | `chunk.summary_text` IS NOT NULL for cut chunks |
|
||||
| 13 | **Release Pack** | 5W1H Agent done | `release_pack.py --phase 2` executed |
|
||||
|
||||
### Ingestion Status
|
||||
|
||||
|
||||
Reference in New Issue
Block a user