feat: ingestion status endpoint + pipeline doc with 入库 steps
This commit is contained in:
117
docs_v1.0/API_WORKSPACE/modules/10_pipeline.md
Normal file
117
docs_v1.0/API_WORKSPACE/modules/10_pipeline.md
Normal file
@@ -0,0 +1,117 @@
|
||||
<!-- module: pipeline -->
|
||||
<!-- description: Pipeline processors, ingestion status, stats endpoints -->
|
||||
<!-- depends: 01_auth -->
|
||||
|
||||
## Pipeline
|
||||
|
||||
### 10 Processor Stages
|
||||
|
||||
| # | Processor | Depends On | Description |
|
||||
|---|-----------|------------|-------------|
|
||||
| 1 | `Cut` | — | Scene boundary detection (PySceneDetect) |
|
||||
| 2 | `ASR` | Cut | Automatic speech recognition (faster-whisper) |
|
||||
| 3 | `ASRX` | ASR | Speaker diarization + ASR refinement |
|
||||
| 4 | `YOLO` | — | Object detection (YOLOv8) |
|
||||
| 5 | `OCR` | — | Optical character recognition |
|
||||
| 6 | `Face` | — | Face detection + recognition (InsightFace + CoreML) |
|
||||
| 7 | `Pose` | — | Pose estimation |
|
||||
| 8 | `VisualChunk` | YOLO | Visual object chunking |
|
||||
| 9 | `Story` | ASRX + Cut | Narrative scene summarization (LLM, with embedding) |
|
||||
| 10 | `5W1H` | Story | Who/What/When/Where/Why extraction (LLM, with embedding) |
|
||||
|
||||
### Post-Processing (入庫)
|
||||
|
||||
After all 10 processors complete, the pipeline runs the following storage & enrichment steps:
|
||||
|
||||
| # | Step | Requires | Evidence |
|
||||
|---|------|----------|----------|
|
||||
| 1 | **Rule 1 Sentence Chunking** | ASR + ASRX | `chunk` table, `chunk_type = 'sentence'` |
|
||||
| 2 | **Auto-Vectorize** | Rule 1 | `chunk.embedding` IS NOT NULL (pgvector) |
|
||||
| 3 | **Rule 3 Scene Chunking** | Cut + ASR | `chunk` table, `chunk_type = 'cut'` |
|
||||
| 4 | **Face Trace + DB Store** | Face | `face_detections.trace_id` IS NOT NULL |
|
||||
| 5 | **Qdrant Face Sync** | Face Trace | Qdrant collection (face embeddings) |
|
||||
| 6 | **Trace Chunks** | Face Trace | `chunk` table, `chunk_type = 'trace'` |
|
||||
| 7 | **TKG Builder** | Face Trace | `tkg_nodes` + `tkg_edges` tables |
|
||||
| 8 | **TMDb Face Matching** | Face + TMDb enabled | `face_detections.identity_id` IS NOT NULL |
|
||||
| 9 | **Heuristic Scene Metadata** | Face + YOLO | `{file_uuid}.scene_meta.json` on disk |
|
||||
| 10 | **Identity Agent** | Face + ASRX | `identities` with `source = 'identity_agent'` |
|
||||
| 11 | **5W1H Agent** | Cut + ASR | `chunk.summary_text` IS NOT NULL (chunk_type = 'cut') |
|
||||
| 12 | **Release Pack** | 5W1H Agent | `release_pack.py --phase 2` output |
|
||||
|
||||
### Ingestion Status
|
||||
|
||||
Check real-time ingestion status for a file:
|
||||
|
||||
```bash
|
||||
curl "$API/api/v1/stats/ingestion-status/{file_uuid}"
|
||||
```
|
||||
|
||||
Returns per-step `done` / `pending` status with detail counts.
|
||||
|
||||
#### Example
|
||||
|
||||
```bash
|
||||
curl "http://localhost:3003/api/v1/stats/ingestion-status/bd80fec9c42afb0307eb28f22c64c76a" | jq '.steps[] | {name, status, detail}'
|
||||
```
|
||||
|
||||
#### Response
|
||||
|
||||
```json
|
||||
{
|
||||
"file_uuid": "bd80fec9c42afb0307eb28f22c64c76a",
|
||||
"steps": [
|
||||
{ "name": "rule1_sentence", "status": "pending", "detail": "0 sentence chunks" },
|
||||
{ "name": "auto_vectorize", "status": "pending", "detail": "0 embedded" },
|
||||
{ "name": "rule3_scene", "status": "pending", "detail": "0 scene chunks" },
|
||||
{ "name": "face_trace", "status": "pending", "detail": "0 traces" },
|
||||
{ "name": "trace_chunks", "status": "pending", "detail": "0 trace chunks" },
|
||||
{ "name": "tkg", "status": "pending", "detail": "0 nodes, 0 edges" },
|
||||
{ "name": "identity_match", "status": "pending", "detail": "0 identities" },
|
||||
{ "name": "scene_metadata", "status": "pending", "detail": null },
|
||||
{ "name": "5w1h", "status": "pending", "detail": "0 scenes with 5W1H" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Stats Endpoints
|
||||
|
||||
| Method | Endpoint | Auth | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| GET | `/api/v1/stats/sftpgo` | No | SFTPGo service status |
|
||||
| GET | `/api/v1/stats/ingestion-status/:file_uuid` | No | Per-file ingestion checklist |
|
||||
|
||||
### Configuration
|
||||
|
||||
### `POST /api/v1/config/cache`
|
||||
|
||||
**Auth**: Required
|
||||
**Scope**: system-level
|
||||
|
||||
Toggle the Redis cache on or off.
|
||||
|
||||
#### Request Parameters
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `enabled` | boolean | Yes | `true` to enable, `false` to disable |
|
||||
|
||||
#### Example
|
||||
|
||||
```bash
|
||||
curl -s -X POST "$API/api/v1/config/cache" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: $KEY" \
|
||||
-d '{"enabled": false}'
|
||||
```
|
||||
|
||||
### Unmounted Routes
|
||||
|
||||
The following routes are defined in source code but are **NOT** currently mounted in the router:
|
||||
|
||||
| Endpoint | Source file |
|
||||
|----------|-------------|
|
||||
| `/api/v1/search/universal` | `universal_search.rs` |
|
||||
| `/api/v1/search/frames` | `universal_search.rs` |
|
||||
| `/api/v1/search/persons` | `universal_search.rs` |
|
||||
| `/api/v1/who` | `who.rs` |
|
||||
| `/api/v1/who/candidates` | `who.rs` |
|
||||
Reference in New Issue
Block a user