feat: ASRX hybrid pipeline, identity history, worker fixes, checkpoint system

This commit is contained in:
Accusys
2026-06-02 07:13:23 +08:00
parent e3066c3f49
commit e1572907ae
198 changed files with 43705 additions and 8910 deletions

View File

@@ -0,0 +1,184 @@
<!-- module: register -->
<!-- description: File registration — register, scan -->
<!-- depends: 01_auth -->
## File Registration
### `POST /api/v1/files/register`
**Auth**: Required
**Scope**: file-level
Register a video file for processing. Returns the file's metadata and UUID.
**New in v0.1.2**: Registration now **automatically triggers the processing pipeline** — no need to call `POST /api/v1/file/:file_uuid/process` separately. The system will:
1. Register the file and run ffprobe
2. Auto-run offline TMDb probe (reads local identity files, no API calls)
3. Create a monitor job for the worker
4. Worker starts all 10 processors (Cut → ASR → ASRX → YOLO → OCR → Face → Pose → VisualChunk → Story → 5W1H)
If the file already exists (same content hash), returns the existing record with `already_exists: true`.
#### Request Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `file_path` | string | Yes | — | Path to video file on disk |
| `pattern` | string | No | — | Regex pattern for batch register (requires `file_path` to be a directory) |
| `user_id` | integer | No | — | User ID to associate with registration |
| `content_hash` | string | No | — | Pre-computed SHA-256 hash (skips computation) |
#### Example
```bash
# Register a single file
curl -s -X POST "$API/api/v1/files/register" \
-H "Content-Type: application/json" \
-H "X-API-Key: $KEY" \
-d '{"file_path": "/path/to/video.mp4"}'
# Batch register files matching a pattern in a directory
curl -s -X POST "$API/api/v1/files/register" \
-H "Content-Type: application/json" \
-H "X-API-Key: $KEY" \
-d '{"file_path": "/path/to/dir", "pattern": ".*\\.mp4$"}'
```
#### Response (200)
```json
{
"success": true,
"file_uuid": "3a6c1865...",
"file_name": "video.mp4",
"file_path": "/path/to/video.mp4",
"file_type": "video",
"duration": 120.5,
"width": 1920,
"height": 1080,
"fps": 24.0,
"total_frames": 2892,
"already_exists": false,
"message": "File registered successfully"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `success` | boolean | Always true on 200 |
| `file_uuid` | string | 32-char hex UUID of the registered file |
| `file_name` | string | File name (auto-renamed if name conflict) |
| `file_path` | string | Canonical path on disk |
| `file_type` | string | `"video"`, `"audio"`, or `"unknown"` |
| `duration` | float | Duration in seconds |
| `width` | integer | Video width in pixels |
| `height` | integer | Video height in pixels |
| `fps` | float | Frames per second |
| `total_frames` | integer | Total frame count |
| `already_exists` | boolean | True if same content was already registered |
| `message` | string | Human-readable status |
#### Error Responses
| HTTP | When |
|------|------|
| `401` | Missing or invalid API key |
| `400` | Invalid request body |
| `404` | File path does not exist |
---
### `GET /api/v1/files/scan`
**Auth**: Required
**Scope**: file-level
Scan the filesystem directory and list all media files, showing which are registered, processing, or unregistered.
#### Query Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `page` | integer | No | 1 | Page number (1-based) |
| `page_size` | integer | No | all | Items per page (alias: `limit`) |
| `limit` | integer | No | all | Max items (alias for `page_size`) |
| `pattern` | string | No | — | Regex filter on file name (e.g., `.*\\.mp4$`) |
| `sort_by` | string | No | `name` | Sort field: `name`, `size`, `modified`, `status` |
| `sort_order` | string | No | `asc` | Sort direction: `asc` or `desc` |
#### Example
```bash
# Full scan
curl -s "$API/api/v1/files/scan" -H "X-API-Key: $KEY" | jq '{total, registered_count, unregistered_count}'
# Paginated (page 1, 5 per page)
curl -s "$API/api/v1/files/scan?page=1&page_size=5" -H "X-API-Key: $KEY" | jq '{page, total_pages, files: [.files[].file_name]}'
# Regex filter: only mp4 files
curl -s "$API/api/v1/files/scan?pattern=.*\\.mp4$" -H "X-API-Key: $KEY" | jq '{filtered_total, files: [.files[].file_name]}'
# Sort by file size (largest first)
curl -s "$API/api/v1/files/scan?sort_by=size&sort_order=desc&page_size=5" -H "X-API-Key: $KEY" | jq '[.files[] | {file_name, file_size}]'
# Sort by modified time (most recent first)
curl -s "$API/api/v1/files/scan?sort_by=modified&sort_order=desc&page_size=5" -H "X-API-Key: $KEY" | jq '[.files[] | {file_name, modified_time}]'
# Sort by status
curl -s "$API/api/v1/files/scan?sort_by=status&page_size=5" -H "X-API-Key: $KEY" | jq '[.files[] | {file_name, status}]'
```
#### Response (200)
```json
{
"files": [
{
"file_name": "video.mp4",
"file_size": 12345678,
"is_registered": true,
"file_uuid": "3a6c1865...",
"status": "completed",
"registration_time": "2026-05-16T12:00:00Z",
"job_id": 42
}
],
"total": 107,
"filtered_total": 80,
"page": 1,
"page_size": 20,
"total_pages": 4,
"registered_count": 26,
"unregistered_count": 81
}
```
| Field | Type | Description |
|-------|------|-------------|
| `files` | array | Array of file info objects (paginated) |
| `files[].file_name` | string | File name |
| `files[].relative_path` | string | Path relative to scan root |
| `files[].file_path` | string | Absolute path on disk |
| `files[].file_size` | integer | File size in bytes |
| `files[].modified_time` | string | Last modified timestamp (ISO8601) |
| `files[].is_registered` | boolean | Whether file is registered in DB |
| `files[].file_uuid` | string | 32-char hex UUID (only if registered) |
| `files[].status` | string | `"completed"`, `"processing"`, `"registered"`, `"unregistered"`, or `null` |
| `files[].registration_time` | string | DB registration timestamp (only if registered) |
| `files[].job_id` | integer | Processing job ID (only if a job exists) |
| `total` | integer | Total files found on disk (unfiltered) |
| `filtered_total` | integer | Files matching regex filter |
| `page` | integer | Current page number |
| `page_size` | integer | Items per page |
| `total_pages` | integer | Total pages |
| `registered_count` | integer | Files registered in DB |
| `unregistered_count` | integer | Files not yet registered |
#### Notes
| Feature | Behavior |
|---------|----------|
| **Regex** | Case-insensitive (`(?i)` prefix auto-applied). Applied to `file_name`. |
| **Sort order** | Default (`sort_by=name`): registered files first, then alphabetically. `sort_by=status`: alphabetical by status string. |
| **Pagination** | `page_size` and `limit` are aliases. Default: show all results. |
| **Processing order** | `pattern` regex filter → `sort_by`/`sort_order``page`/`page_size` slice. |