Merge branch 'main' of http://192.168.110.200:3000/admin/momentry_core

2026-05-22 10:08:11 +08:00
parent 3e81f7c16b fc338a4b59
commit e158176fbe
8 changed files with 518 additions and 15 deletions
--- a/docs_v1.0/API_WORKSPACE/modules/08_media.md
+++ b/docs_v1.0/API_WORKSPACE/modules/08_media.md
@@ -194,6 +194,8 @@ Uses a built-in 5×7 bitmap font renderer to draw labels directly on video frame

 Extract a single frame from a video as JPEG image. Uses FFmpeg `select` filter.

+When `frame` is omitted, the system automatically selects the best representative frame using the TKG bridge (see algorithm below).
+
 **Auth**: Required
 **Scope**: file-level

@@ -201,7 +203,7 @@ Extract a single frame from a video as JPEG image. Uses FFmpeg `select` filter.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `frame` | integer | Yes | — | Zero-based frame number to extract |
+| `frame` | integer | No | auto-detect | Zero-based frame number to extract. Omit for auto-detect. |
 | `x` | integer | No | — | Crop start X (left edge). Requires `y`, `w`, `h`. |
 | `y` | integer | No | — | Crop start Y (top edge). Requires `x`, `w`, `h`. |
 | `w` | integer | No | — | Crop width in pixels. Requires `x`, `y`, `h`. |
@@ -209,9 +211,26 @@ Extract a single frame from a video as JPEG image. Uses FFmpeg `select` filter.

 All four crop params (`x`, `y`, `w`, `h`) must be provided together or omitted.

-#### Example
+#### Auto-detect Algorithm
+
+When `frame` is not provided, the endpoint finds the best frame using this fallback chain:
+
+1. **Main characters**: find the two identities with the most face detections (TMDb source)
+2. **Mutual gaze**: if their face traces have a TKG `CO_OCCURS_WITH` edge with `mutual_gaze=true`, take `first_frame`
+3. **Co-occurrence**: fallback to the first frame where both identities appear together
+4. **Single identity**: if only one main identity exists, take its highest-quality face frame
+5. **Any identity**: fallback to the best-quality face frame across all identities
+6. **Error**: if no face exists, returns `404`
+
+The selected frame is constrained to the **first half of the video** (`total_frames / 2`).
+
+#### Examples

 ```bash
+# Auto-detect best representative frame
+curl -s "$API/api/v1/file/$FILE_UUID/thumbnail" \
+  -H "X-API-Key: $KEY" -o representative.jpg
+
 # Extract frame 1000 (full frame)
 curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000" \
  -H "Authorization: Bearer $JWT" -o frame_1000.jpg
@@ -224,10 +243,104 @@ curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000&
 #### Response

 - **200**: `image/jpeg` binary data
- **404**: File not found
+- **404**: File not found / No faces in file (auto-detect)
 - **500**: FFmpeg error (e.g., frame number exceeds video duration)

-### `GET /api/v1/file/:file_uuid/clip`
+#### Technical Details
+
+| Detail | Value |
+|--------|-------|
+| **Backend** | FFmpeg (`ffmpeg-full`) |
+| **Filter** | `select=eq(n\,FRAME)` to select frame, optional `crop=W:H:X:Y` |
+| **Output** | Single JPEG via pipe (`image2pipe`, `mjpeg` codec) |
+| **Cache** | `Cache-Control: public, max-age=86400` (24h) |
+| **Frame number** | Zero-based (`frame=0` = first frame of video) |
+
+---
+
+### `GET /api/v1/file/:file_uuid/representative-frame`
+
+Return JSON metadata about the best representative frame for the video. Uses the same auto-detect algorithm as `GET /thumbnail` (without crop support).
+
+**Auth**: Required
+**Scope**: file-level
+
+#### Example
+
+```bash
+curl -s "$API/api/v1/file/$FILE_UUID/representative-frame" \
+  -H "X-API-Key: $KEY" | jq '.'
+```
+
+#### Response (200)
+
+```json
+{
+  "success": true,
+  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
+  "frame_number": 38165,
+  "timestamp_secs": 1526.6,
+  "face_quality": 37292.97,
+  "main_identities": [
+    {
+      "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
+      "name": "Audrey Hepburn",
+      "face_count": 16456
+    },
+    {
+      "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
+      "name": "Cary Grant",
+      "face_count": 10643
+    }
+  ],
+  "traces": [
+    {
+      "trace_id": 919,
+      "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
+      "name": "Cary Grant",
+      "x": 764,
+      "y": 237,
+      "width": 199,
+      "height": 199,
+      "confidence": 0.8426
+    },
+    {
+      "trace_id": 920,
+      "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
+      "name": "Audrey Hepburn",
+      "x": 1143,
+      "y": 312,
+      "width": 215,
+      "height": 215,
+      "confidence": 0.8068
+    }
+  ]
+}
+```
+
+#### Response Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `frame_number` | integer | Selected representative frame number (primary coordinate) |
+| `timestamp_secs` | float | Time in seconds (derived from `frame_number / fps`) |
+| `face_quality` | float | Quality score `area × confidence` of the best face at this frame |
+| `main_identities` | array | Top 2 most frequent TMDb identities in the file |
+| `main_identities[].name` | string | Identity display name |
+| `main_identities[].face_count` | integer | Total face detections count |
+| `traces` | array | All face traces present at the selected frame |
+| `traces[].trace_id` | integer | Face trace ID |
+| `traces[].identity_uuid` | string or null | Matched identity UUID |
+| `traces[].name` | string or null | Identity name |
+| `traces[].x, y, width, height` | integer | Bounding box coordinates |
+| `traces[].confidence` | float | Detection confidence (0.0–1.0) |
+
+#### Error Responses
+
+| HTTP | When |
+|------|------|
+| `404` | File not found / No faces in file |
+| `500` | Database error |

 Extract a video clip (time range) as MPEG-TS stream. Uses FFmpeg `-ss` fast seek.

--- a/docs_v1.0/doc_wasm/index.html
+++ b/docs_v1.0/doc_wasm/index.html
@@ -11,7 +11,7 @@
 body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; background: #f5f5f5; color: #333; }
 #app { display: flex; min-height: 100vh; }
 html, body { height: 100%; }
-.sidebar { width: 260px; min-height: 100vh; background: #fff; border-right: 1px solid #ddd; padding: 20px; display: flex; flex-direction: column; }
+.sidebar { width: 260px; height: 100vh; position: sticky; top: 0; overflow-y: auto; background: #fff; border-right: 1px solid #ddd; padding: 20px; display: flex; flex-direction: column; }
 .sidebar h1 { font-size: 18px; margin-bottom: 16px; }
 .sidebar a { display: block; padding: 6px 0; color: #0066cc; text-decoration: none; font-size: 14px; cursor: pointer; }
 .sidebar a:hover { color: #003d80; }
--- a/docs_v1.0/doc_wasm/modules/08_media.md
+++ b/docs_v1.0/doc_wasm/modules/08_media.md
@@ -194,6 +194,8 @@ Uses a built-in 5×7 bitmap font renderer to draw labels directly on video frame

 Extract a single frame from a video as JPEG image. Uses FFmpeg `select` filter.

+When `frame` is omitted, the system automatically selects the best representative frame using the TKG bridge (see algorithm below).
+
 **Auth**: Required
 **Scope**: file-level

@@ -201,7 +203,7 @@ Extract a single frame from a video as JPEG image. Uses FFmpeg `select` filter.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `frame` | integer | Yes | — | Zero-based frame number to extract |
+| `frame` | integer | No | auto-detect | Zero-based frame number to extract. Omit for auto-detect. |
 | `x` | integer | No | — | Crop start X (left edge). Requires `y`, `w`, `h`. |
 | `y` | integer | No | — | Crop start Y (top edge). Requires `x`, `w`, `h`. |
 | `w` | integer | No | — | Crop width in pixels. Requires `x`, `y`, `h`. |
@@ -209,9 +211,26 @@ Extract a single frame from a video as JPEG image. Uses FFmpeg `select` filter.

 All four crop params (`x`, `y`, `w`, `h`) must be provided together or omitted.

-#### Example
+#### Auto-detect Algorithm
+
+When `frame` is not provided, the endpoint finds the best frame using this fallback chain:
+
+1. **Main characters**: find the two identities with the most face detections (TMDb source)
+2. **Mutual gaze**: if their face traces have a TKG `CO_OCCURS_WITH` edge with `mutual_gaze=true`, take `first_frame`
+3. **Co-occurrence**: fallback to the first frame where both identities appear together
+4. **Single identity**: if only one main identity exists, take its highest-quality face frame
+5. **Any identity**: fallback to the best-quality face frame across all identities
+6. **Error**: if no face exists, returns `404`
+
+The selected frame is constrained to the **first half of the video** (`total_frames / 2`).
+
+#### Examples

 ```bash
+# Auto-detect best representative frame
+curl -s "$API/api/v1/file/$FILE_UUID/thumbnail" \
+  -H "X-API-Key: $KEY" -o representative.jpg
+
 # Extract frame 1000 (full frame)
 curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000" \
  -H "Authorization: Bearer $JWT" -o frame_1000.jpg
@@ -224,10 +243,104 @@ curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000&
 #### Response

 - **200**: `image/jpeg` binary data
- **404**: File not found
+- **404**: File not found / No faces in file (auto-detect)
 - **500**: FFmpeg error (e.g., frame number exceeds video duration)

-### `GET /api/v1/file/:file_uuid/clip`
+#### Technical Details
+
+| Detail | Value |
+|--------|-------|
+| **Backend** | FFmpeg (`ffmpeg-full`) |
+| **Filter** | `select=eq(n\,FRAME)` to select frame, optional `crop=W:H:X:Y` |
+| **Output** | Single JPEG via pipe (`image2pipe`, `mjpeg` codec) |
+| **Cache** | `Cache-Control: public, max-age=86400` (24h) |
+| **Frame number** | Zero-based (`frame=0` = first frame of video) |
+
+---
+
+### `GET /api/v1/file/:file_uuid/representative-frame`
+
+Return JSON metadata about the best representative frame for the video. Uses the same auto-detect algorithm as `GET /thumbnail` (without crop support).
+
+**Auth**: Required
+**Scope**: file-level
+
+#### Example
+
+```bash
+curl -s "$API/api/v1/file/$FILE_UUID/representative-frame" \
+  -H "X-API-Key: $KEY" | jq '.'
+```
+
+#### Response (200)
+
+```json
+{
+  "success": true,
+  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
+  "frame_number": 38165,
+  "timestamp_secs": 1526.6,
+  "face_quality": 37292.97,
+  "main_identities": [
+    {
+      "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
+      "name": "Audrey Hepburn",
+      "face_count": 16456
+    },
+    {
+      "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
+      "name": "Cary Grant",
+      "face_count": 10643
+    }
+  ],
+  "traces": [
+    {
+      "trace_id": 919,
+      "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
+      "name": "Cary Grant",
+      "x": 764,
+      "y": 237,
+      "width": 199,
+      "height": 199,
+      "confidence": 0.8426
+    },
+    {
+      "trace_id": 920,
+      "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
+      "name": "Audrey Hepburn",
+      "x": 1143,
+      "y": 312,
+      "width": 215,
+      "height": 215,
+      "confidence": 0.8068
+    }
+  ]
+}
+```
+
+#### Response Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `frame_number` | integer | Selected representative frame number (primary coordinate) |
+| `timestamp_secs` | float | Time in seconds (derived from `frame_number / fps`) |
+| `face_quality` | float | Quality score `area × confidence` of the best face at this frame |
+| `main_identities` | array | Top 2 most frequent TMDb identities in the file |
+| `main_identities[].name` | string | Identity display name |
+| `main_identities[].face_count` | integer | Total face detections count |
+| `traces` | array | All face traces present at the selected frame |
+| `traces[].trace_id` | integer | Face trace ID |
+| `traces[].identity_uuid` | string or null | Matched identity UUID |
+| `traces[].name` | string or null | Identity name |
+| `traces[].x, y, width, height` | integer | Bounding box coordinates |
+| `traces[].confidence` | float | Detection confidence (0.0–1.0) |
+
+#### Error Responses
+
+| HTTP | When |
+|------|------|
+| `404` | File not found / No faces in file |
+| `500` | Database error |

 Extract a video clip (time range) as MPEG-TS stream. Uses FFmpeg `-ss` fast seek.