← Back to index Logout

Video Streaming & Frame Extraction

All video streaming endpoints support the following common query parameters:

Field Type Required Default Description
mode string No normal normal or debug (draws detection overlays)
audio string No on on or off

GET /api/v1/file/:file_uuid/video

Stream the full video file with range support for seeking.

Auth: Required Scope: file-level

Response


GET /api/v1/file/:file_uuid/trace/:trace_id/video

Stream video with highlights for a specific face trace (follows a single person across frames with bounding box overlay).

Auth: Required Scope: file-level


GET /api/v1/file/:file_uuid/trace/:trace_id/representative-face

Find the best single face to represent this trace. Uses a two-stage selection: SQL (area × confidence → top 10) then FFmpeg blurdetect (sharpness → pick the least blurry).

Auth: Required Scope: file-level

Example

curl -s "$API/api/v1/file/$FILE_UUID/trace/1939/representative-face" \
  -H "X-API-Key: $KEY"

Response (200)

{
  "success": true,
  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
  "trace_id": 1939,
  "face_count": 538,
  "representative": {
    "frame_number": 68193,
    "timestamp_secs": 2727.72,
    "bbox": { "x": 347, "y": 378, "width": 427, "height": 427 },
    "confidence": 0.760,
    "quality_score": 138516,
    "blur_score": 9.46
  }
}

Response Fields

Field Type Description
trace_id integer Face trace ID
face_count integer Total face detections in this trace
representative.frame_number integer Frame number of the selected face (primary coordinate)
representative.timestamp_secs float Time in seconds (derived from frame_number / fps)
representative.bbox object Bounding box {x, y, width, height}
representative.confidence float Detection confidence (0.0–1.0)
representative.quality_score float Pre-selection score (area × confidence)
representative.blur_score float FFmpeg blurdetect result (lower = sharper)

Error Responses


GET /api/v1/file/:file_uuid/trace/:trace_id/thumbnail

Extract the best face image for a trace as JPEG (320×320). Internally selects the face using the same two-stage algorithm as representative-face, then crops via FFmpeg. The result is cacheable for 24 hours.

Auth: Required Scope: file-level

Example

curl -s "$API/api/v1/file/$FILE_UUID/trace/1939/thumbnail" \
  -H "X-API-Key: $KEY" -o trace_1939_face.jpg

Response


GET /api/v1/file/:file_uuid/identities/:identity_uuid_a/co-occur-with/:identity_uuid_b

Find the first frame where two identities appear together, with representative face thumbnails for both.

Auth: Required Scope: file-level

Example

# Audrey Hepburn & Cary Grant 第一次同框
curl -s "$API/api/v1/file/$FILE_UUID/identities/$AUDREY_UUID/co-occur-with/$CARY_UUID" \
  -H "X-API-Key: $KEY" | jq '{identity_a: .identity_a.name, identity_b: .identity_b.name, first_frame: .first_cooccurrence.frame_number}'

Response (200)

{
  "success": true,
  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
  "identity_a": {
    "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
    "name": "Audrey Hepburn",
    "trace_id": 920
  },
  "identity_b": {
    "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
    "name": "Cary Grant",
    "trace_id": 919
  },
  "first_cooccurrence": {
    "frame_number": 38165,
    "timestamp_secs": 1526.60,
    "total_cooccurrence_frames": 3136,
    "representative_face_a": {
      "frame_number": 38199,
      "bbox": { "x": 122, "y": 339, "width": 176, "height": 176 },
      "confidence": 0.832,
      "thumbnail_url": "/api/v1/file/aeed71342.../trace/920/thumbnail"
    },
    "representative_face_b": {
      "frame_number": 38291,
      "bbox": { "x": 511, "y": 315, "width": 192, "height": 192 },
      "confidence": 0.791,
      "thumbnail_url": "/api/v1/file/aeed71342.../trace/919/thumbnail"
    }
  }
}

Response Fields

Field Type Description
identity_a.name string First identity name
identity_b.name string Second identity name
first_cooccurrence.frame_number int First frame where both appear
first_cooccurrence.timestamp_secs float Time in seconds
first_cooccurrence.total_cooccurrence_frames int Total frames with both present
first_cooccurrence.representative_face_a/b object Best face thumbnail data for each identity

Error Responses

HTTP When
404 File or identity not found
404 The two identities never co-occur in this file
500 Database or FFmpeg error

GET /api/v1/file/:file_uuid/video/bbox

Stream video with bounding box overlay for all detected objects/faces.

Auth: Required Scope: file-level

Uses a built-in 5×7 bitmap font renderer to draw labels directly on video frames via FFmpeg drawtext filter.


GET /api/v1/file/:file_uuid/thumbnail

Extract a single frame from a video as JPEG image. Uses FFmpeg select filter.

When frame is omitted, the system automatically selects the best representative frame using the TKG bridge (see algorithm below).

Auth: Required Scope: file-level

Query Parameters

Field Type Required Default Description
frame integer No auto-detect Zero-based frame number to extract. Omit for auto-detect.
x integer No Crop start X (left edge). Requires y, w, h.
y integer No Crop start Y (top edge). Requires x, w, h.
w integer No Crop width in pixels. Requires x, y, h.
h integer No Crop height in pixels. Requires x, y, w.

All four crop params (x, y, w, h) must be provided together or omitted.

Auto-detect Algorithm

When frame is not provided, the endpoint finds the best frame using this fallback chain:

  1. Main characters: find the two identities with the most face detections (TMDb source)
  2. Mutual gaze: if their face traces have a TKG CO_OCCURS_WITH edge with mutual_gaze=true, take first_frame
  3. Co-occurrence: fallback to the first frame where both identities appear together
  4. Single identity: if only one main identity exists, take its highest-quality face frame
  5. Any identity: fallback to the best-quality face frame across all identities
  6. Error: if no face exists, returns 404

The selected frame is constrained to the first half of the video (total_frames / 2).

Examples

# Auto-detect best representative frame
curl -s "$API/api/v1/file/$FILE_UUID/thumbnail" \
  -H "X-API-Key: $KEY" -o representative.jpg

# Extract frame 1000 (full frame)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000" \
  -H "Authorization: Bearer $JWT" -o frame_1000.jpg

# Extract and crop face region (x=320, y=240, w=160, h=160)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000&x=320&y=240&w=160&h=160" \
  -H "Authorization: Bearer $JWT" -o face_crop.jpg

Response

Technical Details

Detail Value
Backend FFmpeg (ffmpeg-full)
Filter select=eq(n\,FRAME) to select frame, optional crop=W:H:X:Y
Output Single JPEG via pipe (image2pipe, mjpeg codec)
Cache Cache-Control: public, max-age=86400 (24h)
Frame number Zero-based (frame=0 = first frame of video)

GET /api/v1/file/:file_uuid/representative-frame

Return JSON metadata about the best representative frame for the video. Uses the same auto-detect algorithm as GET /thumbnail (without crop support).

Auth: Required Scope: file-level

Example

curl -s "$API/api/v1/file/$FILE_UUID/representative-frame" \
  -H "X-API-Key: $KEY" | jq '.'

Response (200)

{
  "success": true,
  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
  "frame_number": 38165,
  "timestamp_secs": 1526.6,
  "face_quality": 37292.97,
  "main_identities": [
    {
      "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
      "name": "Audrey Hepburn",
      "face_count": 16456
    },
    {
      "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
      "name": "Cary Grant",
      "face_count": 10643
    }
  ],
  "traces": [
    {
      "trace_id": 919,
      "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
      "name": "Cary Grant",
      "x": 764,
      "y": 237,
      "width": 199,
      "height": 199,
      "confidence": 0.8426
    },
    {
      "trace_id": 920,
      "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
      "name": "Audrey Hepburn",
      "x": 1143,
      "y": 312,
      "width": 215,
      "height": 215,
      "confidence": 0.8068
    }
  ]
}

Response Fields

Field Type Description
frame_number integer Selected representative frame number (primary coordinate)
timestamp_secs float Time in seconds (derived from frame_number / fps)
face_quality float Quality score area × confidence of the best face at this frame
main_identities array Top 2 most frequent TMDb identities in the file
main_identities[].name string Identity display name
main_identities[].face_count integer Total face detections count
traces array All face traces present at the selected frame
traces[].trace_id integer Face trace ID
traces[].identity_uuid string or null Matched identity UUID
traces[].name string or null Identity name
traces[].x, y, width, height integer Bounding box coordinates
traces[].confidence float Detection confidence (0.0–1.0)

Error Responses

HTTP When
404 File not found / No faces in file
500 Database error

Extract a video clip (time range) as MPEG-TS stream. Uses FFmpeg -ss fast seek.

Auth: Required Scope: file-level

Query Parameters

Field Type Required Default Description
start_frame integer No* Start frame (zero-based). Frame-accurate — use this for precision.
end_frame integer No* End frame (zero-based, inclusive). Requires start_frame.
start_time float No* Start time in seconds. Approximate (FPS-dependent). Fallback if frames not given.
end_time float No* End time in seconds. Approximate (FPS-dependent). Fallback if frames not given.
fps float No video FPS Override frames-per-second for frame↔time calculation. Defaults to video's detected FPS.
mode string No normal normal or debug (draws "CLIP" overlay)
audio string No on on or off

Either (start_frame+end_frame) OR (start_time+end_time) must be provided.

Example

# Clip by frame range (primary)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/clip?start_frame=0&end_frame=47" \
  -H "Authorization: Bearer $JWT" -o clip.ts

# Clip by time range (fallback)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/clip?start_time=30&end_time=45" \
  -H "Authorization: Bearer $JWT" -o clip.ts

Response

Technical Notes

Detail Value
Backend FFmpeg (ffmpeg-full)
Seek -ss before -i (fast keyframe seek)
Format MPEG-TS (mpegts muxer, pipe-safe)
Codec H.264 + AAC
Cache Cache-Control: public, max-age=86400 (24h)

Video vs Clip: Quality & Format Comparison

Both endpoints support time range extraction, but serve different use cases:

Feature /video /clip
No params Streams full file (Range seek) Returns 400 (params required)
HTTP Range ✅ Supported ❌ Not supported
Encoding -c copy (zero encoding) -c:v libx264 -c:a aac (re-encode)
Quality Original (bit-exact, zero loss) Compressed (default CRF ≈ 23)
Format video/mp4 video/mp2t (MPEG-TS)
Speed Fast (no computation) Slower (encoding required)
Frame control Time-based (dur = (ef-sf)/fps) Precise (-vframes)
Debug mode mode=debug overlay
Cache max-age=86400

Usage Recommendation

Scenario Use
Full video streaming / player seek /video
Quick preview clip (zero quality loss) /video?start_frame=...&end_frame=...
Debug frame verification / text overlay /clip?mode=debug
Precise frame count control /clip
CDN cacheable clip /clip

Detail Value
Backend FFmpeg (ffmpeg-full)
Filter select=eq(n\,FRAME) to select frame, optional crop=W:H:X:Y
Output Single JPEG via pipe (image2pipe, mjpeg codec)
Cache Cache-Control: public, max-age=86400 (24h)
Frame number Zero-based (frame=0 = first frame of video)

GET /api/v1/file/:file_uuid/stranger/:stranger_id/representative-face

Auth: Required Scope: file-level

Get the representative face for a stranger (unidentified face trace).

Example

curl -s "$API/api/v1/file/$FILE_UUID/stranger/1/representative-face" \
  -H "X-API-Key: $KEY"

Response (200)

{
  "success": true,
  "file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
  "stranger_id": 1,
  "face_count": 85,
  "representative": {
    "frame_number": 5000,
    "timestamp_secs": 208.33,
    "bbox": {"x": 200, "y": 100, "width": 150, "height": 150},
    "confidence": 0.92,
    "quality_score": 20700,
    "blur_score": 8.5
  }
}

GET /api/v1/file/:file_uuid/stranger/:stranger_id/thumbnail

Auth: Required Scope: file-level

Extract the best face image for a stranger as JPEG (320×320).

Example

curl -s "$API/api/v1/file/$FILE_UUID/stranger/1/thumbnail" \
  -H "X-API-Key: $KEY" -o stranger_1_face.jpg

Response


GET /api/v1/file/:file_uuid/chunk/:chunk_id/thumbnail

Auth: Required Scope: file-level

Get thumbnail for a specific chunk. Extracts the representative frame for the chunk's time range.

Example

curl -s "$API/api/v1/file/$FILE_UUID/chunk/chunk_1/thumbnail" \
  -H "X-API-Key: $KEY" -o chunk_1.jpg

Response


GET /api/v1/media-proxy

Auth: Required Scope: system-level

Proxy request to fetch media from external URLs. Useful for loading profile images or thumbnails from external services (TMDb, etc.) without exposing the external URL to the client.

Query Parameters

Field Type Required Description
url string Yes External URL to proxy

Example

curl -s "$API/api/v1/media-proxy?url=https://image.tmdb.org/t/p/w500/abc123.jpg" \
  -H "X-API-Key: $KEY" -o tmdb_profile.jpg

Response



Updated: 2026-06-20 — Added stranger endpoints, chunk thumbnail, and media proxy