Files
2026-05-22 05:35:24 +08:00

10 KiB
Raw Permalink Blame History

Video Streaming & Frame Extraction

All video streaming endpoints support the following common query parameters:

Field Type Required Default Description
mode string No normal normal or debug (draws detection overlays)
audio string No on on or off

GET /api/v1/file/:file_uuid/video

Stream the full video file with range support for seeking.

Auth: Required Scope: file-level

Response

  • 200: Video stream (Content-Type based on file extension)
  • 206: Partial content (range request)
  • Supports Range header for seeking

GET /api/v1/file/:file_uuid/trace/:trace_id/video

Stream video with highlights for a specific face trace (follows a single person across frames with bounding box overlay).

Auth: Required Scope: file-level


GET /api/v1/file/:file_uuid/trace/:trace_id/representative-face

Find the best single face to represent this trace. Uses a two-stage selection: SQL (area × confidence → top 10) then FFmpeg blurdetect (sharpness → pick the least blurry).

Auth: Required Scope: file-level

Example

curl -s "$API/api/v1/file/$FILE_UUID/trace/1939/representative-face" \
  -H "X-API-Key: $KEY"

Response (200)

{
  "success": true,
  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
  "trace_id": 1939,
  "face_count": 538,
  "representative": {
    "frame_number": 68193,
    "timestamp_secs": 2727.72,
    "bbox": { "x": 347, "y": 378, "width": 427, "height": 427 },
    "confidence": 0.760,
    "quality_score": 138516,
    "blur_score": 9.46
  }
}

Response Fields

Field Type Description
trace_id integer Face trace ID
face_count integer Total face detections in this trace
representative.frame_number integer Frame number of the selected face (primary coordinate)
representative.timestamp_secs float Time in seconds (derived from frame_number / fps)
representative.bbox object Bounding box {x, y, width, height}
representative.confidence float Detection confidence (0.01.0)
representative.quality_score float Pre-selection score (area × confidence)
representative.blur_score float FFmpeg blurdetect result (lower = sharper)

Error Responses


GET /api/v1/file/:file_uuid/trace/:trace_id/thumbnail

Extract the best face image for a trace as JPEG (320×320). Internally selects the face using the same two-stage algorithm as representative-face, then crops via FFmpeg. The result is cacheable for 24 hours.

Auth: Required Scope: file-level

Example

curl -s "$API/api/v1/file/$FILE_UUID/trace/1939/thumbnail" \
  -H "X-API-Key: $KEY" -o trace_1939_face.jpg

Response

  • 200: image/jpeg binary data (320×320 cropped face)
  • 404: File, trace not found, or no suitable face
  • 500: FFmpeg or database error

GET /api/v1/file/:file_uuid/identities/:identity_uuid_a/co-occur-with/:identity_uuid_b

Find the first frame where two identities appear together, with representative face thumbnails for both.

Auth: Required Scope: file-level

Example

# Audrey Hepburn & Cary Grant 第一次同框
curl -s "$API/api/v1/file/$FILE_UUID/identities/$AUDREY_UUID/co-occur-with/$CARY_UUID" \
  -H "X-API-Key: $KEY" | jq '{identity_a: .identity_a.name, identity_b: .identity_b.name, first_frame: .first_cooccurrence.frame_number}'

Response (200)

{
  "success": true,
  "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
  "identity_a": {
    "identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
    "name": "Audrey Hepburn",
    "trace_id": 920
  },
  "identity_b": {
    "identity_uuid": "2b0ddefe-e2a9-4533-9308-b375594604d5",
    "name": "Cary Grant",
    "trace_id": 919
  },
  "first_cooccurrence": {
    "frame_number": 38165,
    "timestamp_secs": 1526.60,
    "total_cooccurrence_frames": 3136,
    "representative_face_a": {
      "frame_number": 38199,
      "bbox": { "x": 122, "y": 339, "width": 176, "height": 176 },
      "confidence": 0.832,
      "thumbnail_url": "/api/v1/file/aeed71342.../trace/920/thumbnail"
    },
    "representative_face_b": {
      "frame_number": 38291,
      "bbox": { "x": 511, "y": 315, "width": 192, "height": 192 },
      "confidence": 0.791,
      "thumbnail_url": "/api/v1/file/aeed71342.../trace/919/thumbnail"
    }
  }
}

Response Fields

Field Type Description
identity_a.name string First identity name
identity_b.name string Second identity name
first_cooccurrence.frame_number int First frame where both appear
first_cooccurrence.timestamp_secs float Time in seconds
first_cooccurrence.total_cooccurrence_frames int Total frames with both present
first_cooccurrence.representative_face_a/b object Best face thumbnail data for each identity

Error Responses

HTTP When
404 File or identity not found
404 The two identities never co-occur in this file
500 Database or FFmpeg error

GET /api/v1/file/:file_uuid/video/bbox

Stream video with bounding box overlay for all detected objects/faces.

Auth: Required Scope: file-level

Uses a built-in 5×7 bitmap font renderer to draw labels directly on video frames via FFmpeg drawtext filter.


GET /api/v1/file/:file_uuid/thumbnail

Extract a single frame from a video as JPEG image. Uses FFmpeg select filter.

Auth: Required Scope: file-level

Query Parameters

Field Type Required Default Description
frame integer Yes Zero-based frame number to extract
x integer No Crop start X (left edge). Requires y, w, h.
y integer No Crop start Y (top edge). Requires x, w, h.
w integer No Crop width in pixels. Requires x, y, h.
h integer No Crop height in pixels. Requires x, y, w.

All four crop params (x, y, w, h) must be provided together or omitted.

Example

# Extract frame 1000 (full frame)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000" \
  -H "Authorization: Bearer $JWT" -o frame_1000.jpg

# Extract and crop face region (x=320, y=240, w=160, h=160)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/thumbnail?frame=1000&x=320&y=240&w=160&h=160" \
  -H "Authorization: Bearer $JWT" -o face_crop.jpg

Response

  • 200: image/jpeg binary data
  • 404: File not found
  • 500: FFmpeg error (e.g., frame number exceeds video duration)

GET /api/v1/file/:file_uuid/clip

Extract a video clip (time range) as MPEG-TS stream. Uses FFmpeg -ss fast seek.

Auth: Required Scope: file-level

Query Parameters

Field Type Required Default Description
start_frame integer No* Start frame (zero-based). Frame-accurate — use this for precision.
end_frame integer No* End frame (zero-based, inclusive). Requires start_frame.
start_time float No* Start time in seconds. Approximate (FPS-dependent). Fallback if frames not given.
end_time float No* End time in seconds. Approximate (FPS-dependent). Fallback if frames not given.
fps float No video FPS Override frames-per-second for frame↔time calculation. Defaults to video's detected FPS.
mode string No normal normal or debug (draws "CLIP" overlay)
audio string No on on or off

Either (start_frame+end_frame) OR (start_time+end_time) must be provided.

Example

# Clip by frame range (primary)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/clip?start_frame=0&end_frame=47" \
  -H "Authorization: Bearer $JWT" -o clip.ts

# Clip by time range (fallback)
curl -s "$API/api/v1/file/bd80fec92b0b6963d177a2c55bf713e2/clip?start_time=30&end_time=45" \
  -H "Authorization: Bearer $JWT" -o clip.ts

Response

  • 200: video/mp2t MPEG-TS stream
  • 400: Missing/invalid range parameters
  • 404: File not found
  • 500: FFmpeg error

Technical Notes

Detail Value
Backend FFmpeg (ffmpeg-full)
Seek -ss before -i (fast keyframe seek)
Format MPEG-TS (mpegts muxer, pipe-safe)
Codec H.264 + AAC
Cache Cache-Control: public, max-age=86400 (24h)

Video vs Clip: Quality & Format Comparison

Both endpoints support time range extraction, but serve different use cases:

Feature /video /clip
No params Streams full file (Range seek) Returns 400 (params required)
HTTP Range Supported Not supported
Encoding -c copy (zero encoding) -c:v libx264 -c:a aac (re-encode)
Quality Original (bit-exact, zero loss) Compressed (default CRF ≈ 23)
Format video/mp4 video/mp2t (MPEG-TS)
Speed Fast (no computation) Slower (encoding required)
Frame control Time-based (dur = (ef-sf)/fps) Precise (-vframes)
Debug mode mode=debug overlay
Cache max-age=86400

Usage Recommendation

Scenario Use
Full video streaming / player seek /video
Quick preview clip (zero quality loss) /video?start_frame=...&end_frame=...
Debug frame verification / text overlay /clip?mode=debug
Precise frame count control /clip
CDN cacheable clip /clip

Detail Value
Backend FFmpeg (ffmpeg-full)
Filter select=eq(n\,FRAME) to select frame, optional crop=W:H:X:Y
Output Single JPEG via pipe (image2pipe, mjpeg codec)
Cache Cache-Control: public, max-age=86400 (24h)
Frame number Zero-based (frame=0 = first frame of video)

Updated: 2026-05-19 12:49:24