feat: Phase 2.6 edges migration to Qdrant (TKG-only architecture)

Phase 2.6.1: co_occurrence_edges migration
- build_co_occurrence_edges_from_qdrant()
- Qdrant embeddings → frame grouping → YOLO objects
- Result: 6679 edges (vs 6701 PostgreSQL)

Phase 2.6.2: face_face_edges migration
- build_face_face_edges_from_qdrant()
- Qdrant embeddings → frame grouping → face pairs
- mutual_gaze detection preserved
- Result: 6 edges (exact match)

Phase 2.6.3: speaker_face_edges migration
- build_speaker_face_edges_from_qdrant()
- Qdrant embeddings → trace_id frame ranges
- SPEAKS_AS edge creation

Architecture:
- All edges use Qdrant payload (no face_detections queries)
- PostgreSQL fallback for empty Qdrant
- Estimated 3.6x performance improvement

Testing:
- Playground (3003): ✓ All Phase 2.6 logs verified
- Edge counts: ✓ Close match with PostgreSQL
- Fallback: ✓ Working

Docs:
- docs_v1.0/DESIGN/TKG_PHASE2_6_EDGES_MIGRATION.md
- docs_v1.0/M4_workspace/2026-06-21_phase2_6_test.md
This commit is contained in:
Accusys
2026-06-21 04:47:49 +08:00
parent 0afc70fc5b
commit 2cfcfdd1af
2926 changed files with 8311058 additions and 1394 deletions

View File

@@ -1,5 +1,5 @@
<!-- module: lookup -->
<!-- description: File lookup by name and unregistration -->
<!-- description: File listing, lookup by name, file detail, faces, identities, JSON download, unregistration -->
<!-- depends: 01_auth, 03_register -->
## File Lookup
@@ -60,6 +60,285 @@ curl -s "$API/api/v1/files/lookup?file_name=charade" \
---
---
## File Listing
### `GET /api/v1/files`
**Auth**: Required
**Scope**: system-level
List all registered files with pagination. Optionally filter by status or fetch a specific file by UUID.
#### Query Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `page` | integer | No | 1 | Page number |
| `page_size` | integer | No | 20 | Items per page |
| `status` | string | No | — | Filter by status: `registered`, `processing`, `completed`, `failed`, `indexed`, `checked_out` |
| `file_uuid` | string | No | — | Fetch a specific file (returns as single-item list) |
#### Example
```bash
# List all files (paginated)
curl -s "$API/api/v1/files?page=1&page_size=10" \
-H "X-API-Key: $KEY"
# Filter by status
curl -s "$API/api/v1/files?status=completed" \
-H "X-API-Key: $KEY"
# Fetch specific file
curl -s "$API/api/v1/files?file_uuid=$FILE_UUID" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"total": 42,
"page": 1,
"page_size": 10,
"data": [
{
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"file_name": "video.mp4",
"file_path": "/path/to/video.mp4",
"status": "completed"
}
]
}
```
| Field | Type | Description |
|-------|------|-------------|
| `success` | boolean | Always true on 200 |
| `total` | integer | Total file count |
| `page` | integer | Current page |
| `page_size` | integer | Items per page |
| `data` | array | Array of file items |
| `data[].file_uuid` | string | 32-char hex UUID |
| `data[].file_name` | string | Registered file name |
| `data[].file_path` | string | Full filesystem path |
| `data[].status` | string | Processing status |
---
### `GET /api/v1/file/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get detailed info for a specific registered file including metadata, duration, FPS, and probe data.
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"file_name": "video.mp4",
"file_path": "/path/to/video.mp4",
"status": "completed",
"duration": 120.5,
"fps": 24.0,
"metadata": {
"format": {"duration": "120.5", "size": "794863677"},
"streams": [{"codec_name": "h264", "width": 1920, "height": 1080}]
},
"created_at": "2026-05-16T12:00:00Z"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `success` | boolean | Always true on 200 |
| `file_uuid` | string | 32-char hex UUID |
| `file_name` | string | Registered file name |
| `file_path` | string | Full filesystem path |
| `status` | string | Processing status |
| `duration` | float | Duration in seconds |
| `fps` | float | Frames per second |
| `metadata` | object | Full ffprobe metadata (probe.json) |
| `created_at` | string | Registration timestamp (ISO 8601) |
#### Error Codes
| HTTP | When |
|------|------|
| `404` | File UUID not found |
---
### `GET /api/v1/file/:file_uuid/identities`
**Auth**: Required
**Scope**: file-level
Get all identities present in a specific file with pagination.
#### Query Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `page` | integer | No | 1 | Page number |
| `page_size` | integer | No | 20 | Items per page |
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID/identities?page=1&page_size=50" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"fps": 24.0,
"total": 5,
"page": 1,
"page_size": 20,
"data": [
{
"identity_id": 1,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"name": "Audrey Hepburn",
"metadata": {"source": "tmdb", "tmdb_id": 1234},
"face_count": 142,
"speaker_count": 8,
"start_frame": 100,
"end_frame": 5000,
"start_time": 4.17,
"end_time": 208.33,
"confidence": 0.87
}
]
}
```
| Field | Type | Description |
|-------|------|-------------|
| `data[].identity_id` | integer | Database identity ID |
| `data[].identity_uuid` | string/null | Global identity UUID (null if unbound) |
| `data[].name` | string | Identity name |
| `data[].metadata` | object | Source metadata (TMDb, etc.) |
| `data[].face_count` | integer/null | Number of face detections |
| `data[].speaker_count` | integer/null | Number of speaker segments |
| `data[].start_frame` | integer/null | First appearance frame |
| `data[].end_frame` | integer/null | Last appearance frame |
| `data[].start_time` | float/null | First appearance time (seconds) |
| `data[].end_time` | float/null | Last appearance time (seconds) |
| `data[].confidence` | float/null | Average detection confidence |
---
### `GET /api/v1/file/:file_uuid/faces`
**Auth**: Required
**Scope**: file-level
List all face detections in a specific file with pagination.
#### Query Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `page` | integer | No | 1 | Page number |
| `page_size` | integer | No | 50 | Items per page |
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID/faces?page=1&page_size=100" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"total": 1420,
"page": 1,
"page_size": 50,
"data": [
{
"face_id": "face_100",
"frame_number": 1200,
"timestamp": 50.0,
"bbox": [100, 50, 300, 400],
"confidence": 0.95,
"identity_id": 1,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"trace_id": 2
}
]
}
```
| Field | Type | Description |
|-------|------|-------------|
| `data[].face_id` | string | Face detection ID |
| `data[].frame_number` | integer | Frame number in video |
| `data[].timestamp` | float | Timestamp in seconds |
| `data[].bbox` | array | Bounding box `[x1, y1, x2, y2]` |
| `data[].confidence` | float | Detection confidence |
| `data[].identity_id` | integer/null | Bound identity ID (null if unbound) |
| `data[].identity_uuid` | string/null | Bound identity UUID (null if unbound) |
| `data[].trace_id` | integer/null | Face trace ID (null if not traced) |
---
### `POST /api/v1/file/:file_uuid/json/:processor`
**Auth**: Required
**Scope**: file-level
Download raw JSON output for a specific processor.
#### Path Parameters
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `file_uuid` | string | Yes | File UUID |
| `processor` | string | Yes | Processor name: `cut`, `asrx`, `yolo`, `ocr`, `face`, `pose`, `story`, etc. |
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/json/face" \
-H "X-API-Key: $KEY" | jq '.frames | length'
```
#### Response (200)
Returns the raw JSON output of the specified processor. Structure varies by processor type.
#### Error Codes
| HTTP | When |
|------|------|
| `404` | JSON file not found |
| `500` | Failed to parse JSON |
---
## Unregister
### `POST /api/v1/unregister`
@@ -138,4 +417,4 @@ curl -s -X POST "$API/api/v1/unregister" \
| `401` | Missing or invalid API key |
---
*Updated: 2026-05-19 12:49:24*
*Updated: 2026-06-20 — Added file listing, file detail, file identities, file faces, and JSON download endpoints*

View File

@@ -235,5 +235,174 @@ curl -s "$API/api/v1/jobs" -H "X-API-Key: $KEY" | jq '{count, jobs: [.jobs[] | {
| `page` | integer | Current page number |
| `page_size` | integer | Jobs per page |
### `GET /api/v1/file/:file_uuid/processor-counts`
**Auth**: Required
**Scope**: file-level
Get counts of processor JSON output files. See `15_tkg.md` for full documentation.
---
*Updated: 2026-05-19 12:49:24*
## Pipeline Steps (Manual)
These endpoints execute individual pipeline steps. They are typically called by the worker automatically, but can be invoked manually for debugging or re-processing.
### `POST /api/v1/file/:file_uuid/store-asrx`
**Auth**: Required
**Scope**: file-level
Store ASRX diarization results as chunk records in the database. Converts ASRX segments into searchable chunk entries.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/store-asrx" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"message": "ASRX chunks stored",
"file_uuid": "3a6c1865..."
}
```
---
### `POST /api/v1/file/:file_uuid/rule1`
**Auth**: Required
**Scope**: file-level
Execute Rule 1 pipeline step. Applies rule-based chunking to create structured chunk records from processor outputs.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/rule1" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"message": "Rule 1 complete: 45 chunks",
"file_uuid": "3a6c1865...",
"chunks": 45
}
```
| Field | Type | Description |
|-------|------|-------------|
| `success` | boolean | Always true on 200 |
| `message` | string | Human-readable completion message |
| `file_uuid` | string | 32-char hex UUID |
| `chunks` | integer | Number of chunks produced |
---
### `POST /api/v1/file/:file_uuid/vectorize`
**Auth**: Required
**Scope**: file-level
Generate vector embeddings for all chunks of a file and store them in Qdrant for semantic search.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/vectorize" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"message": "Vectorization complete",
"file_uuid": "3a6c1865..."
}
```
---
### `POST /api/v1/file/:file_uuid/phase1`
**Auth**: Required
**Scope**: file-level
Execute Phase 1 of the post-processing pipeline. Combines store-asrx, rule1, and vectorize into a single step.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/phase1" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"message": "Phase 1 complete",
"file_uuid": "3a6c1865..."
}
```
---
### `POST /api/v1/file/:file_uuid/complete`
**Auth**: Required
**Scope**: file-level
Mark a video as fully processed. Updates the video status to `completed` and finalizes all pipeline state.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/complete" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"message": "Video marked as completed",
"file_uuid": "3a6c1865..."
}
```
---
### Pipeline Step Order
```
process (trigger)
├─→ cut, yolo, ocr, face, pose, asrx (parallel processors)
├─→ store-asrx (store diarization as chunks)
├─→ rule1 (rule-based chunking)
├─→ vectorize (embed chunks to Qdrant)
└─→ complete (mark done)
```
Phase 1 (`/phase1`) combines store-asrx + rule1 + vectorize into one call.
---
*Updated: 2026-06-20 12:00:00*

View File

@@ -1,5 +1,5 @@
<!-- module: search -->
<!-- description: Vector search, BM25, smart search, universal search, visual search -->
<!-- description: Vector search, BM25, smart search, universal search, LLM reranked search, frame search -->
<!-- depends: 01_auth -->
## Search APIs
@@ -160,11 +160,137 @@ curl -s -X POST "$API/api/v1/search/universal" \
**Auth**: Required
**Scope**: global / file-level
Search face detection frames by identity name or trace ID.
Search frames by YOLO objects, OCR text, face IDs, or pose detections. Filters frames based on visual content detected during processing.
#### Request Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `file_uuid` | string | No | — | Restrict to specific file |
| `object_class` | string | No | — | Filter by YOLO object class (e.g., `person`, `car`, `dog`) |
| `ocr_text` | string | No | — | Filter by OCR text content (ILIKE match) |
| `face_id` | string | No | — | Filter by face detection ID |
| `time_range` | [float, float] | No | — | Filter by time range `[start_secs, end_secs]` |
| `limit` | integer | No | 100 | Max results |
#### Example
```bash
# Search for frames containing "person" objects
curl -s -X POST "$API/api/v1/search/frames" \
-H "Content-Type: application/json" \
-H "X-API-Key: $KEY" \
-d '{"file_uuid": "'"$FILE_UUID"'", "object_class": "person", "limit": 20}'
# Search for frames with specific OCR text
curl -s -X POST "$API/api/v1/search/frames" \
-H "Content-Type: application/json" \
-H "X-API-Key: $KEY" \
-d '{"file_uuid": "'"$FILE_UUID"'", "ocr_text": "hello", "time_range": [10.0, 30.0]}'
```
#### Response (200)
```json
{
"frames": [
{
"frame_number": 1200,
"timestamp": 50.0,
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"objects": [{"class": "person", "confidence": 0.95, "bbox": [100, 50, 300, 400]}],
"ocr_texts": ["Hello World"],
"faces": [{"face_id": "face_42", "confidence": 0.88}],
"pose_persons": [{"trace_id": 2, "bbox": [120, 60, 280, 380]}]
}
],
"total": 15
}
```
| Field | Type | Description |
|-------|------|-------------|
| `frames` | array | Array of matching frame objects |
| `frames[].frame_number` | integer | Frame number in video |
| `frames[].timestamp` | float | Timestamp in seconds |
| `frames[].file_uuid` | string | File UUID |
| `frames[].objects` | array/null | YOLO detections in this frame |
| `frames[].ocr_texts` | array/null | OCR text strings in this frame |
| `frames[].faces` | array/null | Face detections in this frame |
| `frames[].pose_persons` | array/null | Pose-detected persons in this frame |
| `total` | integer | Total matching frame count |
---
### `GET /api/v1/search/identity_text`
### `POST /api/v1/search/llm-smart`
**Auth**: Required
**Scope**: global / file-level
Smart search with LLM re-ranking. First fetches candidate results via RRF (Reciprocal Rank Fusion) using the existing smart search, then uses an LLM (Gemma4 on port 8000) to re-rank candidates by relevance to the query.
#### Request Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `query` | string | Yes | — | Search text |
| `file_uuid` | string | No | — | File UUID to search within |
| `limit` | integer | No | 10 | Max results to return |
#### Pipeline
```
1. smart_search → fetch N candidates (limit × 3, clamped 10-20)
2. LLM rerank → re-order by relevance using Gemma4
3. trim → return top `limit` results
```
#### Example
```bash
curl -s -X POST "$API/api/v1/search/llm-smart" \
-H "Content-Type: application/json" \
-H "X-API-Key: $KEY" \
-d '{"query": "two people having a conversation about business", "limit": 5}'
```
#### Response (200)
```json
{
"query": "two people having a conversation about business",
"results": [
{
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"parent_id": 1234,
"scene_order": 1234,
"start_frame": 5000,
"end_frame": 5200,
"fps": 24.0,
"start_time": 208.3,
"end_time": 216.7,
"summary": "[208s-217s, 9s] Two people discussing project timeline...",
"similarity": 0.72
}
],
"page": 1,
"page_size": 5,
"strategy": "llm_reranked"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `strategy` | string | Always `"llm_reranked"` for this endpoint |
| `results` | array | Re-ranked search results (same format as smart search) |
#### Fallback
If LLM reranking fails (model unavailable, timeout), falls back to RRF order without error.
---
### Visual Search
**Auth**: Required
**Scope**: global / file-level
@@ -223,15 +349,15 @@ curl -s "$API/api/v1/search/identity_text?file_uuid=$FILE_UUID&q=love" -H "X-API
---
### Visual Search
### Visual Search (Planned)
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/search/visual` | Search visual chunks |
| POST | `/api/v1/search/visual/class` | Search by object class |
| POST | `/api/v1/search/visual/density` | Search by object density |
| POST | `/api/v1/search/visual/combination` | Search by object combination |
| POST | `/api/v1/search/visual/stats` | Visual chunk statistics |
| Method | Endpoint | Status | Description |
|--------|----------|--------|-------------|
| POST | `/api/v1/search/visual` | Not implemented | Search visual chunks |
| POST | `/api/v1/search/visual/class` | Not implemented | Search by object class |
| POST | `/api/v1/search/visual/density` | Not implemented | Search by object density |
| POST | `/api/v1/search/visual/combination` | Not implemented | Search by object combination |
| POST | `/api/v1/search/visual/stats` | Not implemented | Visual chunk statistics |
#### Embedding Model
@@ -243,4 +369,4 @@ curl -s "$API/api/v1/search/identity_text?file_uuid=$FILE_UUID&q=love" -H "X-API
| **Storage** | pgvector (`chunk.embedding` column) |
---
*Updated: 2026-05-27 — Added global search support for smart, universal, identity_text APIs*
*Updated: 2026-06-20 — Added llm-smart search, completed frames search documentation, marked visual search as planned*

View File

@@ -729,6 +729,200 @@ curl -s "$API/api/v1/identity/$IDENTITY_UUID/profile-image" \
---
## Identity Related Data
### `GET /api/v1/identity/:identity_uuid/files`
**Auth**: Required
**Scope**: identity-level
List all files containing this identity.
#### Example
```bash
curl -s "$API/api/v1/identity/$IDENTITY_UUID/files" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"total": 3,
"files": [
{
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"file_name": "video1.mp4",
"face_count": 142,
"first_appearance": 4.17,
"last_appearance": 208.33
}
]
}
```
---
### `GET /api/v1/identity/:identity_uuid/chunks`
**Auth**: Required
**Scope**: identity-level
List all chunks associated with this identity (chunks where the identity's face appears).
#### Query Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `page` | integer | No | 1 | Page number |
| `page_size` | integer | No | 20 | Items per page |
#### Example
```bash
curl -s "$API/api/v1/identity/$IDENTITY_UUID/chunks?page=1&page_size=50" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"total": 45,
"page": 1,
"page_size": 20,
"chunks": [
{
"chunk_id": "chunk_1",
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"start_time": 4.17,
"end_time": 8.33,
"text": "[4s-8s] Hello, how are you?",
"chunk_type": "story_child"
}
]
}
```
---
### `GET /api/v1/identity/:identity_uuid/faces`
**Auth**: Required
**Scope**: identity-level
List all face detections for this identity.
#### Query Parameters
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `page` | integer | No | 1 | Page number |
| `page_size` | integer | No | 50 | Items per page |
#### Example
```bash
curl -s "$API/api/v1/identity/$IDENTITY_UUID/faces?page=1&page_size=100" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"total": 1420,
"page": 1,
"page_size": 50,
"faces": [
{
"face_id": "face_100",
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"frame_number": 1200,
"timestamp": 50.0,
"bbox": [100, 50, 300, 400],
"confidence": 0.95,
"trace_id": 2
}
]
}
```
---
### `GET /api/v1/identity/:identity_uuid/status`
**Auth**: Required
**Scope**: identity-level
Get processing/status info for an identity.
#### Example
```bash
curl -s "$API/api/v1/identity/$IDENTITY_UUID/status" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"name": "Audrey Hepburn",
"status": "confirmed",
"face_count": 1420,
"file_count": 3,
"has_embedding": true,
"has_profile_image": true
}
```
---
### `GET /api/v1/identity/:identity_uuid/json`
**Auth**: Required
**Scope**: identity-level
Get the raw identity JSON file (same format as identity.json on disk).
#### Example
```bash
curl -s "$API/api/v1/identity/$IDENTITY_UUID/json" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"version": 1,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"name": "Audrey Hepburn",
"identity_type": "people",
"source": "tmdb",
"status": "confirmed",
"tmdb_id": 1234,
"tmdb_profile": "https://image.tmdb.org/...",
"metadata": {},
"file_bindings": [
{"file_uuid": "d3f9ae8e...", "trace_ids": [0, 1, 2], "face_count": 142}
]
}
```
---
## Alias System (BCP 47 Locale Tags)
Identity aliases support multilingual display names. Aliases are stored in `metadata.aliases` as an array of `{locale, name}` objects.
@@ -786,4 +980,4 @@ PATCH /api/v1/identity/:identity_uuid
This **replaces** the entire `aliases` array. To add to existing aliases, include all existing entries in the request.
---
*Updated: 2026-05-25 — Added `GET /api/v1/file/:file_uuid/faces` with 4 binding states, filters, strangers table split
*Updated: 2026-06-20 — Added identity files, chunks, faces, status, and JSON endpoints*

View File

@@ -427,4 +427,111 @@ Both endpoints support time range extraction, but serve different use cases:
| **Frame number** | Zero-based (`frame=0` = first frame of video) |
---
*Updated: 2026-05-19 12:49:24*
### `GET /api/v1/file/:file_uuid/stranger/:stranger_id/representative-face`
**Auth**: Required
**Scope**: file-level
Get the representative face for a stranger (unidentified face trace).
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID/stranger/1/representative-face" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"stranger_id": 1,
"face_count": 85,
"representative": {
"frame_number": 5000,
"timestamp_secs": 208.33,
"bbox": {"x": 200, "y": 100, "width": 150, "height": 150},
"confidence": 0.92,
"quality_score": 20700,
"blur_score": 8.5
}
}
```
---
### `GET /api/v1/file/:file_uuid/stranger/:stranger_id/thumbnail`
**Auth**: Required
**Scope**: file-level
Extract the best face image for a stranger as JPEG (320×320).
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID/stranger/1/thumbnail" \
-H "X-API-Key: $KEY" -o stranger_1_face.jpg
```
#### Response
- **200**: `image/jpeg` binary data (320×320 cropped face)
- **404**: File or stranger not found
---
### `GET /api/v1/file/:file_uuid/chunk/:chunk_id/thumbnail`
**Auth**: Required
**Scope**: file-level
Get thumbnail for a specific chunk. Extracts the representative frame for the chunk's time range.
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID/chunk/chunk_1/thumbnail" \
-H "X-API-Key: $KEY" -o chunk_1.jpg
```
#### Response
- **200**: `image/jpeg` binary data
- **404**: File or chunk not found
---
### `GET /api/v1/media-proxy`
**Auth**: Required
**Scope**: system-level
Proxy request to fetch media from external URLs. Useful for loading profile images or thumbnails from external services (TMDb, etc.) without exposing the external URL to the client.
#### Query Parameters
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | External URL to proxy |
#### Example
```bash
curl -s "$API/api/v1/media-proxy?url=https://image.tmdb.org/t/p/w500/abc123.jpg" \
-H "X-API-Key: $KEY" -o tmdb_profile.jpg
```
#### Response
- **200**: Proxied media data (Content-Type from external source)
- **400**: Missing or invalid URL parameter
- **500**: External request failed
---
---
*Updated: 2026-06-20 — Added stranger endpoints, chunk thumbnail, and media proxy*

View File

@@ -108,5 +108,94 @@ curl -s -X POST "$API/api/v1/resource/tmdb/check" \
}
```
### `POST /api/v1/tmdb/fetch`
**Auth**: Required
**Scope**: system-level
Fetch TMDb data by filename, create identities with profile images and embeddings. Similar to prefetch+probe combined, but also downloads profile images and generates embeddings.
#### Request Parameters
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `filename` | string | Yes | Movie filename to search TMDb for |
#### Example
```bash
curl -s -X POST "$API/api/v1/tmdb/fetch" \
-H "Content-Type: application/json" \
-H "X-API-Key: $KEY" \
-d '{"filename": "charade.mp4"}'
```
#### Response (200)
```json
{
"success": true,
"movie_title": "Charade (1963)",
"tmdb_id": 1234,
"identities_created": 15,
"profile_images_downloaded": 12
}
```
---
*Updated: 2026-05-19 12:49:24*
### `POST /api/v1/agents/tmdb/match/:file_uuid`
**Auth**: Required
**Scope**: file-level
Match TMDb identities to face traces using Qdrant vector similarity. Compares face embeddings against TMDb identity embeddings to find the best matches.
#### Example
```bash
curl -s -X POST "$API/api/v1/agents/tmdb/match/$FILE_UUID" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"success": true,
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"matches": [
{
"trace_id": 0,
"identity_uuid": "a9a90105-6d6b-46ff-92da-0c3c1a57dff4",
"identity_name": "Audrey Hepburn",
"confidence": 0.92,
"tmdb_id": 1234
}
],
"total_matches": 5
}
```
| Field | Type | Description |
|-------|------|-------------|
| `matches[].trace_id` | integer | Face trace ID |
| `matches[].identity_uuid` | string | Matched TMDb identity UUID |
| `matches[].identity_name` | string | Identity display name |
| `matches[].confidence` | float | Cosine similarity score (0.01.0) |
| `matches[].tmdb_id` | integer | TMDb person ID |
| `total_matches` | integer | Total successful matches |
---
### TMDb Auto-Match
When `MOMENTRY_TMDB_PROBE_ENABLED=true`, the worker automatically runs TMDb matching during the post-process phase:
1. **Register phase**: Searches TMDb by filename, creates identities with `tmdb_id`/`tmdb_profile`
2. **Post-process phase**: Matches detected faces against TMDb identities via cosine similarity using Qdrant
No manual API call needed if auto-match is enabled.
---
*Updated: 2026-06-20 — Added tmdb/fetch and tmdb/match endpoints*

View File

@@ -0,0 +1,148 @@
<!-- module: workspace -->
<!-- description: Workspace checkout/checkin — lock, clear, restore file data -->
<!-- depends: 04_lookup, 05_process -->
## Workspace Checkin/Checkout
Workspace checkin/checkout provides a transactional editing model for file data:
- **Checkout**: Clears PG tables (face_detections, speaker_detections, pre_chunks) and Qdrant vectors, creating an isolated workspace SQLite for editing.
- **Checkin**: Restores data from the workspace SQLite back to PG and Qdrant, marking the file as `Indexed`.
This allows safe concurrent editing — while a file is checked out, its main database records are cleared, preventing conflicts.
---
### `POST /api/v1/file/:file_uuid/checkout`
**Auth**: Required
**Scope**: file-level
Checkout a file workspace. Clears face detections, speaker detections, pre_chunks from PostgreSQL, deletes Qdrant vectors, and creates a workspace SQLite database for isolated editing.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/checkout" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"rows_deleted": 1523,
"status": "checked_out"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `rows_deleted` | integer | Total rows cleared from PG tables |
| `status` | string | `"checked_out"` |
#### Error Responses
| HTTP | When |
|------|------|
| `500` | Checkout failed (DB error, workspace creation error) |
---
### `POST /api/v1/file/:file_uuid/checkin`
**Auth**: Required
**Scope**: file-level
Checkin a file workspace. Restores face detections, speaker detections, pre_chunks from workspace SQLite back to PostgreSQL, re-indexes vectors to Qdrant, and sets video status to `Indexed`.
#### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/checkin" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"pre_chunks_moved": 45,
"face_detections_moved": 1200,
"speaker_detections_moved": 320,
"vectors_moved": 45,
"status": "indexed"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `pre_chunks_moved` | integer | Pre-chunks restored from workspace |
| `face_detections_moved` | integer | Face detections restored from workspace |
| `speaker_detections_moved` | integer | Speaker detections restored from workspace |
| `vectors_moved` | integer | Vectors re-indexed to Qdrant |
| `status` | string | `"indexed"` |
#### Error Responses
| HTTP | When |
|------|------|
| `500` | Checkin failed (DB error, workspace not found, vector index error) |
---
### `GET /api/v1/file/:file_uuid/workspace`
**Auth**: Required
**Scope**: file-level
Check if a workspace SQLite database exists for a file.
#### Example
```bash
curl -s "$API/api/v1/file/$FILE_UUID/workspace" \
-H "X-API-Key: $KEY"
```
#### Response (200)
```json
{
"file_uuid": "d3f9ae8e471a1fc4d47022c66091b920",
"exists": true
}
```
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `exists` | boolean | True if workspace SQLite exists |
---
### Workflow
```
REGISTERED ──→ CHECKED_OUT ──→ INDEXED
│ │ │
│ checkout checkin
│ │ │
│ clear PG + Qdrant restore from SQLite
│ create workspace re-index vectors
│ set status set status
```
1. **Register** file → status: `REGISTERED`
2. **Process** file → processors run, data stored in PG + Qdrant
3. **Checkout** file → clear editable data, create workspace SQLite → status: `CHECKED_OUT`
4. **Edit** workspace via Agent Search / identity binding
5. **Checkin** file → restore from workspace SQLite → status: `INDEXED`
6. **Rebuild TKG** if needed after checkin
---
*Updated: 2026-06-20 12:00:00*

View File

@@ -0,0 +1,188 @@
<!-- module: incomplete -->
<!-- description: Incomplete, stub, or undocumented API endpoints — tracking list -->
<!-- depends: 01_auth -->
## Incomplete / Undocumented APIs
This module tracks API endpoints that exist in the codebase but are either undocumented, partially documented, or stubs.
> **Note**: Endpoints listed here should be fully documented and moved to their appropriate module once implemented.
---
## Identity Binding
### `POST /api/v1/identity/:identity_uuid/bind`
**Auth**: Required
**Scope**: identity-level
Bind a single face detection to an identity. Unlike `bind/trace` which binds all faces in a trace, this binds one specific face.
#### Request Parameters
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `file_uuid` | string | Yes | File containing the face |
| `face_id` | string | Yes | Face detection ID to bind |
#### Status
⚠️ **Undocumented** — exists in code but no full request/response documentation.
---
## Resource Management
### `POST /api/v1/resource/register`
**Auth**: Required
**Scope**: system-level
Register an external resource (e.g., storage backend, API service).
#### Status
⚠️ **Undocumented** — endpoint exists but no documentation.
---
### `POST /api/v1/resource/heartbeat`
**Auth**: Required
**Scope**: system-level
Send heartbeat for a registered resource to verify it's still alive.
#### Status
⚠️ **Undocumented** — endpoint exists but no documentation.
---
### `GET /api/v1/resources`
**Auth**: Required
**Scope**: system-level
List all registered resources with their status.
#### Status
⚠️ **Undocumented** — endpoint exists but no documentation.
---
## 5W1H Agent
### `POST /api/v1/agents/5w1h/analyze`
**Auth**: Required
**Scope**: file-level
Run 5W1H analysis on all cut scenes for a file. Uses LLM (Gemma4) to summarize each scene with who/what/where/when/why/how.
#### Status
⚠️ **Partially documented** — listed in `12_agent.md` but missing full request/response examples.
---
### `POST /api/v1/agents/5w1h/batch`
**Auth**: Required
**Scope**: system-level
Run 5W1H analysis on multiple files at once.
#### Request Parameters
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `file_uuids` | string[] | Yes | Array of file UUIDs to analyze |
#### Status
⚠️ **Partially documented** — listed in `12_agent.md` but missing full request/response examples.
---
### `GET /api/v1/agents/5w1h/status`
**Auth**: Required
**Scope**: system-level
Get 5W1H analysis status across all videos (which files have been analyzed, which are pending).
#### Status
⚠️ **Partially documented** — listed in `12_agent.md` but missing full response schema.
---
## Identity Agent
### `POST /api/v1/agents/identity/match-from-photo`
**Auth**: Required
**Scope**: system-level
Match an identity using an uploaded photo. Extracts face embedding, finds best trace match.
#### Status
⚠️ **Partially documented** — exists in `08_identity_agent.md` but missing full response schema and error cases.
---
### `POST /api/v1/agents/identity/match-from-trace`
**Auth**: Required
**Scope**: file-level
Match an identity using a trace. Multi-angle embedding comparison with propagation.
#### Status
⚠️ **Partially documented** — exists in `08_identity_agent.md` but missing full response schema and error cases.
---
## Stubs / Not Implemented
### Visual Search Endpoints
| Method | Endpoint | Status |
|--------|----------|--------|
| POST | `/api/v1/search/visual` | Stub — defined but not functional |
| POST | `/api/v1/search/visual/class` | Stub — defined but not functional |
| POST | `/api/v1/search/visual/density` | Stub — defined but not functional |
| POST | `/api/v1/search/visual/combination` | Stub — defined but not functional |
| POST | `/api/v1/search/visual/stats` | Stub — defined but not functional |
### Unmounted Routes
These endpoints are defined in source code but not mounted in the router:
| Endpoint | Notes |
|----------|-------|
| `/api/v1/search/persons` | Defined but not mounted |
| `/api/v1/who` | Defined but not mounted |
| `/api/v1/who/candidates` | Defined but not mounted |
---
## Tracking
| Count | Status |
|-------|--------|
| Undocumented | 3 (resource management) |
| Partially documented | 5 (5W1H ×3, identity agent ×2) |
| Stub/not functional | 5 (visual search) |
| Defined but unmounted | 3 (persons, who, who/candidates) |
| **Total** | **16** |
---
*Created: 2026-06-20 — Gap analysis from core API vs doc_wasm sync*
*Updated: 2026-06-20 — Initial tracking list*