feat: ASRX hybrid pipeline, identity history, worker fixes, checkpoint system

This commit is contained in:
Accusys
2026-06-02 07:13:23 +08:00
parent e3066c3f49
commit e1572907ae
198 changed files with 43705 additions and 8910 deletions

View File

@@ -0,0 +1,375 @@
# Face Binding States — Data Model Reference
**Version**: 1.0.0
**Date**: 2026-05-25
**Related**: `GET /api/v1/file/:file_uuid/faces`, `identities`, `strangers`, `face_detections`
---
## Glossary
| Term | Definition |
|------|------------|
| **face detection** | A single face bounding box detected in one video frame. Stored in `face_detections` table. |
| **trace** | A sequence of face detections belonging to the same person across consecutive frames. Assigned by the face tracker. `trace_id` groups multiple face detections. |
| **identity** | A known person with a name. Sources: TMDb (movie stars), user-defined (manual entry). Stored in `identities` table with `source='tmdb'` or `source='user_defined'`. |
| **stranger** | An unknown person detected but not matched to any known identity. Created automatically for unmatched traces. Stored in `strangers` table. |
| **binding** | The association between a face detection and either an identity or a stranger. Represented by `identity_id` or `stranger_id` FK in `face_detections`. |
| **TMDb** | The Movie Database. Source of celebrity identity seeds with `face_embedding` for matching. |
| **auto identity** | Legacy term for identities created from `face_clustered.json` analysis. Now migrated to `strangers` table as reference records. |
| **dangling** | A face detection whose `identity_id` points to a deleted identity (e.g., auto identities removed during migration). |
| **unbound** | A face detection with no binding at all — `identity_id IS NULL AND stranger_id IS NULL`. |
| **PK** | Primary Key. A unique identifier for each row in a table. Example: `identities.id`, `strangers.id`, `face_detections.id`. |
| **FK** | Foreign Key. A column that references the PK of another table, creating a relationship. Example: `face_detections.identity_id``identities.id`, `face_detections.stranger_id``strangers.id`. FK ensures referential integrity — a face cannot point to a non-existent identity. |
---
## Three Core Tables
### ER Diagram
```
┌─────────────────────┐ ┌─────────────────────┐
│ identities │ │ strangers │
│─────────────────────│ │─────────────────────│
│ id (PK) │ │ id (PK) │
│ uuid │ │ file_uuid │
│ name │ │ trace_id │
│ source │ │ metadata │
│ tmdb_id │ │ created_at │
│ face_embedding │ │ │
│ metadata │ │ UNIQUE(file_uuid, │
│ status │ │ trace_id) │
│ ... │ │ │
└─────────┬───────────┘ └─────────┬───────────┘
│ │
│ FK │ FK
│ (ON DELETE SET NULL) │ (ON DELETE SET NULL)
│ │
▼ ▼
┌─────────────────────────────────────────────────────┐
│ face_detections │
│─────────────────────────────────────────────────────│
│ id (PK) │
│ file_uuid — Video file identifier │
│ frame_number — Frame where face was detected│
│ timestamp_secs — Frame number / fps │
│ trace_id — Face tracking ID │
│ face_id — Format: `{frame}_{idx}` │
│ identity_id (FK) — → identities.id │
│ stranger_id (FK) — → strangers.id │
│ x, y, width, height — Bounding box │
│ confidence — Detection confidence (01) │
│ embedding — Face embedding vector │
│ metadata — JSON metadata │
└─────────────────────────────────────────────────────┘
```
### Table Summary
| Table | Role | Record Count (public) | Primary Key |
|-------|------|----------------------|-------------|
| `identities` | Known persons (TMDb, user-defined) | 70 | `id`, `uuid` |
| `strangers` | Unknown persons (unmatched traces) | 0N per file | `id`, `(file_uuid, trace_id)` |
| `face_detections` | Individual face detections | 70691 per file | `id` |
### Key Columns in `face_detections`
| Column | Type | Purpose |
|--------|------|---------|
| `identity_id` | INTEGER FK | Points to `identities.id` if matched to known person |
| `stranger_id` | INTEGER FK | Points to `strangers.id` if unmatched trace |
| `trace_id` | INTEGER | Groups faces belonging to same person across frames |
**Design Rule**: `identity_id` and `stranger_id` are mutually exclusive in normal operation. A face should have only one binding.
---
## Four Binding States
### State Definitions
| # | State | `binding` JSON | SQL Condition | Meaning |
|---|-------|----------------|---------------|---------|
| 1 | **identity** | `{"identity_id": 9, "identity_uuid": "...", "identity_name": "Audrey Hepburn"}` | `identity_id IN (SELECT id FROM identities)` | Face matched to a known TMDb or user-defined identity |
| 2 | **stranger** | `{"stranger_id": 845, "metadata": {}}` | `stranger_id IS NOT NULL` | Face belongs to an unmatched trace (unknown person) |
| 3 | **dangling** | `{"old_identity_id": 18052}` | `identity_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM identities WHERE id = face_detections.identity_id)` | Face was bound to an identity that has been deleted (orphaned reference) |
| 4 | **unbound** | `null` | `identity_id IS NULL AND stranger_id IS NULL` | Face has no binding at all |
### State Detection Logic (Rust)
```rust
let binding = if let (Some(iid), Some(iuuid), Some(iname)) =
(identity_id, identity_uuid, identity_name)
{
FaceBinding::Identity { identity_id: iid, identity_uuid: iuuid, identity_name: iname }
} else if let Some(sid) = stranger_id {
FaceBinding::Stranger { stranger_id: sid, metadata: stranger_metadata }
} else if let Some(iid) = identity_id {
FaceBinding::Dangling { old_identity_id: iid }
} else {
FaceBinding::Unbound
};
```
---
## Lifecycle Flow
### Processing Pipeline
```
Video Registration
Face Detection
(face_detections created)
Face Tracking
(trace_id assigned)
┌────────────────┐
│ Identity Agent │
│ Face Matching │
└────────────────┘
┌─────────┴─────────┐
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ MATCHED │ │ UNMATCHED│
│ to TMDb │ │ trace │
└─────┬────┘ └────┬─────┘
│ │
│ │
▼ ▼
identity_id=X stranger_id=S
│ │
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ IDENTITY│ │ STRANGER│
│ state │ │ state │
└─────────┘ └─────────┘
```
### User Operations
```
┌─────────┐ bind ┌─────────┐
│ STRANGER│──────────────▶│ IDENTITY│
└────┬────┘ └────┬────┘
│ │
│ unbind │
│ (if stranger_id │
│ preserved) │
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ STRANGER│◀─────────────│ UNBOUND │
│ (rollback) │ (if no │
└─────────┘ │ stranger)│
└─────────┘
```
### Migration Effect
```
┌─────────────────────┐
│ auto identities │
│ (source='auto') │
│ 943 records │
└─────────┬───────────┘
│ DELETE
┌─────────────────────┐
│ face_detections │
│ identity_id=18052 │
│ (points to deleted) │
└─────────┬───────────┘
│ Cleanup SQL
│ SET identity_id=NULL
┌─────────────────────┐
│ DANGLING → UNBOUND │
│ 18641 faces cleaned │
└─────────────────────┘
```
---
## SQL Query Examples
### Count by State
```sql
SELECT
COUNT(*) FILTER (WHERE identity_id IN (SELECT id FROM identities)) AS identity,
COUNT(*) FILTER (WHERE stranger_id IS NOT NULL) AS stranger,
COUNT(*) FILTER (WHERE identity_id IS NOT NULL
AND NOT EXISTS (SELECT 1 FROM identities WHERE id = face_detections.identity_id)) AS dangling,
COUNT(*) FILTER (WHERE identity_id IS NULL AND stranger_id IS NULL) AS unbound
FROM face_detections
WHERE file_uuid = 'aeed71342a899fe4b4c57b7d41bcb692';
```
### Filter by State
```sql
-- Identity
SELECT * FROM face_detections fd
WHERE fd.identity_id IN (SELECT id FROM identities);
-- Stranger
SELECT * FROM face_detections WHERE stranger_id IS NOT NULL;
-- Dangling
SELECT * FROM face_detections fd
WHERE fd.identity_id IS NOT NULL
AND NOT EXISTS (SELECT 1 FROM identities WHERE id = fd.identity_id);
-- Unbound
SELECT * FROM face_detections
WHERE identity_id IS NULL AND stranger_id IS NULL;
```
---
## bind/unbind Behavior
### Current Implementation (stranger_id cleared on bind)
| Operation | SQL Effect | Result |
|-----------|------------|--------|
| `bind_face_to_identity` | `SET identity_id=X, stranger_id=NULL` | Stranger info lost |
| `bind_trace_to_identity` | `SET identity_id=X, stranger_id=NULL` | Stranger info lost |
| `merge_identity` | `SET identity_id=X, stranger_id=NULL` | Stranger info lost |
| `unbind_face` | `SET identity_id=NULL` | Becomes unbound (cannot rollback) |
**Problem**: After bind → unbind, face becomes unbound instead of returning to stranger.
### Proposed Fix (preserve stranger_id on bind)
| Operation | SQL Effect | Result |
|-----------|------------|--------|
| `bind_face_to_identity` | `SET identity_id=X` (keep stranger_id) | Stranger info preserved |
| `bind_trace_to_identity` | `SET identity_id=X` (keep stranger_id) | Stranger info preserved |
| `merge_identity` | `SET identity_id=X` (keep stranger_id) | Stranger info preserved |
| `unbind_face` | `SET identity_id=NULL` | Returns to stranger (if stranger_id exists) |
**Change Required**: Remove `, stranger_id = NULL` from three UPDATE queries in `identity_binding.rs`.
---
## Why Dangling Happens
Dangling occurs when `face_detections.identity_id` points to a deleted row in `identities` table.
### Root Cause
At the time of migration, `face_detections.identity_id` **had no FK constraint** to `identities.id`. This allowed:
1. `DELETE FROM identities WHERE source='auto'` succeeded without error
2. `face_detections.identity_id` values remained unchanged (pointing to deleted IDs)
3. No `ON DELETE SET NULL` triggered because no FK existed
### Prevention
With FK constraint in place:
```sql
ALTER TABLE face_detections
ADD CONSTRAINT fk_face_detections_identity
FOREIGN KEY (identity_id) REFERENCES identities(id) ON DELETE SET NULL;
```
Deleting an identity would automatically set `face_detections.identity_id = NULL` (no dangling).
### Current Status
After migration cleanup:
- Public schema: FK `fk_face_detections_stranger` exists (on `stranger_id`)
- Public schema: FK `fk_face_detections_identity` **does not exist** (historical reason)
- Dev schema: Same state as public
---
## API Endpoint
### `GET /api/v1/file/:file_uuid/faces`
**Purpose**: List all face detections in a file with binding state.
**Query Parameters**:
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `page` | int | 1 | Page number |
| `page_size` | int | 50 | Items per page |
| `binding` | string | — | Filter: `identity`, `stranger`, `dangling`, `unbound` |
| `trace_id` | int | — | Filter by trace ID |
| `min_confidence` | float | — | Minimum confidence (0.01.0) |
| `start_frame` | int | — | Start frame (inclusive) |
| `end_frame` | int | — | End frame (inclusive) |
**Response Example**:
```json
{
"success": true,
"file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
"total": 52244,
"page": 1,
"page_size": 2,
"data": [
{
"id": 661508,
"file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
"frame_number": 21297,
"timestamp_secs": 851.88,
"face_id": "21297_0",
"trace_id": 485,
"bbox": { "x": 1072, "y": 390, "width": 56, "height": 56 },
"confidence": 0.6114,
"binding": {
"identity_id": 9,
"identity_uuid": "c3545906-c82d-4b66-aa1d-150bc02decce",
"identity_name": "Audrey Hepburn"
}
}
]
}
```
---
## Migration Reference
### `migrate_strangers_table.sql` (Summary)
1. `CREATE TABLE strangers`
2. Insert unmatched traces → strangers
3. Preserve auto identity metadata → strangers (NULL file_uuid/trace_id)
4. Update `face_detections.stranger_id` → FK
5. Add FK constraint
6. Delete legacy `identity_bindings` for auto identities
7. Delete `identities` where `source='auto'`
8. Cleanup dangling `identity_id` (set to NULL)
### Cleanup SQL (Dangling)
```sql
UPDATE face_detections fd
SET identity_id = NULL
WHERE NOT EXISTS (SELECT 1 FROM identities i WHERE i.id = fd.identity_id)
AND fd.identity_id IS NOT NULL;
```
---
*Updated: 2026-05-25*