docs: update TKG documentation for Identity Agent V4.0

- Add new file: 2026-06-25_identity_agent_v4.0.md (M4 workspace)
  - Complete architecture overview
  - All phases completed
  - Thresholds, components, test results

- Update: API_WORKSPACE/modules/15_tkg.md
  - Correct node type: face_trace → face_track
  - Add text_region (replaces text_trace)
  - Add Identity Agent integration section
  - face_track status values (pending/suggested/confirmed/stranger)
  - Example face_track node with identity properties
This commit is contained in:
Accusys
2026-06-25 02:27:34 +08:00
parent 4b4d37b332
commit 406b2d5524
2 changed files with 217 additions and 4 deletions

View File

@@ -10,16 +10,63 @@ TKG is a time-aligned knowledge graph built from multi-processor outputs (face,
| Node Type | Description | Key Properties |
|-----------|-------------|----------------|
| `face_trace` | A tracked face identity over time | `trace_id`, `face_count`, `avg_confidence` |
| `gaze_trace` | Gaze direction over time | `direction` (frontal/left/right/up/down + diagonals) |
| `lip_trace` | Lip movement synced with speech | `speaker_id`, `lip_area_range` |
| `text_trace` | Spoken text aligned to time | `speaker_id`, `text`, `start_time`, `end_time` |
| `face_track` | A tracked face identity over time | `trace_id`, `frame_count`, `status`, `pending_identity_name`, `confidence`, `identity_uuid` (see Identity Agent section below) |
| `gaze_track` | Gaze direction over time | `direction` (frontal/left/right/up/down + diagonals) |
| `lip_track` | Lip movement synced with speech | `speaker_id`, `lip_area_range` |
| `text_region` | Spoken text aligned to time | `speaker_id`, `text`, `start_time`, `end_time` |
| `appearance_trace` | Human appearance (clothing) over time | `clothing_color`, `upper_cloth`, `lower_cloth` |
| `skin_tone_trace` | Fitzpatrick skin tone classification | `fitzpatrick_type` (IVI) |
| `accessory` | Detected accessories | `type` (glasses/hat/etc.), `confidence` |
| `object` | YOLO-detected object | `class`, `confidence`, `frame_count` |
| `speaker` | ASRX speaker segment | `speaker_id`, `segment_count`, `total_duration` |
---
### Identity Agent Integration (face_track nodes)
Identity Agent marks face_track nodes with identity binding status.
#### face_track Status Values
| Status | Description | Properties |
|--------|-------------|------------|
| `pending` | No identity suggestion | Default state |
| `suggested` | Identity Agent suggested | `pending_identity_name`, `pending_identity_uuid`, `suggested_by`, `confidence` |
| `confirmed` | User confirmed binding | `identity_uuid`, `identity_id`, `identity_ref`, `identity_name` |
| `stranger` | Stranger cluster member | `stranger_id`, `stranger_ref` |
#### Suggested By Values
| Value | Description |
|-------|-------------|
| `tmdb` | TMDb seed matched |
| `propagation` | Confirmed trace propagation |
| `manual` | User manual selection |
#### Example face_track Node
```json
{
"node_type": "face_track",
"external_id": "face_track_1",
"label": "Face Track 1",
"properties": {
"trace_id": 1,
"frame_count": 45,
"start_frame": 100,
"end_frame": 300,
"avg_bbox": {"x": 100, "y": 200, "width": 80, "height": 100},
"status": "suggested",
"pending_identity_name": "Tom Hanks",
"pending_identity_uuid": "xxx-xxx",
"suggested_by": "tmdb",
"confidence": 0.91
}
}
```
---
### Edge Types
| Edge Type | Source → Target | Description |

View File

@@ -0,0 +1,166 @@
---
title: Identity Agent V4.0 Architecture
version: 1.0
date: 2026-06-25
author: OpenCode
status: Completed
---
## Goal
- Implement Identity Agent with single Qdrant `_faces` collection architecture
- Multi-angle matching with propagation support
- TKG node marking for identity binding status tracking
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Identity Agent V4.0 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Data Stores: │
│ ├─ Qdrant _faces (512D embeddings, trace-level) │
│ ├─ Qdrant _seeds (512D embeddings, identity-level) │
│ ├─ PG identities (metadata: name, tmdb_id, source) │
│ ├─ PG face_detections (trace_id, identity_id) │
│ └─ PG tkg_nodes (face_track nodes, status tracking) │
│ │
│ Flow: │
│ 1. TMDb Query → Seeds │
│ 2. Identity Agent Round 1 → 建議人臉 │
│ 3. User Confirm → 確認人臉 + 自動 Round 2 │
│ 4. Propagation → 更多建議 │
│ 5. Stranger Clustering → stranger_ref │
│ │
└─────────────────────────────────────────────────────────────┘
```
## Components
### 1. Python CLI Scripts
| Script | Purpose |
|--------|---------|
| `identity_matcher.py` | Multi-angle face matching (Round 1-3) |
| `confirm_identity.py` | Confirm identity binding + auto propagation |
| `generate_seed_embeddings.py` | TMDb profiles → _seeds collection |
| `manual_seed.py` | User-selected trace → manual seed |
### 2. Qdrant Collections
| Collection | Vectors | Payload |
|------------|---------|---------|
| `_faces` | 512D, Cosine | `{file_uuid, frame, trace_id, bbox, confidence, identity_id, identity_uuid, stranger_id}` |
| `_seeds` | 512D, Cosine | `{identity_id, identity_uuid, name, source, file_uuid, trace_id, tmdb_id}` |
### 3. TKG face_track Node Properties
```json
{
"trace_id": 2,
"frame_count": 45,
"start_frame": 100,
"end_frame": 300,
"avg_bbox": {...},
// Identity binding states
"status": "pending | suggested | confirmed | stranger",
"pending_identity_name": "Tom Hanks",
"pending_identity_uuid": "xxx-xxx",
"suggested_by": "tmdb | propagation | manual",
"confidence": 0.91,
// Confirmed fields
"identity_uuid": "xxx-xxx",
"identity_id": 1,
"identity_ref": "file_uuid:identity_1",
"stranger_id": 1,
"stranger_ref": "stranger_1"
}
```
## Matching Thresholds
| Round | Threshold | Seed Source |
|-------|-----------|-------------|
| Round 1 | 0.55 | TMDb seeds |
| Round 2 | 0.55 | Confirmed traces (propagation seeds) |
| Round 3+ | 0.50 | More confirmed traces |
| Stranger clustering | 0.40 | Unmatched traces (greedy merge) |
## Progress
### Done
- **Phase 1**: `_seeds` collection + helper functions (`qdrant_faces.py`)
- **Phase 2**: Multi-angle matching (`identity_matcher.py`)
- **Phase 3**: TKG node marking (`tkg_helper.py`)
- **Phase 4**: Confirm API + auto propagation (`confirm_identity.py`)
- **Phase 5**: Propagation Round 2-3 logic
- **Phase 6**: Stranger clustering (greedy merge)
- **Phase 7**: TMDb seed generation (`generate_seed_embeddings.py`)
- **Phase 8**: Manual seed creation (`manual_seed.py`)
- **Phase 9**: CoreML embedding extraction
- **Phase 10**: End-to-end testing
### Deprecated (Removed in V4.0)
- `FaceEmbeddingDb` module
- `person_id` concept
- `person_identities` table
- `sync_trace_embeddings` function
- PG `face_detections.embedding` column
- Qdrant `{schema}_face_embeddings` collection
## Key Decisions
- **Single `_faces` collection**: No schema prefix, fixed name for dev/prod
- **Trace-level binding**: All identity operations are trace operations
- **Multi-angle matching**: 3 representatives (start, middle, end) per trace
- **Propagation seeds**: Confirmed traces become seeds (source='propagation')
- **TKG status tracking**: `pending → suggested → confirmed → stranger`
- **Auto propagation**: Round 2 triggers automatically after confirmation
- **CoreML FaceNet**: 512D embeddings extracted during face processing
## Test Results
| Test | Result |
|------|--------|
| Round 1 matching | ✅ 3/4 traces matched (score ~0.91) |
| TKG marking | ✅ status='suggested', confidence recorded |
| Confirm flow | ✅ TKG + Qdrant + PG + propagation seed |
| Stranger clustering | ✅ Greedy merge (TH=0.40) |
| TMDb seed generation | ✅ 3 seeds (Cary Grant, Audrey Hepburn, Walter Matthau) |
| End-to-end test | ✅ All phases passed |
## Commits
| Commit | Description |
|--------|-------------|
| `074cdcdb` | Remove face embedding architecture |
| `9fbb4f9b` | Add Qdrant `_faces` embedding push |
| `580c4b40` | Add `_seeds` collection helper |
| `6851cb47` | Add identity_matcher.py |
| `21b9f500` | Add TKG node marking |
| `4198a740` | Add confirm_identity.py |
| `b5e3adf5` | Add generate_seed_embeddings.py |
| `d20819b0` | Add manual_seed.py |
| `b19b1a8c` | Fix count_seeds empty body |
| `4b4d37b3` | Fix qdrant_request empty body |
## Relevant Files
| Category | Files |
|----------|-------|
| **Python scripts** | `scripts/identity_matcher.py`, `scripts/confirm_identity.py`, `scripts/generate_seed_embeddings.py`, `scripts/manual_seed.py` |
| **Python utils** | `scripts/utils/qdrant_faces.py`, `scripts/utils/tkg_helper.py` |
| **Rust processor** | `scripts/face_processor.py` (Qdrant push), `scripts/store_traced_faces.py` (trace_id update) |
| **Rust stubs** | `src/api/identity_agent_api.rs`, `src/api/tmdb_api.rs` (await Rust integration) |
| **TKG builder** | `src/core/processor/tkg.rs` |
## Next Steps
- **Rust API integration**: Call Python scripts from stubbed Rust functions
- **Production deployment**: Generate TMDb seeds for all identities
- **Workflow documentation**: User guide for Identity Agent CLI usage