feat: implement skin_tone_trace node builder and standardize TKG node naming

- Add build_skin_tone_trace_nodes() to tkg.rs (Fitzpatrick I-VI classification)
- Add skin_tone_trace_nodes field to TkgResult
- Standardize node naming: _trace -> _track (text uses _region)
- Add external_id format column to Node Types table
- Add storage names to Edge Types table
- Create TKG_FORMATION_V1.0.md with Phase 0-4 definition, flow diagram, queries
- Add cross-reference from identity_agent_v4.0.md to TKG Formation
- Update Python scripts to executable mode
This commit is contained in:
Accusys
2026-06-25 03:09:16 +08:00
parent 406b2d5524
commit 4273576612
8 changed files with 680 additions and 72 deletions

View File

@@ -0,0 +1,499 @@
---
title: TKG Formation V1.0
version: 1.0
date: 2026-06-25
author: OpenCode
status: draft
---
## Overview
Temporal Knowledge Graph (TKG) is built from multi-processor outputs to create a time-aligned knowledge graph. This document defines the formation phases, node/edge types, data flow, and integration with Identity Agent.
---
## Phase Definition
| Phase | Name | Trigger | Input | Output | Code Location |
|-------|------|---------|-------|--------|---------------|
| **Phase 0** | Populate | TKG rebuild | `face.json` | `PG face_detections.trace_id` | `tkg.rs:20-100` |
| **Phase 1** | Extract | Video register | Video frames | `Qdrant _faces` (512D embeddings) | `face_processor.py` |
| **Phase 2** | Build Nodes | TKG rebuild | `face.json`, PG tables | `tkg_nodes` (9 types) | `tkg.rs:506-515` |
| **Phase 3** | Build Edges | TKG rebuild | `tkg_nodes`, pose data | `tkg_edges` (9 types) | `tkg.rs:517-524` |
| **Phase 4** | Identity | Manual/API call | `Qdrant _faces`, `_seeds` | `tkg_nodes.status` updated | `identity_matcher.py` |
### Phase Details
#### Phase 0: Populate
**Purpose:** Assign `trace_id` to face detections.
**Flow:**
1. Check if `trace_id IS NOT NULL` already exists in `face_detections`
2. If not, call `store_traced_faces.py`
3. `store_traced_faces.py` runs `face_tracker.py` (IoU-only)
4. Update `face_detections.trace_id`
**Dependency:** `face.json` must exist (from Phase 1)
---
#### Phase 1: Extract
**Purpose:** Generate face embeddings and push to Qdrant.
**Flow:**
1. `face_processor.py` runs Vision detection (ANE)
2. Crop faces from video frames
3. CoreML FaceNet → 512D embedding
4. Push to Qdrant `_faces` collection
5. Write `face.json` (metadata only, no embedding)
**Output:**
- `face.json` in output directory
- Qdrant `_faces` collection with embeddings
---
#### Phase 2: Build Nodes
**Purpose:** Create TKG nodes from processor outputs.
**Node Builders:**
| Builder | Node Type | Data Source |
|---------|-----------|-------------|
| `build_face_track_nodes` | `face_track` | `face.json`, `face_detections` |
| `build_gaze_track_nodes` | `gaze_track` | `face.json` (pose data) |
| `build_lip_track_nodes` | `lip_track` | `face.json` (lips), `asrx.json` |
| `build_text_region_nodes` | `text_region` | `asrx.json` |
| `build_appearance_trace_nodes` | `appearance_trace` | `yolo.json` (person) |
| `build_accessory_nodes` | `accessory` | `yolo.json` |
| `build_yolo_object_nodes` | `object` | `yolo.json` |
| `build_hand_nodes` | `hand` | `pose.json` |
| `build_speaker_nodes` | `speaker` | `asrx.json` |
| `build_skin_tone_trace_nodes` | `skin_tone_trace` | **TODO** |
---
#### Phase 3: Build Edges
**Purpose:** Create TKG edges from node relationships.
**Edge Builders:**
| Builder | Edge Type | Source → Target |
|---------|-----------|-----------------|
| `build_co_occurrence_edges` | `co_occurs` | `object ↔ object` |
| `build_speaker_face_edges` | `speaker_face` | `speaker ↔ face_track` |
| `build_face_face_edges` | `face_face` | `face_track ↔ face_track` |
| `build_mutual_gaze_edges` | `mutual_gaze` | `gaze_track ↔ gaze_track` |
| `build_lip_sync_edges` | `lip_sync` | `lip_track ↔ text_region` |
| `build_has_appearance_edges` | `has_appearance` | `face_track ↔ appearance_trace` |
| `build_wears_edges` | `wears` | `face_track ↔ accessory` |
| `build_hand_object_edges` | `hand_object` | `hand ↔ object` |
---
#### Phase 4: Identity
**Purpose:** Mark face_track nodes with identity binding status.
**Flow:**
1. Identity Agent queries `_seeds` collection (TMDb/manual/propagation)
2. Queries `_faces` collection for trace representatives
3. Multi-angle matching (3 reps per trace)
4. Mark TKG nodes: `status='suggested'`, `confidence`, `pending_identity_name`
5. User confirms → update TKG, Qdrant, PG
6. Confirmed trace becomes propagation seed in `_seeds`
---
## Node Types (Naming Standardized)
**Naming Rule:**
- All trace types use `_track` suffix
- Text uses `_region` (non-temporal)
| Node Type | External ID Format | Key Properties |
|-----------|---------------------|----------------|
| `face_track` | `face_track_{trace_id}` | `trace_id`, `frame_count`, `start_frame`, `end_frame`, `avg_bbox`, `status`, `confidence`, `identity_uuid` |
| `gaze_track` | `gaze_track_{id}` | `direction` (frontal/left/right/up/down + diagonals) |
| `lip_track` | `lip_track_{id}` | `speaker_id`, `lip_area_range` |
| `text_region` | `text_region_{id}` | `speaker_id`, `text`, `start_time`, `end_time` |
| `appearance_trace` | `appearance_{trace_id}` | `clothing_color`, `upper_cloth`, `lower_cloth` |
| `skin_tone_trace` | `skin_tone_{trace_id}` | `fitzpatrick_type` (I-VI) - **TODO** |
| `accessory` | `accessory_{id}` | `type` (glasses/hat/etc.), `confidence` |
| `object` | `object_{class}_{id}` | `class`, `confidence`, `frame_count` |
| `speaker` | `speaker_{speaker_id}` | `speaker_id`, `segment_count`, `total_duration` |
### face_track Identity Properties
| Property | Type | Values |
|----------|------|--------|
| `status` | string | `pending` | `suggested` | `confirmed` | `stranger` |
| `pending_identity_name` | string/null | Suggested identity name |
| `pending_identity_uuid` | string/null | Suggested identity UUID |
| `suggested_by` | string/null | `tmdb` | `propagation` | `manual` |
| `confidence` | float/null | Matching score (0.0-1.0) |
| `identity_uuid` | string/null | Confirmed identity UUID |
| `identity_id` | integer/null | Confirmed PG identity.id |
| `identity_ref` | string/null | Reference string (e.g., `file_uuid:identity_1`) |
| `stranger_id` | integer/null | Stranger cluster ID |
| `stranger_ref` | string/null | Reference string (e.g., `stranger_1`) |
---
## Edge Types
| Edge Type | Storage Name | Source → Target | Properties |
|-----------|--------------|-----------------|------------|
| `co_occurs` | `CO_OCCURS_WITH` | `object ↔ object` | `frame_count`, `confidence` |
| `speaker_face` | `SPEAKS_AS` | `speaker → face_track` | `overlap_frames`, `confidence` |
| `face_face` | `INTERACTS_WITH` | `face_track ↔ face_track` | `co_occurrence_frames` |
| `mutual_gaze` | `MUTUAL_GAZE` | `gaze_track ↔ gaze_track` | `frame_count` |
| `lip_sync` | `LIP_SYNC` | `lip_track → text_region` | `speaker_id` |
| `has_appearance` | `HAS_APPEARANCE` | `face_track → appearance_trace` | `frame_count` |
| `wears` | `WEARS` | `face_track → accessory` | `confidence` |
| `hand_object` | `HOLDS` | `hand → object` | `frame_count`, `confidence` |
---
## Data Flow Diagram
```mermaid
graph TB
subgraph Phase0[Phase 0: Populate]
A[face.json] --> B[store_traced_faces.py]
B --> C[PG face_detections.trace_id]
end
subgraph Phase1[Phase 1: Extract]
D[Video Frames] --> E[face_processor.py]
E --> F[face.json metadata]
E --> G[Qdrant _faces 512D]
end
subgraph Phase2[Phase 2: Build Nodes]
C --> H[TKG Builder]
F --> H
I[pose.json] --> H
J[asrx.json] --> H
K[yolo.json] --> H
H --> L[tkg_nodes<br/>9 node types]
end
subgraph Phase3[Phase 3: Build Edges]
L --> M[TKG Builder]
M --> N[tkg_edges<br/>9 edge types]
end
subgraph Phase4[Phase 4: Identity]
G --> O[identity_matcher.py]
L --> O
P[Qdrant _seeds] --> O
O --> Q[mark_tkg_suggested]
Q --> L
O --> R[confirm_identity.py]
R --> L
R --> G
R --> P
end
style Phase0 fill:#e1f5fe
style Phase1 fill:#fff9c4
style Phase2 fill:#e8f5e9
style Phase3 fill:#f3e5f5
style Phase4 fill:#fce4ec
```
---
## skin_tone_trace Implementation
### Status: ✅ Implemented (2026-06-25)
See `src/core/processor/tkg.rs` lines 2579-2627 for implementation.
### Overview
Skin tone classification using Fitzpatrick scale from face skin color analysis.
### Fitzpatrick Classification
| Type | Skin H Range (HSV) | Description | Example |
|------|-------------------|-------------|---------|
| I | H < 10° | Very fair/pale | Northern European |
| II | 10° ≤ H < 20° | Fair | European |
| III | 20° ≤ H < 30° | Medium | Mediterranean |
| IV | 30° ≤ H < 40° | Olive | Asian, Hispanic |
| V | 40° ≤ H < 50° | Brown | Indian, African |
| VI | H ≥ 50° | Dark brown | African |
### Implementation
**Rust Code Location:** `src/core/processor/tkg.rs`
```rust
// Add to build_tkg()
let n_skin = build_skin_tone_trace_nodes(pool, file_uuid).await?;
// Add to TkgResult
pub skin_tone_trace_nodes: usize,
// New builder function
async fn build_skin_tone_trace_nodes(
pool: &PgPool,
file_uuid: &str,
) -> Result<usize> {
let fd_table = t("face_detections");
// Step 1: Get avg skin H per trace
let rows: Vec<(i64, f64)> = sqlx::query_as(&format!(
"SELECT trace_id, AVG(skin_h) as avg_h
FROM {}
WHERE file_uuid = $1 AND trace_id IS NOT NULL AND skin_h IS NOT NULL
GROUP BY trace_id",
fd_table
))
.bind(file_uuid)
.fetch_all(pool)
.await?;
// Step 2: Classify Fitzpatrick
let nodes_table = t("tkg_nodes");
let mut count = 0;
for (trace_id, avg_h) in &rows {
let fitz_type = classify_fitzpatrick(*avg_h);
let external_id = format!("skin_tone_{}", trace_id);
let label = format!("Skin Tone Trace {}", trace_id);
sqlx::query(&format!(
"INSERT INTO {} (node_type, external_id, file_uuid, label, properties)
VALUES ('skin_tone_trace', $1, $2, $3, $4::jsonb)
ON CONFLICT (file_uuid, node_type, external_id) DO UPDATE SET properties = EXCLUDED.properties",
nodes_table
))
.bind(&external_id)
.bind(file_uuid)
.bind(&label)
.bind(serde_json::json!({
"trace_id": trace_id,
"avg_skin_h": avg_h,
"fitzpatrick_type": fitz_type,
}))
.execute(pool)
.await?;
count += 1;
}
Ok(count)
}
fn classify_fitzpatrick(h: f64) -> &'static str {
if h < 10.0 { "I" }
else if h < 20.0 { "II" }
else if h < 30.0 { "III" }
else if h < 40.0 { "IV" }
else if h < 50.0 { "V" }
else { "VI" }
}
```
### Dependencies
- `face_detections.trace_id` must be populated (from Phase 0)
- `face.json` used for skin_h estimation (placeholder based on attributes)
- `output_dir` path must be passed to builder
### Limitations
- Current skin_h estimation is placeholder (based on face attributes)
- For accurate Fitzpatrick classification, face ROI color extraction needed
- Face.json doesn't store skin_h directly (would need video frame analysis)
---
## SQL Query Examples
### Status Queries
```sql
-- Get all face_track nodes for a file
SELECT id, external_id, label, properties
FROM dev.tkg_nodes
WHERE node_type = 'face_track' AND file_uuid = 'xxx'
ORDER BY external_id;
-- Get pending faces (no identity suggestion)
SELECT id, external_id, properties->>'trace_id' as trace_id
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND file_uuid = 'xxx'
AND (properties->>'status' IS NULL OR properties->>'status' = 'pending');
-- Get suggested faces (Identity Agent suggested)
SELECT id, external_id,
properties->>'pending_identity_name' as name,
properties->>'confidence' as confidence,
properties->>'suggested_by' as source
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND file_uuid = 'xxx'
AND properties->>'status' = 'suggested'
ORDER BY (properties->>'confidence')::float DESC;
-- Get confirmed faces
SELECT id, external_id,
properties->>'identity_uuid' as identity_uuid,
properties->>'identity_name' as name
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND file_uuid = 'xxx'
AND properties->>'status' = 'confirmed';
-- Get stranger cluster members
SELECT id, external_id, properties->>'stranger_id' as cluster
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND file_uuid = 'xxx'
AND properties->>'status' = 'stranger'
ORDER BY (properties->>'stranger_id')::int;
```
### Identity Queries
```sql
-- Find all traces bound to an identity
SELECT id, external_id, properties->>'trace_id' as trace_id
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND properties->>'identity_uuid' = 'xxx-xxx';
-- Count identities per file
SELECT properties->>'identity_uuid' as identity_uuid,
COUNT(*) as trace_count
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND file_uuid = 'xxx'
AND properties->>'status' = 'confirmed'
GROUP BY properties->>'identity_uuid';
```
### Statistics Queries
```sql
-- Status distribution for a file
SELECT properties->>'status' as status, COUNT(*) as count
FROM dev.tkg_nodes
WHERE node_type = 'face_track' AND file_uuid = 'xxx'
GROUP BY properties->>'status';
-- Confidence distribution
SELECT
CASE
WHEN (properties->>'confidence')::float >= 0.9 THEN 'high'
WHEN (properties->>'confidence')::float >= 0.7 THEN 'medium'
ELSE 'low'
END as confidence_level,
COUNT(*) as count
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND file_uuid = 'xxx'
AND properties->>'status' = 'suggested'
GROUP BY confidence_level;
-- Top suggested identities
SELECT properties->>'pending_identity_name' as name,
COUNT(*) as trace_count,
AVG((properties->>'confidence')::float) as avg_confidence
FROM dev.tkg_nodes
WHERE node_type = 'face_track'
AND properties->>'status' = 'suggested'
GROUP BY properties->>'pending_identity_name'
ORDER BY trace_count DESC
LIMIT 10;
```
### Cross-node Queries
```sql
-- Speaker ↔ face_track edges
SELECT
s.external_id as speaker,
f.external_id as face_track,
e.properties->>'overlap_frames' as overlap
FROM dev.tkg_edges e
JOIN dev.tkg_nodes s ON e.source_node_id = s.id
JOIN dev.tkg_nodes f ON e.target_node_id = f.id
WHERE e.file_uuid = 'xxx'
AND e.edge_type = 'SPEAKS_AS';
-- Objects co-occurrence
SELECT
o1.external_id as obj1,
o2.external_id as obj2,
e.properties->>'frame_count' as co_frames
FROM dev.tkg_edges e
JOIN dev.tkg_nodes o1 ON e.source_node_id = o1.id
JOIN dev.tkg_nodes o2 ON e.target_node_id = o2.id
WHERE e.file_uuid = 'xxx'
AND e.edge_type = 'CO_OCCURS_WITH'
ORDER BY (e.properties->>'frame_count')::int DESC;
```
---
## Integration with Identity Agent
### Identity Agent Flow
```
Identity Agent Pipeline:
├─ Round 1 (TH=0.55):
│ Query _seeds (source='tmdb')
│ Query _faces (file_uuid) → get trace representatives
│ Multi-angle match → suggestions
│ Mark TKG: status='suggested', confidence
├─ User Confirmation:
│ Update TKG: status='confirmed'
│ Update _faces: identity_uuid for all points
│ Update PG face_detections: identity_id
│ Add _seeds: source='propagation'
├─ Round 2 (TH=0.55):
│ Use confirmed traces as seeds
│ Match remaining pending traces
└─ Round 3+ (TH=0.50):
Continue propagation
Stranger clustering (TH=0.40)
```
### TKG Node Status Transitions
```
pending → suggested → confirmed → (final)
↘ stranger ↘ (final)
```
### Status Transition Triggers
| Transition | Trigger | Action |
|------------|---------|--------|
| `pending → suggested` | Identity Agent Round 1-3 | `mark_face_track_suggested()` |
| `suggested → confirmed` | User confirmation API | `mark_face_track_confirmed()` |
| `pending → stranger` | Stranger clustering | `mark_face_track_stranger()` |
| `confirmed → pending` | Undo binding | `clear_face_track_status()` |
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-06-25 | Initial version with Phase 0-4 definition, node naming, flow diagram, query examples |