feat: Phase 2.5 gaze_trace and lip_trace Qdrant migration + Charade Q&A test

Phase 2.5.1: gaze_trace_nodes from Qdrant
- build_gaze_trace_nodes_from_qdrant()
- Read trace_id, frame, bbox from Qdrant payload
- Compute gaze stats (yaw, pitch, roll, gaze direction, blink)
- No PostgreSQL face_detections dependency

Phase 2.5.2: lip_trace_nodes from Qdrant + face.json
- build_lip_trace_nodes_from_qdrant()
- Match trace_id using Qdrant embeddings + face.json bbox
- Compute lip stats (openness, variance, speaking frames)
- Fixed face.json bbox structure (x,y,width,height not bbox object)

Test results:
- 23 gaze_trace nodes from Qdrant
- 23 lip_trace nodes from Qdrant + face.json
- 51 lip_sync edges created
- Charade Q&A: 20 identities, 75 relationship chunks

Docs:
- TKG_PHASE2_NONFACE_MIGRATION_V1.0.md (migration plan)
- 2026-06-21_charade_qa_test.md (Q&A test report)
This commit is contained in:
Accusys
2026-06-21 02:17:08 +08:00
parent 23c440104b
commit c39805bb8e
5 changed files with 802 additions and 16 deletions

View File

@@ -0,0 +1,186 @@
---
title: TKG Phase 2-4 Migration Plan (Non-Face Nodes)
version: 1.0
date: 2026-06-21
author: OpenCode
status: Draft
---
## 概览
Phase 2-3 已完成 face_trace_nodes 的 Qdrant 迁移。其他 node types 需要类似迁移。
## 当前状态
| Node Type | 数据源 | PostgreSQL 依赖 | 迁移状态 |
|-----------|--------|-----------------|----------|
| **face_trace_nodes** | Qdrant embeddings | ❌ 无 | ✅ Phase 2.1 完成 |
| **gaze_trace_nodes** | face.json | ✅ face_detections.trace_id | 🔄 待迁移 |
| **lip_trace_nodes** | face.json + lip.json | ✅ face_detections.trace_id | 🔄 待迁移 |
| **text_trace_nodes** | chunk table | ✅ chunk.sentence | ⏸️ 保持现状 |
| **yolo_object_nodes** | .yolo.json | ❌ 无 | ✅ 无需迁移 |
| **speaker_nodes** | .asrx.json | ❌ 无 | ✅ 无需迁移 |
| **appearance_trace_nodes** | .appearance.json | ❌ 无 | ✅ 无需迁移 |
| **skin_tone_trace_nodes** | .skin.json | ❌ 无 | ✅ 无需迁移 |
| **accessory_nodes** | .accessory.json | ❌ 无 | ✅ 无需迁移 |
## Edge Types 迁移状态
| Edge Type | 数据源 | PostgreSQL 依赖 | 迁移状态 |
|-----------|--------|-----------------|----------|
| **co_occurrence_edges** | face_detections | ✅ face_detections.trace_id | 🔄 待迁移 |
| **face_face_edges** | face_detections | ✅ face_detections.trace_id | 🔄 待迁移 |
| **speaker_face_edges** | face_detections + speaker | ✅ face_detections.trace_id | 🔄 待迁移 |
| **mutual_gaze_edges** | gaze.json | ✅ face_detections.trace_id | 🔄 待迁移 |
| **lip_sync_edges** | lip.json | ✅ face_detections.trace_id | 🔄 待迁移 |
## 迁移计划
### Phase 2.5: Gaze & Lip Nodes
**目标**: 使用 Qdrant payload 替代 face_detections 查询
#### 2.5.1: gaze_trace_nodes
**当前代码** (`src/core/processor/tkg.rs`):
```rust
let frame_rows: Vec<(i64, i64, f64, f64, f64, f64)> = sqlx::query_as(
"SELECT trace_id, frame_number, x, y, width, height
FROM face_detections WHERE file_uuid = $1"
)
```
**迁移方案**:
```rust
// 使用 Qdrant payload (trace_id, frame, bbox_x/y/w/h)
let qdrant_embeddings = face_db.get_all_embeddings_for_file(file_uuid).await?;
// Group by trace_id → compute gaze
```
#### 2.5.2: lip_trace_nodes
**当前代码**:
```rust
// Read lip.json, query face_detections for trace_id
let trace_id = sqlx::query_scalar(
"SELECT trace_id FROM face_detections
WHERE file_uuid = $1 AND frame_number = $2 AND x = $3 ..."
)
```
**迁移方案**:
```rust
// 使用 Qdrant payload 直接关联 trace_id
// face.json 已有 trace_id (Python store_traced_faces.py)
```
### Phase 2.6: Edge Types
#### 2.6.1: co_occurrence_edges
**当前代码**:
```rust
"SELECT trace_id FROM face_detections
WHERE file_uuid = $1 AND frame_number BETWEEN $2 AND $3"
```
**迁移方案**:
```rust
// 使用 Qdrant payload.group_by(trace_id)
// 预计算 frame ranges
```
#### 2.6.2: face_face_edges
**当前代码**:
```rust
"SELECT trace_id, frame_number FROM face_detections
WHERE file_uuid = $1 AND trace_id IS NOT NULL"
```
**迁移方案**:
```rust
// 使用 Qdrant embeddings 的 spatial proximity
// 无需 PostgreSQL
```
#### 2.6.3: speaker_face_edges
**当前代码**:
```rust
// JOIN face_detections.trace_id + speaker_nodes
```
**迁移方案**:
```rust
// Qdrant trace_id + speaker_nodes (already from .asrx.json)
```
### Phase 2.7: Identity Resolution for Edges
**当前代码** (Rule2):
```rust
// 已完成 Phase 2.3: 查询 tkg_nodes.properties.identity_id
```
**扩展**:
- gaze/lip edges 也需要 identity resolution
- 统一使用 `tkg_nodes.properties.identity_id`
## 不迁移的 Node Types
### text_trace_nodes
**原因**:
- chunk table 是必要持久化sentence chunks
- 不依赖 face_detections
- 保持现状,无需迁移
### JSON-based Nodes
**已无 PostgreSQL 依赖**:
- yolo_object_nodes: `.yolo.json`
- speaker_nodes: `.asrx.json`
- appearance_trace_nodes: `.appearance.json`
- skin_tone_trace_nodes: `.skin.json`
- accessory_nodes: `.accessory.json`
## 性能影响预估
| 迁移项 | 当前耗时 | 预估迁移后 | 提升 |
|--------|----------|------------|------|
| gaze_trace_nodes | ~50ms (PG query) | ~15ms (Qdrant) | **3x** |
| lip_trace_nodes | ~80ms (PG + lip.json) | ~20ms (Qdrant + lip.json) | **4x** |
| co_occurrence_edges | ~120ms (PG) | ~30ms (Qdrant) | **4x** |
| face_face_edges | ~90ms (PG) | ~25ms (Qdrant) | **3.6x** |
## 实施优先级
| 优先级 | 任务 | 影响 | 复杂度 |
|--------|------|------|--------|
| P1 | gaze_trace_nodes | 高gaze 分析) | 低 |
| P1 | co_occurrence_edges | 高(关系图) | 中 |
| P2 | lip_trace_nodes | 中lip 分析) | 中 |
| P2 | face_face_edges | 中face 关系) | 中 |
| P3 | speaker_face_edges | 低speaker 关系) | 中 |
## 关键决策
1. **text_trace_nodes**: 保持 chunk table 查询(必要持久化)
2. **JSON nodes**: 无需迁移(已无 PG 依赖)
3. **Qdrant 作为唯一 face 数据源**: trace_id, frame, bbox 全部从 payload 获取
4. **渐进式迁移**: 按优先级分 Phase 2.5, 2.6, 2.7
## 验收标准
- ✅ gaze_trace_nodes: 无 face_detections 查询
- ✅ lip_trace_nodes: 使用 Qdrant trace_id
- ✅ 所有 edges: 使用 Qdrant payload
- ✅ 性能测试: 比原架构快 2x 以上
- ✅ Rule2/Rule3: 正常工作identity resolution
## 参考文档
- `docs_v1.0/M4_workspace/2026-06-21_tkg_phase2_progress.md` (Phase 2-3)
- `src/core/processor/tkg.rs` (当前实现)
- `src/core/db/face_embedding_db.rs` (Qdrant API)