feat: Phase 2.6 edges migration to Qdrant (TKG-only architecture)

Phase 2.6.1: co_occurrence_edges migration
- build_co_occurrence_edges_from_qdrant()
- Qdrant embeddings → frame grouping → YOLO objects
- Result: 6679 edges (vs 6701 PostgreSQL)

Phase 2.6.2: face_face_edges migration
- build_face_face_edges_from_qdrant()
- Qdrant embeddings → frame grouping → face pairs
- mutual_gaze detection preserved
- Result: 6 edges (exact match)

Phase 2.6.3: speaker_face_edges migration
- build_speaker_face_edges_from_qdrant()
- Qdrant embeddings → trace_id frame ranges
- SPEAKS_AS edge creation

Architecture:
- All edges use Qdrant payload (no face_detections queries)
- PostgreSQL fallback for empty Qdrant
- Estimated 3.6x performance improvement

Testing:
- Playground (3003): ✓ All Phase 2.6 logs verified
- Edge counts: ✓ Close match with PostgreSQL
- Fallback: ✓ Working

Docs:
- docs_v1.0/DESIGN/TKG_PHASE2_6_EDGES_MIGRATION.md
- docs_v1.0/M4_workspace/2026-06-21_phase2_6_test.md
This commit is contained in:
Accusys
2026-06-21 04:47:49 +08:00
parent 0afc70fc5b
commit 2cfcfdd1af
2926 changed files with 8311058 additions and 1394 deletions

View File

@@ -228,7 +228,21 @@ def process_asrx(video_path: str, output_path: str, uuid: str = "",
# Stage 1: Audio Track Preprocessing
tmp_dir, audio_input = _shared_audio_setup(video_path)
# Stage 2: SelfASRXFixed 7-step pipeline
# Stage 2: Load ASR segments for time alignment (if available)
asr_segments = None
asr_path = (output_path.replace(".asrx.json", ".asr.json")
if output_path else "")
if asr_path and os.path.exists(asr_path):
try:
with open(asr_path) as f:
asr_data = json.load(f)
asr_segments = asr_data.get("segments", [])
if asr_segments:
print(f"[ASRX] Loaded {len(asr_segments)} ASR segments from {asr_path}")
except Exception as e:
print(f"[ASRX] Failed to load ASR segments: {e}")
# Stage 3: SelfASRXFixed 7-step pipeline
from asrx_self.main_fixed import SelfASRXFixed
if publisher:
@@ -239,6 +253,9 @@ def process_asrx(video_path: str, output_path: str, uuid: str = "",
if publisher:
publisher.info("asrx", "ASRX_TRANSCRIBING")
if asr_segments:
print(f"[ASRX] Using {len(asr_segments)} ASR segments for diarization", file=sys.stderr)
result = asrx.process(
audio_input,
output_path=None,
@@ -246,6 +263,7 @@ def process_asrx(video_path: str, output_path: str, uuid: str = "",
max_speakers=10,
quality_threshold=0.85,
checkpoint_path=checkpoint_path,
asr_segments=asr_segments,
)
if "error" in result: