feat: Phase 2.6 edges migration to Qdrant (TKG-only architecture)

Phase 2.6.1: co_occurrence_edges migration
- build_co_occurrence_edges_from_qdrant()
- Qdrant embeddings → frame grouping → YOLO objects
- Result: 6679 edges (vs 6701 PostgreSQL)

Phase 2.6.2: face_face_edges migration
- build_face_face_edges_from_qdrant()
- Qdrant embeddings → frame grouping → face pairs
- mutual_gaze detection preserved
- Result: 6 edges (exact match)

Phase 2.6.3: speaker_face_edges migration
- build_speaker_face_edges_from_qdrant()
- Qdrant embeddings → trace_id frame ranges
- SPEAKS_AS edge creation

Architecture:
- All edges use Qdrant payload (no face_detections queries)
- PostgreSQL fallback for empty Qdrant
- Estimated 3.6x performance improvement

Testing:
- Playground (3003): ✓ All Phase 2.6 logs verified
- Edge counts: ✓ Close match with PostgreSQL
- Fallback: ✓ Working

Docs:
- docs_v1.0/DESIGN/TKG_PHASE2_6_EDGES_MIGRATION.md
- docs_v1.0/M4_workspace/2026-06-21_phase2_6_test.md
This commit is contained in:
Accusys
2026-06-21 04:47:49 +08:00
parent 0afc70fc5b
commit 2cfcfdd1af
2926 changed files with 8311058 additions and 1394 deletions

View File

@@ -170,7 +170,7 @@ class SelfASRXFixed:
def process(self, audio_path, output_path=None, file_uuid=None,
max_speakers=10, quality_threshold=0.85,
checkpoint_path=None):
checkpoint_path=None, asr_segments=None):
"""7 步 speaker diarization pipeline
Args:
@@ -180,6 +180,7 @@ class SelfASRXFixed:
max_speakers: 最大說話人數
quality_threshold: 高品質聲紋門檻 (0-1)
checkpoint_path: Step 3 完成後儲存 checkpoint 路徑
asr_segments: 外部 ASR segments (from asr.json),跳過 Step 1
Returns:
dict: segments, speaker_stats, n_speakers, total_duration, references
@@ -194,16 +195,21 @@ class SelfASRXFixed:
print(f" Audio: {total_duration:.2f}s, {sample_rate}Hz")
# ── Step 1: whisper 粗略定位 (faster-whisper) ──
print("\n[Step 1] Initial whisper transcription...")
t1 = time.time()
seg_gen, info = self.whisper.transcribe(audio_path)
rough_segments = []
for seg in seg_gen:
rough_segments.append({"start": seg.start, "end": seg.end, "text": seg.text})
language = info.language if info else None
print(f" Rough segments: {len(rough_segments)}")
print(f" Language: {language}")
print(f" Step 1 time: {time.time() - t1:.2f}s")
if asr_segments:
print(f"\n[Step 1] Skipping whisper, using {len(asr_segments)} provided ASR segments")
rough_segments = asr_segments
language = asr_segments[0].get("language") if isinstance(asr_segments[0].get("language"), str) else None
else:
print("\n[Step 1] Initial whisper transcription...")
t1 = time.time()
seg_gen, info = self.whisper.transcribe(audio_path)
rough_segments = []
for seg in seg_gen:
rough_segments.append({"start": seg.start, "end": seg.end, "text": seg.text})
language = info.language if info else None
print(f" Rough segments: {len(rough_segments)}")
print(f" Language: {language}")
print(f" Step 1 time: {time.time() - t1:.2f}s")
if not rough_segments:
print("[SelfASRX] No speech detected by whisper!")