Files

Accusys ef894a44ad docs: update Phase 1 report with all Qdrant collections + voice embeddings fix

- Fixed asrx_processor_custom.py: embeddings now passed to asrx.json
- Voice embeddings (192D ECAPA-TDNN) extracted for all 1815 ASRX segments
- momentry_dev_voice Qdrant collection created (1815 vectors)
- Updated Phase 1 report with 6 collections, key decisions

2026-05-10 01:11:42 +08:00

4.2 KiB

Raw Blame History

Phase 1 Completion Report — v1 (base model)

File: Charade (1963) Cary Grant & Audrey Hepburn UUID: aeed71342a899fe4b4c57b7d41bcb692 Date: 2026-05-09 System: M5 (MacBook Pro, 48GB, Apple Silicon)

1. Processor Outputs

File	Size	Description
`asr.json`	413KB	3,417 segments, full movie coverage
`asrx.json`	307KB	1,815 segments, 10 speakers
`cut.json`	329KB	2,260 scenes
`yolo.json`	181MB	169,625 frames with object detections
`face.json`	106MB	4,550 frames, 5,910 faces @ 8Hz (CoreML 512D)
`face_traced.json`	110MB	Traced faces with identity
`lip.json`	492KB	Lip openness analysis
`ocr.json`	277KB	606 OCR frames
`pose.json`	26MB	4,211 pose frames
`scene.json`	403B	Scene classification

2. Pipeline 8-Stage Checklist

Stage	Status	Detail
ASR	✅	3,417 segments, last end 6,773s (100%)
ASRX	✅	1,815 segments, 10 speakers
Sentence Chunks	✅	3,417 sentence chunks with text
Vectorization	✅	3,417 PG + Qdrant (768D)
Face Trace	✅	423 traces, 11,820 detections @ 8Hz
TKG Graph	✅	498 nodes, 1,617 edges
Trace Chunks	✅	423 trace chunks with ASR text
Phase 1 Release	✅	483MB package

3. Identity & Knowledge Graph

TMDb Character Matching (9 characters)

Character	Traces	Actor
Audrey Hepburn	843	Regina Lampert
Cary Grant	482	Peter Joshua
Jacques Marin	348	Inspector Grandpierre
James Coburn	188	Tex Panthollow
Ned Glass	176	Leopold W. Gideon
George Kennedy	104	Herman Scobie
Walter Matthau	104	Hamilton Bartholomew
Dominique Minot	45	Sylvie Gaudel
Raoul Delfosse	32	—

Speaker Bindings (via Lip Verification)

Speaker	Identity	Confidence
SPEAKER_2	Audrey Hepburn	61%
SPEAKER_4	Cary Grant	56%
SPEAKER_5	Audrey Hepburn	100%
SPEAKER_6	Audrey Hepburn	43%
SPEAKER_7	Cary Grant	100%
SPEAKER_8	Audrey Hepburn	54%

TKG Graph

Node Type	Count
Face traces	423
Objects	75
Total nodes	498
Total edges	1,617

Qdrant Vector Collections

Collection	Dims	Points	Content	Status
`momentry_dev_v1`	768	3,417	Sentence chunk embeddings (待重embed含speaker)	⏳
`momentry_dev_stories`	768	456	Story dialogue + LLM summary	✅
`momentry_dev_faces`	512	5,910	Face embeddings (8Hz CoreML)	✅
`momentry_dev_voice`	192	1,815	Voice embeddings (ECAPA-TDNN)	✅
`story_sentence`	768	0	Story processor template (待建立)	⏳
`sentence_summary`	768	0	LLM 50字摘要 (待建立)	⏳

4. Release Package

Component	Size
`output_json/`	11 processor files
`chunks.csv`	2.2MB
`vectors.csv`	56MB
`identities.csv`	973KB
`schema.sql`	29KB
`RELEASE_INFO.txt`	Metadata
Total	483MB

Location: release/phase1/v1.0.0_20260509_101337/

5. Key Technical Decisions

Decision	Rationale
Face 8Hz (interval=3)	5-15Hz human lip motion needs ≥8Hz sampling
Two-stage face processor	Apple Vision ANE (fast) + CoreML FaceNet (512D)
VNFaceprint not used	KVC returns nil in video pipeline
Face Qdrant separate collection	Face 512D vs chunk 768D — different dimensions
LLM reasoning off	`--reasoning off` needed for non-empty content
Voice embedding (ECAPA-TDNN)	SFSpeechAnalyzer 無暴露 speaker embedding (Apple 未開放 API)
ASRX embeddings bug	`asrx_processor_custom.py` 遺漏傳遞 embeddings → 已修復
Speaker 匹配方式	ASR × ASRX 時間重疊 (any overlap)，99% 配對率
Story chunk 分組	固定 15 ASR segments，228 parent chunks

6. Phase 2 Preparation

Pending for Phase 2:

Rule 3 scene chunking (cut-based parent chunks)
5W1H Agent (LLM-generated scene summaries)
Full pipeline + 5W1H release packaging
Lip analysis extended to full movie speaker binding

4.2 KiB Raw Blame History Unescape Escape