- Fixed asrx_processor_custom.py: embeddings now passed to asrx.json - Voice embeddings (192D ECAPA-TDNN) extracted for all 1815 ASRX segments - momentry_dev_voice Qdrant collection created (1815 vectors) - Updated Phase 1 report with 6 collections, key decisions
4.2 KiB
4.2 KiB
Phase 1 Completion Report — v1 (base model)
File: Charade (1963) Cary Grant & Audrey Hepburn
UUID: aeed71342a899fe4b4c57b7d41bcb692
Date: 2026-05-09
System: M5 (MacBook Pro, 48GB, Apple Silicon)
1. Processor Outputs
| File | Size | Description |
|---|---|---|
asr.json |
413KB | 3,417 segments, full movie coverage |
asrx.json |
307KB | 1,815 segments, 10 speakers |
cut.json |
329KB | 2,260 scenes |
yolo.json |
181MB | 169,625 frames with object detections |
face.json |
106MB | 4,550 frames, 5,910 faces @ 8Hz (CoreML 512D) |
face_traced.json |
110MB | Traced faces with identity |
lip.json |
492KB | Lip openness analysis |
ocr.json |
277KB | 606 OCR frames |
pose.json |
26MB | 4,211 pose frames |
scene.json |
403B | Scene classification |
2. Pipeline 8-Stage Checklist
| Stage | Status | Detail |
|---|---|---|
| ASR | ✅ | 3,417 segments, last end 6,773s (100%) |
| ASRX | ✅ | 1,815 segments, 10 speakers |
| Sentence Chunks | ✅ | 3,417 sentence chunks with text |
| Vectorization | ✅ | 3,417 PG + Qdrant (768D) |
| Face Trace | ✅ | 423 traces, 11,820 detections @ 8Hz |
| TKG Graph | ✅ | 498 nodes, 1,617 edges |
| Trace Chunks | ✅ | 423 trace chunks with ASR text |
| Phase 1 Release | ✅ | 483MB package |
3. Identity & Knowledge Graph
TMDb Character Matching (9 characters)
| Character | Traces | Actor |
|---|---|---|
| Audrey Hepburn | 843 | Regina Lampert |
| Cary Grant | 482 | Peter Joshua |
| Jacques Marin | 348 | Inspector Grandpierre |
| James Coburn | 188 | Tex Panthollow |
| Ned Glass | 176 | Leopold W. Gideon |
| George Kennedy | 104 | Herman Scobie |
| Walter Matthau | 104 | Hamilton Bartholomew |
| Dominique Minot | 45 | Sylvie Gaudel |
| Raoul Delfosse | 32 | — |
Speaker Bindings (via Lip Verification)
| Speaker | Identity | Confidence |
|---|---|---|
| SPEAKER_2 | Audrey Hepburn | 61% |
| SPEAKER_4 | Cary Grant | 56% |
| SPEAKER_5 | Audrey Hepburn | 100% |
| SPEAKER_6 | Audrey Hepburn | 43% |
| SPEAKER_7 | Cary Grant | 100% |
| SPEAKER_8 | Audrey Hepburn | 54% |
TKG Graph
| Node Type | Count |
|---|---|
| Face traces | 423 |
| Objects | 75 |
| Total nodes | 498 |
| Total edges | 1,617 |
Qdrant Vector Collections
| Collection | Dims | Points | Content | Status |
|---|---|---|---|---|
momentry_dev_v1 |
768 | 3,417 | Sentence chunk embeddings (待重embed含speaker) | ⏳ |
momentry_dev_stories |
768 | 456 | Story dialogue + LLM summary | ✅ |
momentry_dev_faces |
512 | 5,910 | Face embeddings (8Hz CoreML) | ✅ |
momentry_dev_voice |
192 | 1,815 | Voice embeddings (ECAPA-TDNN) | ✅ |
story_sentence |
768 | 0 | Story processor template (待建立) | ⏳ |
sentence_summary |
768 | 0 | LLM 50字摘要 (待建立) | ⏳ |
4. Release Package
| Component | Size |
|---|---|
output_json/ |
11 processor files |
chunks.csv |
2.2MB |
vectors.csv |
56MB |
identities.csv |
973KB |
schema.sql |
29KB |
RELEASE_INFO.txt |
Metadata |
| Total | 483MB |
Location: release/phase1/v1.0.0_20260509_101337/
5. Key Technical Decisions
| Decision | Rationale |
|---|---|
| Face 8Hz (interval=3) | 5-15Hz human lip motion needs ≥8Hz sampling |
| Two-stage face processor | Apple Vision ANE (fast) + CoreML FaceNet (512D) |
| VNFaceprint not used | KVC returns nil in video pipeline |
| Face Qdrant separate collection | Face 512D vs chunk 768D — different dimensions |
| LLM reasoning off | --reasoning off needed for non-empty content |
| Voice embedding (ECAPA-TDNN) | SFSpeechAnalyzer 無暴露 speaker embedding (Apple 未開放 API) |
| ASRX embeddings bug | asrx_processor_custom.py 遺漏傳遞 embeddings → 已修復 |
| Speaker 匹配方式 | ASR × ASRX 時間重疊 (any overlap),99% 配對率 |
| Story chunk 分組 | 固定 15 ASR segments,228 parent chunks |
6. Phase 2 Preparation
Pending for Phase 2:
- Rule 3 scene chunking (cut-based parent chunks)
- 5W1H Agent (LLM-generated scene summaries)
- Full pipeline + 5W1H release packaging
- Lip analysis extended to full movie speaker binding