Phase 1 Completion Report — v1 (base model)
File: Charade (1963) Cary Grant & Audrey Hepburn
UUID: aeed71342a899fe4b4c57b7d41bcb692
Date: 2026-05-09
System: M5 (MacBook Pro, 48GB, Apple Silicon)
1. Processor Outputs
| File |
Size |
Description |
asr.json |
413KB |
3,417 segments, full movie coverage |
asrx.json |
307KB |
1,815 segments, 10 speakers |
cut.json |
329KB |
2,260 scenes |
yolo.json |
181MB |
169,625 frames with object detections |
face.json |
106MB |
4,550 frames, 5,910 faces @ 8Hz (CoreML 512D) |
face_traced.json |
110MB |
Traced faces with identity |
lip.json |
492KB |
Lip openness analysis |
ocr.json |
277KB |
606 OCR frames |
pose.json |
26MB |
4,211 pose frames |
scene.json |
403B |
Scene classification |
2. Pipeline 8-Stage Checklist
| Stage |
Status |
Detail |
| ASR |
✅ |
3,417 segments, last end 6,773s (100%) |
| ASRX |
✅ |
1,815 segments, 10 speakers |
| Sentence Chunks |
✅ |
3,417 sentence chunks with text |
| Vectorization |
✅ |
3,417 PG + Qdrant (768D) |
| Face Trace |
✅ |
423 traces, 11,820 detections @ 8Hz |
| TKG Graph |
✅ |
498 nodes, 1,617 edges |
| Trace Chunks |
✅ |
423 trace chunks with ASR text |
| Phase 1 Release |
✅ |
483MB package |
3. Identity & Knowledge Graph
TMDb Character Matching (9 characters)
| Character |
Traces |
Actor |
| Audrey Hepburn |
843 |
Regina Lampert |
| Cary Grant |
482 |
Peter Joshua |
| Jacques Marin |
348 |
Inspector Grandpierre |
| James Coburn |
188 |
Tex Panthollow |
| Ned Glass |
176 |
Leopold W. Gideon |
| George Kennedy |
104 |
Herman Scobie |
| Walter Matthau |
104 |
Hamilton Bartholomew |
| Dominique Minot |
45 |
Sylvie Gaudel |
| Raoul Delfosse |
32 |
— |
Speaker Bindings (via Lip Verification)
| Speaker |
Identity |
Confidence |
| SPEAKER_2 |
Audrey Hepburn |
61% |
| SPEAKER_4 |
Cary Grant |
56% |
| SPEAKER_5 |
Audrey Hepburn |
100% |
| SPEAKER_6 |
Audrey Hepburn |
43% |
| SPEAKER_7 |
Cary Grant |
100% |
| SPEAKER_8 |
Audrey Hepburn |
54% |
TKG Graph
| Node Type |
Count |
| Face traces |
423 |
| Objects |
75 |
| Total nodes |
498 |
| Total edges |
1,617 |
Qdrant Vector Collections
| Collection |
Dims |
Points |
Content |
momentry_dev_v1 |
768 |
3,417 |
Sentence chunk embeddings |
momentry_dev_faces |
512 |
5,910 |
Face embeddings (8Hz CoreML) |
4. Release Package
| Component |
Size |
output_json/ |
11 processor files |
chunks.csv |
2.2MB |
vectors.csv |
56MB |
identities.csv |
973KB |
schema.sql |
29KB |
RELEASE_INFO.txt |
Metadata |
| Total |
483MB |
Location: release/phase1/v1.0.0_20260509_101337/
5. Key Technical Decisions
| Decision |
Rationale |
| Face 8Hz (interval=3) |
5-15Hz human lip motion needs ≥8Hz sampling |
| Two-stage face processor |
Apple Vision ANE (fast) + CoreML FaceNet (512D) |
| VNFaceprint not used |
KVC returns nil in video pipeline |
| Face Qdrant separate collection |
Face 512D vs chunk 768D — different dimensions |
| LLM reasoning off |
--reasoning off needed for non-empty content |
6. Phase 2 Preparation
Pending for Phase 2:
- Rule 3 scene chunking (cut-based parent chunks)
- 5W1H Agent (LLM-generated scene summaries)
- Full pipeline + 5W1H release packaging
- Lip analysis extended to full movie speaker binding