Files
momentry_core/docs/PHASE1_COMPLETION_REPORT.md

3.5 KiB

Phase 1 Completion Report — v1 (base model)

File: Charade (1963) Cary Grant & Audrey Hepburn UUID: aeed71342a899fe4b4c57b7d41bcb692 Date: 2026-05-09 System: M5 (MacBook Pro, 48GB, Apple Silicon)


1. Processor Outputs

File Size Description
asr.json 413KB 3,417 segments, full movie coverage
asrx.json 307KB 1,815 segments, 10 speakers
cut.json 329KB 2,260 scenes
yolo.json 181MB 169,625 frames with object detections
face.json 106MB 4,550 frames, 5,910 faces @ 8Hz (CoreML 512D)
face_traced.json 110MB Traced faces with identity
lip.json 492KB Lip openness analysis
ocr.json 277KB 606 OCR frames
pose.json 26MB 4,211 pose frames
scene.json 403B Scene classification

2. Pipeline 8-Stage Checklist

Stage Status Detail
ASR 3,417 segments, last end 6,773s (100%)
ASRX 1,815 segments, 10 speakers
Sentence Chunks 3,417 sentence chunks with text
Vectorization 3,417 PG + Qdrant (768D)
Face Trace 423 traces, 11,820 detections @ 8Hz
TKG Graph 498 nodes, 1,617 edges
Trace Chunks 423 trace chunks with ASR text
Phase 1 Release 483MB package

3. Identity & Knowledge Graph

TMDb Character Matching (9 characters)

Character Traces Actor
Audrey Hepburn 843 Regina Lampert
Cary Grant 482 Peter Joshua
Jacques Marin 348 Inspector Grandpierre
James Coburn 188 Tex Panthollow
Ned Glass 176 Leopold W. Gideon
George Kennedy 104 Herman Scobie
Walter Matthau 104 Hamilton Bartholomew
Dominique Minot 45 Sylvie Gaudel
Raoul Delfosse 32

Speaker Bindings (via Lip Verification)

Speaker Identity Confidence
SPEAKER_2 Audrey Hepburn 61%
SPEAKER_4 Cary Grant 56%
SPEAKER_5 Audrey Hepburn 100%
SPEAKER_6 Audrey Hepburn 43%
SPEAKER_7 Cary Grant 100%
SPEAKER_8 Audrey Hepburn 54%

TKG Graph

Node Type Count
Face traces 423
Objects 75
Total nodes 498
Total edges 1,617

Qdrant Vector Collections

Collection Dims Points Content
momentry_dev_v1 768 3,417 Sentence chunk embeddings
momentry_dev_faces 512 5,910 Face embeddings (8Hz CoreML)

4. Release Package

Component Size
output_json/ 11 processor files
chunks.csv 2.2MB
vectors.csv 56MB
identities.csv 973KB
schema.sql 29KB
RELEASE_INFO.txt Metadata
Total 483MB

Location: release/phase1/v1.0.0_20260509_101337/

5. Key Technical Decisions

Decision Rationale
Face 8Hz (interval=3) 5-15Hz human lip motion needs ≥8Hz sampling
Two-stage face processor Apple Vision ANE (fast) + CoreML FaceNet (512D)
VNFaceprint not used KVC returns nil in video pipeline
Face Qdrant separate collection Face 512D vs chunk 768D — different dimensions
LLM reasoning off --reasoning off needed for non-empty content

6. Phase 2 Preparation

Pending for Phase 2:

  • Rule 3 scene chunking (cut-based parent chunks)
  • 5W1H Agent (LLM-generated scene summaries)
  • Full pipeline + 5W1H release packaging
  • Lip analysis extended to full movie speaker binding