admin/momentry_core

Files

Accusys d043b6adae docs: Phase 1 completion report + LLM reasoning off fix

2026-05-09 22:03:34 +08:00

3.5 KiB

Raw Blame History

Phase 1 Completion Report — v1 (base model)

File: Charade (1963) Cary Grant & Audrey Hepburn UUID: aeed71342a899fe4b4c57b7d41bcb692 Date: 2026-05-09 System: M5 (MacBook Pro, 48GB, Apple Silicon)

1. Processor Outputs

File	Size	Description
`asr.json`	413KB	3,417 segments, full movie coverage
`asrx.json`	307KB	1,815 segments, 10 speakers
`cut.json`	329KB	2,260 scenes
`yolo.json`	181MB	169,625 frames with object detections
`face.json`	106MB	4,550 frames, 5,910 faces @ 8Hz (CoreML 512D)
`face_traced.json`	110MB	Traced faces with identity
`lip.json`	492KB	Lip openness analysis
`ocr.json`	277KB	606 OCR frames
`pose.json`	26MB	4,211 pose frames
`scene.json`	403B	Scene classification

2. Pipeline 8-Stage Checklist

Stage	Status	Detail
ASR	✅	3,417 segments, last end 6,773s (100%)
ASRX	✅	1,815 segments, 10 speakers
Sentence Chunks	✅	3,417 sentence chunks with text
Vectorization	✅	3,417 PG + Qdrant (768D)
Face Trace	✅	423 traces, 11,820 detections @ 8Hz
TKG Graph	✅	498 nodes, 1,617 edges
Trace Chunks	✅	423 trace chunks with ASR text
Phase 1 Release	✅	483MB package

3. Identity & Knowledge Graph

TMDb Character Matching (9 characters)

Character	Traces	Actor
Audrey Hepburn	843	Regina Lampert
Cary Grant	482	Peter Joshua
Jacques Marin	348	Inspector Grandpierre
James Coburn	188	Tex Panthollow
Ned Glass	176	Leopold W. Gideon
George Kennedy	104	Herman Scobie
Walter Matthau	104	Hamilton Bartholomew
Dominique Minot	45	Sylvie Gaudel
Raoul Delfosse	32	—

Speaker Bindings (via Lip Verification)

Speaker	Identity	Confidence
SPEAKER_2	Audrey Hepburn	61%
SPEAKER_4	Cary Grant	56%
SPEAKER_5	Audrey Hepburn	100%
SPEAKER_6	Audrey Hepburn	43%
SPEAKER_7	Cary Grant	100%
SPEAKER_8	Audrey Hepburn	54%

TKG Graph

Node Type	Count
Face traces	423
Objects	75
Total nodes	498
Total edges	1,617

Qdrant Vector Collections

Collection	Dims	Points	Content
`momentry_dev_v1`	768	3,417	Sentence chunk embeddings
`momentry_dev_faces`	512	5,910	Face embeddings (8Hz CoreML)

4. Release Package

Component	Size
`output_json/`	11 processor files
`chunks.csv`	2.2MB
`vectors.csv`	56MB
`identities.csv`	973KB
`schema.sql`	29KB
`RELEASE_INFO.txt`	Metadata
Total	483MB

Location: release/phase1/v1.0.0_20260509_101337/

5. Key Technical Decisions

Decision	Rationale
Face 8Hz (interval=3)	5-15Hz human lip motion needs ≥8Hz sampling
Two-stage face processor	Apple Vision ANE (fast) + CoreML FaceNet (512D)
VNFaceprint not used	KVC returns nil in video pipeline
Face Qdrant separate collection	Face 512D vs chunk 768D — different dimensions
LLM reasoning off	`--reasoning off` needed for non-empty content

6. Phase 2 Preparation

Pending for Phase 2:

Rule 3 scene chunking (cut-based parent chunks)
5W1H Agent (LLM-generated scene summaries)
Full pipeline + 5W1H release packaging
Lip analysis extended to full movie speaker binding