From d043b6adae8014e32ab03faf7269ff5e1e2ca03a Mon Sep 17 00:00:00 2001 From: Accusys Date: Sat, 9 May 2026 22:03:34 +0800 Subject: [PATCH] docs: Phase 1 completion report + LLM reasoning off fix --- docs/PHASE1_COMPLETION_REPORT.md | 111 +++++++++++++++++++++++++++++++ scripts/start_momentry.sh | 2 +- 2 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 docs/PHASE1_COMPLETION_REPORT.md diff --git a/docs/PHASE1_COMPLETION_REPORT.md b/docs/PHASE1_COMPLETION_REPORT.md new file mode 100644 index 0000000..35c2455 --- /dev/null +++ b/docs/PHASE1_COMPLETION_REPORT.md @@ -0,0 +1,111 @@ +# Phase 1 Completion Report — v1 (base model) + +**File**: Charade (1963) Cary Grant & Audrey Hepburn +**UUID**: `aeed71342a899fe4b4c57b7d41bcb692` +**Date**: 2026-05-09 +**System**: M5 (MacBook Pro, 48GB, Apple Silicon) + +--- + +## 1. Processor Outputs + +| File | Size | Description | +|------|------|-------------| +| `asr.json` | 413KB | 3,417 segments, full movie coverage | +| `asrx.json` | 307KB | 1,815 segments, 10 speakers | +| `cut.json` | 329KB | 2,260 scenes | +| `yolo.json` | 181MB | 169,625 frames with object detections | +| `face.json` | **106MB** | 4,550 frames, 5,910 faces @ 8Hz (CoreML 512D) | +| `face_traced.json` | 110MB | Traced faces with identity | +| `lip.json` | 492KB | Lip openness analysis | +| `ocr.json` | 277KB | 606 OCR frames | +| `pose.json` | 26MB | 4,211 pose frames | +| `scene.json` | 403B | Scene classification | + +## 2. Pipeline 8-Stage Checklist + +| Stage | Status | Detail | +|-------|--------|--------| +| ASR | ✅ | 3,417 segments, last end 6,773s (100%) | +| ASRX | ✅ | 1,815 segments, 10 speakers | +| Sentence Chunks | ✅ | 3,417 sentence chunks with text | +| Vectorization | ✅ | 3,417 PG + Qdrant (768D) | +| Face Trace | ✅ | 423 traces, 11,820 detections @ 8Hz | +| TKG Graph | ✅ | 498 nodes, 1,617 edges | +| Trace Chunks | ✅ | 423 trace chunks with ASR text | +| Phase 1 Release | ✅ | 483MB package | + +## 3. Identity & Knowledge Graph + +### TMDb Character Matching (9 characters) + +| Character | Traces | Actor | +|-----------|--------|-------| +| Audrey Hepburn | 843 | Regina Lampert | +| Cary Grant | 482 | Peter Joshua | +| Jacques Marin | 348 | Inspector Grandpierre | +| James Coburn | 188 | Tex Panthollow | +| Ned Glass | 176 | Leopold W. Gideon | +| George Kennedy | 104 | Herman Scobie | +| Walter Matthau | 104 | Hamilton Bartholomew | +| Dominique Minot | 45 | Sylvie Gaudel | +| Raoul Delfosse | 32 | — | + +### Speaker Bindings (via Lip Verification) + +| Speaker | Identity | Confidence | +|---------|----------|------------| +| SPEAKER_2 | Audrey Hepburn | 61% | +| SPEAKER_4 | Cary Grant | 56% | +| SPEAKER_5 | Audrey Hepburn | 100% | +| SPEAKER_6 | Audrey Hepburn | 43% | +| SPEAKER_7 | Cary Grant | 100% | +| SPEAKER_8 | Audrey Hepburn | 54% | + +### TKG Graph + +| Node Type | Count | +|-----------|-------| +| Face traces | 423 | +| Objects | 75 | +| Total nodes | 498 | +| Total edges | 1,617 | + +### Qdrant Vector Collections + +| Collection | Dims | Points | Content | +|-----------|------|--------|---------| +| `momentry_dev_v1` | 768 | 3,417 | Sentence chunk embeddings | +| `momentry_dev_faces` | 512 | **5,910** | Face embeddings (8Hz CoreML) | + +## 4. Release Package + +| Component | Size | +|-----------|------| +| `output_json/` | 11 processor files | +| `chunks.csv` | 2.2MB | +| `vectors.csv` | 56MB | +| `identities.csv` | 973KB | +| `schema.sql` | 29KB | +| `RELEASE_INFO.txt` | Metadata | +| **Total** | **483MB** | + +Location: `release/phase1/v1.0.0_20260509_101337/` + +## 5. Key Technical Decisions + +| Decision | Rationale | +|----------|-----------| +| Face 8Hz (interval=3) | 5-15Hz human lip motion needs ≥8Hz sampling | +| Two-stage face processor | Apple Vision ANE (fast) + CoreML FaceNet (512D) | +| VNFaceprint not used | KVC returns nil in video pipeline | +| Face Qdrant separate collection | Face 512D vs chunk 768D — different dimensions | +| LLM reasoning off | `--reasoning off` needed for non-empty content | + +## 6. Phase 2 Preparation + +Pending for Phase 2: +- Rule 3 scene chunking (cut-based parent chunks) +- 5W1H Agent (LLM-generated scene summaries) +- Full pipeline + 5W1H release packaging +- Lip analysis extended to full movie speaker binding diff --git a/scripts/start_momentry.sh b/scripts/start_momentry.sh index 7d4587f..d20f85d 100755 --- a/scripts/start_momentry.sh +++ b/scripts/start_momentry.sh @@ -85,7 +85,7 @@ if curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 http://localhost:8 else LLM_BIN="/Users/accusys/llama/bin/llama-server" LLM_MODEL="/Users/accusys/models/google_gemma-4-26B-A4B-it-Q5_K_M.gguf" - nohup "$LLM_BIN" -m "$LLM_MODEL" --host 0.0.0.0 --port 8082 -ngl 99 -c 16384 --temp 0.1 --mlock > "$LOG_DIR/llama_server.log" 2>&1 & + nohup "$LLM_BIN" -m "$LLM_MODEL" --host 0.0.0.0 --port 8082 -ngl 99 -c 16384 --temp 0.1 --mlock --reasoning off > "$LOG_DIR/llama_server.log" 2>&1 & echo -e " ${YELLOW}⏳ loading model (~30s)...${NC}" for i in $(seq 1 30); do sleep 2