docs: Phase 1 completion report + LLM reasoning off fix
This commit is contained in:
111
docs/PHASE1_COMPLETION_REPORT.md
Normal file
111
docs/PHASE1_COMPLETION_REPORT.md
Normal file
@@ -0,0 +1,111 @@
|
|||||||
|
# Phase 1 Completion Report — v1 (base model)
|
||||||
|
|
||||||
|
**File**: Charade (1963) Cary Grant & Audrey Hepburn
|
||||||
|
**UUID**: `aeed71342a899fe4b4c57b7d41bcb692`
|
||||||
|
**Date**: 2026-05-09
|
||||||
|
**System**: M5 (MacBook Pro, 48GB, Apple Silicon)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Processor Outputs
|
||||||
|
|
||||||
|
| File | Size | Description |
|
||||||
|
|------|------|-------------|
|
||||||
|
| `asr.json` | 413KB | 3,417 segments, full movie coverage |
|
||||||
|
| `asrx.json` | 307KB | 1,815 segments, 10 speakers |
|
||||||
|
| `cut.json` | 329KB | 2,260 scenes |
|
||||||
|
| `yolo.json` | 181MB | 169,625 frames with object detections |
|
||||||
|
| `face.json` | **106MB** | 4,550 frames, 5,910 faces @ 8Hz (CoreML 512D) |
|
||||||
|
| `face_traced.json` | 110MB | Traced faces with identity |
|
||||||
|
| `lip.json` | 492KB | Lip openness analysis |
|
||||||
|
| `ocr.json` | 277KB | 606 OCR frames |
|
||||||
|
| `pose.json` | 26MB | 4,211 pose frames |
|
||||||
|
| `scene.json` | 403B | Scene classification |
|
||||||
|
|
||||||
|
## 2. Pipeline 8-Stage Checklist
|
||||||
|
|
||||||
|
| Stage | Status | Detail |
|
||||||
|
|-------|--------|--------|
|
||||||
|
| ASR | ✅ | 3,417 segments, last end 6,773s (100%) |
|
||||||
|
| ASRX | ✅ | 1,815 segments, 10 speakers |
|
||||||
|
| Sentence Chunks | ✅ | 3,417 sentence chunks with text |
|
||||||
|
| Vectorization | ✅ | 3,417 PG + Qdrant (768D) |
|
||||||
|
| Face Trace | ✅ | 423 traces, 11,820 detections @ 8Hz |
|
||||||
|
| TKG Graph | ✅ | 498 nodes, 1,617 edges |
|
||||||
|
| Trace Chunks | ✅ | 423 trace chunks with ASR text |
|
||||||
|
| Phase 1 Release | ✅ | 483MB package |
|
||||||
|
|
||||||
|
## 3. Identity & Knowledge Graph
|
||||||
|
|
||||||
|
### TMDb Character Matching (9 characters)
|
||||||
|
|
||||||
|
| Character | Traces | Actor |
|
||||||
|
|-----------|--------|-------|
|
||||||
|
| Audrey Hepburn | 843 | Regina Lampert |
|
||||||
|
| Cary Grant | 482 | Peter Joshua |
|
||||||
|
| Jacques Marin | 348 | Inspector Grandpierre |
|
||||||
|
| James Coburn | 188 | Tex Panthollow |
|
||||||
|
| Ned Glass | 176 | Leopold W. Gideon |
|
||||||
|
| George Kennedy | 104 | Herman Scobie |
|
||||||
|
| Walter Matthau | 104 | Hamilton Bartholomew |
|
||||||
|
| Dominique Minot | 45 | Sylvie Gaudel |
|
||||||
|
| Raoul Delfosse | 32 | — |
|
||||||
|
|
||||||
|
### Speaker Bindings (via Lip Verification)
|
||||||
|
|
||||||
|
| Speaker | Identity | Confidence |
|
||||||
|
|---------|----------|------------|
|
||||||
|
| SPEAKER_2 | Audrey Hepburn | 61% |
|
||||||
|
| SPEAKER_4 | Cary Grant | 56% |
|
||||||
|
| SPEAKER_5 | Audrey Hepburn | 100% |
|
||||||
|
| SPEAKER_6 | Audrey Hepburn | 43% |
|
||||||
|
| SPEAKER_7 | Cary Grant | 100% |
|
||||||
|
| SPEAKER_8 | Audrey Hepburn | 54% |
|
||||||
|
|
||||||
|
### TKG Graph
|
||||||
|
|
||||||
|
| Node Type | Count |
|
||||||
|
|-----------|-------|
|
||||||
|
| Face traces | 423 |
|
||||||
|
| Objects | 75 |
|
||||||
|
| Total nodes | 498 |
|
||||||
|
| Total edges | 1,617 |
|
||||||
|
|
||||||
|
### Qdrant Vector Collections
|
||||||
|
|
||||||
|
| Collection | Dims | Points | Content |
|
||||||
|
|-----------|------|--------|---------|
|
||||||
|
| `momentry_dev_v1` | 768 | 3,417 | Sentence chunk embeddings |
|
||||||
|
| `momentry_dev_faces` | 512 | **5,910** | Face embeddings (8Hz CoreML) |
|
||||||
|
|
||||||
|
## 4. Release Package
|
||||||
|
|
||||||
|
| Component | Size |
|
||||||
|
|-----------|------|
|
||||||
|
| `output_json/` | 11 processor files |
|
||||||
|
| `chunks.csv` | 2.2MB |
|
||||||
|
| `vectors.csv` | 56MB |
|
||||||
|
| `identities.csv` | 973KB |
|
||||||
|
| `schema.sql` | 29KB |
|
||||||
|
| `RELEASE_INFO.txt` | Metadata |
|
||||||
|
| **Total** | **483MB** |
|
||||||
|
|
||||||
|
Location: `release/phase1/v1.0.0_20260509_101337/`
|
||||||
|
|
||||||
|
## 5. Key Technical Decisions
|
||||||
|
|
||||||
|
| Decision | Rationale |
|
||||||
|
|----------|-----------|
|
||||||
|
| Face 8Hz (interval=3) | 5-15Hz human lip motion needs ≥8Hz sampling |
|
||||||
|
| Two-stage face processor | Apple Vision ANE (fast) + CoreML FaceNet (512D) |
|
||||||
|
| VNFaceprint not used | KVC returns nil in video pipeline |
|
||||||
|
| Face Qdrant separate collection | Face 512D vs chunk 768D — different dimensions |
|
||||||
|
| LLM reasoning off | `--reasoning off` needed for non-empty content |
|
||||||
|
|
||||||
|
## 6. Phase 2 Preparation
|
||||||
|
|
||||||
|
Pending for Phase 2:
|
||||||
|
- Rule 3 scene chunking (cut-based parent chunks)
|
||||||
|
- 5W1H Agent (LLM-generated scene summaries)
|
||||||
|
- Full pipeline + 5W1H release packaging
|
||||||
|
- Lip analysis extended to full movie speaker binding
|
||||||
@@ -85,7 +85,7 @@ if curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 http://localhost:8
|
|||||||
else
|
else
|
||||||
LLM_BIN="/Users/accusys/llama/bin/llama-server"
|
LLM_BIN="/Users/accusys/llama/bin/llama-server"
|
||||||
LLM_MODEL="/Users/accusys/models/google_gemma-4-26B-A4B-it-Q5_K_M.gguf"
|
LLM_MODEL="/Users/accusys/models/google_gemma-4-26B-A4B-it-Q5_K_M.gguf"
|
||||||
nohup "$LLM_BIN" -m "$LLM_MODEL" --host 0.0.0.0 --port 8082 -ngl 99 -c 16384 --temp 0.1 --mlock > "$LOG_DIR/llama_server.log" 2>&1 &
|
nohup "$LLM_BIN" -m "$LLM_MODEL" --host 0.0.0.0 --port 8082 -ngl 99 -c 16384 --temp 0.1 --mlock --reasoning off > "$LOG_DIR/llama_server.log" 2>&1 &
|
||||||
echo -e " ${YELLOW}⏳ loading model (~30s)...${NC}"
|
echo -e " ${YELLOW}⏳ loading model (~30s)...${NC}"
|
||||||
for i in $(seq 1 30); do
|
for i in $(seq 1 30); do
|
||||||
sleep 2
|
sleep 2
|
||||||
|
|||||||
Reference in New Issue
Block a user