feat: ASRX hybrid pipeline, identity history, worker fixes, checkpoint system
This commit is contained in:
242
docs_v1.0/M4_workspace/2026-05-27_charade_pipeline_checklist.md
Normal file
242
docs_v1.0/M4_workspace/2026-05-27_charade_pipeline_checklist.md
Normal file
@@ -0,0 +1,242 @@
|
||||
---
|
||||
title: Charade Full Movie Pipeline Checklist
|
||||
version: 1.0
|
||||
date: 2026-05-27
|
||||
author: M5Max48
|
||||
status: in_progress
|
||||
---
|
||||
|
||||
# Charade Full Movie Pipeline Checklist
|
||||
|
||||
**File UUID**: `c3c635e3641da80dde10cc555ffcdda5`
|
||||
**File Name**: Charade (1963) Cary Grant & Audrey Hepburn | Comedy Mystery Romance Thriller | Full Movie.mp4
|
||||
**Duration**: 6785 seconds (113 minutes)
|
||||
**Total Frames**: 169,625
|
||||
|
||||
---
|
||||
|
||||
## P0: Processor Outputs
|
||||
|
||||
### Purpose
|
||||
原始處理器輸出檔案,存放在 `/Users/accusys/momentry/output_dev/`。這些是後續 ingestion 的資料來源。
|
||||
|
||||
### Processor Details
|
||||
|
||||
| Processor | Expected Output | Size Estimate | Purpose | Status |
|
||||
|-----------|-----------------|---------------|---------|--------|
|
||||
| CUT | `c3c635e3641da80dde10cc555ffcdda5.cut.json` | ~170KB | Scene boundary detection,切割點用於 Rule 3 chunking | ✅ Done |
|
||||
| YOLO | `c3c635e3641da80dde10cc555ffcdda5.yolo.json` | ~50-80MB | Object detection,每幀的物件類別與位置 | 🔄 Running |
|
||||
| Face | `c3c635e3641da80dde10cc555ffcdda5.face.json` | ~1.5GB | Face detection + 512-dim embedding (FaceNet CoreML) | 🔄 44% |
|
||||
| Face Traced | `c3c635e3641da80dde10cc555ffcdda5.face_traced.json` | ~1.2GB | Face tracking,同一人物的連續出現 → trace_id | ⏳ Pending (after Face) |
|
||||
| OCR | `c3c635e3641da80dde10cc555ffcdda5.ocr.json` | ~50KB | Text recognition from frames | ❌ Skipped |
|
||||
| Pose | `c3c635e3641da80dde10cc555ffcdda5.pose.json` | ~20MB | Body pose estimation | 🔄 Running |
|
||||
| ASRX | `c3c635e3641da80dde10cc555ffcdda5.asrx.json` | ~8MB | Speaker diarization,語者分段 | ✅ Done (reuse from public) |
|
||||
| Visual Chunk | `c3c635e3641da80dde10cc555ffcdda5.visual_chunk.json` | ~60KB | Visual scene chunk metadata | ✅ Done |
|
||||
| Scene | `c3c635e3641da80dde10cc555ffcdda5.scene.json` | ~300B | Scene list from CUT | ✅ Done |
|
||||
| Scene Meta | `c3c635e3641da80dde10cc555ffcdda5.scene_meta.json` | ~50KB | Heuristic scene metadata (人物 + 物件統計) | ⏳ Pending |
|
||||
| Story LLM | `c3c635e3641da80dde10cc555ffcdda5.story_llm.json` | ~800KB | LLM-generated story summaries per chunk | ✅ Done |
|
||||
| Story Story | `c3c635e3641da80dde10cc555ffcdda5.story_story.json` | ~800KB | Story parent-child relationships | ✅ Done |
|
||||
| TMDb | `c3c635e3641da80dde10cc555ffcdda5.tmdb.json` | ~5KB | TMDb cast list with face embeddings | ⏳ Pending |
|
||||
| 5W1H | `c3c635e3641da80dde10cc555ffcdda5.5w1h.json` | ~500KB | 5W1H agent output (who/when/where/what/why/how) | ✅ Done |
|
||||
|
||||
### Key Dependencies
|
||||
- Face Traced 需要 Face 完成後才能執行 (face_traced.json = face.json + tracking)
|
||||
- Scene Meta 需要 Face + YOLO 完成
|
||||
- TMDb 需要 Face Traced 完成後執行 matching
|
||||
|
||||
---
|
||||
|
||||
## P1: Database Records
|
||||
|
||||
### Purpose
|
||||
將 processor outputs 存入 PostgreSQL,供 API query 使用。
|
||||
|
||||
### Table Details
|
||||
|
||||
| Table | Expected Records | Purpose | Verification Query | Status |
|
||||
|-------|------------------|---------|-------------------|--------|
|
||||
| `dev.videos` | 1 row | Video metadata (duration, fps, status) | `SELECT file_uuid, status FROM dev.videos WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ✅ Registered |
|
||||
| `dev.monitor_jobs` | 1 row | Processing job state machine | `SELECT uuid, status, completed_processors FROM dev.monitor_jobs WHERE uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | 🔄 Running |
|
||||
| `dev.pre_chunks` | ~7,000 rows | Raw processor outputs (ASR sentences, YOLO objects, etc.) | `SELECT COUNT(*) FROM dev.pre_chunks WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ⏳ Pending |
|
||||
| `dev.face_detections` | ~70,000 rows | Face detection records (每幀每張臉) | `SELECT COUNT(*) FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ⏳ Pending |
|
||||
| `dev.face_detections.embedding` | ~70,000 non-NULL | 512-dim FaceNet embedding (用於 identity matching) | `SELECT COUNT(embedding) FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ⏳ Pending |
|
||||
| `dev.face_detections.trace_id` | ~70,000 non-NULL | Face tracking ID (同一人物跨幀連續出現) | `SELECT COUNT(trace_id) FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ⏳ Pending |
|
||||
| `dev.face_detections.identity_id` | ~50,000 non-NULL | TMDb identity binding (Audrey, Cary, etc.) | `SELECT COUNT(identity_id) FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ⏳ Pending |
|
||||
|
||||
### Key Points
|
||||
- `embedding` 必須非 NULL 才能進行 TMDb matching (之前 store_traced_faces.py bug 修復)
|
||||
- `trace_id` 由 `store_traced_faces.py` 從 face_traced.json 計算
|
||||
- `identity_id` 由 `match_faces_to_tmdb.py` 計算 (cosine similarity > 0.5)
|
||||
|
||||
---
|
||||
|
||||
## P2: Chunk Ingestion
|
||||
|
||||
### Purpose
|
||||
將 raw processor outputs 轉換為 searchable chunks,用於 RAG query。
|
||||
|
||||
### Chunk Types
|
||||
|
||||
| Chunk Type | Expected Count | Purpose | Source | Verification Query | Status |
|
||||
|------------|----------------|---------|--------|-------------------|--------|
|
||||
| sentence (Rule 1) | ~1,700 | Sentence-level chunks for text search | ASR output → sentence split | `SELECT COUNT(*) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND chunk_type = 'sentence'` | ⏳ Pending |
|
||||
| llm_parent | ~800 | LLM-generated summary parent chunks | Story LLM output | `SELECT COUNT(*) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND chunk_type = 'llm_parent'` | ⏳ Pending |
|
||||
| story_parent | ~800 | Story parent chunks (narrative segments) | Story processor | `SELECT COUNT(*) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND chunk_type = 'story_parent'` | ⏳ Pending |
|
||||
| story_child | ~1,700 | Story child chunks (linked to sentence) | Story processor | `SELECT COUNT(*) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND chunk_type = 'story_child'` | ⏳ Pending |
|
||||
| cut (Rule 3) | ~500 | Scene-level chunks for scene search | CUT output → scene boundaries | `SELECT COUNT(*) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND chunk_type = 'cut'` | ⏳ Pending |
|
||||
| trace | ~3,600 | Face trace chunks (identity-centric) | Face Traced output | `SELECT COUNT(*) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND chunk_type = 'trace'` | ⏳ Pending |
|
||||
|
||||
### Ingestion Pipeline
|
||||
1. **Rule 1**: ASR → sentence split → chunk + embedding → Qdrant
|
||||
2. **Rule 3**: CUT + ASR → scene chunks → chunk + embedding → Qdrant
|
||||
3. **Trace**: Face Traced → trace chunks → TKG nodes → Qdrant
|
||||
|
||||
### Key Points
|
||||
- `start_frame` / `end_frame` 必須正確計算 (之前 bug: frame=0)
|
||||
- Chunks 必須有 `embedding` 才能 search
|
||||
|
||||
---
|
||||
|
||||
## P3: Vector Embeddings
|
||||
|
||||
### Purpose
|
||||
將 chunks 的 text 轉換為 768-dim embeddings,存入 PostgreSQL + Qdrant,用於 semantic search。
|
||||
|
||||
### Embedding Targets
|
||||
|
||||
| Target | Expected Count | Model | Purpose | Verification | Status |
|
||||
|--------|----------------|-------|---------|--------------|--------|
|
||||
| PostgreSQL `dev.chunk.embedding` | ~5,000 | Gemma-2-9B (768-dim) | Text semantic search | `SELECT COUNT(embedding) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | ⏳ Pending |
|
||||
| Qdrant `momentry_dev_rule1_v2` | ~5,000 points | Gemma-2-9B | Fast vector similarity search | `curl -H "api-key: Test3200Test3200Test3200" "http://localhost:6333/collections/momentry_dev_rule1_v2"` | ⏳ Pending |
|
||||
| Qdrant `_face` collection | ~70,000 points | FaceNet-512 (512-dim) | Face identity search | Face embeddings sync via `sync_face_embeddings()` | ⏳ Pending |
|
||||
|
||||
### Embedding Pipeline
|
||||
1. **Text chunks**: `embeddinggemma_server.py` (port 11436) → 768-dim embedding
|
||||
2. **Face embeddings**: FaceNet CoreML (from face.json) → 512-dim embedding (已在 P0 產生)
|
||||
3. **Sync to Qdrant**: `sync_face_embeddings()` function in Rust
|
||||
|
||||
### Key Points
|
||||
- Text embeddings 使用 Gemma-2-9B (local LLM server)
|
||||
- Face embeddings 使用 FaceNet-512 (CoreML ANE accelerated)
|
||||
- Qdrant 提供 fast similarity search (cosine similarity)
|
||||
|
||||
---
|
||||
|
||||
## P4: Identity Binding
|
||||
|
||||
### Purpose
|
||||
將 detected faces 綁定到 TMDb identities (Audrey Hepburn, Cary Grant, etc.),用於 identity_text search。
|
||||
|
||||
### Identity Matching Pipeline
|
||||
|
||||
| Step | Expected Result | Method | Verification | Status |
|
||||
|------|-----------------|--------|--------------|--------|
|
||||
| TMDb seeds loaded | 23 identities | `tmdb_embed_extractor.py` → TMDb profile face embeddings | `SELECT COUNT(*) FROM dev.identities WHERE source = 'tmdb' AND face_embedding IS NOT NULL` | ✅ Done |
|
||||
| Face matching | ~50,000 bindings | `match_faces_to_tmdb.py` → cosine similarity > 0.5 | `SELECT COUNT(identity_id) FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND identity_id IS NOT NULL` | ⏳ Pending |
|
||||
| Audrey Hepburn faces | ~16,000 | Highest similarity match | `SELECT COUNT(*) FROM dev.face_detections fd JOIN dev.identities i ON fd.identity_id = i.id WHERE fd.file_uuid = 'c3c635e3641da80dde10cc555ffcdda5' AND i.name = 'Audrey Hepburn'` | ⏳ Pending |
|
||||
| Cary Grant faces | ~5,000 | Second highest match | Same query for Cary Grant | ⏳ Pending |
|
||||
|
||||
### Matching Algorithm
|
||||
```python
|
||||
# match_faces_to_tmdb.py
|
||||
for trace_id in traces:
|
||||
for face_embedding in trace_faces:
|
||||
for tmdb_identity in tmdb_identities:
|
||||
similarity = cosine_similarity(face_embedding, tmdb_identity.face_embedding)
|
||||
if similarity >= 0.5:
|
||||
match trace_id → tmdb_identity
|
||||
```
|
||||
|
||||
### Key Points
|
||||
- TMDb seeds 需要 `face_embedding` (之前已驗證: 23 identities with embeddings)
|
||||
- Face `embedding` 必須非 NULL (之前 store_traced_faces.py bug 修復)
|
||||
- Threshold: 0.5 (可調整)
|
||||
|
||||
---
|
||||
|
||||
## P5: API Endpoints
|
||||
|
||||
### Purpose
|
||||
驗證 API endpoints 可以正確返回 identity_text search results。
|
||||
|
||||
### API Tests
|
||||
|
||||
| Endpoint | Purpose | Expected Response | Test Command | Status |
|
||||
|----------|---------|-------------------|--------------|--------|
|
||||
| `/api/v1/search/identity_text` | Search chunk text → identities | Results with `identity_name`, `trace_id`, `identity_source` | `curl "http://localhost:3003/api/v1/search/identity_text?file_uuid=c3c635e3641da80dde10cc555ffcdda5&q=Regina&limit=5"` | ⏳ Pending |
|
||||
| `/api/v1/identities` | List identities with TMDb | Identity list with `tmdb_id`, `face_embedding` | `curl "http://localhost:3003/api/v1/identities?name=Audrey"` | ⏳ Pending |
|
||||
| `/api/v1/progress/:file_uuid` | Check processing progress | JSON with `status`, `completed_processors` | `curl "http://localhost:3003/api/v1/progress/c3c635e3641da80dde10cc555ffcdda5"` | ⏳ Pending |
|
||||
|
||||
### Expected API Response Example
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"total": 5,
|
||||
"results": [
|
||||
{
|
||||
"chunk_id": "sentence_123",
|
||||
"start_time": 355.0,
|
||||
"text_content": "Oh, mine's Regina Lampert.",
|
||||
"identity_id": 9,
|
||||
"identity_name": "Audrey Hepburn",
|
||||
"identity_source": "tmdb",
|
||||
"trace_id": 169
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Key Points
|
||||
- `identity_text` API 需要 `chunk.start_frame` / `chunk.end_frame` 正確 (之前 bug: frame=0)
|
||||
- `identity_id` 必須非 NULL 才能返回 identity_name
|
||||
|
||||
---
|
||||
|
||||
## P6: Completion Criteria
|
||||
|
||||
### Purpose
|
||||
驗證 pipeline 完整完成,所有 ingestion steps 成功。
|
||||
|
||||
### Final Verification Checklist
|
||||
|
||||
| Criteria | Purpose | Check Command | Expected Result | Status |
|
||||
|----------|---------|---------------|-----------------|--------|
|
||||
| All processor outputs exist | 確認所有 processor JSON 檔案產生 | `ls -la output_dev/c3c635e3641da80dde10cc555ffcdda5.*` | 14+ files with size > 0 | ⏳ Pending |
|
||||
| Job status = completed | 確認 worker 完成 job | `SELECT status FROM dev.monitor_jobs WHERE uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | `completed` | ⏳ Pending |
|
||||
| Video status = completed | 確認 video state 更新 | `SELECT status FROM dev.videos WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | `completed` | ⏳ Pending |
|
||||
| All chunks have embeddings | 確認 text embeddings 完成 | `SELECT COUNT(*) = COUNT(embedding) FROM dev.chunk WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | `true` (all chunks have embedding) | ⏳ Pending |
|
||||
| Face traces assigned | 確認 face tracking 完成 | `SELECT COUNT(*) = COUNT(trace_id) FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | `true` (all faces have trace_id) | ⏳ Pending |
|
||||
| TMDb matching done | 確認 identity binding 完成 | `SELECT COUNT(identity_id) > 40000 FROM dev.face_detections WHERE file_uuid = 'c3c635e3641da80dde10cc555ffcdda5'` | `true` (> 40K identity bindings) | ⏳ Pending |
|
||||
| Qdrant synced | 確認 vector search ready | Check Qdrant points count | Points increased by ~5,000 | ⏳ Pending |
|
||||
|
||||
### Success Thresholds
|
||||
- **Face detections**: ~70,000 (169K frames / 3 sample interval)
|
||||
- **Identity bindings**: > 40,000 (60% match rate)
|
||||
- **Chunks with embeddings**: > 4,000 (all chunk types)
|
||||
- **Qdrant points**: > 90,000 (current) → > 95,000 (after Charade)
|
||||
|
||||
---
|
||||
|
||||
## Verification Script
|
||||
|
||||
```bash
|
||||
# Run after completion
|
||||
./scripts/verify_charade_pipeline.sh c3c635e3641da80dde10cc555ffcdda5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- OCR processor failed, skipped
|
||||
- Face detection using SwiftFace (ANE accelerated)
|
||||
- TMDb matching using `scripts/match_faces_to_tmdb.py`
|
||||
- Expected total processing time: ~2-3 hours
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0 | 2026-05-27 | M5Max48 | Initial checklist |
|
||||
@@ -0,0 +1,49 @@
|
||||
# Session Summary: Identity Fixes + WP Proxy Fixes + Data Sync
|
||||
|
||||
**Date**: 2026-05-29
|
||||
**Author**: OpenCode
|
||||
**Status**: Completed (marcom team testing)
|
||||
|
||||
## What Was Done (Chronological)
|
||||
|
||||
### 1. Production Identity Fixes (3002)
|
||||
- **James Coburn restored** (id=18738, confirmed)
|
||||
- **Chantal Goya restored** (id=18737, confirmed)
|
||||
- **Louis Viret name/status fixed**
|
||||
- **Sequences fixed**: `identities_id_seq` (48→18734), `face_detections_id_seq` (141383→932413), `identity_history_id_seq`, `identity_bindings_id_seq`, `pre_chunks_id_seq`, `file_identities_id_seq`
|
||||
- **COALESCE fix** for `reference_data` NULL crash (`postgres_db.rs:3198`, `storage.rs:196`)
|
||||
|
||||
### 2. Bug Fixes
|
||||
- **DELETE identity**: Fixed binding order bug + removed `identity_confidence` column reference
|
||||
- **PATCH identity**: `jsonb_deep_merge` Nested JSON metadata
|
||||
- **mergeinto UNDO/REDO**: MongoDB deserialization fix (`Collection<Document>`)
|
||||
|
||||
### 3. Library Page Infinite Load Fix
|
||||
- **Root cause**: WP scan proxy (snippet 48) didn't forward query params → infinite pagination loop
|
||||
- **Fix**: Added `$request->get_query_params()` forwarding in scan proxy
|
||||
- **Safety**: Added `maxPages = 10` limit in JS pagination
|
||||
|
||||
### 4. Identity Data Sync (Dev → Production)
|
||||
- **Full replacement** of `public.identities`, `public.identity_bindings`, `public.identity_history` with dev data
|
||||
- James Coburn id: 18738 → 11
|
||||
- Bindings: 11,892 → 12,834 (+942)
|
||||
- **Verification**: 0 differences between schemas
|
||||
|
||||
### 5. Snippet 55 Filter
|
||||
- Added `.filter(f => f.is_registered)` to show only registered files on library page
|
||||
- Changed `status:'unregistered'` → `status: f.status || 'unregistered'`
|
||||
|
||||
## Key Decisions
|
||||
- Library page filter: default show registered files only
|
||||
- Identity sync: full DELETE + INSERT (not UPDATE) to ensure consistency
|
||||
- No user-defined metadata fields (starred/notes/role) preserved — matches dev exactly
|
||||
|
||||
## Handoff to Marcom
|
||||
- `/people/` page should show correct identity state
|
||||
- `/library/` page should show only registered files (4 currently)
|
||||
- Login required for `/library/` — redirects to `/login/` if not authenticated
|
||||
|
||||
## Files Modified
|
||||
- `snippet 48` (/scan WP proxy — query param forwarding)
|
||||
- `snippet 55` (library page JS — registered-only filter, maxPages safety)
|
||||
- `docs_v1.0/M4_workspace/2026-05-29_identity_sync_prod.md` (sync record)
|
||||
45
docs_v1.0/M4_workspace/2026-05-29_identity_sync_prod.md
Normal file
45
docs_v1.0/M4_workspace/2026-05-29_identity_sync_prod.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Identity Data Sync: Dev (3003) → Production (3002)
|
||||
|
||||
**Date**: 2026-05-29
|
||||
**Author**: OpenCode
|
||||
**Status**: Completed
|
||||
|
||||
## Summary
|
||||
|
||||
Fully synced all identity-related tables from dev schema to public schema on PostgreSQL `momentry` database.
|
||||
|
||||
## What Was Done
|
||||
|
||||
1. **Identities table** (`public.identities`): Replaced with `dev.identities` (69 records, original ids preserved)
|
||||
2. **Identity_bindings** (`public.identity_bindings`): Replaced with `dev.identity_bindings` (12,834 records)
|
||||
3. **Identity_history** (`public.identity_history`): Replaced with `dev.identity_history` (10 records)
|
||||
4. **Sequences**: Updated `identities_id_seq`, `identity_bindings_id_seq`, `identity_history_id_seq` to match
|
||||
|
||||
### Key Changes
|
||||
- **James Coburn**: Changed from id=18738 → id=11 (dev's original id)
|
||||
- **Chantal Goya**: Changed from id=18737 → id=18736 (dev's id)
|
||||
- **Metadata**: Now matches dev schema — TMDB fields only, no user-defined fields (starred, notes, role, aliases, user_confirmed are removed as expected)
|
||||
- **Bindings**: Increased from 11,892 → 12,834 (+942 bindings)
|
||||
|
||||
### Not Changed
|
||||
- `face_detections` — identical in both schemas (135,521 records)
|
||||
- `pre_chunks` — large difference (public: 1.3M vs dev: 3.3M) but NOT related to identity
|
||||
- All other non-identity tables unchanged
|
||||
|
||||
## Verification
|
||||
|
||||
```sql
|
||||
-- Counts match
|
||||
identities: 69 = 69 ✅
|
||||
identity_bindings: 12,834 = 12,834 ✅
|
||||
identity_history: 10 = 10 ✅
|
||||
|
||||
-- No differences
|
||||
id/uuid mismatch: 0
|
||||
metadata/status/name diffs: 0
|
||||
```
|
||||
|
||||
## Files Referenced
|
||||
|
||||
- `AGENTS.md` — Development isolation rules
|
||||
- `/Users/accusys/momentry_core/docs_v1.0/M4_workspace/2026-05-29_wp_api_url_update.md` — Previous session handoff
|
||||
@@ -0,0 +1,27 @@
|
||||
# 2026-05-29: Mergeinto NULL face_id Fix
|
||||
|
||||
## Problem
|
||||
Production server (3002) returned `"error":"error occurred while decoding column 0: unexpected null; try decoding as an 'Option'"` when using mergeinto after clicking undo on a merge.
|
||||
|
||||
## Root Cause
|
||||
`src/api/identity_binding.rs:428` decodes `face_id` from `face_detections` as `String` (non-Option), but **135,521 records** in the production `face_detections` table have NULL `face_id`. When merging an identity whose face_detections include NULL face_ids, the SQLx decode panics.
|
||||
|
||||
## Fix
|
||||
- Changed `(String, Option<i32>)` → `(Option<String>, Option<i32>)` at line 428
|
||||
- Changed `face_id_list` to use `filter_map` instead of `map` to skip NULL face_ids
|
||||
- Changed `faces_count` to use `face_id_list.len()` instead of `face_ids.len()` (matching the actual transferred count)
|
||||
|
||||
## Files Changed
|
||||
- `momentry_core/src/api/identity_binding.rs` — 3 lines changed
|
||||
|
||||
## Verification
|
||||
- 234 library tests pass
|
||||
- `cargo fmt` passes
|
||||
- Production binary rebuilt (`target/release/momentry`)
|
||||
- Production server restarted on port 3002 (PID 92043)
|
||||
|
||||
## Identities with NULL face_id (20 identities, ~135k records)
|
||||
Audrey Hepburn (36k), Cary Grant (15k), Bernard Musson, Walter Matthau, Jacques Marin, George Kennedy, Michel Thomass, Antonio Passalia, etc. — all `type=people, status=confirmed`. These identities were likely imported from bulk face detection data without face_id generation.
|
||||
|
||||
## Data Note
|
||||
The NULL face_ids are a pre-existing data quality issue. The fix prevents crashes but doesn't clean up the NULL data. Faces with NULL face_id won't be tracked in undo history (they stay with the target after undo), but the bulk transfer (`WHERE identity_id = $1`) still works correctly.
|
||||
Reference in New Issue
Block a user