release: v1.3.0 - TKG node type renaming

Changes:
- Rust: face_trace → face_track (45 occurrences in 8 files)
- Rust: gaze_trace → gaze_track, lip_trace → lip_track
- Python: tkg_builder.py unified + pipeline_checklist.py fixed
- Swift: swift_hand.swift hand state detection (empty vs holding)

Node type changes:
  face_trace    → face_track
  person_trace  → body_track
  gaze_trace    → gaze_track
  lip_trace     → lip_track
  hand_trace    → hand_track
  speaker       → speaker_segment
  object        → detected_object
  text_trace    → text_region

Migration:
  PUBLIC schema: 12970 + 892 + 305 rows updated
This commit is contained in:
Accusys
2026-06-22 07:18:21 +08:00
parent bce9435823
commit 7e548f8b08
35 changed files with 2789 additions and 481 deletions

View File

@@ -127,13 +127,15 @@ curl -s "$API/api/v1/file/$FILE_UUID/probe" -H "X-API-Key: $KEY"
---
### `GET /api/v1/progress/:file_uuid`
### `POST /api/v1/progress/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get real-time processing progress for a file via Redis pub/sub. Includes per-processor status, current/total frames, ETA, and system resource stats.
**Note**: This endpoint uses **POST** method, not GET. The progress data is stored in Redis as a hash, and POST is used to retrieve the latest state.
#### Pipeline Order
| Order | Processor | Dependencies | Description |
@@ -154,7 +156,7 @@ All processors except `story` and `5w1h` run concurrently when their dependencie
#### Example
```bash
curl -s "$API/api/v1/progress/$FILE_UUID" -H "X-API-Key: $KEY" | jq '{overall_progress, processors: [.processors[] | {processor_type, status}]}'
curl -s -X POST "$API/api/v1/progress/$FILE_UUID" -H "X-API-Key: $KEY" | jq '{overall_progress, processors: [.processors[] | {name, status}]}'
```
#### Response (200)

View File

@@ -1,7 +1,7 @@
---
title: Rule 2 TKG Relationship Chunks V1.0
version: 1.0
date: 2026-06-20
version: 1.1
date: 2026-06-22
author: OpenCode
status: approved
---
@@ -18,13 +18,26 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
**Key Change:** Original Rule 2 (YOLO frame objects) is deprecated due to COCO classes being too generic. New Rule 2 focuses on TKG relationships.
## Node Types (V2.0 - Intuitive Naming)
| Old Name | New Name | Description | external_id Format |
|----------|----------|-------------|-------------------|
| `face_trace` | `face_track` | Face tracking across frames | `face_track_1` |
| `person_trace` | `body_track` | Body appearance tracking | `body_track_0` |
| `gaze_trace` | `gaze_track` | Gaze direction sequence | `gaze_track_1` |
| `lip_trace` | `lip_track` | Lip sync sequence | `lip_track_1` |
| `hand_trace` | `hand_track` | Hand state sequence | `hand_track_0` |
| `speaker` | `speaker_segment` | Speaker segment | `speaker_01` |
| `object` | `detected_object` | YOLO detected object | `car`, `phone` |
| `text_trace` | `text_region` | OCR text region | `text_1` |
## Data Flow
```
┌─────────────────────────────────────────────────────────┐
│ UPSTREAM: TKG Builder │
│ │
│ tkg_nodes: face_trace, speaker, object, etc.
│ tkg_nodes: face_track, speaker_segment, detected_object
│ tkg_edges: speaker_face, mutual_gaze, co_occurs, etc. │
│ │
└─────────────────────────────────────────────────────────┘
@@ -42,7 +55,7 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
│ ├─ Query tkg_edges by type (priority order) │
│ ├─ For each edge: │
│ │ ├─ Resolve source_node / target_node │
│ │ ├─ Resolve identity names (if face_trace) │
│ │ ├─ Resolve identity names (if face_track) │
│ │ ├─ Build context JSON │
│ │ ├─ call_llm(context) → text_content │
│ │ └─ INSERT INTO chunk (chunk_type='relationship') │
@@ -68,12 +81,12 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
| Priority | Edge Type | Description | Example Output |
|----------|-----------|-------------|----------------|
| P0 | `speaker_face` | Speaker ↔ Face trace | "SPEAKER_01 以 Cary Grant 的身份說話,從 frame 100 到 350" |
| P0 | `mutual_gaze` | Two face traces looking at each other | "Cary Grant 和 Grace Kelly 互相看對方 24 幀,起始於 frame 450" |
| P1 | `face_face` | Two face traces co-occurring | "Cary Grant 和 Grace Kelly 同框 180 幀" |
| P1 | `co_occurs` | Object ↔ Object co-occurrence | "物件 'car' 和 'person' 在同一畫面出現 60 幀" |
| P2 | `has_appearance` | Face traceAppearance trace | "Cary Grant 穿著藍色上衣,戴眼鏡" |
| P2 | `wears` | Face trace ↔ Accessory | "Cary Grant 戴帽子,信心值 0.82" |
| P0 | `speaker_face` | Speaker ↔ Face track | "SPEAKER_01 以 Cary Grant 的身份說話,從 frame 100 到 350" |
| P0 | `mutual_gaze` | Two face tracks looking at each other | "Cary Grant 和 Grace Kelly 互相看對方 24 幀,起始於 frame 450" |
| P1 | `face_face` | Two face tracks co-occurring | "Cary Grant 和 Grace Kelly 同框 180 幀" |
| P1 | `co_occurs` | Detected object ↔ Detected object co-occurrence | "物件 'car' 和 'person' 在同一畫面出現 60 幀" |
| P2 | `has_appearance` | Face trackBody track | "Cary Grant 穿著藍色上衣,戴眼鏡" |
| P2 | `wears` | Face track ↔ Accessory | "Cary Grant 戴帽子,信心值 0.82" |
## Chunk Data Structure
@@ -85,15 +98,15 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
"edge_id": 123,
"source_node": {
"id": 45,
"node_type": "speaker",
"external_id": "SPEAKER_01",
"node_type": "speaker_segment",
"external_id": "speaker_01",
"label": "SPEAKER_01"
},
"target_node": {
"id": 67,
"node_type": "face_trace",
"external_id": "trace_5",
"label": "Face Trace 5",
"node_type": "face_track",
"external_id": "face_track_5",
"label": "Face Track 5",
"identity_name": "Cary Grant"
},
"properties": {
@@ -157,21 +170,21 @@ LLM-generated natural language description in Traditional Chinese:
### speaker_face Edge
```rust
// Source: speaker node
// Target: face_trace node
// Source: speaker_segment node
// Target: face_track node
// Properties: first_frame, last_frame, lip_sync_confidence
let text_content = call_llm(format!(
"SPEAKER {} 對應 face trace {},身份 {}frame {}-{}",
speaker_id, trace_id, identity_name, first_frame, last_frame
"SPEAKER {} 對應 face track {},身份 {}frame {}-{}",
speaker_id, track_id, identity_name, first_frame, last_frame
));
```
### mutual_gaze Edge
```rust
// Source: face_trace node A
// Target: face_trace node B
// Source: face_track node A
// Target: face_track node B
// Properties: first_frame, gaze_frame_count, yaw_a_avg, yaw_b_avg
let text_content = call_llm(format!(
@@ -183,8 +196,8 @@ let text_content = call_llm(format!(
### has_appearance Edge
```rust
// Source: face_trace node
// Target: appearance_trace node
// Source: face_track node
// Target: body_track node
// Properties: clothing colors, accessories
let text_content = call_llm(format!(
@@ -232,4 +245,5 @@ let text_content = call_llm(format!(
| Version | Date | Author | Change |
|---------|------|--------|--------|
| 1.1 | 2026-06-22 | OpenCode | Node type renaming: face_trace→face_track, person_trace→body_track, etc. |
| 1.0 | 2026-06-20 | OpenCode | Initial design: TKG edges → relationship chunks |

View File

@@ -0,0 +1,179 @@
---
title: Redis Prefix Configuration
version: 1.0
date: 2026-06-21
author: momentry_core development
status: active
---
## Overview
Momentry Core uses Redis key prefixes to isolate namespaces between Production and Playground environments. This prevents cross-contamination of job queues, progress data, and cache entries.
## Environment Configuration
| Environment | Port | Redis Prefix | Config File |
|-------------|------|--------------|-------------|
| **Production** | 3002 | `momentry:` | `.env` (default) |
| **Playground** | 3003 | `momentry_dev:` | `.env.development` |
### Configuration
```bash
# Production (.env)
MOMENTRY_REDIS_PREFIX=momentry: # Default if not set
# Playground (.env.development)
MOMENTRY_REDIS_PREFIX=momentry_dev:
```
## Redis Key Structure
All Redis keys follow this pattern:
```
{prefix}{key_type}:{identifier}
```
### Key Types
| Key Type | Pattern | Example |
|----------|---------|---------|
| Job | `{prefix}job:{file_uuid}` | `momentry:job:abc123...` |
| Progress | `{prefix}progress:{file_uuid}` | `momentry:progress:abc123...` |
| Processor | `{prefix}job:{file_uuid}:processor:{type}` | `momentry:job:abc123:processor:face` |
| Health | `{prefix}health` | `momentry:health` |
## Namespace Isolation
### Production vs Playground
**Production (3002)**:
- Jobs created by production API → `momentry:job:*`
- Worker must run with production prefix
- Production worker sees only production jobs
**Playground (3003)**:
- Jobs created by playground API → `momentry_dev:job:*`
- Worker must run with playground prefix
- Playground worker sees only playground jobs
### Cross-Namespace Access
**Cannot access**:
- Production API cannot see playground jobs
- Playground API cannot see production jobs
- Worker with wrong prefix will not process jobs
**Design intent**:
- Complete isolation between environments
- No accidental cross-contamination
- Safe testing in playground without affecting production
## Worker Configuration
Workers must match the Redis prefix of the server that creates jobs:
```bash
# Production worker
./target/release/momentry worker
# Uses: momentry: prefix (default)
# Playground worker
./target/debug/momentry_playground worker
# Uses: momentry_dev: prefix (from .env.development)
```
### Worker Redis Connection
Workers read Redis prefix from environment:
1. Check `MOMENTRY_REDIS_PREFIX` environment variable
2. If not set, use default prefix:
- `momentry` binary → `momentry:`
- `momentry_playground` binary → `momentry_dev:`
## Common Issues
### Issue: Jobs Not Being Processed
**Symptoms**:
- API returns "Processing triggered"
- Worker shows no activity
- Redis job key created but not consumed
**Cause**: Worker running with wrong Redis prefix
**Solution**:
```bash
# Check worker prefix
redis-cli keys "momentry*"
# If jobs in momentry: namespace
# Production worker needed
./target/release/momentry worker
# If jobs in momentry_dev: namespace
# Playground worker needed
./target/debug/momentry_playground worker
```
### Issue: Progress API Returns Empty
**Symptoms**:
- Progress API returns empty response
- Job exists but progress not visible
**Cause**: Progress key in different namespace
**Solution**:
- Ensure worker prefix matches server prefix
- Check Redis keys: `redis-cli keys "{prefix}progress:*"`
## Redis CLI Examples
```bash
# List all production jobs
redis-cli -a accusys keys "momentry:job:*"
# List all playground jobs
redis-cli -a accusys keys "momentry_dev:job:*"
# Check progress for specific file (production)
redis-cli -a accusys HGETALL "momentry:progress:{file_uuid}"
# Check progress for specific file (playground)
redis-cli -a accusys HGETALL "momentry_dev:progress:{file_uuid}"
# Delete all production jobs (⚠️ destructive)
redis-cli -a accusys keys "momentry:job:*" | xargs redis-cli -a accusys del
# Delete all playground jobs (⚠️ destructive)
redis-cli -a accusys keys "momentry_dev:job:*" | xargs redis-cli -a accusys del
```
## Best Practices
1. **Always match worker to server**: Production worker for production server, playground worker for playground server
2. **Check Redis keys**: Before debugging worker issues, verify namespace alignment
3. **Document in AGENTS.md**: Update Redis prefix documentation when configuration changes
4. **Never mix namespaces**: Keep production and playground completely isolated
5. **Use environment variables**: Configure prefix via `.env` files, not hardcoded values
## Related Documentation
- `docs_v1.0/DESIGN/Redis_Progress_Reporting_V1.0.md` - Progress reporting design
- `docs_v1.0/M4_workspace/2026-06-21_issue_report.md` - Issue report with Redis prefix problem
- `AGENTS.md` - Environment configuration reference
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-06-21 | Initial documentation for Redis prefix configuration |

View File

@@ -0,0 +1,328 @@
---
title: Worker Health Check Mechanism
version: 1.0
date: 2026-06-21
author: momentry_core development
status: active
---
## Overview
Momentry Core worker processes can become stuck due to:
- Redis connection timeouts
- Job queue corruption
- Long-running processor hangs
- Resource exhaustion
This document describes health check mechanisms and recommended solutions.
## Current Architecture
### Worker Process
```
momentry worker
├─→ Redis connection pool
│ └─→ Poll job queue ({prefix}job:*)
├─→ Processor executor
│ ├─→ Python scripts (timeout: configurable)
│ └─→ Resource monitoring (CPU, memory, GPU)
└─→ Dynamic concurrency
└─→ Adjust based on system resources
```
### Worker Logs
Worker logs are stored in:
- `logs/nohup_worker*.log` - Historical worker logs
- `logs/momentry_3002.log` - Production server logs
- `logs/momentry_3003.log` - Playground server logs
## Known Issues
### Issue: Worker Stuck (2026-06-21)
**Symptoms**:
- Worker process running but no activity
- Last log timestamp outdated (>17 hours old)
- Jobs triggered but never processed
- Redis keys created but not consumed
**Cause**: Worker process running for extended period without proper cleanup
**Resolution**:
```bash
# 1. Check worker status
ps aux | grep momentry.*worker
# 2. Check last activity
tail -20 logs/nohup_worker*.log
# 3. Kill stuck worker
kill <PID>
# 4. Restart worker
./target/release/momentry worker
```
## Recommended Health Check Mechanisms
### 1. Worker Heartbeat
**Implementation**:
- Worker writes heartbeat to Redis every 30 seconds
- Heartbeat key: `{prefix}health`
- Heartbeat value: `{timestamp, worker_pid, status}`
**Check**:
```bash
# Check worker heartbeat
redis-cli -a accusys HGETALL "momentry:health"
```
**Expected output**:
```json
{
"timestamp": "1782015243",
"worker_pid": "52908",
"status": "active",
"last_job": "abc123..."
}
```
### 2. Automatic Restart
**Recommendation**: Implement automatic restart on inactivity timeout
```bash
# Example: Restart worker if no heartbeat for 60 seconds
# (To be implemented in worker code)
while true; do
# Check heartbeat
LAST_HEARTBEAT=$(redis-cli HGET momentry:health timestamp)
CURRENT_TIME=$(date +%s)
if [ $((CURRENT_TIME - LAST_HEARTBEAT)) > 60 ]; then
echo "Worker stuck, restarting..."
pkill -f "momentry worker"
./target/release/momentry worker &
fi
sleep 30
done
```
### 3. Worker Status API
**Recommendation**: Add `/api/v1/worker/status` endpoint
**Response**:
```json
{
"worker_pid": 52908,
"status": "active",
"last_heartbeat": "2026-06-21T12:15:00Z",
"jobs_processed": 42,
"current_job": "abc123...",
"uptime_seconds": 3600
}
```
### 4. Job Queue Monitoring
**Check for stuck jobs**:
```bash
# List all pending jobs
redis-cli -a accusys keys "momentry:job:*"
# Check job timestamp
redis-cli -a accusys HGET "momentry:job:{file_uuid}" created_at
# If job > 1 hour old without progress → stuck job
```
### 5. Resource Monitoring
**Worker logs include system stats**:
```
System: CPU idle=50.0%, Memory=31948MB/49152MB (35.0%), No GPU
Dynamic concurrency: 2 (config: 2)
```
**Monitor**:
- CPU idle > 90% for extended period → worker not processing
- Memory > 90% → resource exhaustion risk
- GPU not available → GPU-dependent processors will fail
## Monitoring Script
```bash
#!/bin/bash
# worker_health_monitor.sh
PREFIX="momentry:"
REDIS_URL="redis://:accusys@localhost:6379"
while true; do
echo "=== Worker Health Check ==="
# Check worker process
WORKER_PID=$(pgrep -f "momentry worker")
if [ -z "$WORKER_PID" ]; then
echo "❌ No worker process running"
echo "Starting worker..."
./target/release/momentry worker &
continue
fi
echo "✅ Worker running (PID: $WORKER_PID)"
# Check Redis heartbeat
HEARTBEAT=$(redis-cli -a accusys HGET "${PREFIX}health" timestamp)
if [ -n "$HEARTBEAT" ]; then
AGE=$(( $(date +%s) - $HEARTBEAT ))
if [ $AGE > 60 ]; then
echo "⚠️ Worker heartbeat stale ($AGE seconds old)"
echo "Restarting worker..."
kill $WORKER_PID
./target/release/momentry worker &
else
echo "✅ Heartbeat recent ($AGE seconds old)"
fi
else
echo "⚠️ No heartbeat found"
fi
# Check pending jobs
JOBS=$(redis-cli -a accusys keys "${PREFIX}job:*" | wc -l)
echo "Pending jobs: $JOBS"
sleep 30
done
```
## Preventive Measures
### 1. Regular Worker Restart
**Recommendation**: Restart worker daily to prevent accumulation
```bash
# Daily restart at 3 AM
# Add to crontab:
0 3 * * * pkill -f "momentry worker" && sleep 5 && ./target/release/momentry worker &
# Or use systemd/launchd for automatic restart
```
### 2. Timeout Configuration
**Set reasonable timeouts**:
```bash
# Environment variables
MOMENTRY_ASR_TIMEOUT=3600 # 1 hour for ASR
MOMENTRY_CUT_TIMEOUT=3600 # 1 hour for CUT
MOMENTRY_DEFAULT_TIMEOUT=7200 # 2 hours default
```
### 3. Resource Limits
**Limit worker concurrency**:
```bash
# Worker flags
./target/release/momentry worker \
--max-concurrent 6 \ # Max parallel processors
--poll-interval 10 \ # Poll every 10 seconds
--batch-size 5 # Process 5 jobs per batch
```
### 4. Logging Enhancement
**Recommendation**: Add structured logging for job lifecycle
```rust
// In job_worker.rs
tracing::info!(
job_id = %job.id,
file_uuid = %file_uuid,
status = "started",
"Worker started job"
);
tracing::info!(
job_id = %job.id,
duration_ms = elapsed,
status = "completed",
"Worker completed job"
);
```
## Troubleshooting Guide
### Step 1: Check Process
```bash
ps aux | grep momentry.*worker
```
Expected: One worker process per environment (production + playground)
### Step 2: Check Logs
```bash
tail -50 logs/nohup_worker*.log
```
Look for:
- Last log timestamp
- Error messages
- Processor failures
### Step 3: Check Redis
```bash
redis-cli -a accusys keys "momentry:job:*"
redis-cli -a accusys HGETALL "momentry:health"
```
Look for:
- Pending jobs count
- Heartbeat timestamp
- Job creation timestamps
### Step 4: Check Resources
```bash
top -pid <worker_pid>
```
Look for:
- CPU usage (should be active if processing)
- Memory usage (should not exceed 80%)
- Process state (should be running, not sleeping)
### Step 5: Restart Worker
```bash
kill <worker_pid>
./target/release/momentry worker
```
## Related Documentation
- `docs_v1.0/DESIGN/Redis_Prefix_Configuration.md` - Redis namespace configuration
- `docs_v1.0/M4_workspace/2026-06-21_issue_report.md` - Worker stuck issue report
- `AGENTS.md` - Worker configuration reference
- `src/worker/job_worker.rs` - Worker implementation
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-06-21 | Initial documentation for worker health check mechanisms |

View File

@@ -0,0 +1,97 @@
---
title: Job Status Sync Fix - Historical Processor Results Issue
version: 1.0
date: 2026-06-21
author: OpenCode
status: resolved
---
# Job Status Sync Fix - Historical Processor Results Issue
## Problem Summary
Production Worker marked jobs as 'failed' even when current processors completed successfully.
## Root Cause
### Location: `src/worker/job_worker.rs:1070`
```rust
let any_failed = results
.iter()
.any(|r| matches!(r.status, ProcessorJobStatus::Failed));
```
### Logic Defect
- Checked **all historical processor_results** (results=8)
- If **any historical processor failed** → job marked as failed
- **Ignored job_processors** (current request processors)
### Example Case
Job ID 63:
- Historical: asr, yolo, face, ocr, pose, mediapipe, appearance (all failed)
- Current: cut (completed)
- Result: `any_failed=true` → job status='failed' ❌
## Fix Implementation
### Modified Code (line 1070-1110)
```rust
// Before
let any_failed = results
.iter()
.any(|r| matches!(r.status, ProcessorJobStatus::Failed));
// After
let any_failed = results
.iter()
.filter(|r| job_processors.contains(&r.processor_type.as_str().to_string()))
.any(|r| matches!(r.status, ProcessorJobStatus::Failed));
```
### Key Changes
1. Added filter for `job_processors` parameter
2. Only checks processors in current request
3. Ignores historical failed processors
## Verification Results
### Production (3002) After Fix
```
Found 1 pending jobs ✅
Processing job: 53090f160138fd4a01d62edf8395c6a0 (63) ✅
Processor cut output file exists, marking completed ✅
Job status: running ✅ (not failed)
```
### Playground (3003) Comparison
- Playground had fewer historical results
- Jobs processed successfully before fix
- Dev schema works normally
## Deployment
### Binary
- Compiled: Jun 21 14:35
- Worker restart: PID 28623
- Logs: `logs/worker_3002_fixed.log`
### Test Command
```bash
curl -X POST "http://localhost:3002/api/v1/file/53090f160138fd4a01d62edf8395c6a0/process" \
-H "Content-Type: application/json" \
-d '{"processors": ["cut"]}'
```
## Lessons Learned
1. **Job lifecycle should be scoped to request**: Only check processors in current request
2. **Historical data pollution**: Failed attempts can pollute job status logic
3. **Filter early**: Apply filters before checking status to avoid false positives
## Related Files
- `src/worker/job_worker.rs:1070-1110` (fixed)
- `src/worker/job_worker.rs:1407` (any_failed handling)
- `logs/worker_3002_fixed.log` (verification)

View File

@@ -0,0 +1,84 @@
---
title: PostgreSQL Job Status Sync Issue
version: 1.0
date: 2026-06-21
author: OpenCode
status: identified
---
# PostgreSQL Job Status Sync Issue
## Problem Description
Production Worker (3002) cannot find pending jobs despite successful UPDATE operations.
## Evidence
### Server Logs
```
UPDATE monitor_jobs SET processors = ..., status = 'pending' WHERE uuid = '...'
rows_affected=1 ✅
elapsed=565.917µs
```
### PostgreSQL Query Timeline
1. **Trigger at 06:04:39**: UPDATE executed (rows_affected=1)
2. **Query at 06:04:41** (Python): status='pending' ✅
3. **Query at 06:06**: status='failed' ❌ (reverted)
4. **Worker SELECT at 06:04-06:07**: rows_returned=0 ❌
### Key Findings
- Server UPDATE succeeds (rows_affected=1)
- PostgreSQL briefly shows 'pending' (confirmed 2 seconds later)
- Status immediately reverts to 'failed'
- Worker SELECT never finds pending jobs
## Hypotheses
1. **Another process resets status**: Unknown mechanism changing status back to 'failed'
2. **Job lifecycle logic**: Job processing framework has logic that marks failed jobs back as failed
3. **Connection pool transaction issue**: UPDATE happens in one transaction, reverted in another
4. **Worker health check**: Only affects WHERE status='running', not pending jobs
## Configuration Verified
- Server schema: `public`
- Worker schema: `public`
- monitor_jobs.uuid: VARCHAR(32) ✅
- All uuids: 32 characters ✅
- Worker binary: Jun 21 13:20 (latest) ✅
- Server binary: Jun 21 13:20 (latest) ✅
## Testing Done
1. Restarted Server (3002, PID 65718)
2. Restarted Worker (PID 88674)
3. Triggered processing for multiple files
4. Direct PostgreSQL queries via Python
5. API verification: /api/v1/files, /health, /api/v1/jobs
## Current Status
**Production (3002)**:
- Server: Running ✅
- Worker: Running ✅
- Jobs: 8 total (6 failed, 1 completed)
- Processing: Blocked ❌
**Playground (3003)**:
- Server: Running ✅
- Worker: Running ✅
- Not tested yet
## Next Steps
1. **Test in Playground**: Compare job lifecycle in dev schema
2. **Find reset mechanism**: Search for code that resets job status to 'failed'
3. **Check job lifecycle**: Review job_worker.rs for failed job handling logic
4. **Test new job registration**: Register fresh video and trigger processing
## Related Files
- `src/api/processing.rs`: trigger_processing UPDATE (line 271)
- `src/worker/job_worker.rs`: Worker polling and health check (line 95-115)
- `src/core/db/postgres_db.rs`: list_monitor_jobs_by_status (line 1720)
- `logs/momentry_3002.log`: Server UPDATE logs
- `logs/worker_3002_new.log`: Worker SELECT logs

View File

@@ -0,0 +1,206 @@
# Issue Report: 2026-06-21
## Issue 1: Worker Process Stuck
### Description
Worker process (PID 58279) started on Fri10PM was stuck and not processing new jobs. Last log entry dated 2026-06-20 06:52.
### Symptoms
- Jobs triggered via API returned "Processing triggered" but never executed
- Redis keys for new jobs were not created
- Progress API returned empty response
- Worker logs showed old timestamps
### Resolution
- Killed stuck worker: `kill 58279`
- Restarted worker: `cd /Users/accusys/momentry_core && ./target/release/momentry worker`
- New worker PID: 52908
### Root Cause (Suspected)
- Worker process running for extended period without proper cleanup
- Possible Redis connection timeout or job queue corruption
### Recommendation
- Add worker health check mechanism
- Implement automatic worker restart on inactivity timeout
- Add logging for job queue polling status
---
## Issue 2: Face/YOLO Processor Failure - Missing OpenCV
### Description
Face and YOLO processors failed with `ModuleNotFoundError: No module named 'cv2'`
### Error Log
```
[ERROR] Processor face failed for job d8acb03870f0cc9b14e01f14a7bf24d6: Failed to run "/Users/accusys/momentry_core/scripts/face_processor.py"
[ERROR] Processor yolo failed for job d8acb03870f0cc9b14e01f14a7bf24d6: Failed to run "/Users/accusys/momentry_core/scripts/yolo_processor.py"
```
### Python Test Result
```
python3 /Users/accusys/momentry_core/scripts/face_processor.py --help
Traceback (most recent call last):
File ".../face_processor.py", line 25, in <module>
import cv2
ModuleNotFoundError: No module named 'cv2'
```
### Resolution
```bash
pip3 install opencv-python
```
### Recommendation
- Add Python dependency check in worker startup
- Document required Python packages in README
- Add `requirements.txt` with all processor dependencies
---
## Issue 3: Redis Prefix Configuration Confusion
### Description
Two different Redis namespaces exist:
- `momentry:` - Production server (port 3002)
- `momentry_dev:` - Playground server (port 3003)
### Impact
- Jobs triggered on production server not visible to playground worker
- Progress data stored in different namespaces
- API proxy needs to match correct prefix
### Current Setup
```
Production Server (port 3002): Redis prefix "momentry:"
Playground Server (port 3003): Redis prefix "momentry_dev:"
```
### Recommendation
- Document Redis prefix configuration clearly
- Add environment variable for Redis prefix selection
- Consider using same prefix for development simplicity
---
## Issue 4: Progress API Behavior
### Description
`GET /api/v1/progress/:file_uuid` returns empty response when:
1. No job exists for the file
2. Job is complete (all processors finished)
3. Worker is stuck/not processing
### Expected Behavior (from docs)
```json
{
"file_uuid": "...",
"overall_progress": 71,
"processors": [
{"processor_type": "asr", "status": "complete", "progress": 100},
{"processor_type": "yolo", "status": "running", "progress": 65}
]
}
```
### Actual Behavior
- Returns empty response (no output) when job complete or missing
- Frontend cannot distinguish between "not started" vs "completed"
### Recommendation
- Return explicit status for completed jobs (e.g., `{"overall_progress": 100, "status": "completed"}`)
- Return 404 when job not found (file never processed)
- Add `status` field to response: `pending`, `running`, `completed`, `failed`
---
## Issue 5: Frontend Status Display Bug
### Description
Frontend showed "處理中" (processing) status for Gamma Carry file but:
- Database status: `registered` (not processed)
- No job in Redis
- No progress data
### Cause
Frontend code sets `f.status = 'processing'` immediately after process trigger, without verifying job creation:
```typescript
// LibraryView.vue line 463
if (result.success) {
f.status = 'processing' // Sets status prematurely
pollProgress(f.file_uuid)
}
```
### Impact
- User sees "processing" status but actual processing never started
- Misleading UI feedback
### Recommendation
- Verify job creation before setting status
- Check Redis job key existence
- Poll progress API and set status based on actual response
- Handle case when progress API returns empty (job not created)
---
## Test Results Summary
### File: Gamma Carry Saves the World..mp4
- UUID: `d8acb03870f0cc9b14e01f14a7bf24d6`
- Processing triggered: 2026-06-21 12:13
### Processor Results
| Processor | Status | Output |
|-----------|--------|--------|
| cut | ✓ Complete | 4825 frames |
| asr | ✓ Complete | 0 segments |
| face | ✗ Failed | Missing cv2 |
| yolo | ✗ Failed | Missing cv2 |
| ocr | - Not run | Dependency failed |
| pose | - Not run | Dependency failed |
### Redis Keys Created
```
momentry:job:d8acb03870f0cc9b14e01f14a7bf24d6
momentry:progress:d8acb03870f0cc9b14e01f14a7bf24d6
momentry:job:d8acb03870f0cc9b14e01f14a7bf24d6:processor:cut
momentry:job:d8acb03870f0cc9b14e01f14a7bf24d6:processor:asr
momentry:job:d8acb03870f0cc9b14e01f14a7bf24d6:processor:face
momentry:job:d8acb03870f0cc9b14e01f14a7bf24d6:processor:yolo
```
### API Test Results
| API | Status | Note |
|-----|--------|------|
| `POST /api/v1/file/:uuid/process` | ✓ Works | Job created |
| `GET /api/v1/file/:uuid/processor-counts` | ✓ Works | Returns correct counts |
| `GET /api/v1/progress/:uuid` | Partial | Empty when complete/missing |
| `GET /api/v1/jobs` | - Not tested | No response via proxy |
---
## Recommended Actions
### Immediate
1. Install OpenCV: `pip3 install opencv-python`
2. Add worker health monitoring
3. Fix progress API to return status for completed jobs
### Short-term
1. Add Python dependency validation in worker
2. Document Redis prefix configuration
3. Improve frontend status verification
### Long-term
1. Add `requirements.txt` for processor scripts
2. Implement worker auto-restart mechanism
3. Add comprehensive logging for job lifecycle
4. Create integration tests for processing pipeline
---
*Report generated: 2026-06-21 12:15*
*Reporter: momentry_studio development session*