release: v1.3.0 - TKG node type renaming
Changes: - Rust: face_trace → face_track (45 occurrences in 8 files) - Rust: gaze_trace → gaze_track, lip_trace → lip_track - Python: tkg_builder.py unified + pipeline_checklist.py fixed - Swift: swift_hand.swift hand state detection (empty vs holding) Node type changes: face_trace → face_track person_trace → body_track gaze_trace → gaze_track lip_trace → lip_track hand_trace → hand_track speaker → speaker_segment object → detected_object text_trace → text_region Migration: PUBLIC schema: 12970 + 892 + 305 rows updated
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
---
|
||||
title: Rule 2 TKG Relationship Chunks V1.0
|
||||
version: 1.0
|
||||
date: 2026-06-20
|
||||
version: 1.1
|
||||
date: 2026-06-22
|
||||
author: OpenCode
|
||||
status: approved
|
||||
---
|
||||
@@ -18,13 +18,26 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
|
||||
|
||||
**Key Change:** Original Rule 2 (YOLO frame objects) is deprecated due to COCO classes being too generic. New Rule 2 focuses on TKG relationships.
|
||||
|
||||
## Node Types (V2.0 - Intuitive Naming)
|
||||
|
||||
| Old Name | New Name | Description | external_id Format |
|
||||
|----------|----------|-------------|-------------------|
|
||||
| `face_trace` | `face_track` | Face tracking across frames | `face_track_1` |
|
||||
| `person_trace` | `body_track` | Body appearance tracking | `body_track_0` |
|
||||
| `gaze_trace` | `gaze_track` | Gaze direction sequence | `gaze_track_1` |
|
||||
| `lip_trace` | `lip_track` | Lip sync sequence | `lip_track_1` |
|
||||
| `hand_trace` | `hand_track` | Hand state sequence | `hand_track_0` |
|
||||
| `speaker` | `speaker_segment` | Speaker segment | `speaker_01` |
|
||||
| `object` | `detected_object` | YOLO detected object | `car`, `phone` |
|
||||
| `text_trace` | `text_region` | OCR text region | `text_1` |
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ UPSTREAM: TKG Builder │
|
||||
│ │
|
||||
│ tkg_nodes: face_trace, speaker, object, etc. │
|
||||
│ tkg_nodes: face_track, speaker_segment, detected_object │
|
||||
│ tkg_edges: speaker_face, mutual_gaze, co_occurs, etc. │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
@@ -42,7 +55,7 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
|
||||
│ ├─ Query tkg_edges by type (priority order) │
|
||||
│ ├─ For each edge: │
|
||||
│ │ ├─ Resolve source_node / target_node │
|
||||
│ │ ├─ Resolve identity names (if face_trace) │
|
||||
│ │ ├─ Resolve identity names (if face_track) │
|
||||
│ │ ├─ Build context JSON │
|
||||
│ │ ├─ call_llm(context) → text_content │
|
||||
│ │ └─ INSERT INTO chunk (chunk_type='relationship') │
|
||||
@@ -68,12 +81,12 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
|
||||
|
||||
| Priority | Edge Type | Description | Example Output |
|
||||
|----------|-----------|-------------|----------------|
|
||||
| P0 | `speaker_face` | Speaker ↔ Face trace | "SPEAKER_01 以 Cary Grant 的身份說話,從 frame 100 到 350" |
|
||||
| P0 | `mutual_gaze` | Two face traces looking at each other | "Cary Grant 和 Grace Kelly 互相看對方 24 幀,起始於 frame 450" |
|
||||
| P1 | `face_face` | Two face traces co-occurring | "Cary Grant 和 Grace Kelly 同框 180 幀" |
|
||||
| P1 | `co_occurs` | Object ↔ Object co-occurrence | "物件 'car' 和 'person' 在同一畫面出現 60 幀" |
|
||||
| P2 | `has_appearance` | Face trace ↔ Appearance trace | "Cary Grant 穿著藍色上衣,戴眼鏡" |
|
||||
| P2 | `wears` | Face trace ↔ Accessory | "Cary Grant 戴帽子,信心值 0.82" |
|
||||
| P0 | `speaker_face` | Speaker ↔ Face track | "SPEAKER_01 以 Cary Grant 的身份說話,從 frame 100 到 350" |
|
||||
| P0 | `mutual_gaze` | Two face tracks looking at each other | "Cary Grant 和 Grace Kelly 互相看對方 24 幀,起始於 frame 450" |
|
||||
| P1 | `face_face` | Two face tracks co-occurring | "Cary Grant 和 Grace Kelly 同框 180 幀" |
|
||||
| P1 | `co_occurs` | Detected object ↔ Detected object co-occurrence | "物件 'car' 和 'person' 在同一畫面出現 60 幀" |
|
||||
| P2 | `has_appearance` | Face track ↔ Body track | "Cary Grant 穿著藍色上衣,戴眼鏡" |
|
||||
| P2 | `wears` | Face track ↔ Accessory | "Cary Grant 戴帽子,信心值 0.82" |
|
||||
|
||||
## Chunk Data Structure
|
||||
|
||||
@@ -85,15 +98,15 @@ Rule 2 creates **relationship chunks** by converting TKG edges into searchable,
|
||||
"edge_id": 123,
|
||||
"source_node": {
|
||||
"id": 45,
|
||||
"node_type": "speaker",
|
||||
"external_id": "SPEAKER_01",
|
||||
"node_type": "speaker_segment",
|
||||
"external_id": "speaker_01",
|
||||
"label": "SPEAKER_01"
|
||||
},
|
||||
"target_node": {
|
||||
"id": 67,
|
||||
"node_type": "face_trace",
|
||||
"external_id": "trace_5",
|
||||
"label": "Face Trace 5",
|
||||
"node_type": "face_track",
|
||||
"external_id": "face_track_5",
|
||||
"label": "Face Track 5",
|
||||
"identity_name": "Cary Grant"
|
||||
},
|
||||
"properties": {
|
||||
@@ -157,21 +170,21 @@ LLM-generated natural language description in Traditional Chinese:
|
||||
### speaker_face Edge
|
||||
|
||||
```rust
|
||||
// Source: speaker node
|
||||
// Target: face_trace node
|
||||
// Source: speaker_segment node
|
||||
// Target: face_track node
|
||||
// Properties: first_frame, last_frame, lip_sync_confidence
|
||||
|
||||
let text_content = call_llm(format!(
|
||||
"SPEAKER {} 對應 face trace {},身份 {},frame {}-{}",
|
||||
speaker_id, trace_id, identity_name, first_frame, last_frame
|
||||
"SPEAKER {} 對應 face track {},身份 {},frame {}-{}",
|
||||
speaker_id, track_id, identity_name, first_frame, last_frame
|
||||
));
|
||||
```
|
||||
|
||||
### mutual_gaze Edge
|
||||
|
||||
```rust
|
||||
// Source: face_trace node A
|
||||
// Target: face_trace node B
|
||||
// Source: face_track node A
|
||||
// Target: face_track node B
|
||||
// Properties: first_frame, gaze_frame_count, yaw_a_avg, yaw_b_avg
|
||||
|
||||
let text_content = call_llm(format!(
|
||||
@@ -183,8 +196,8 @@ let text_content = call_llm(format!(
|
||||
### has_appearance Edge
|
||||
|
||||
```rust
|
||||
// Source: face_trace node
|
||||
// Target: appearance_trace node
|
||||
// Source: face_track node
|
||||
// Target: body_track node
|
||||
// Properties: clothing colors, accessories
|
||||
|
||||
let text_content = call_llm(format!(
|
||||
@@ -232,4 +245,5 @@ let text_content = call_llm(format!(
|
||||
|
||||
| Version | Date | Author | Change |
|
||||
|---------|------|--------|--------|
|
||||
| 1.1 | 2026-06-22 | OpenCode | Node type renaming: face_trace→face_track, person_trace→body_track, etc. |
|
||||
| 1.0 | 2026-06-20 | OpenCode | Initial design: TKG edges → relationship chunks |
|
||||
179
docs_v1.0/DESIGN/Redis_Prefix_Configuration.md
Normal file
179
docs_v1.0/DESIGN/Redis_Prefix_Configuration.md
Normal file
@@ -0,0 +1,179 @@
|
||||
---
|
||||
title: Redis Prefix Configuration
|
||||
version: 1.0
|
||||
date: 2026-06-21
|
||||
author: momentry_core development
|
||||
status: active
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Momentry Core uses Redis key prefixes to isolate namespaces between Production and Playground environments. This prevents cross-contamination of job queues, progress data, and cache entries.
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
| Environment | Port | Redis Prefix | Config File |
|
||||
|-------------|------|--------------|-------------|
|
||||
| **Production** | 3002 | `momentry:` | `.env` (default) |
|
||||
| **Playground** | 3003 | `momentry_dev:` | `.env.development` |
|
||||
|
||||
### Configuration
|
||||
|
||||
```bash
|
||||
# Production (.env)
|
||||
MOMENTRY_REDIS_PREFIX=momentry: # Default if not set
|
||||
|
||||
# Playground (.env.development)
|
||||
MOMENTRY_REDIS_PREFIX=momentry_dev:
|
||||
```
|
||||
|
||||
## Redis Key Structure
|
||||
|
||||
All Redis keys follow this pattern:
|
||||
|
||||
```
|
||||
{prefix}{key_type}:{identifier}
|
||||
```
|
||||
|
||||
### Key Types
|
||||
|
||||
| Key Type | Pattern | Example |
|
||||
|----------|---------|---------|
|
||||
| Job | `{prefix}job:{file_uuid}` | `momentry:job:abc123...` |
|
||||
| Progress | `{prefix}progress:{file_uuid}` | `momentry:progress:abc123...` |
|
||||
| Processor | `{prefix}job:{file_uuid}:processor:{type}` | `momentry:job:abc123:processor:face` |
|
||||
| Health | `{prefix}health` | `momentry:health` |
|
||||
|
||||
## Namespace Isolation
|
||||
|
||||
### Production vs Playground
|
||||
|
||||
**Production (3002)**:
|
||||
- Jobs created by production API → `momentry:job:*`
|
||||
- Worker must run with production prefix
|
||||
- Production worker sees only production jobs
|
||||
|
||||
**Playground (3003)**:
|
||||
- Jobs created by playground API → `momentry_dev:job:*`
|
||||
- Worker must run with playground prefix
|
||||
- Playground worker sees only playground jobs
|
||||
|
||||
### Cross-Namespace Access
|
||||
|
||||
❌ **Cannot access**:
|
||||
- Production API cannot see playground jobs
|
||||
- Playground API cannot see production jobs
|
||||
- Worker with wrong prefix will not process jobs
|
||||
|
||||
✅ **Design intent**:
|
||||
- Complete isolation between environments
|
||||
- No accidental cross-contamination
|
||||
- Safe testing in playground without affecting production
|
||||
|
||||
## Worker Configuration
|
||||
|
||||
Workers must match the Redis prefix of the server that creates jobs:
|
||||
|
||||
```bash
|
||||
# Production worker
|
||||
./target/release/momentry worker
|
||||
# Uses: momentry: prefix (default)
|
||||
|
||||
# Playground worker
|
||||
./target/debug/momentry_playground worker
|
||||
# Uses: momentry_dev: prefix (from .env.development)
|
||||
```
|
||||
|
||||
### Worker Redis Connection
|
||||
|
||||
Workers read Redis prefix from environment:
|
||||
|
||||
1. Check `MOMENTRY_REDIS_PREFIX` environment variable
|
||||
2. If not set, use default prefix:
|
||||
- `momentry` binary → `momentry:`
|
||||
- `momentry_playground` binary → `momentry_dev:`
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Issue: Jobs Not Being Processed
|
||||
|
||||
**Symptoms**:
|
||||
- API returns "Processing triggered"
|
||||
- Worker shows no activity
|
||||
- Redis job key created but not consumed
|
||||
|
||||
**Cause**: Worker running with wrong Redis prefix
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Check worker prefix
|
||||
redis-cli keys "momentry*"
|
||||
|
||||
# If jobs in momentry: namespace
|
||||
# Production worker needed
|
||||
./target/release/momentry worker
|
||||
|
||||
# If jobs in momentry_dev: namespace
|
||||
# Playground worker needed
|
||||
./target/debug/momentry_playground worker
|
||||
```
|
||||
|
||||
### Issue: Progress API Returns Empty
|
||||
|
||||
**Symptoms**:
|
||||
- Progress API returns empty response
|
||||
- Job exists but progress not visible
|
||||
|
||||
**Cause**: Progress key in different namespace
|
||||
|
||||
**Solution**:
|
||||
- Ensure worker prefix matches server prefix
|
||||
- Check Redis keys: `redis-cli keys "{prefix}progress:*"`
|
||||
|
||||
## Redis CLI Examples
|
||||
|
||||
```bash
|
||||
# List all production jobs
|
||||
redis-cli -a accusys keys "momentry:job:*"
|
||||
|
||||
# List all playground jobs
|
||||
redis-cli -a accusys keys "momentry_dev:job:*"
|
||||
|
||||
# Check progress for specific file (production)
|
||||
redis-cli -a accusys HGETALL "momentry:progress:{file_uuid}"
|
||||
|
||||
# Check progress for specific file (playground)
|
||||
redis-cli -a accusys HGETALL "momentry_dev:progress:{file_uuid}"
|
||||
|
||||
# Delete all production jobs (⚠️ destructive)
|
||||
redis-cli -a accusys keys "momentry:job:*" | xargs redis-cli -a accusys del
|
||||
|
||||
# Delete all playground jobs (⚠️ destructive)
|
||||
redis-cli -a accusys keys "momentry_dev:job:*" | xargs redis-cli -a accusys del
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always match worker to server**: Production worker for production server, playground worker for playground server
|
||||
|
||||
2. **Check Redis keys**: Before debugging worker issues, verify namespace alignment
|
||||
|
||||
3. **Document in AGENTS.md**: Update Redis prefix documentation when configuration changes
|
||||
|
||||
4. **Never mix namespaces**: Keep production and playground completely isolated
|
||||
|
||||
5. **Use environment variables**: Configure prefix via `.env` files, not hardcoded values
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `docs_v1.0/DESIGN/Redis_Progress_Reporting_V1.0.md` - Progress reporting design
|
||||
- `docs_v1.0/M4_workspace/2026-06-21_issue_report.md` - Issue report with Redis prefix problem
|
||||
- `AGENTS.md` - Environment configuration reference
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0 | 2026-06-21 | Initial documentation for Redis prefix configuration |
|
||||
328
docs_v1.0/DESIGN/Worker_Health_Check_Mechanism.md
Normal file
328
docs_v1.0/DESIGN/Worker_Health_Check_Mechanism.md
Normal file
@@ -0,0 +1,328 @@
|
||||
---
|
||||
title: Worker Health Check Mechanism
|
||||
version: 1.0
|
||||
date: 2026-06-21
|
||||
author: momentry_core development
|
||||
status: active
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Momentry Core worker processes can become stuck due to:
|
||||
- Redis connection timeouts
|
||||
- Job queue corruption
|
||||
- Long-running processor hangs
|
||||
- Resource exhaustion
|
||||
|
||||
This document describes health check mechanisms and recommended solutions.
|
||||
|
||||
## Current Architecture
|
||||
|
||||
### Worker Process
|
||||
|
||||
```
|
||||
momentry worker
|
||||
│
|
||||
├─→ Redis connection pool
|
||||
│ └─→ Poll job queue ({prefix}job:*)
|
||||
│
|
||||
├─→ Processor executor
|
||||
│ ├─→ Python scripts (timeout: configurable)
|
||||
│ └─→ Resource monitoring (CPU, memory, GPU)
|
||||
│
|
||||
└─→ Dynamic concurrency
|
||||
└─→ Adjust based on system resources
|
||||
```
|
||||
|
||||
### Worker Logs
|
||||
|
||||
Worker logs are stored in:
|
||||
- `logs/nohup_worker*.log` - Historical worker logs
|
||||
- `logs/momentry_3002.log` - Production server logs
|
||||
- `logs/momentry_3003.log` - Playground server logs
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Issue: Worker Stuck (2026-06-21)
|
||||
|
||||
**Symptoms**:
|
||||
- Worker process running but no activity
|
||||
- Last log timestamp outdated (>17 hours old)
|
||||
- Jobs triggered but never processed
|
||||
- Redis keys created but not consumed
|
||||
|
||||
**Cause**: Worker process running for extended period without proper cleanup
|
||||
|
||||
**Resolution**:
|
||||
```bash
|
||||
# 1. Check worker status
|
||||
ps aux | grep momentry.*worker
|
||||
|
||||
# 2. Check last activity
|
||||
tail -20 logs/nohup_worker*.log
|
||||
|
||||
# 3. Kill stuck worker
|
||||
kill <PID>
|
||||
|
||||
# 4. Restart worker
|
||||
./target/release/momentry worker
|
||||
```
|
||||
|
||||
## Recommended Health Check Mechanisms
|
||||
|
||||
### 1. Worker Heartbeat
|
||||
|
||||
**Implementation**:
|
||||
- Worker writes heartbeat to Redis every 30 seconds
|
||||
- Heartbeat key: `{prefix}health`
|
||||
- Heartbeat value: `{timestamp, worker_pid, status}`
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
# Check worker heartbeat
|
||||
redis-cli -a accusys HGETALL "momentry:health"
|
||||
```
|
||||
|
||||
**Expected output**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "1782015243",
|
||||
"worker_pid": "52908",
|
||||
"status": "active",
|
||||
"last_job": "abc123..."
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Automatic Restart
|
||||
|
||||
**Recommendation**: Implement automatic restart on inactivity timeout
|
||||
|
||||
```bash
|
||||
# Example: Restart worker if no heartbeat for 60 seconds
|
||||
# (To be implemented in worker code)
|
||||
|
||||
while true; do
|
||||
# Check heartbeat
|
||||
LAST_HEARTBEAT=$(redis-cli HGET momentry:health timestamp)
|
||||
CURRENT_TIME=$(date +%s)
|
||||
|
||||
if [ $((CURRENT_TIME - LAST_HEARTBEAT)) > 60 ]; then
|
||||
echo "Worker stuck, restarting..."
|
||||
pkill -f "momentry worker"
|
||||
./target/release/momentry worker &
|
||||
fi
|
||||
|
||||
sleep 30
|
||||
done
|
||||
```
|
||||
|
||||
### 3. Worker Status API
|
||||
|
||||
**Recommendation**: Add `/api/v1/worker/status` endpoint
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"worker_pid": 52908,
|
||||
"status": "active",
|
||||
"last_heartbeat": "2026-06-21T12:15:00Z",
|
||||
"jobs_processed": 42,
|
||||
"current_job": "abc123...",
|
||||
"uptime_seconds": 3600
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Job Queue Monitoring
|
||||
|
||||
**Check for stuck jobs**:
|
||||
```bash
|
||||
# List all pending jobs
|
||||
redis-cli -a accusys keys "momentry:job:*"
|
||||
|
||||
# Check job timestamp
|
||||
redis-cli -a accusys HGET "momentry:job:{file_uuid}" created_at
|
||||
|
||||
# If job > 1 hour old without progress → stuck job
|
||||
```
|
||||
|
||||
### 5. Resource Monitoring
|
||||
|
||||
**Worker logs include system stats**:
|
||||
```
|
||||
System: CPU idle=50.0%, Memory=31948MB/49152MB (35.0%), No GPU
|
||||
Dynamic concurrency: 2 (config: 2)
|
||||
```
|
||||
|
||||
**Monitor**:
|
||||
- CPU idle > 90% for extended period → worker not processing
|
||||
- Memory > 90% → resource exhaustion risk
|
||||
- GPU not available → GPU-dependent processors will fail
|
||||
|
||||
## Monitoring Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# worker_health_monitor.sh
|
||||
|
||||
PREFIX="momentry:"
|
||||
REDIS_URL="redis://:accusys@localhost:6379"
|
||||
|
||||
while true; do
|
||||
echo "=== Worker Health Check ==="
|
||||
|
||||
# Check worker process
|
||||
WORKER_PID=$(pgrep -f "momentry worker")
|
||||
if [ -z "$WORKER_PID" ]; then
|
||||
echo "❌ No worker process running"
|
||||
echo "Starting worker..."
|
||||
./target/release/momentry worker &
|
||||
continue
|
||||
fi
|
||||
|
||||
echo "✅ Worker running (PID: $WORKER_PID)"
|
||||
|
||||
# Check Redis heartbeat
|
||||
HEARTBEAT=$(redis-cli -a accusys HGET "${PREFIX}health" timestamp)
|
||||
if [ -n "$HEARTBEAT" ]; then
|
||||
AGE=$(( $(date +%s) - $HEARTBEAT ))
|
||||
if [ $AGE > 60 ]; then
|
||||
echo "⚠️ Worker heartbeat stale ($AGE seconds old)"
|
||||
echo "Restarting worker..."
|
||||
kill $WORKER_PID
|
||||
./target/release/momentry worker &
|
||||
else
|
||||
echo "✅ Heartbeat recent ($AGE seconds old)"
|
||||
fi
|
||||
else
|
||||
echo "⚠️ No heartbeat found"
|
||||
fi
|
||||
|
||||
# Check pending jobs
|
||||
JOBS=$(redis-cli -a accusys keys "${PREFIX}job:*" | wc -l)
|
||||
echo "Pending jobs: $JOBS"
|
||||
|
||||
sleep 30
|
||||
done
|
||||
```
|
||||
|
||||
## Preventive Measures
|
||||
|
||||
### 1. Regular Worker Restart
|
||||
|
||||
**Recommendation**: Restart worker daily to prevent accumulation
|
||||
|
||||
```bash
|
||||
# Daily restart at 3 AM
|
||||
# Add to crontab:
|
||||
0 3 * * * pkill -f "momentry worker" && sleep 5 && ./target/release/momentry worker &
|
||||
|
||||
# Or use systemd/launchd for automatic restart
|
||||
```
|
||||
|
||||
### 2. Timeout Configuration
|
||||
|
||||
**Set reasonable timeouts**:
|
||||
```bash
|
||||
# Environment variables
|
||||
MOMENTRY_ASR_TIMEOUT=3600 # 1 hour for ASR
|
||||
MOMENTRY_CUT_TIMEOUT=3600 # 1 hour for CUT
|
||||
MOMENTRY_DEFAULT_TIMEOUT=7200 # 2 hours default
|
||||
```
|
||||
|
||||
### 3. Resource Limits
|
||||
|
||||
**Limit worker concurrency**:
|
||||
```bash
|
||||
# Worker flags
|
||||
./target/release/momentry worker \
|
||||
--max-concurrent 6 \ # Max parallel processors
|
||||
--poll-interval 10 \ # Poll every 10 seconds
|
||||
--batch-size 5 # Process 5 jobs per batch
|
||||
```
|
||||
|
||||
### 4. Logging Enhancement
|
||||
|
||||
**Recommendation**: Add structured logging for job lifecycle
|
||||
|
||||
```rust
|
||||
// In job_worker.rs
|
||||
tracing::info!(
|
||||
job_id = %job.id,
|
||||
file_uuid = %file_uuid,
|
||||
status = "started",
|
||||
"Worker started job"
|
||||
);
|
||||
|
||||
tracing::info!(
|
||||
job_id = %job.id,
|
||||
duration_ms = elapsed,
|
||||
status = "completed",
|
||||
"Worker completed job"
|
||||
);
|
||||
```
|
||||
|
||||
## Troubleshooting Guide
|
||||
|
||||
### Step 1: Check Process
|
||||
|
||||
```bash
|
||||
ps aux | grep momentry.*worker
|
||||
```
|
||||
|
||||
Expected: One worker process per environment (production + playground)
|
||||
|
||||
### Step 2: Check Logs
|
||||
|
||||
```bash
|
||||
tail -50 logs/nohup_worker*.log
|
||||
```
|
||||
|
||||
Look for:
|
||||
- Last log timestamp
|
||||
- Error messages
|
||||
- Processor failures
|
||||
|
||||
### Step 3: Check Redis
|
||||
|
||||
```bash
|
||||
redis-cli -a accusys keys "momentry:job:*"
|
||||
redis-cli -a accusys HGETALL "momentry:health"
|
||||
```
|
||||
|
||||
Look for:
|
||||
- Pending jobs count
|
||||
- Heartbeat timestamp
|
||||
- Job creation timestamps
|
||||
|
||||
### Step 4: Check Resources
|
||||
|
||||
```bash
|
||||
top -pid <worker_pid>
|
||||
```
|
||||
|
||||
Look for:
|
||||
- CPU usage (should be active if processing)
|
||||
- Memory usage (should not exceed 80%)
|
||||
- Process state (should be running, not sleeping)
|
||||
|
||||
### Step 5: Restart Worker
|
||||
|
||||
```bash
|
||||
kill <worker_pid>
|
||||
./target/release/momentry worker
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `docs_v1.0/DESIGN/Redis_Prefix_Configuration.md` - Redis namespace configuration
|
||||
- `docs_v1.0/M4_workspace/2026-06-21_issue_report.md` - Worker stuck issue report
|
||||
- `AGENTS.md` - Worker configuration reference
|
||||
- `src/worker/job_worker.rs` - Worker implementation
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0 | 2026-06-21 | Initial documentation for worker health check mechanisms |
|
||||
Reference in New Issue
Block a user