release: v1.3.0 - TKG node type renaming

Changes:
- Rust: face_trace → face_track (45 occurrences in 8 files)
- Rust: gaze_trace → gaze_track, lip_trace → lip_track
- Python: tkg_builder.py unified + pipeline_checklist.py fixed
- Swift: swift_hand.swift hand state detection (empty vs holding)

Node type changes:
  face_trace    → face_track
  person_trace  → body_track
  gaze_trace    → gaze_track
  lip_trace     → lip_track
  hand_trace    → hand_track
  speaker       → speaker_segment
  object        → detected_object
  text_trace    → text_region

Migration:
  PUBLIC schema: 12970 + 892 + 305 rows updated
This commit is contained in:
Accusys
2026-06-22 07:18:21 +08:00
parent bce9435823
commit 7e548f8b08
35 changed files with 2789 additions and 481 deletions

View File

@@ -0,0 +1,97 @@
---
title: Job Status Sync Fix - Historical Processor Results Issue
version: 1.0
date: 2026-06-21
author: OpenCode
status: resolved
---
# Job Status Sync Fix - Historical Processor Results Issue
## Problem Summary
Production Worker marked jobs as 'failed' even when current processors completed successfully.
## Root Cause
### Location: `src/worker/job_worker.rs:1070`
```rust
let any_failed = results
.iter()
.any(|r| matches!(r.status, ProcessorJobStatus::Failed));
```
### Logic Defect
- Checked **all historical processor_results** (results=8)
- If **any historical processor failed** → job marked as failed
- **Ignored job_processors** (current request processors)
### Example Case
Job ID 63:
- Historical: asr, yolo, face, ocr, pose, mediapipe, appearance (all failed)
- Current: cut (completed)
- Result: `any_failed=true` → job status='failed' ❌
## Fix Implementation
### Modified Code (line 1070-1110)
```rust
// Before
let any_failed = results
.iter()
.any(|r| matches!(r.status, ProcessorJobStatus::Failed));
// After
let any_failed = results
.iter()
.filter(|r| job_processors.contains(&r.processor_type.as_str().to_string()))
.any(|r| matches!(r.status, ProcessorJobStatus::Failed));
```
### Key Changes
1. Added filter for `job_processors` parameter
2. Only checks processors in current request
3. Ignores historical failed processors
## Verification Results
### Production (3002) After Fix
```
Found 1 pending jobs ✅
Processing job: 53090f160138fd4a01d62edf8395c6a0 (63) ✅
Processor cut output file exists, marking completed ✅
Job status: running ✅ (not failed)
```
### Playground (3003) Comparison
- Playground had fewer historical results
- Jobs processed successfully before fix
- Dev schema works normally
## Deployment
### Binary
- Compiled: Jun 21 14:35
- Worker restart: PID 28623
- Logs: `logs/worker_3002_fixed.log`
### Test Command
```bash
curl -X POST "http://localhost:3002/api/v1/file/53090f160138fd4a01d62edf8395c6a0/process" \
-H "Content-Type: application/json" \
-d '{"processors": ["cut"]}'
```
## Lessons Learned
1. **Job lifecycle should be scoped to request**: Only check processors in current request
2. **Historical data pollution**: Failed attempts can pollute job status logic
3. **Filter early**: Apply filters before checking status to avoid false positives
## Related Files
- `src/worker/job_worker.rs:1070-1110` (fixed)
- `src/worker/job_worker.rs:1407` (any_failed handling)
- `logs/worker_3002_fixed.log` (verification)

View File

@@ -0,0 +1,84 @@
---
title: PostgreSQL Job Status Sync Issue
version: 1.0
date: 2026-06-21
author: OpenCode
status: identified
---
# PostgreSQL Job Status Sync Issue
## Problem Description
Production Worker (3002) cannot find pending jobs despite successful UPDATE operations.
## Evidence
### Server Logs
```
UPDATE monitor_jobs SET processors = ..., status = 'pending' WHERE uuid = '...'
rows_affected=1 ✅
elapsed=565.917µs
```
### PostgreSQL Query Timeline
1. **Trigger at 06:04:39**: UPDATE executed (rows_affected=1)
2. **Query at 06:04:41** (Python): status='pending' ✅
3. **Query at 06:06**: status='failed' ❌ (reverted)
4. **Worker SELECT at 06:04-06:07**: rows_returned=0 ❌
### Key Findings
- Server UPDATE succeeds (rows_affected=1)
- PostgreSQL briefly shows 'pending' (confirmed 2 seconds later)
- Status immediately reverts to 'failed'
- Worker SELECT never finds pending jobs
## Hypotheses
1. **Another process resets status**: Unknown mechanism changing status back to 'failed'
2. **Job lifecycle logic**: Job processing framework has logic that marks failed jobs back as failed
3. **Connection pool transaction issue**: UPDATE happens in one transaction, reverted in another
4. **Worker health check**: Only affects WHERE status='running', not pending jobs
## Configuration Verified
- Server schema: `public`
- Worker schema: `public`
- monitor_jobs.uuid: VARCHAR(32) ✅
- All uuids: 32 characters ✅
- Worker binary: Jun 21 13:20 (latest) ✅
- Server binary: Jun 21 13:20 (latest) ✅
## Testing Done
1. Restarted Server (3002, PID 65718)
2. Restarted Worker (PID 88674)
3. Triggered processing for multiple files
4. Direct PostgreSQL queries via Python
5. API verification: /api/v1/files, /health, /api/v1/jobs
## Current Status
**Production (3002)**:
- Server: Running ✅
- Worker: Running ✅
- Jobs: 8 total (6 failed, 1 completed)
- Processing: Blocked ❌
**Playground (3003)**:
- Server: Running ✅
- Worker: Running ✅
- Not tested yet
## Next Steps
1. **Test in Playground**: Compare job lifecycle in dev schema
2. **Find reset mechanism**: Search for code that resets job status to 'failed'
3. **Check job lifecycle**: Review job_worker.rs for failed job handling logic
4. **Test new job registration**: Register fresh video and trigger processing
## Related Files
- `src/api/processing.rs`: trigger_processing UPDATE (line 271)
- `src/worker/job_worker.rs`: Worker polling and health check (line 95-115)
- `src/core/db/postgres_db.rs`: list_monitor_jobs_by_status (line 1720)
- `logs/momentry_3002.log`: Server UPDATE logs
- `logs/worker_3002_new.log`: Worker SELECT logs