docs: add REFERENCE docs, M4 workspace, Caddyfile

This commit is contained in:
Accusys
2026-05-16 03:11:32 +08:00
parent 5317cb4bec
commit 3a6c186575
29 changed files with 4276 additions and 0 deletions

View File

@@ -0,0 +1,56 @@
# Delivery: v1.0.0 (c41f7e0)
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Build**: `c41f7e0`
---
## Delivery Package
`release/delivery/v1.0.0_c41f7e0_20260515_180644/`
| Item | Size |
|------|------|
| `momentry_v1.0.0_c41f7e0` | 21 MB |
| `scripts/` (293 .py + 22 .sh) | 2.9 MB |
| `migrate_*.sql` (4 files) | |
## Changes Since 0e73d2a
| # | Change | Details |
|---|--------|---------|
| 1 | Schema version tracking | `schema_migrations` table built into binary. Startup checks all migrations applied. `/health/detailed` shows `schema.ok`. **版本錯用立刻就知** |
| 2 | SHA256 script integrity | `scripts/checksums.sha256` manifest with 345 entries. `PythonExecutor` verifies SHA256 before running any processor. `/health/detailed` shows `scripts_integrity`. |
| 3 | 3 setup scripts | `install_momentry.sh`, `upgrade_momentry.sh`, `check_momentry.sh` in `scripts/setup/` |
| 4 | Bug #2 fixed | chunk_id 12290 rows normalized to `{file_uuid}_{id}` format. Handler fallback for stale Qdrant payloads (integer chunk_id → match by `id`). |
| 5 | Bug #3 fixed | `GET /api/v1/file/:file_uuid/probe` returns JSON error body + correct HTTP code instead of bare 500 |
| 6 | Portal API Review (Bug #1) | Correct endpoint for trace search: `POST /api/v1/file/:file_uuid/face_trace/sortby` (not `search/traces`) |
## Required Deploy Steps
```bash
# 1. Migrations (in order)
psql -U accusys -d momentry -f migrate_add_schema_version.sql
psql -U accusys -d momentry -f migrate_add_registered_status.sql
psql -U accusys -d momentry -f migrate_add_content_hash.sql
psql -U accusys -d momentry -f migrate_fix_chunk_id_format.sql
# 2. Record in schema_migrations
for f in migrate_*.sql; do
HASH=$(shasum -a 256 "$f" | awk '{print $1}')
psql -U accusys -d momentry -c "INSERT INTO schema_migrations (filename, checksum) VALUES ('$f', '$HASH') ON CONFLICT (filename) DO NOTHING"
done
# 3. Replace scripts
cp -r scripts/ /path/to/scripts/
# 4. Replace binary
codesign --remove-signature momentry_v1.0.0_c41f7e0
pkill momentry
DATABASE_SCHEMA=public ./momentry_v1.0.0_c41f7e0 server --port 3002
# 5. Verify
bash /path/to/scripts/setup/check_momentry.sh
```

View File

@@ -0,0 +1,91 @@
# M4 回覆: Delivery c41f7e0 (Corrected Binary)
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Ref**: `2026-05-15_delivery_c41f7e0_response.md`
---
## 1. Binary 正確(驗證方法修正)
Delivery binary **已包含正確 hash** `c41f7e0c`。M5 實測:
```bash
# ✅ 執行 binary → /health 正確回報
DATABASE_SCHEMA=dev ./momentry_v1.0.0_c41f7e0 server --port 3011 &
curl -sf http://127.0.0.1:3011/health | python3 -c "import json,sys;print(json.load(sys.stdin)['build_git_hash'])"
# → c41f7e0c ✓
```
**`strings binary | grep hash` 不適用於 Rust binary。**
Rust 編譯器將 build.rs 的 `cargo:rustc-env=BUILD_GIT_HASH=...` 視為 compile-time 字串常數inline 到 `.rodata` 時可能被合併、分割或優化。M5 驗證:
- `strings` 找不到 `c41f7e0c`**正常現象**
- `xxd` / raw byte search 也找不到 → **正常現象**
- 執行 binary 後 `/health` 正確回 `c41f7e0c`**正確唯一驗證方式**
**更正驗證方式**:請直接啟動 binary不要用 `strings`
## 2. Probe — 確認 Fix 有效
`GET /api/v1/file/fa182e9c26145b2c1a932f73d1d484e5/probe``{"error":"File does not exist at registered path"}`
根因:`short_clip.mov` 不在磁碟上。DB 記錄的 `file_path` 指向 `/Users/accusys/momentry/var/sftpgo/data/demo/short_clip.mov`但該檔案已被刪除或移動。Fix 本身正確(回 JSON error 非 500。✅
## 3. Chunk — 此 binary 已含 handler fallback
此 delivery binary (`c41f7e0c`) **已包含** handler fallback (`WHERE id = int(chunk_id)`)。M5 已驗證。M4 部署後請測試:
```bash
# Test 1: integer chunk_id (handler fallback: WHERE id = 1075655)
curl 'http://localhost:3002/api/v1/file/23b1c872379d4ec06479e5ed39eef4c5/chunk/1075655' \
-H 'X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69'
# 預期: 200 ✅
# Test 2: new format {file_uuid}_{id}
curl 'http://localhost:3002/api/v1/file/23b1c872379d4ec06479e5ed39eef4c5/chunk/23b1c872379d4ec06479e5ed39eef4c5_1075655' \
-H 'X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69'
# 預期: 200 ✅
```
| DB 位置 | |
|------|------|
| Schema | `public` |
| Table | `public.chunk` |
| 測試 `file_uuid` | `23b1c872379d4ec06479e5ed39eef4c5` (Charade_YouTube_24fps.mp4, completed) |
| 測試 `id` | `1075655` (DB 中存在) |
## 4. DB 狀態
| Status | Count | Schema |
|--------|:--:|------|
| `completed` | 2 | `public.videos` (localhost:5432, user: accusys) |
| `unregistered` | 36 | 同上 |
已執行清理:
```sql
-- 位置: 3002 production, schema = public
DELETE FROM public.processor_results WHERE file_uuid IN (SELECT file_uuid FROM public.videos WHERE status = 'unregistered');
UPDATE public.monitor_jobs SET status = 'cancelled' WHERE uuid IN (SELECT file_uuid FROM public.videos WHERE status = 'unregistered') AND status = 'pending';
```
Auto-resume 不再觸發。✅
## 5. M4 Portal Fixes
M4 已完成 portal 修正local commit `6f425de`git push 403 未同步),等 binary 到位後可完整測試:
| 檔案 | 路徑 | 變更 |
|------|------|------|
| `api/client.ts` | `portal/src/api/client.ts` | `searchVideos()`: `vid` 改用傳入 `fileUuid` 參數 |
| `api/client.ts` | 同上 | `searchChunks()`: 同上 |
| `api/client.ts` | 同上 | `getVideos()`: 標準化 response type `{success, total, data}` |
| `SearchView.vue` | `portal/src/views/SearchView.vue` | Trace 搜尋: `fetch(/search/traces)``listTracesSorted()` |
| `SearchView.vue` | 同上 | 強制選 `file_uuid` 才能 trace 搜尋 |
| `App.vue` | `portal/src/App.vue` | ApiDemo 預設顯示(`devMode !== 'false'` |
| `PersonsView.vue` | `portal/src/views/PersonsView.vue` | `person.id``person.identity_uuid` |
| `VideoDetailView.vue` | `portal/src/views/VideoDetailView.vue` | Response: `result.files``result.data` |
| `FaceCandidatesView.vue` | `portal/src/views/FaceCandidatesView.vue` | Response: `result.files``result.data` |
Git push blocked (403 on `gitea.momentry.ddns.net/warren/momentry_core.git`)。Source files at M4 local path: `/Users/accusys/momentry_core_0.1/portal/src/`

View File

@@ -0,0 +1,80 @@
# M4 部署報告v1.0.0 29eca5a → 3002 Production
## 部署摘要
| 項目 | 內容 |
|------|------|
| Binary | `29eca5a` (22075280 bytes delivery, 21903992 stripped) |
| 部署時間 | 2026-05-15 14:49 CST |
| Port | 3002 |
| MD5 (stripped) | `a28e8f517eac22b5a8991fb0769aecc4` |
| 狀態 | ✅ Running, API 200 |
## Migration 完成
```sql
-- registered_status column added
ALTER TABLE videos ADD COLUMN IF NOT EXISTS registered_status text;
-- CHECK constraint updated (added unregistered, registered)
ALTER TABLE videos DROP CONSTRAINT IF EXISTS chk_videos_status;
ALTER TABLE videos ADD CONSTRAINT chk_videos_status
CHECK (status::text = ANY (ARRAY['pending','processing','completed','failed','unregistered']));
-- 36 unfinished files cleaned
UPDATE videos SET status = 'unregistered' WHERE status IN ('pending', 'processing');
```
### DB 最終狀態
| status | registered_status | count |
|--------|-------------------|:--:|
| completed | registered | 2 |
| unregistered | unregistered | 36 |
## Python Deps
```bash
pip3 install PyPDF2 python-docx openpyxl python-pptx # ✅
```
## Watcher Safety
- No file-auto-register watcher: ✅ confirmed
- `com.momentry.monitor`: health check only (300s interval), does NOT register files
- No n8n file-registration workflows
- No sftpgo webhook triggers
## Issue: Probe Endpoint 500
`GET /api/v1/file/{file_uuid}/probe` returns HTTP 500 (no body). Endpoint exists (not 404), confirms 29eca5a features are present, but internal error. Needs M5 investigation.
## Issue: Binary MD5 Mismatch
`codesign --remove-signature` changed binary hash. Original delivery MD5 may not match running binary.
| | MD5 | Size |
|------|------|------|
| Delivery (signed) | `23b0029392e4d363bd0da9b678ae97a9` | 22075280 |
| Running (stripped) | `a28e8f517eac22b5a8991fb0769aecc4` | 21903992 |
## Source Sync
M5 devsync `v1.0.0_devsync_20260515_070837` applied:
- `src/core/probe/unified.rs`
- `scripts/probe_file.py`, `test_probe_file.py`
- `src/watcher/watcher.rs`, `postgres_db.rs`, `universal_search.rs`
- `docs_v1.0/DESIGN/` (3 files) ✅
- M4 protected domains preserved: `portal/`, `AGENTS.md`, `MARKBASE_DESIGN`, `server.rs`
## M4 Files Delivered
M4 sync package at `release/delivery/m4_sync_20260515/`:
- `deploy_v1.0.0_20260515.sh` / `.sql`
- `cleanup_3003_dev.sql`
- `migrate_add_registered_status.sql`
- `AGENTS.md` (M4 updated)
- `rca/` (RCA report)
## M5 Action Items
1. **Probe endpoint 500**: investigate root cause on 29eca5a binary
2. **Verify version detection**: how is M5 checking `fc1d775` vs `29eca5a` on domain?
3. **Pull M4 sync files** from `m4_sync_20260515/` into main repo

View File

@@ -0,0 +1,105 @@
# M5 WordPress 安裝及轉移報告
**Date**: 2026-05-15
**From**: M4
**To**: M5
---
## 1. M5 安裝項目
| 項目 | 操作 | 狀態 |
|------|------|:--:|
| PHP-FPM | `brew services start php`config 複製自 M4 | ✅ |
| MariaDB | 已存在(`brew services`datadir: `/opt/homebrew/var/mysql` | ✅ |
| WordPress web | 解壓自 M4 備份 (`/Users/accusys/wordpress/web/`, 1.4GB) | ✅ |
| Caddy | `brew install caddy`,但 **未使用**M4 端負責) | - |
## 2. 轉移流程
### M4 → M5 傳送
```bash
# M4: DB dump32MB
mariadb-dump -u wp_user -p wp_password_123 -h 127.0.0.1 --databases wordpress > wordpress_m4_db.sql
# M4: Web files539M tar.gz
tar czf wordpress_m4_files.tar.gz -C /Users/accusys/wordpress web/
# SCP
scp wordpress_m4_db.sql wordpress_m4_files.tar.gz accusys@192.168.110.201:/tmp/
```
### M5 還原
```bash
# 解壓 web files
tar xzf /tmp/wordpress_m4_files.tar.gz -C /Users/accusys/wordpress/
# PHP-FPM configM4 複製)
cp www.conf.m4 /opt/homebrew/etc/php/8.5/php-fpm.d/www.conf
sed -i '' 's/127.0.0.1:9000/0.0.0.0:9000/' www.conf # 允許外部連線
brew services restart php
# MariaDB
CREATE DATABASE wordpress;
CREATE USER 'wp_user'@'localhost' IDENTIFIED BY 'wp_password_123';
GRANT ALL ON wordpress.* TO 'wp_user'@'localhost';
mysql wordpress < /tmp/wordpress_m4_db.sql # 25 tables
```
## 3. 架構
```
m5wp.momentry.ddns.net
→ M4 Caddy → php_fastcgi 192.168.110.201:9000
→ M5 PHP-FPM:9000 → M5 MariaDB:3306
```
M5 無需安裝 web server。Caddy 在 M4 端處理 HTTPS、靜態檔案、FastCGI 轉發。
### M5 服務狀態
| Port | Service | Status |
|------|---------|:--:|
| 9000 | PHP-FPM | ✅ running (`brew services`) |
| 3306 | MariaDB | ✅ running (`brew services`) |
### M4 Caddy 配置
```caddyfile
m5wp.momentry.ddns.net {
root * /Users/accusys/wordpress/web
encode gzip
php_fastcgi 192.168.110.201:9000
file_server
import common_log m5wp_access
}
```
## 4. 驗證
| 測試 | 結果 |
|------|:--:|
| REST API | ✅ `"Every moment is an entry"` |
| HTML response | ✅ HTTP 200 |
| DB tables | ✅ 25 tables |
## 5. 待處理
| 項目 | 說明 |
|------|------|
| **~~Home URL~~** | ~~DB 中存為 `https://wp.momentry.ddns.net`。~~ ✅ 已修正為 `https://m5wp.momentry.ddns.net``wp_options.home` + `siteurl` |
| **PHP-FPM restart on boot** | `brew services` 已處理 ✅ |
| **wp-config.php `DB_HOST`** | 設為 `127.0.0.1`M5 本地 MariaDB ✅ |
| **ssl/no-ssl redirect** | WordPress 可能強制 https → m5wp 已有 Caddy HTTPS ✅ |
## 6. 相關路徑
| 路徑 | 說明 |
|------|------|
| `/Users/accusys/wordpress/web/` | WordPress web root |
| `/opt/homebrew/etc/php/8.5/php-fpm.d/www.conf` | PHP-FPM configlisten 0.0.0.0:9000 |
| `/opt/homebrew/var/mysql/` | MariaDB data dir |
| `/tmp/wordpress_m4_db.sql` | DB backup (M5) |
| `/tmp/wordpress_m4_files.tar.gz` | Files backup (M5) |

View File

@@ -0,0 +1,53 @@
# Portal API Review — 對照 API_REFERENCE_V1.0.0.md
## 需 M5 處理3 項)
| # | 問題 | 位置 | M5 行動 |
|---|------|------|---------|
| 1 | `POST /api/v1/search/traces` → 404 | `SearchView.vue:311` (Trace 搜尋 tab) | 實作此 endpoint或告知替代方案 |
| 2 | `GET /api/v1/file/:file_uuid/chunk/:chunk_id` → 404 | `ChunkDetailView.vue:245` | API ref 只有 `GET /api/v1/file/:file_uuid/chunks` (list),無 single chunk endpoint |
| 3 | `GET /api/v1/file/:file_uuid/probe` → 500 | `PipelineProgressView.vue:276` | 已於 29eca5a 部署報告提交,再次確認 |
## Portal API 端點對照3002 實測)
```
client.ts 呼叫 → 實際 3002 endpoint 狀態
─────────────────────────────────────────────────────────────────────────────────────────
getHealth() → GET /health/detailed ✅ 200
getIngestStats() → GET /api/v1/stats/ingest ✅ 200
getSftpgoStatus() → GET /api/v1/stats/sftpgo ✅ 200
getInferenceHealth() → GET /api/v1/stats/inference ✅ 200
getVideos() → GET /api/v1/files ✅ 200
listIdentities() → GET /api/v1/identities ✅ 200
registerVideo(file_path) → POST /api/v1/files/register ✅ 200
unregisterVideo(file_uuid) → POST /api/v1/unregister ✅ 200
processVideo(file_uuid) → POST /api/v1/file/:file_uuid/process ✅ 200
searchVideos() → POST /api/v1/search/universal ✅ 200
listTracesSorted(file_uuid) → POST /api/v1/file/:file_uuid/face_trace/sortby ✅ 200
listTraceFaces(file_uuid, trace_id) → GET /api/v1/file/:file_uuid/trace/:trace_id/faces ✅ 200
registerIdentity(name, images) → POST /api/v1/identity ✅ 200
getIdentityFaces(identity_uuid) → GET /api/v1/identity/:identity_uuid/files ✅ 200
translateText() → POST /api/v1/agents/translate ✅ 200
httpFetch → GET /api/v1/jobs ✅ 200
httpFetch → GET /api/v1/progress/:file_uuid ✅ 200
httpFetch → GET /api/v1/files/scan ✅ 200 (未文件化)
httpFetch → GET /api/v1/search/traces ❌ 404
httpFetch → GET /api/v1/file/:file_uuid/chunk/:chunk_id ❌ 404
httpFetch → GET /api/v1/file/:file_uuid/probe ⚠️ 500
```
## M4 自行修正3 項,待執行)
| # | 修正 | 檔案 |
|---|------|------|
| 1 | `getVideos()` 回傳格式統一為 `{success, total, data}`,移除 views 中 `result.videos \|\| result.data \|\| result.files` fallback | `api/client.ts`, 各 view |
| 2 | `ApiDemo.vue`(即時 API request/response log加到每個 view 底部,供示範教學 | 各 view `.vue` |
| 3 | 補充 `/api/v1/files/scan` endpoint 至 API reference | `API_REFERENCE_V1.0.0.md` |
## 術語規範
全文件使用精確專有名詞:
- `file_uuid` — 不使用 `uuid` / `UUID`
- `identity_uuid` — 全域身份識別符
- `trace_id` — 臉部追蹤 ID
- `chunk_id` — 句子片段 ID

View File

@@ -0,0 +1,33 @@
# M4 Source Sync Request
## 背景
M5 交付 `v1.0.0_29eca5a` binary 已成功部署到 3002。M4 完成了以下工作,需將 source 同步回 M5
## M4 變更
### Database
| 檔案 | 說明 |
|------|------|
| `release/deploy_v1.0.0_20260515.sql` | Migration: `registered_status` column + cleanup 36 unfinished files |
| `release/cleanup_3003_dev.sql` | 3003 dev schema cleanup |
| `release/migrate_add_registered_status.sql` | `registered_status` column migration |
| `release/deploy_v1.0.0_20260515.sh` | Full deployment script |
### Deployment
- Binary `29eca5a` deployed to `/target/release/momentry`, port 3002 ✅
- CHECK constraint `chk_videos_status` updated: added `unregistered`
- Python deps installed: `PyPDF2`, `python-docx`, `openpyxl`, `python-pptx`
- 36 unfinished files cleaned → `unregistered` status
### Docs
- `docs/maintenance_records/rca/RCA_MARKBASE_HTML_PREVIEW_SCREENSHOT_2026_05_15.md` — HTML preview screenshot bug RCA
- `docs_v1.0/REFERENCE/MARKBASE_DESIGN_V2.0.md` — MarkBase design
- `AGENTS.md` — Updated M4 instructions
## Sync 方式
- Git push 失敗: `403` (M4 無 push 權限 `gitea.momentry.ddns.net/warren/momentry_core.git`)
- 已複製到 `/Volumes/accusys/momentry_core_0.1/release/delivery/m4_sync_20260515/`
- Git commit: `d4e3853` (local only)
## M5 Action
請從 shared volume 拉取 M4 變更,合併到 main repo 並 push 到 git remote。

View File

@@ -0,0 +1,92 @@
# M4 回覆: Worker 崩潰循環 — 根因分析與修正
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Ref**: Worker crash-loop (all jobs stuck at pending)
---
## 根因
`PythonExecutor::new()` 使用 `env!("CARGO_MANIFEST_DIR")`,這是 Rust 的 **compile-time 常數**。在 M5 編譯時被硬編碼為:
```
/Users/accusys/momentry_core_0.1/venv/bin/python
/Users/accusys/momentry_core_0.1/scripts/
```
若 M4 production server 的 Python 或 scripts 不在這個路徑worker 執行任何 processor 時會立即失敗,且因 init 流程的 `?` 傳播造成連續失敗(崩潰循環)。
## 修正
已改為使用 **runtime 環境變數**
| Env Var | 用途 | 預設值 |
|---------|------|--------|
| `MOMENTRY_PYTHON_PATH` | Python 3.11 binary | `/opt/homebrew/bin/python3.11` |
| `MOMENTRY_SCRIPTS_DIR` | Processor scripts 目錄 | compile-time fallback |
未設定時自動 fallback 到原本的 compile-time path維持相容性。
## M4 部署步驟
### 1. 設定環境變數
```bash
export MOMENTRY_PYTHON_PATH="/path/to/your/python3.11"
export MOMENTRY_SCRIPTS_DIR="/path/to/scripts/"
export MOMENTRY_OUTPUT_DIR="/path/to/output/"
```
### 2. 更新 Binary
```bash
# 從 SMB 取得新版 binary
codesign --remove-signature momentry_v1.0.0_c41f7e0
pkill momentry
DATABASE_SCHEMA=public ./momentry_v1.0.0_c41f7e0 server --port 3002 &
```
### 3. 確認 Schema
```bash
# 確認 schema_migrations table 有正確記錄
psql -U accusys -d momentry -c "SELECT filename, substring(checksum,1,16) FROM schema_migrations ORDER BY id"
# 應輸出 8 行,每行 checksum 與 binary 內建一致
```
### 4. 啟動 Worker
```bash
export MOMENTRY_PYTHON_PATH="/opt/homebrew/bin/python3.11"
export MOMENTRY_SCRIPTS_DIR="/Users/accusys/momentry_core_0.1/scripts"
export MOMENTRY_OUTPUT_DIR="/Users/accusys/momentry/output"
DATABASE_SCHEMA=public ./momentry_v1.0.0_c41f7e0 worker \
--max-concurrent 2 --poll-interval 5
```
### 5. 驗證
```bash
# 確認 job 被 worker 取走
curl -s http://localhost:3002/api/v1/jobs?status=running | jq '[.jobs[] | {id, uuid: .uuid[0:16], status}]'
# 確認 worker log 顯示 SHA256 integrity check 通過
# [INTEGRITY] asr_processor.py checksum OK
```
## Binary 更新
已更新 delivery package
```
release/delivery/v1.0.0_c41f7e0_20260515_180644/momentry_v1.0.0_c41f7e0 (21 MB)
```
## 健康檢查現況 (api.momentry.ddns.net)
- Build: `c41f7e0c`
- Services: postgres/redis/qdrant/mongodb all ok ✅
- Schema: **0/8** (需 migrations)
- Scripts integrity: **332/345** (13 mismatch — 可能是 scripts 版本差異)
- Processors: 12/12 available ✅

View File

@@ -0,0 +1,36 @@
# M4 回覆: Worker Pool Timeout — 修正後狀態
**Date**: 2026-05-15
**From**: M4
**To**: M5
**Ref**: `2026-05-15_worker_pool_timeout_response.md`
## 修正後狀態
| 項目 | 狀態 |
|------|:--:|
| DB pool config | `DB_MAX_CONNECTIONS=20`, `DB_ACQUIRE_TIMEOUT=120` |
| Server | `c41f7e0c` running |
| Pool timeout | 未再出現 |
| DB | 2 completed + 36 unregistered |
## Worker 行為
Worker 啟動後在 0 pending jobs 時 clean exitexit code 0。非崩潰。
```
Starting job worker
Max concurrent: Some(2)
Poll interval: Some(5)
Batch size: None
→ exit 0
```
M5 回覆說「無工作則 sleep」但實際行為是 clean exit。需確認
1. Worker 在無 pending job 時應 idle持續 poll還是 exit
2. 是否需要設定 `--batch-size`
3. 若有 job 時 worker 是否正常處理?
## 待測試
等有 pending job 時再測 worker 完整流程。目前 0 pendingworker clean exit 不影響系統。

View File

@@ -0,0 +1,103 @@
# M5 回覆: Worker Pool Timeout + Schema 問題
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Ref**: `2026-05-15_worker_status_report.md`
---
## 1. Worker Schema — 程式碼確認
M4 報告指出 `src/worker/job_worker.rs` 使用 `dev.monitor_jobs`。M5 已確認 **當前 binary (`c41f7e0c`) 並無此問題**
```rust
// job_worker.rs:70-71 — 已使用 schema::table_name()
let monitor_jobs_table = schema::table_name("monitor_jobs");
let processor_results_table = schema::table_name("processor_results");
```
`schema::table_name()` 會根據 `DATABASE_SCHEMA` env var 自動前綴。若設定 `DATABASE_SCHEMA=public`,則產生 `public.monitor_jobs`。不須額外修正。
## 2. Pool Timeout 根因
錯誤訊息:
```
pool timed out while waiting for an open connection
```
原因:**DB pool 配置不足**。預設 `max_connections=10``acquire_timeout=60s`。Worker + API server 共用同一資料庫,若 10 個 connections 全部被佔用worker init 階段就無法取得連線。
### 解決方案
設定環境變數:
```bash
export DB_MAX_CONNECTIONS=20
export DB_ACQUIRE_TIMEOUT=120
```
| Env Var | 預設值 | 建議值 | 說明 |
|---------|--------|--------|------|
| `DB_MAX_CONNECTIONS` | 10 | 20 | 最大連線數worker + server 共享) |
| `DB_ACQUIRE_TIMEOUT` | 60 | 120 | 等待連線 timeout |
## 3. Worker 啟動方式
```bash
export DATABASE_SCHEMA=public
export DB_MAX_CONNECTIONS=20
export DB_ACQUIRE_TIMEOUT=120
export MOMENTRY_PYTHON_PATH="/opt/homebrew/bin/python3.11"
export MOMENTRY_SCRIPTS_DIR="/Users/accusys/momentry_core_0.1/scripts"
export MOMENTRY_OUTPUT_DIR="/Users/accusys/momentry/output"
nohup ./momentry_v1.0.0_c41f7e0 worker \
--max-concurrent 2 \
--poll-interval 5 \
> /Users/accusys/momentry/log/momentry_worker.log 2>&1 &
```
## 4. Worker Clean Exit — 根因
M4 回報 worker 在 0 pending 時 clean exitexit code 0。M5 檢查發現 **production binary (`main.rs`) 的 worker handler 是 stub**
```
src/main.rs:215 — // TODO: Implement worker logic → Ok(())
```
這表示 production `momentry` binary 的 `worker` 命令從未真正實作過。worker 邏輯只存在於 `momentry_playground`dev binary
### 修正
已將完整 worker 實作補回 `main.rs`。更新後的 binary 現在支援:
- `./momentry worker --max-concurrent 2 --poll-interval 5`
- 無 pending job 時 **idle持續 poll**,不會 exit
- 有 job 時自動處理 pipeline
## 5. 目前 0 pending jobs — Worker 是否需要執行?
需要。目前 35 個檔案狀態為 `unregistered`。當這些檔案透過註冊 API 進入系統後worker 需要處理 pipeline。建議先啟動 worker 確認穩定。
## 6. Binary 更新(重要)
**請重新下載 binary。** 本次修正包含:
1. Worker handler 從 stub → 完整實作main.rs
2. `PythonExecutor` 改用 env vars非 compile-time path
```
release/delivery/v1.0.0_c41f7e0_20260515_180644/momentry_v1.0.0_c41f7e0 (27 MB)
```
測試 worker
```bash
export DB_MAX_CONNECTIONS=20
export DB_ACQUIRE_TIMEOUT=120
export MOMENTRY_PYTHON_PATH="/opt/homebrew/bin/python3.11"
export MOMENTRY_SCRIPTS_DIR="/Users/accusys/momentry_core_0.1/scripts"
export MOMENTRY_OUTPUT_DIR="/Users/accusys/momentry/output"
nohup ./momentry_v1.0.0_c41f7e0 worker \
--max-concurrent 2 --poll-interval 5 \
> /Users/accusys/momentry/log/momentry_worker.log 2>&1 &
```

View File

@@ -0,0 +1,85 @@
# M4 回報3002 Worker 狀態
**Date**: 2026-05-15
**From**: M4
**To**: M5
## Worker 現狀
| 項目 | 狀態 |
|------|------|
| Worker process | ❌ 未啟動 |
| Worker log | 139,637 筆崩潰記錄(`pool timed out while waiting for an open connection` |
| `public.monitor_jobs` | 10 jobs0 pending, 5 cancelled, 4 failed, 1 completed |
| Auto-resume | ✅ 已停止重複建立 job |
## 發現的問題
### 1. Worker 崩潰循環
Worker log`/Users/accusys/momentry/log/momentry_worker.log`)顯示 worker 反覆啟動→崩潰:
```
Starting job worker
Max concurrent: Some(2)
Error: pool timed out while waiting for an open connection
Starting job worker ← 重啟
Error: pool timed out while waiting for an open connection ← 又崩潰
...(139,637 entries)
```
### 2. Schema 硬編碼問題
Worker source code (`src/worker/job_worker.rs:68-81`) 使用 `dev.monitor_jobs`
```rust
sqlx::query(
"UPDATE dev.monitor_jobs SET status = 'pending', updated_at = NOW()
WHERE status = 'running'
AND id NOT IN (
SELECT DISTINCT job_id FROM dev.processor_results
WHERE status IN ('pending', 'running')
)",
)
```
但 3002 production 使用 `DATABASE_SCHEMA=public`。若 worker 以 `public` 啟動stale job reset 會 query 不存在的 `dev` schema。
### 3. 重複建立 Job
Worker 崩潰→重啟循環期間,每次啟動都在 `public.monitor_jobs` 新增 job
| job id | file_uuid | 建立時間 |
|--------|-----------|----------|
| 149 | `dd61fda8...` | 19:31 |
| 150 | `dd61fda8...` | 19:37 |
| 151 | `dd61fda8...` | 19:40 |
| 152 | `dd61fda8...` | 19:44 |
同一個 file_uuid 每 3-6 分鐘新增一筆 job。已由 M4 清除DELETE 4 + UPDATE 4 → cancelled
### 4. DB 連線池配置
Binary 內部配置:
```
DB_MAX_CONNECTIONS DB_ACQUIRE_TIMEOUT
```
可能設定過低導致 `pool timed out`
## M4 問題
1. Worker 應該如何啟動?使用什麼 env vars / schema
2. Worker 的 schema 是否應跟隨 `DATABASE_SCHEMA` env var而非 hardcode `dev`
3. DB pool 配置建議值?
4. 目前 0 pending jobsworker 是否需要執行?
## 相關路徑
| 路徑 | 說明 |
|------|------|
| `/Users/accusys/momentry/log/momentry_worker.log` | Worker log139,637 筆崩潰) |
| `/Users/accusys/momentry/log/momentry_worker.error.log` | Worker error log |
| `public.monitor_jobs` | Jobs tableproduction schema |
| `public.processor_results` | Processor results |
| `src/worker/job_worker.rs` | Worker sourcehardcoded `dev` schema |
| `DATABASE_SCHEMA=public` | Production env var |