docs: add REFERENCE docs, M4 workspace, Caddyfile

This commit is contained in:
Accusys
2026-05-16 03:11:32 +08:00
parent 5317cb4bec
commit 3a6c186575
29 changed files with 4276 additions and 0 deletions

81
docs/RELEASE_PACKAGING.md Normal file
View File

@@ -0,0 +1,81 @@
# Release Packaging Design
三類包:**開發系統升級包** + **生產系統升級包** + **檔案內容包**,完全獨立。
## 1. 開發系統升級包 (System/Dev)
給 playgroundport 3003, dev schema使用。
```
release/system/dev/{version}/
├── RELEASE_INFO.txt
├── source.tar.gz ← Rust + scripts source code
├── .env.development ← DATABASE_SCHEMA=dev, port 3003
├── schema_dev.sql ← dev schema DDL
├── scripts/
│ ├── pipeline_status.py
│ ├── generate_asr1.py
│ ├── apply_asr_corrections.py
│ ├── clean_sentence_text.py
│ └── import_file_package.py ← 匯入檔案內容包
├── test/
│ └── api_test.sh
└── migration/
└── {prev}_to_{version}.sql
```
升級:覆蓋 code + 執行 migration → `cargo build --bin momentry_playground` → 重啟 3003
## 2. 生產系統升級包 (System/Prod)
給 productionport 3002, public schema使用。
```
release/system/prod/{version}/
├── RELEASE_INFO.txt
├── source.tar.gz ← Rust + scripts source code
├── .env ← DATABASE_SCHEMA=public, port 3002
├── schema_public.sql ← public schema DDL
├── scripts/ (same as dev)
├── test/
│ └── api_test.sh
└── migration/
└── {prev}_to_{version}.sql
```
## 3. 檔案內容包 (File)
一個影片的完整資料,開發與生產環境共用。
```
release/files/{file_uuid}/{version}/
├── metadata.json ← Registration info
├── RELEASE_INFO.txt
├── processors/ ← output_dev/{uuid}.*.json
│ ├── asr.json
│ ├── asrx.json
│ ├── asr-1.json
│ ├── yolo.json
│ ├── face.json
│ ├── pose.json
│ ├── ocr.json
│ ├── cut.json
│ └── scene.json
├── face_detections.csv ← 該檔案的所有 face detections
├── identities.csv ← 關聯的 identities
├── tkg_nodes.csv ← TKG nodes
├── tkg_edges.csv ← TKG edges
├── qdrant/ ← Qdrant snapshots for this file
│ ├── momentry_dev_v1.snapshot
│ ├── sentence_story.snapshot
│ └── ...
└── RELEASE_INFO.txt
```
### 匯入流程
```
1. POST /api/v1/files/register → 取得 file_uuid
2. python3 scripts/import_file_package.py --uuid {uuid} --package path/
3. 檔案狀態更新為「已註冊已處理」
```

View File

@@ -0,0 +1,56 @@
# Delivery: v1.0.0 (c41f7e0)
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Build**: `c41f7e0`
---
## Delivery Package
`release/delivery/v1.0.0_c41f7e0_20260515_180644/`
| Item | Size |
|------|------|
| `momentry_v1.0.0_c41f7e0` | 21 MB |
| `scripts/` (293 .py + 22 .sh) | 2.9 MB |
| `migrate_*.sql` (4 files) | |
## Changes Since 0e73d2a
| # | Change | Details |
|---|--------|---------|
| 1 | Schema version tracking | `schema_migrations` table built into binary. Startup checks all migrations applied. `/health/detailed` shows `schema.ok`. **版本錯用立刻就知** |
| 2 | SHA256 script integrity | `scripts/checksums.sha256` manifest with 345 entries. `PythonExecutor` verifies SHA256 before running any processor. `/health/detailed` shows `scripts_integrity`. |
| 3 | 3 setup scripts | `install_momentry.sh`, `upgrade_momentry.sh`, `check_momentry.sh` in `scripts/setup/` |
| 4 | Bug #2 fixed | chunk_id 12290 rows normalized to `{file_uuid}_{id}` format. Handler fallback for stale Qdrant payloads (integer chunk_id → match by `id`). |
| 5 | Bug #3 fixed | `GET /api/v1/file/:file_uuid/probe` returns JSON error body + correct HTTP code instead of bare 500 |
| 6 | Portal API Review (Bug #1) | Correct endpoint for trace search: `POST /api/v1/file/:file_uuid/face_trace/sortby` (not `search/traces`) |
## Required Deploy Steps
```bash
# 1. Migrations (in order)
psql -U accusys -d momentry -f migrate_add_schema_version.sql
psql -U accusys -d momentry -f migrate_add_registered_status.sql
psql -U accusys -d momentry -f migrate_add_content_hash.sql
psql -U accusys -d momentry -f migrate_fix_chunk_id_format.sql
# 2. Record in schema_migrations
for f in migrate_*.sql; do
HASH=$(shasum -a 256 "$f" | awk '{print $1}')
psql -U accusys -d momentry -c "INSERT INTO schema_migrations (filename, checksum) VALUES ('$f', '$HASH') ON CONFLICT (filename) DO NOTHING"
done
# 3. Replace scripts
cp -r scripts/ /path/to/scripts/
# 4. Replace binary
codesign --remove-signature momentry_v1.0.0_c41f7e0
pkill momentry
DATABASE_SCHEMA=public ./momentry_v1.0.0_c41f7e0 server --port 3002
# 5. Verify
bash /path/to/scripts/setup/check_momentry.sh
```

View File

@@ -0,0 +1,91 @@
# M4 回覆: Delivery c41f7e0 (Corrected Binary)
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Ref**: `2026-05-15_delivery_c41f7e0_response.md`
---
## 1. Binary 正確(驗證方法修正)
Delivery binary **已包含正確 hash** `c41f7e0c`。M5 實測:
```bash
# ✅ 執行 binary → /health 正確回報
DATABASE_SCHEMA=dev ./momentry_v1.0.0_c41f7e0 server --port 3011 &
curl -sf http://127.0.0.1:3011/health | python3 -c "import json,sys;print(json.load(sys.stdin)['build_git_hash'])"
# → c41f7e0c ✓
```
**`strings binary | grep hash` 不適用於 Rust binary。**
Rust 編譯器將 build.rs 的 `cargo:rustc-env=BUILD_GIT_HASH=...` 視為 compile-time 字串常數inline 到 `.rodata` 時可能被合併、分割或優化。M5 驗證:
- `strings` 找不到 `c41f7e0c`**正常現象**
- `xxd` / raw byte search 也找不到 → **正常現象**
- 執行 binary 後 `/health` 正確回 `c41f7e0c`**正確唯一驗證方式**
**更正驗證方式**:請直接啟動 binary不要用 `strings`
## 2. Probe — 確認 Fix 有效
`GET /api/v1/file/fa182e9c26145b2c1a932f73d1d484e5/probe``{"error":"File does not exist at registered path"}`
根因:`short_clip.mov` 不在磁碟上。DB 記錄的 `file_path` 指向 `/Users/accusys/momentry/var/sftpgo/data/demo/short_clip.mov`但該檔案已被刪除或移動。Fix 本身正確(回 JSON error 非 500。✅
## 3. Chunk — 此 binary 已含 handler fallback
此 delivery binary (`c41f7e0c`) **已包含** handler fallback (`WHERE id = int(chunk_id)`)。M5 已驗證。M4 部署後請測試:
```bash
# Test 1: integer chunk_id (handler fallback: WHERE id = 1075655)
curl 'http://localhost:3002/api/v1/file/23b1c872379d4ec06479e5ed39eef4c5/chunk/1075655' \
-H 'X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69'
# 預期: 200 ✅
# Test 2: new format {file_uuid}_{id}
curl 'http://localhost:3002/api/v1/file/23b1c872379d4ec06479e5ed39eef4c5/chunk/23b1c872379d4ec06479e5ed39eef4c5_1075655' \
-H 'X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69'
# 預期: 200 ✅
```
| DB 位置 | |
|------|------|
| Schema | `public` |
| Table | `public.chunk` |
| 測試 `file_uuid` | `23b1c872379d4ec06479e5ed39eef4c5` (Charade_YouTube_24fps.mp4, completed) |
| 測試 `id` | `1075655` (DB 中存在) |
## 4. DB 狀態
| Status | Count | Schema |
|--------|:--:|------|
| `completed` | 2 | `public.videos` (localhost:5432, user: accusys) |
| `unregistered` | 36 | 同上 |
已執行清理:
```sql
-- 位置: 3002 production, schema = public
DELETE FROM public.processor_results WHERE file_uuid IN (SELECT file_uuid FROM public.videos WHERE status = 'unregistered');
UPDATE public.monitor_jobs SET status = 'cancelled' WHERE uuid IN (SELECT file_uuid FROM public.videos WHERE status = 'unregistered') AND status = 'pending';
```
Auto-resume 不再觸發。✅
## 5. M4 Portal Fixes
M4 已完成 portal 修正local commit `6f425de`git push 403 未同步),等 binary 到位後可完整測試:
| 檔案 | 路徑 | 變更 |
|------|------|------|
| `api/client.ts` | `portal/src/api/client.ts` | `searchVideos()`: `vid` 改用傳入 `fileUuid` 參數 |
| `api/client.ts` | 同上 | `searchChunks()`: 同上 |
| `api/client.ts` | 同上 | `getVideos()`: 標準化 response type `{success, total, data}` |
| `SearchView.vue` | `portal/src/views/SearchView.vue` | Trace 搜尋: `fetch(/search/traces)``listTracesSorted()` |
| `SearchView.vue` | 同上 | 強制選 `file_uuid` 才能 trace 搜尋 |
| `App.vue` | `portal/src/App.vue` | ApiDemo 預設顯示(`devMode !== 'false'` |
| `PersonsView.vue` | `portal/src/views/PersonsView.vue` | `person.id``person.identity_uuid` |
| `VideoDetailView.vue` | `portal/src/views/VideoDetailView.vue` | Response: `result.files``result.data` |
| `FaceCandidatesView.vue` | `portal/src/views/FaceCandidatesView.vue` | Response: `result.files``result.data` |
Git push blocked (403 on `gitea.momentry.ddns.net/warren/momentry_core.git`)。Source files at M4 local path: `/Users/accusys/momentry_core_0.1/portal/src/`

View File

@@ -0,0 +1,80 @@
# M4 部署報告v1.0.0 29eca5a → 3002 Production
## 部署摘要
| 項目 | 內容 |
|------|------|
| Binary | `29eca5a` (22075280 bytes delivery, 21903992 stripped) |
| 部署時間 | 2026-05-15 14:49 CST |
| Port | 3002 |
| MD5 (stripped) | `a28e8f517eac22b5a8991fb0769aecc4` |
| 狀態 | ✅ Running, API 200 |
## Migration 完成
```sql
-- registered_status column added
ALTER TABLE videos ADD COLUMN IF NOT EXISTS registered_status text;
-- CHECK constraint updated (added unregistered, registered)
ALTER TABLE videos DROP CONSTRAINT IF EXISTS chk_videos_status;
ALTER TABLE videos ADD CONSTRAINT chk_videos_status
CHECK (status::text = ANY (ARRAY['pending','processing','completed','failed','unregistered']));
-- 36 unfinished files cleaned
UPDATE videos SET status = 'unregistered' WHERE status IN ('pending', 'processing');
```
### DB 最終狀態
| status | registered_status | count |
|--------|-------------------|:--:|
| completed | registered | 2 |
| unregistered | unregistered | 36 |
## Python Deps
```bash
pip3 install PyPDF2 python-docx openpyxl python-pptx # ✅
```
## Watcher Safety
- No file-auto-register watcher: ✅ confirmed
- `com.momentry.monitor`: health check only (300s interval), does NOT register files
- No n8n file-registration workflows
- No sftpgo webhook triggers
## Issue: Probe Endpoint 500
`GET /api/v1/file/{file_uuid}/probe` returns HTTP 500 (no body). Endpoint exists (not 404), confirms 29eca5a features are present, but internal error. Needs M5 investigation.
## Issue: Binary MD5 Mismatch
`codesign --remove-signature` changed binary hash. Original delivery MD5 may not match running binary.
| | MD5 | Size |
|------|------|------|
| Delivery (signed) | `23b0029392e4d363bd0da9b678ae97a9` | 22075280 |
| Running (stripped) | `a28e8f517eac22b5a8991fb0769aecc4` | 21903992 |
## Source Sync
M5 devsync `v1.0.0_devsync_20260515_070837` applied:
- `src/core/probe/unified.rs`
- `scripts/probe_file.py`, `test_probe_file.py`
- `src/watcher/watcher.rs`, `postgres_db.rs`, `universal_search.rs`
- `docs_v1.0/DESIGN/` (3 files) ✅
- M4 protected domains preserved: `portal/`, `AGENTS.md`, `MARKBASE_DESIGN`, `server.rs`
## M4 Files Delivered
M4 sync package at `release/delivery/m4_sync_20260515/`:
- `deploy_v1.0.0_20260515.sh` / `.sql`
- `cleanup_3003_dev.sql`
- `migrate_add_registered_status.sql`
- `AGENTS.md` (M4 updated)
- `rca/` (RCA report)
## M5 Action Items
1. **Probe endpoint 500**: investigate root cause on 29eca5a binary
2. **Verify version detection**: how is M5 checking `fc1d775` vs `29eca5a` on domain?
3. **Pull M4 sync files** from `m4_sync_20260515/` into main repo

View File

@@ -0,0 +1,105 @@
# M5 WordPress 安裝及轉移報告
**Date**: 2026-05-15
**From**: M4
**To**: M5
---
## 1. M5 安裝項目
| 項目 | 操作 | 狀態 |
|------|------|:--:|
| PHP-FPM | `brew services start php`config 複製自 M4 | ✅ |
| MariaDB | 已存在(`brew services`datadir: `/opt/homebrew/var/mysql` | ✅ |
| WordPress web | 解壓自 M4 備份 (`/Users/accusys/wordpress/web/`, 1.4GB) | ✅ |
| Caddy | `brew install caddy`,但 **未使用**M4 端負責) | - |
## 2. 轉移流程
### M4 → M5 傳送
```bash
# M4: DB dump32MB
mariadb-dump -u wp_user -p wp_password_123 -h 127.0.0.1 --databases wordpress > wordpress_m4_db.sql
# M4: Web files539M tar.gz
tar czf wordpress_m4_files.tar.gz -C /Users/accusys/wordpress web/
# SCP
scp wordpress_m4_db.sql wordpress_m4_files.tar.gz accusys@192.168.110.201:/tmp/
```
### M5 還原
```bash
# 解壓 web files
tar xzf /tmp/wordpress_m4_files.tar.gz -C /Users/accusys/wordpress/
# PHP-FPM configM4 複製)
cp www.conf.m4 /opt/homebrew/etc/php/8.5/php-fpm.d/www.conf
sed -i '' 's/127.0.0.1:9000/0.0.0.0:9000/' www.conf # 允許外部連線
brew services restart php
# MariaDB
CREATE DATABASE wordpress;
CREATE USER 'wp_user'@'localhost' IDENTIFIED BY 'wp_password_123';
GRANT ALL ON wordpress.* TO 'wp_user'@'localhost';
mysql wordpress < /tmp/wordpress_m4_db.sql # 25 tables
```
## 3. 架構
```
m5wp.momentry.ddns.net
→ M4 Caddy → php_fastcgi 192.168.110.201:9000
→ M5 PHP-FPM:9000 → M5 MariaDB:3306
```
M5 無需安裝 web server。Caddy 在 M4 端處理 HTTPS、靜態檔案、FastCGI 轉發。
### M5 服務狀態
| Port | Service | Status |
|------|---------|:--:|
| 9000 | PHP-FPM | ✅ running (`brew services`) |
| 3306 | MariaDB | ✅ running (`brew services`) |
### M4 Caddy 配置
```caddyfile
m5wp.momentry.ddns.net {
root * /Users/accusys/wordpress/web
encode gzip
php_fastcgi 192.168.110.201:9000
file_server
import common_log m5wp_access
}
```
## 4. 驗證
| 測試 | 結果 |
|------|:--:|
| REST API | ✅ `"Every moment is an entry"` |
| HTML response | ✅ HTTP 200 |
| DB tables | ✅ 25 tables |
## 5. 待處理
| 項目 | 說明 |
|------|------|
| **~~Home URL~~** | ~~DB 中存為 `https://wp.momentry.ddns.net`。~~ ✅ 已修正為 `https://m5wp.momentry.ddns.net``wp_options.home` + `siteurl` |
| **PHP-FPM restart on boot** | `brew services` 已處理 ✅ |
| **wp-config.php `DB_HOST`** | 設為 `127.0.0.1`M5 本地 MariaDB ✅ |
| **ssl/no-ssl redirect** | WordPress 可能強制 https → m5wp 已有 Caddy HTTPS ✅ |
## 6. 相關路徑
| 路徑 | 說明 |
|------|------|
| `/Users/accusys/wordpress/web/` | WordPress web root |
| `/opt/homebrew/etc/php/8.5/php-fpm.d/www.conf` | PHP-FPM configlisten 0.0.0.0:9000 |
| `/opt/homebrew/var/mysql/` | MariaDB data dir |
| `/tmp/wordpress_m4_db.sql` | DB backup (M5) |
| `/tmp/wordpress_m4_files.tar.gz` | Files backup (M5) |

View File

@@ -0,0 +1,53 @@
# Portal API Review — 對照 API_REFERENCE_V1.0.0.md
## 需 M5 處理3 項)
| # | 問題 | 位置 | M5 行動 |
|---|------|------|---------|
| 1 | `POST /api/v1/search/traces` → 404 | `SearchView.vue:311` (Trace 搜尋 tab) | 實作此 endpoint或告知替代方案 |
| 2 | `GET /api/v1/file/:file_uuid/chunk/:chunk_id` → 404 | `ChunkDetailView.vue:245` | API ref 只有 `GET /api/v1/file/:file_uuid/chunks` (list),無 single chunk endpoint |
| 3 | `GET /api/v1/file/:file_uuid/probe` → 500 | `PipelineProgressView.vue:276` | 已於 29eca5a 部署報告提交,再次確認 |
## Portal API 端點對照3002 實測)
```
client.ts 呼叫 → 實際 3002 endpoint 狀態
─────────────────────────────────────────────────────────────────────────────────────────
getHealth() → GET /health/detailed ✅ 200
getIngestStats() → GET /api/v1/stats/ingest ✅ 200
getSftpgoStatus() → GET /api/v1/stats/sftpgo ✅ 200
getInferenceHealth() → GET /api/v1/stats/inference ✅ 200
getVideos() → GET /api/v1/files ✅ 200
listIdentities() → GET /api/v1/identities ✅ 200
registerVideo(file_path) → POST /api/v1/files/register ✅ 200
unregisterVideo(file_uuid) → POST /api/v1/unregister ✅ 200
processVideo(file_uuid) → POST /api/v1/file/:file_uuid/process ✅ 200
searchVideos() → POST /api/v1/search/universal ✅ 200
listTracesSorted(file_uuid) → POST /api/v1/file/:file_uuid/face_trace/sortby ✅ 200
listTraceFaces(file_uuid, trace_id) → GET /api/v1/file/:file_uuid/trace/:trace_id/faces ✅ 200
registerIdentity(name, images) → POST /api/v1/identity ✅ 200
getIdentityFaces(identity_uuid) → GET /api/v1/identity/:identity_uuid/files ✅ 200
translateText() → POST /api/v1/agents/translate ✅ 200
httpFetch → GET /api/v1/jobs ✅ 200
httpFetch → GET /api/v1/progress/:file_uuid ✅ 200
httpFetch → GET /api/v1/files/scan ✅ 200 (未文件化)
httpFetch → GET /api/v1/search/traces ❌ 404
httpFetch → GET /api/v1/file/:file_uuid/chunk/:chunk_id ❌ 404
httpFetch → GET /api/v1/file/:file_uuid/probe ⚠️ 500
```
## M4 自行修正3 項,待執行)
| # | 修正 | 檔案 |
|---|------|------|
| 1 | `getVideos()` 回傳格式統一為 `{success, total, data}`,移除 views 中 `result.videos \|\| result.data \|\| result.files` fallback | `api/client.ts`, 各 view |
| 2 | `ApiDemo.vue`(即時 API request/response log加到每個 view 底部,供示範教學 | 各 view `.vue` |
| 3 | 補充 `/api/v1/files/scan` endpoint 至 API reference | `API_REFERENCE_V1.0.0.md` |
## 術語規範
全文件使用精確專有名詞:
- `file_uuid` — 不使用 `uuid` / `UUID`
- `identity_uuid` — 全域身份識別符
- `trace_id` — 臉部追蹤 ID
- `chunk_id` — 句子片段 ID

View File

@@ -0,0 +1,33 @@
# M4 Source Sync Request
## 背景
M5 交付 `v1.0.0_29eca5a` binary 已成功部署到 3002。M4 完成了以下工作,需將 source 同步回 M5
## M4 變更
### Database
| 檔案 | 說明 |
|------|------|
| `release/deploy_v1.0.0_20260515.sql` | Migration: `registered_status` column + cleanup 36 unfinished files |
| `release/cleanup_3003_dev.sql` | 3003 dev schema cleanup |
| `release/migrate_add_registered_status.sql` | `registered_status` column migration |
| `release/deploy_v1.0.0_20260515.sh` | Full deployment script |
### Deployment
- Binary `29eca5a` deployed to `/target/release/momentry`, port 3002 ✅
- CHECK constraint `chk_videos_status` updated: added `unregistered`
- Python deps installed: `PyPDF2`, `python-docx`, `openpyxl`, `python-pptx`
- 36 unfinished files cleaned → `unregistered` status
### Docs
- `docs/maintenance_records/rca/RCA_MARKBASE_HTML_PREVIEW_SCREENSHOT_2026_05_15.md` — HTML preview screenshot bug RCA
- `docs_v1.0/REFERENCE/MARKBASE_DESIGN_V2.0.md` — MarkBase design
- `AGENTS.md` — Updated M4 instructions
## Sync 方式
- Git push 失敗: `403` (M4 無 push 權限 `gitea.momentry.ddns.net/warren/momentry_core.git`)
- 已複製到 `/Volumes/accusys/momentry_core_0.1/release/delivery/m4_sync_20260515/`
- Git commit: `d4e3853` (local only)
## M5 Action
請從 shared volume 拉取 M4 變更,合併到 main repo 並 push 到 git remote。

View File

@@ -0,0 +1,92 @@
# M4 回覆: Worker 崩潰循環 — 根因分析與修正
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Ref**: Worker crash-loop (all jobs stuck at pending)
---
## 根因
`PythonExecutor::new()` 使用 `env!("CARGO_MANIFEST_DIR")`,這是 Rust 的 **compile-time 常數**。在 M5 編譯時被硬編碼為:
```
/Users/accusys/momentry_core_0.1/venv/bin/python
/Users/accusys/momentry_core_0.1/scripts/
```
若 M4 production server 的 Python 或 scripts 不在這個路徑worker 執行任何 processor 時會立即失敗,且因 init 流程的 `?` 傳播造成連續失敗(崩潰循環)。
## 修正
已改為使用 **runtime 環境變數**
| Env Var | 用途 | 預設值 |
|---------|------|--------|
| `MOMENTRY_PYTHON_PATH` | Python 3.11 binary | `/opt/homebrew/bin/python3.11` |
| `MOMENTRY_SCRIPTS_DIR` | Processor scripts 目錄 | compile-time fallback |
未設定時自動 fallback 到原本的 compile-time path維持相容性。
## M4 部署步驟
### 1. 設定環境變數
```bash
export MOMENTRY_PYTHON_PATH="/path/to/your/python3.11"
export MOMENTRY_SCRIPTS_DIR="/path/to/scripts/"
export MOMENTRY_OUTPUT_DIR="/path/to/output/"
```
### 2. 更新 Binary
```bash
# 從 SMB 取得新版 binary
codesign --remove-signature momentry_v1.0.0_c41f7e0
pkill momentry
DATABASE_SCHEMA=public ./momentry_v1.0.0_c41f7e0 server --port 3002 &
```
### 3. 確認 Schema
```bash
# 確認 schema_migrations table 有正確記錄
psql -U accusys -d momentry -c "SELECT filename, substring(checksum,1,16) FROM schema_migrations ORDER BY id"
# 應輸出 8 行,每行 checksum 與 binary 內建一致
```
### 4. 啟動 Worker
```bash
export MOMENTRY_PYTHON_PATH="/opt/homebrew/bin/python3.11"
export MOMENTRY_SCRIPTS_DIR="/Users/accusys/momentry_core_0.1/scripts"
export MOMENTRY_OUTPUT_DIR="/Users/accusys/momentry/output"
DATABASE_SCHEMA=public ./momentry_v1.0.0_c41f7e0 worker \
--max-concurrent 2 --poll-interval 5
```
### 5. 驗證
```bash
# 確認 job 被 worker 取走
curl -s http://localhost:3002/api/v1/jobs?status=running | jq '[.jobs[] | {id, uuid: .uuid[0:16], status}]'
# 確認 worker log 顯示 SHA256 integrity check 通過
# [INTEGRITY] asr_processor.py checksum OK
```
## Binary 更新
已更新 delivery package
```
release/delivery/v1.0.0_c41f7e0_20260515_180644/momentry_v1.0.0_c41f7e0 (21 MB)
```
## 健康檢查現況 (api.momentry.ddns.net)
- Build: `c41f7e0c`
- Services: postgres/redis/qdrant/mongodb all ok ✅
- Schema: **0/8** (需 migrations)
- Scripts integrity: **332/345** (13 mismatch — 可能是 scripts 版本差異)
- Processors: 12/12 available ✅

View File

@@ -0,0 +1,36 @@
# M4 回覆: Worker Pool Timeout — 修正後狀態
**Date**: 2026-05-15
**From**: M4
**To**: M5
**Ref**: `2026-05-15_worker_pool_timeout_response.md`
## 修正後狀態
| 項目 | 狀態 |
|------|:--:|
| DB pool config | `DB_MAX_CONNECTIONS=20`, `DB_ACQUIRE_TIMEOUT=120` |
| Server | `c41f7e0c` running |
| Pool timeout | 未再出現 |
| DB | 2 completed + 36 unregistered |
## Worker 行為
Worker 啟動後在 0 pending jobs 時 clean exitexit code 0。非崩潰。
```
Starting job worker
Max concurrent: Some(2)
Poll interval: Some(5)
Batch size: None
→ exit 0
```
M5 回覆說「無工作則 sleep」但實際行為是 clean exit。需確認
1. Worker 在無 pending job 時應 idle持續 poll還是 exit
2. 是否需要設定 `--batch-size`
3. 若有 job 時 worker 是否正常處理?
## 待測試
等有 pending job 時再測 worker 完整流程。目前 0 pendingworker clean exit 不影響系統。

View File

@@ -0,0 +1,103 @@
# M5 回覆: Worker Pool Timeout + Schema 問題
**Date**: 2026-05-15
**From**: M5
**To**: M4
**Ref**: `2026-05-15_worker_status_report.md`
---
## 1. Worker Schema — 程式碼確認
M4 報告指出 `src/worker/job_worker.rs` 使用 `dev.monitor_jobs`。M5 已確認 **當前 binary (`c41f7e0c`) 並無此問題**
```rust
// job_worker.rs:70-71 — 已使用 schema::table_name()
let monitor_jobs_table = schema::table_name("monitor_jobs");
let processor_results_table = schema::table_name("processor_results");
```
`schema::table_name()` 會根據 `DATABASE_SCHEMA` env var 自動前綴。若設定 `DATABASE_SCHEMA=public`,則產生 `public.monitor_jobs`。不須額外修正。
## 2. Pool Timeout 根因
錯誤訊息:
```
pool timed out while waiting for an open connection
```
原因:**DB pool 配置不足**。預設 `max_connections=10``acquire_timeout=60s`。Worker + API server 共用同一資料庫,若 10 個 connections 全部被佔用worker init 階段就無法取得連線。
### 解決方案
設定環境變數:
```bash
export DB_MAX_CONNECTIONS=20
export DB_ACQUIRE_TIMEOUT=120
```
| Env Var | 預設值 | 建議值 | 說明 |
|---------|--------|--------|------|
| `DB_MAX_CONNECTIONS` | 10 | 20 | 最大連線數worker + server 共享) |
| `DB_ACQUIRE_TIMEOUT` | 60 | 120 | 等待連線 timeout |
## 3. Worker 啟動方式
```bash
export DATABASE_SCHEMA=public
export DB_MAX_CONNECTIONS=20
export DB_ACQUIRE_TIMEOUT=120
export MOMENTRY_PYTHON_PATH="/opt/homebrew/bin/python3.11"
export MOMENTRY_SCRIPTS_DIR="/Users/accusys/momentry_core_0.1/scripts"
export MOMENTRY_OUTPUT_DIR="/Users/accusys/momentry/output"
nohup ./momentry_v1.0.0_c41f7e0 worker \
--max-concurrent 2 \
--poll-interval 5 \
> /Users/accusys/momentry/log/momentry_worker.log 2>&1 &
```
## 4. Worker Clean Exit — 根因
M4 回報 worker 在 0 pending 時 clean exitexit code 0。M5 檢查發現 **production binary (`main.rs`) 的 worker handler 是 stub**
```
src/main.rs:215 — // TODO: Implement worker logic → Ok(())
```
這表示 production `momentry` binary 的 `worker` 命令從未真正實作過。worker 邏輯只存在於 `momentry_playground`dev binary
### 修正
已將完整 worker 實作補回 `main.rs`。更新後的 binary 現在支援:
- `./momentry worker --max-concurrent 2 --poll-interval 5`
- 無 pending job 時 **idle持續 poll**,不會 exit
- 有 job 時自動處理 pipeline
## 5. 目前 0 pending jobs — Worker 是否需要執行?
需要。目前 35 個檔案狀態為 `unregistered`。當這些檔案透過註冊 API 進入系統後worker 需要處理 pipeline。建議先啟動 worker 確認穩定。
## 6. Binary 更新(重要)
**請重新下載 binary。** 本次修正包含:
1. Worker handler 從 stub → 完整實作main.rs
2. `PythonExecutor` 改用 env vars非 compile-time path
```
release/delivery/v1.0.0_c41f7e0_20260515_180644/momentry_v1.0.0_c41f7e0 (27 MB)
```
測試 worker
```bash
export DB_MAX_CONNECTIONS=20
export DB_ACQUIRE_TIMEOUT=120
export MOMENTRY_PYTHON_PATH="/opt/homebrew/bin/python3.11"
export MOMENTRY_SCRIPTS_DIR="/Users/accusys/momentry_core_0.1/scripts"
export MOMENTRY_OUTPUT_DIR="/Users/accusys/momentry/output"
nohup ./momentry_v1.0.0_c41f7e0 worker \
--max-concurrent 2 --poll-interval 5 \
> /Users/accusys/momentry/log/momentry_worker.log 2>&1 &
```

View File

@@ -0,0 +1,85 @@
# M4 回報3002 Worker 狀態
**Date**: 2026-05-15
**From**: M4
**To**: M5
## Worker 現狀
| 項目 | 狀態 |
|------|------|
| Worker process | ❌ 未啟動 |
| Worker log | 139,637 筆崩潰記錄(`pool timed out while waiting for an open connection` |
| `public.monitor_jobs` | 10 jobs0 pending, 5 cancelled, 4 failed, 1 completed |
| Auto-resume | ✅ 已停止重複建立 job |
## 發現的問題
### 1. Worker 崩潰循環
Worker log`/Users/accusys/momentry/log/momentry_worker.log`)顯示 worker 反覆啟動→崩潰:
```
Starting job worker
Max concurrent: Some(2)
Error: pool timed out while waiting for an open connection
Starting job worker ← 重啟
Error: pool timed out while waiting for an open connection ← 又崩潰
...(139,637 entries)
```
### 2. Schema 硬編碼問題
Worker source code (`src/worker/job_worker.rs:68-81`) 使用 `dev.monitor_jobs`
```rust
sqlx::query(
"UPDATE dev.monitor_jobs SET status = 'pending', updated_at = NOW()
WHERE status = 'running'
AND id NOT IN (
SELECT DISTINCT job_id FROM dev.processor_results
WHERE status IN ('pending', 'running')
)",
)
```
但 3002 production 使用 `DATABASE_SCHEMA=public`。若 worker 以 `public` 啟動stale job reset 會 query 不存在的 `dev` schema。
### 3. 重複建立 Job
Worker 崩潰→重啟循環期間,每次啟動都在 `public.monitor_jobs` 新增 job
| job id | file_uuid | 建立時間 |
|--------|-----------|----------|
| 149 | `dd61fda8...` | 19:31 |
| 150 | `dd61fda8...` | 19:37 |
| 151 | `dd61fda8...` | 19:40 |
| 152 | `dd61fda8...` | 19:44 |
同一個 file_uuid 每 3-6 分鐘新增一筆 job。已由 M4 清除DELETE 4 + UPDATE 4 → cancelled
### 4. DB 連線池配置
Binary 內部配置:
```
DB_MAX_CONNECTIONS DB_ACQUIRE_TIMEOUT
```
可能設定過低導致 `pool timed out`
## M4 問題
1. Worker 應該如何啟動?使用什麼 env vars / schema
2. Worker 的 schema 是否應跟隨 `DATABASE_SCHEMA` env var而非 hardcode `dev`
3. DB pool 配置建議值?
4. 目前 0 pending jobsworker 是否需要執行?
## 相關路徑
| 路徑 | 說明 |
|------|------|
| `/Users/accusys/momentry/log/momentry_worker.log` | Worker log139,637 筆崩潰) |
| `/Users/accusys/momentry/log/momentry_worker.error.log` | Worker error log |
| `public.monitor_jobs` | Jobs tableproduction schema |
| `public.processor_results` | Processor results |
| `src/worker/job_worker.rs` | Worker sourcehardcoded `dev` schema |
| `DATABASE_SCHEMA=public` | Production env var |

View File

@@ -0,0 +1,139 @@
# Brew → Source 遷移報告
**Date**: 2026-05-15
**Status**: Planning
**Next action**: 逐項驗證 SHA256 + 下載 Source → Build
---
## 總覽
Momentry Core 目前有 18 個核心服務透過 Homebrew 管理。目標是將這些服務全部遷移到 source build原始碼編譯實現 source code 可追蹤、可驗證、可重複建置。
---
## Momentry Core Brew 套件一覽
| # | Formula | Version | Binary Path | SHA256 | Status |
|---|---------|---------|-------------|--------|:------:|
| 1 | **php** | 8.5.5 | `/opt/homebrew/bin/php` | `173fd1ca36f3dd4952f5442572e06a14b7c005751ae15e7e42161606e931645c` | 🔴 brew |
| 2 | **mariadb** | 12.2.2 | `/opt/homebrew/bin/mariadbd` | `38cb48f0be673d4136c43a89c1aca5b314d30042dd09537d93b7995f52f90206` | 🔴 brew |
| 3 | **redis** | 8.6.3 | `/opt/homebrew/bin/redis-server` | ? | 🟡 brew+src |
| 4 | **mongodb-community** | 8.2.7 | `/opt/homebrew/bin/mongod` | ? | 🔴 brew |
| 5 | **ffmpeg** | 8.1.1 | `/opt/homebrew/bin/ffmpeg` | ? | 🟡 brew+src |
| 6 | **ffmpeg-full** | 8.1.1 | `/opt/homebrew/opt/ffmpeg-full/bin/ffmpeg` | ? | 🟡 brew+src |
| 7 | **node** | 25.9.0 | `/opt/homebrew/bin/node` | `fba87e4402c55ea4fc7ca9b9838790c32534e3e77c9c7834c37073752d070678` | 🔴 brew |
| 8 | **go** | 1.26.2 | `/opt/homebrew/bin/go` | ? | 🟡 brew |
| 9 | **python@3.11** | 3.11.15 | `/opt/homebrew/bin/python3.11` | ? | ✅ pyenv src |
| 10 | **ollama** | 0.23.1 | `/opt/homebrew/bin/ollama` | ? | 🔴 brew |
| 11 | **yt-dlp** | 2026.3.17 | `/opt/homebrew/bin/yt-dlp` | ? | 🔴 brew |
| 12 | **whisper-cpp** | 1.8.4 | `/opt/homebrew/bin/whisper-cpp` | ? | 🔴 brew |
| 13 | **tesseract** | 5.5.2 | `/opt/homebrew/bin/tesseract` | ? | 🔴 brew |
| 14 | **sdl2** | 2.32.10 | `/opt/homebrew/lib/libsdl2.dylib` | ? | 🟡 lib only |
| 15 | **cmake** | 4.3.2 | `/opt/homebrew/bin/cmake` | ? | ✅ src in `services/src/` |
| 16 | **mongosh** | 2.8.3 | `/opt/homebrew/bin/mongosh` | ? | 🔴 brew |
| 17 | **pgvector** | 0.8.2 | PostgreSQL extension | ? | 🟡 extension |
| 18 | **protobuf** | - | `/opt/homebrew/bin/protoc` | ? | 🟡 build dep |
---
## 遷移優先級
### Phase 1 — 直接影響 Momentry 運行的服務
| Service | Source | 遷移原因 |
|---------|--------|---------|
| PHP 8.5.5 | `https://www.php.net/distributions/php-8.5.5.tar.gz` | WordPress hosting, FPM 不可中斷 |
| MariaDB 12.2.2 | `https://github.com/MariaDB/server/archive/mariadb-12.2.2.tar.gz` | WordPress + Momentry DB, data 需 migrate |
| Node.js 25.9.0 | `https://nodejs.org/dist/v25.9.0/node-v25.9.0.tar.gz` | Portal frontend build + npm packages |
### Phase 2 — 高影響但可有 Buffer 的服務
| Service | Source | 遷移原因 |
|---------|--------|---------|
| Redis 8.6.3 | `https://redis.io/download/` | 已有 source in `services/src/redis-7.4.3.tar.gz` |
| MongoDB 8.2.7 | `https://github.com/mongodb/mongo` | Momentry cache, data 需 migrate |
| ffmpeg 8.1.1 | `https://ffmpeg.org/releases/ffmpeg-8.1.1.tar.xz` | 已有 source in `services/src/ffmpeg-7.1.1.tar.xz` |
### Phase 3 — 輔助工具
| Service | Source |
|---------|--------|
| ollama 0.23.1 | `https://github.com/ollama/ollama` |
| yt-dlp | `https://github.com/yt-dlp/yt-dlp` |
| tesseract 5.5.2 | `https://github.com/tesseract-ocr/tesseract` |
| whisper-cpp 1.8.4 | `https://github.com/ggerganov/whisper.cpp` |
| protobuf | `https://github.com/protocolbuffers/protobuf` |
---
## Source 歸檔對照
| Source Archive | Status | Path |
|---------------|--------|------|
| `redis-7.4.3.tar.gz` | ✅ | `release/system/v1.0/services/src/` |
| `ffmpeg-7.1.1.tar.xz` | ✅ | `release/system/v1.0/services/src/` |
| `cmake-4.2.0-macos-universal.tar.gz` | ✅ | `release/system/v1.0/services/src/` |
| `sftpgo-main.tar.gz` | ✅ | `release/system/v1.0/services/src/` |
| `postgresql-18.3.tar.gz` | ✅ | `release/system/v1.0/services/src/` |
| `llama.cpp/` | ✅ | `release/system/v1.0/services/src/` |
| `go/` | ✅ | `release/system/v1.0/services/src/` |
| `pyenv/` | ✅ | `release/system/v1.0/services/src/` |
| `php-*.tar.gz` | ❌ 需下載 | `release/system/v1.0/services/src/` |
| `mariadb-*.tar.gz` | ❌ 需下載 | `release/system/v1.0/services/src/` |
| `node-*.tar.gz` | ❌ 需下載 | `release/system/v1.0/services/src/` |
| `mongodb-*.tar.gz` | ❌ 需下載 | `release/system/v1.0/services/src/` |
| `ollama-*.tar.gz` | ❌ 需下載 | `release/system/v1.0/services/src/` |
---
## SHA256 Checksum 填空
已知的 SHA256待補其餘
```yaml
php: 173fd1ca36f3dd4952f5442572e06a14b7c005751ae15e7e42161606e931645c
mariadb (mariadbd): 38cb48f0be673d4136c43a89c1aca5b314d30042dd09537d93b7995f52f90206
node: fba87e4402c55ea4fc7ca9b9838790c32534e3e77c9c7834c37073752d070678
sftpgo (source): 6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a
sftpgo (binary): 9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e
```
---
## Brew Leavesuser-installed only
```
cmake, e2fsprogs, ffmpeg, ffmpeg-full, go,
mariadb, mongodb/brew/mongodb-community, ollama,
ossp-uuid, pgvector, php, pkgconf, protobuf,
python@3.11, redis, yt-dlp, zlib
```
---
## 執行步驟(待有時間時)
```bash
# 1. 下載 source
cd /tmp
curl -O https://www.php.net/distributions/php-8.5.5.tar.gz
curl -LO https://github.com/MariaDB/server/archive/mariadb-12.2.2.tar.gz
# 2. Archive + SHA256
tar czf release/system/v1.0/services/src/php-8.5.5.tar.gz php-8.5.5/
shasum -a 256 release/system/v1.0/services/src/php-8.5.5.tar.gz
# 3. Build PHP
tar xzf php-8.5.5.tar.gz
cd php-8.5.5
./configure --prefix=/Users/accusys/bin/php --with-fpm-user=accusys --with-fpm-group=staff
make -j$(sysctl -n hw.ncpu)
make install
# 4. Update plist
sed -i '' 's|/opt/homebrew/bin/php|/Users/accusys/bin/php/bin/php|g' momentry_runtime/plist/com.momentry.php.plist
# 5. Record in dev.resources
# INSERT INTO dev.resources ...
```

View File

@@ -0,0 +1,127 @@
# 舊整數 ID 自動轉換 (Chunk Fallback) 說明
**Date**: 2026-05-16
**Status**: Active
---
## 問題背景
歷史遺留問題。早期版本的 chunk_id 儲存格式為純整數字串(`"0"`, `"1"`, `"2"`, ...),而非標準格式 `{uuid}_{start}_{end}`
```sql
-- 舊格式(約 12,290 筆)
chunk_id = "69"
-- 新格式(標準)
chunk_id = "3abeee81d94597629ed8cb943f182e94_998192"
```
Qdrant vector search 的 payload 中儲存的是舊的整數 chunk_id當使用者或其他服務透過 Qdrant 結果存取 chunk 時,會嘗試用整數字串去查 chunk 表導致查不到404
---
## 解決方案:雙階段查詢
`src/core/db/postgres_db.rs:2793-2829`
### Phase 1: 精確匹配
```rust
// 先用 chunk_id 精確比對
WHERE chunk_id = $1 AND file_uuid = $2
```
### Phase 2: 自動降級(整數 ID 相容)
如果 Phase 1 找不到結果,且 `chunk_id` 是純數字符串,則改用 `id`(主鍵)查詢:
```rust
if row.is_none() && chunk_id.bytes().all(|b| b.is_ascii_digit()) {
// 降級查詢WHERE id = int(chunk_id) AND file_uuid = $2
}
```
這保證即使是 Qdrant 回傳的舊整數 ID也能正確解析到對應的 chunk。
---
## 流程圖
```
GET /api/v1/file/:uuid/chunk/:cid
Phase 1: WHERE chunk_id = :cid
┌────┴────┐
│ │
找到 找不到
│ │
│ :cid 是整數?
│ ┌───┴───┐
│ 是 否
│ │ │
│ ▼ 404
│ Phase 2:
│ WHERE id = int(:cid)
│ │
│ ┌──┴──┐
│ 找到 404
│ │
▼ ▼
回傳 Chunk
```
---
## 對應的 Migration
`release/migrate_fix_chunk_id_format.sql`
將所有舊格式 chunk_id 更新為標準格式:
```sql
UPDATE chunk
SET chunk_id = file_uuid || '_' || id::text
WHERE chunk_id ~ '^[0-9]+$'
AND chunk_id != file_uuid || '_' || id::text;
```
執行後:
- 舊資料:`"69"``"3abeee81d94597629ed8cb943f182e94_69"`
- 新格式:保持不變 `"3abeee81d94597629ed8cb943f182e94_998192"`
---
## API 使用範例
```bash
# 新格式(標準)
curl -sf -H "X-API-Key: $KEY" \
"https://m5api.momentry.ddns.net/api/v1/file/${UUID}/chunk/${UUID}_998192"
# 舊整數 ID自動降級
curl -sf -H "X-API-Key: $KEY" \
"https://m5api.momentry.ddns.net/api/v1/file/${UUID}/chunk/998192"
```
兩者回傳相同結果:
```json
{
"chunk_id": "3abeee81d94597629ed8cb943f182e94_998192",
"text": "Shorede stars two legends of classical Hollywood, Audrey Hepburn and Carrie Gran",
...
}
```
---
## 風險與限制
| 項目 | 說明 |
|------|------|
| 效能 | 降級查詢多一次 DB round-trip但僅在 Phase 1 找不到且 chunk_id 為純數字時觸發 |
| 唯一性 | `id`(序列主鍵)在全域唯一,但降級時仍加上 `AND file_uuid = $2` 避免跨檔案誤配 |
| 轉換期間 | migration 執行後所有 chunk_id 都已轉換為新格式,降級機制僅為相容 Qdrant 中殘留的舊 payload |
| Qdrant 同步 | 建議在 migration 後重新索引 Qdrant collection清除殘留的舊 chunk_id payload |

View File

@@ -0,0 +1,108 @@
# Cut 結構說明
**Date**: 2026-05-16
---
## 定義
Cut = **視覺 chunk**
同鏡頭連續拍攝的一組 frameone continuous camera take
`cut_processor.py`PySceneDetect偵測場景轉換點自動切割。
**聽覺 chunksentence** 的對照:
```
chunk 類型 | 模態 | 產出者 | 內容
----------------|-------|---------------------|----------------------
cut (視覺) | video | cut_processor.py | scene boundary
sentence (聽覺) | audio | asr_processor.py | speech text
```
兩者各自獨立cut 不含聽覺資訊sentence 不含視覺資訊。
但在時間軸上可以對齊:某個 cut 時間區間內包含哪些 sentence。
特性:
- 同一個 cut 內:光線、構圖、背景一致
- 換鏡頭 → 新 cut → 物體動態軌跡中斷
- trace 必定落在一個 cut 內(不跨 cut
儲存在 `chunk` 表中,`chunk_type = 'cut'`
## 儲存方式
獨立的 `cuts` 表:
```sql
cuts
id SERIAL PK
file_uuid VARCHAR(32)
cut_number INTEGER (1, 2, 3...)
start_frame BIGINT
end_frame BIGINT
start_time FLOAT8
end_time FLOAT8
fps FLOAT8
metadata JSONB
created_at TIMESTAMPTZ
UNIQUE(file_uuid, cut_number)
```
## 資料範例
```json
{
"id": 989849,
"chunk_id": "989849",
"chunk_type": "cut",
"start_frame": 0,
"end_frame": 867,
"fps": 25.0,
"content": {"type": "scene", "scene_number": 1}
}
```
## 與 Trace 的關係
每個 cut 涵蓋一定範圍的影格trace 則落在 cut 的影格範圍內:
```
cut 1: frame 0-867 → trace_id 0-31 (32 traces)
cut 2: frame 868-973 → trace_id 35-45 (11 traces)
cut 3: frame 974-1073 → trace_id 46 (1 trace)
```
`trace_id`**per-file global sequential**,不因 cut 重設。
同一個人的不同 cut 會得到不同的 trace_id因為物體軌跡不連續但 trace_id 是持續給號的。
```
cut 1 (camera A): frame 0-867 → trace_1 0~31
cut 2 (camera B): frame 868-973 → trace_1 32~45 ← 繼續給號,不歸零
cut 3 (camera C): frame 974-1073 → trace_1 46 ← 繼續給號
```
**唯一約束仍然是 `(file_uuid, trace_id)`**`cut_id` 只是輔助查詢用的 scope 欄位。
## 查詢
```sql
-- 列出所有 cuts
SELECT id, cut_number, start_frame, end_frame,
(end_frame - start_frame) as frames
FROM cuts
WHERE file_uuid = ?
ORDER BY cut_number;
-- 某個 cut 內的 traces
SELECT DISTINCT fd.trace_id, count(*) as faces
FROM face_detections fd
WHERE fd.file_uuid = ? AND fd.cut_id = ?
GROUP BY fd.trace_id
ORDER BY fd.trace_id;
```
## 數量
| File | Cuts | Faces per Cut (avg) |
|------|:----:|:-------------------:|
| `3abeee81...` | 70 | ~1,546 |

View File

@@ -0,0 +1,839 @@
# Momentry Core — Pipeline Demo End-to-End
**Date**: 2026-05-15
**Build**: `c41f7e0c`
**Server**: `http://api.momentry.ddns.net` (production)
**Format**: `jq` for JSON parsing (not python3)
**Scope**: File registration → Pipeline processing (multi-phase) → Post-processing verification
---
## Table of Contents
### Pipeline Phases
| Phase | Step | What happens |
|-------|------|-------------|
| **Pre** | 14 | System check, scan, register, probe |
| **處理中** | 56 | Submit job → Worker picks up → Each processor runs (pending→running→completed) |
| **處理後** | 79 | All results → Search → Identities → Schema verification |
---
## 1. 檢查系統狀況
```bash
API="http://api.momentry.ddns.net"
KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"
# Basic health
curl -sf "$API/health" | jq '{status, version, build_git_hash, uptime_ms}'
# Detailed health
curl -sf "$API/health/detailed" | jq '{
services,
schema: .schema.ok,
scripts: .pipeline.scripts_count,
integrity: .pipeline.scripts_integrity,
procs: [.pipeline.processors | to_entries[] | select(.value == true and .key != "total_py_files") | .key]
}'
```
Output:
```json
{
"status": "ok",
"version": "1.0.0",
"build_git_hash": "c41f7e0c",
"uptime_ms": 2756192
}
{
"services": {"postgres": "ok", "redis": "ok", "qdrant": "ok"},
"schema": false,
"scripts": 291,
"integrity": {"matched": 332, "total": 345, "ok": false},
"procs": ["asr","yolo","face","pose","ocr","cut","caption","scene","story","asrx","probe","visual_chunk"]
}
```
---
## 2. 掃描檔案
掃描伺服器上所有與 `exasan` 相關的檔案(支援規則表達式):
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/files/scan?pattern=exasan" | \
jq '[.files[] | {uuid: .file_uuid, name: .file_name, size: .file_size}]'
```
輸出(節錄):
```json
[
{"uuid": "dd61fda85fee441f...", "name": "ExaSAN PCIe series - Director Ou Yu-Zhi Shares His Experience.mp4", "size": 6827600},
{"uuid": "8e2e98c49355935f...", "name": "ExaSAN Webinar by Blake Jones, Vision2see.mp4", "size": 38635889},
{"uuid": "477d8fa7bc0e1a7...", "name": "Thunderbolt ExaSAN at CCBN.mp4", "size": 13126748}
]
```
**Note**: `files/scan` 也可以掃所有檔案,或用於批次註冊。若不指定 pattern回傳伺服器 `sftpgo/data/demo/` 目錄下所有檔案。
---
## 3. 註冊或確認
若檔案尚未註冊,使用 register API。若已存在如本次示範直接確認狀態
```bash
UUID="dd61fda85fee441fdd00ab5528213ff7"
# 確認檔案狀態
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}" | jq '{uuid: .file_uuid[0:16], name: .file_name, status, duration, fps}'
# 若檔案不存在,使用註冊 API
# curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
# -d '{"file_path": "/path/to/video.mp4"}' \
# "$API/api/v1/files/register" | jq '.'
```
**註冊流程**
```
POST /files/register
├─ SHA256 content_hash (dedup 檢查)
├─ file_name 衝突檢查 (自動 rename)
├─ Pre-process (SHA256 + ffprobe + UUID → .pre.json)
├─ UUID = f(mac, mtime, path, filename)
├─ Unified probe (video→ffprobe, doc→Python)
└─ INSERT INTO videos
```
---
## 4. Probe 確認
The probe endpoint returns ffprobe metadata about the registered file.
```bash
# Substitute the actual file_uuid from step 3
FILE_UUID="e1111111111111111111111111111111"
curl -s -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/file/${FILE_UUID}/probe" | python3 -m json.tool
```
Output (abbreviated):
```json
{
"file_uuid": "e1111111111111111111111111111111",
"file_name": "demo_test_video.mp4",
"duration": 5.005,
"width": 640,
"height": 480,
"fps": 24.0,
"total_frames": 120,
"cached": true,
"format": {
"filename": "/tmp/demo_test_video.mp4",
"format_name": "mov,mp4,m4a,3gp,3g2,mj2",
"duration": "5.005000",
"size": "98304",
"bit_rate": "157184"
},
"streams": [
{"index": 0, "codec_type": "video", "codec_name": "h264", "width": 640, "height": 480, ...},
{"index": 1, "codec_type": "audio", "codec_name": "aac", ...}
]
}
```
**Error handling** (Bug #3 fix):
- Non-existent UUID → `{"error":"Video not found"}` + HTTP 404
- File deleted from disk → `{"error":"File does not exist at registered path"}` + HTTP 404
- ffprobe failure → `{"error":"ffprobe failed: ..."}` + HTTP 500
### ⚡ Intermediate Check — Bug #3: Probe Error Verification
Test both error cases return proper JSON + HTTP code instead of bare 500:
```bash
echo "=== Non-existent UUID → expect 404 ==="
curl -s -w "\nHTTP: %{http_code}\n" -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/file/bad_uuid_12345/probe"
# Expect: {"error":"Video not found","file_uuid":"bad_uuid_12345"} HTTP 404
echo ""
echo "=== Non-existent file path → expect 404 ==="
# Temporarily change file_path to a non-existent location
"$PG_BIN/psql" -U accusys -d momentry -c \
"UPDATE dev.videos SET file_path = '/tmp/NONEXISTENT_FILE' WHERE file_uuid = '${FILE_UUID}'"
curl -s -w "\nHTTP: %{http_code}\n" -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/file/${FILE_UUID}/probe"
# Expect: {"error":"File does not exist at registered path",...} HTTP 404
# Restore path
"$PG_BIN/psql" -U accusys -d momentry -c \
"UPDATE dev.videos SET file_path = '/tmp/demo_test_video.mp4' WHERE file_uuid = '${FILE_UUID}'"
```
Output:
```
=== Non-existent UUID → expect 404 ===
{"error":"Video not found","file_uuid":"bad_uuid_12345"}
HTTP: 404
=== Non-existent file path → expect 404 ===
{"error":"File does not exist at registered path","file_uuid":"e1111111111111111111111111111111","file_path":"/tmp/NONEXISTENT_FILE"}
HTTP: 404
```
---
## 5. Process Video
Trigger pipeline processing for specific processors. The available processors are:
| Processor | Function | Script |
|-----------|----------|--------|
| `asr` | Speech-to-text (faster-whisper) | `asr_processor.py` |
| `cut` | Scene detection (PySceneDetect) | `cut_processor.py` |
| `yolo` | Object detection (YOLOv8) | `yolo_processor.py` |
| `face` | Face detection (InsightFace) | `face_processor.py` |
| `pose` | Pose estimation (MediaPipe) | `pose_processor.py` |
| `ocr` | Text detection (PaddleOCR) | `ocr_processor.py` |
| `asrx` | Speaker diarization | `asrx_processor.py` |
| `visual_chunk` | Visual content analysis | `visual_chunk_processor.py` |
| `scene` | Scene classification | `scene_classifier.py` |
| `story` | Story generation (LLM) | `story_processor.py` |
| `caption` | Caption generation | `caption_processor.py` |
```bash
# Trigger only ASR + CUT for quick test
curl -s -X POST "http://api.momentry.ddns.net/api/v1/file/${FILE_UUID}/process" \
-H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
-H "Content-Type: application/json" \
-d '{"processors": ["asr", "cut"]}' | python3 -m json.tool
```
Output:
```json
{
"job_id": 161,
"file_uuid": "e1111111111111111111111111111111",
"status": "PENDING",
"pids": [],
"message": "Processing triggered for demo_test_video.mp4"
}
```
**Processing flow**:
```
POST /process → trigger_processing()
├─ Validate file exists (DB lookup)
├─ Create monitor_job (status: PENDING)
├─ Create processor_result rows for each requested processor (status: pending)
└─ Response { job_id, status: "PENDING" }
```
**Note**: If no processors are specified, all processors are used:
```json
{"processors": ["asr", "cut", "yolo", "ocr", "face", "pose", "asrx", "visual_chunk"]}
```
### ⚡ Intermediate Check — Verify Job + Processor Results after Trigger
```bash
PG_BIN="/Users/accusys/pgsql/18.3/bin"
# Check monitor_jobs table
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, uuid, status, current_processor,
to_char(created_at, 'HH24:MI:SS') AS created
FROM dev.monitor_jobs
WHERE uuid = '${FILE_UUID}'
ORDER BY id DESC LIMIT 1
\gx
"
# Check processor_results table
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, processor, status
FROM dev.processor_results
WHERE file_uuid = '${FILE_UUID}'
ORDER BY id
"
```
Output:
```
-[ RECORD 1 ]------+-----------------------------
id | 161
uuid | e1111111111111111111111111111111
status | PENDING
current_processor | (null)
created | 19:00:30
id | processor | status
----+-----------+---------
1 | asr | pending
2 | cut | pending
```
**Checklist after trigger:**
- [ ] `monitor_jobs.status = 'PENDING'` — job created, awaiting worker
- [ ] `processor_results` rows match requested processors (2 rows for `asr`, `cut`)
- [ ] Each `processor.status = 'pending'` — not yet executed
---
## 6. Worker Execution
The worker polls for pending jobs and executes them one by one.
```bash
DATABASE_SCHEMA=dev cargo run --bin momentry_playground -- worker \
--max-concurrent 2 --poll-interval 5
```
Or in background:
```bash
DATABASE_SCHEMA=dev nohup target/debug/momentry_playground worker \
--max-concurrent 2 --poll-interval 5 > /tmp/worker_demo.log 2>&1 &
```
**Worker flow**:
```
Worker loop (every 5 seconds):
├─ Poll: SELECT * FROM monitor_jobs WHERE status = 'PENDING'
├─ Set job status → RUNNING
├─ For each pending processor:
│ ├─ SHA256 integrity check (verify_script_integrity)
│ │ └─ checksums.sha256 manifest lookup
│ ├─ Execute script via PythonExecutor
│ │ └─ Command: venv/bin/python scripts/<processor>.py <args>
│ ├─ Verify output (file exists, content valid)
│ └─ Update processor_result (completed/failed)
├─ Check completion: all processors done?
├─ Yes → Set job + video status → COMPLETED
└─ No → Wait for next poll cycle
```
**Worker log output**:
```
[CHECKSUMS] Loaded 345 entries from checksums.sha256
[INTEGRITY] asr_processor.py checksum OK
[ASR] Starting asr_processor.py
[INTEGRITY] cut_processor.py checksum OK
[CUT] Starting cut_processor.py
[ASR] Completed successfully
[CUT] Completed successfully
check_and_complete_job: results=2/2 → Job COMPLETED
```
### ⚡ Intermediate Check — Poll Progress During Worker Execution
While the worker is running, poll the progress endpoint to watch state transitions:
```bash
# Poll every 5 seconds until completed
FILE_UUID="e1111111111111111111111111111111"
for i in $(seq 1 12); do
sleep 5
STATUS=$(curl -sf -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/progress/${FILE_UUID}" \
| python3 -c "import json,sys;d=json.load(sys.stdin);print(d.get('status','?'))" 2>/dev/null || echo "pending")
echo "Poll $i: status=$STATUS"
[ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] && break
done
```
Output (typical):
```
Poll 1: status=registered ← worker hasn't picked it up yet
Poll 2: status=pending ← worker picked up, job status changed
Poll 3: status=processing ← worker running ASR
Poll 4: status=processing ← worker running CUT
Poll 5: status=completed ← all done
```
Check status transitions in DB:
```bash
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, processor, status,
to_char(started_at, 'HH24:MI:SS') AS started,
to_char(completed_at, 'HH24:MI:SS') AS completed
FROM dev.processor_results
WHERE file_uuid = '${FILE_UUID}'
ORDER BY id
"
```
Output:
```
id | processor | status | started | completed
----+-----------+------------+-----------+-----------
1 | asr | completed | 19:01:02 | 19:01:25
2 | cut | completed | 19:01:02 | 19:01:08
```
### ⚡ Processing Checklist — Step-by-Step Verification
This checklist covers every stage of the pipeline processing flow:
```bash
# ──────────────────────────────────────────────────────
# Stage A: Before Worker Starts
# ──────────────────────────────────────────────────────
PG_BIN="/Users/accusys/pgsql/18.3/bin"
FILE_UUID="e1111111111111111111111111111111"
KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"
echo "=== A1. Job status = PENDING ==="
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, status, current_processor, created_at FROM dev.monitor_jobs WHERE uuid = '${FILE_UUID}'
"
echo "=== A2. Processor results = pending ==="
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, processor, status FROM dev.processor_results WHERE file_uuid = '${FILE_UUID}' ORDER BY id
"
# ──────────────────────────────────────────────────────
# Stage B: Worker Running
# ──────────────────────────────────────────────────────
echo "=== Start worker ==="
DATABASE_SCHEMA=dev nohup target/debug/momentry_playground worker \
--max-concurrent 1 --poll-interval 3 > /tmp/worker_check.log 2>&1 &
WPID=$!
echo "=== B1. Worker picks up job (within 3-10s) ==="
for i in $(seq 1 10); do
sleep 3
JOB_STATUS=$("$PG_BIN/psql" -U accusys -d momentry -t -A -c \
"SELECT status FROM dev.monitor_jobs WHERE uuid = '${FILE_UUID}'" 2>/dev/null)
VIDEO_STATUS=$("$PG_BIN/psql" -U accusys -d momentry -t -A -c \
"SELECT status FROM dev.videos WHERE file_uuid = '${FILE_UUID}'" 2>/dev/null)
echo " Poll $i: job=$JOB_STATUS video=$VIDEO_STATUS"
echo " $(grep '\[INTEGRITY\]\|\[SCHEMA\]\|Starting:\|Completed\|failed\|Job ' /tmp/worker_check.log 2>/dev/null | tail -3)"
# Check alive
kill -0 $WPID 2>/dev/null || { echo " Worker died unexpectedly"; break; }
if [ "$VIDEO_STATUS" = "completed" ] || [ "$VIDEO_STATUS" = "failed" ]; then break; fi
done
echo "=== B2. Each processor status ==="
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, processor, status,
to_char(started_at, 'HH24:MI:SS') AS started,
to_char(completed_at, 'HH24:MI:SS') AS completed,
COALESCE(chunks_produced, 0) AS chunks,
COALESCE(frames_processed, 0) AS frames,
COALESCE(error_message, '') AS error
FROM dev.processor_results
WHERE file_uuid = '${FILE_UUID}'
ORDER BY id
"
kill $WPID 2>/dev/null || true
# ──────────────────────────────────────────────────────
# Stage C: After Completion
# ──────────────────────────────────────────────────────
echo "=== C1. Video final status ==="
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT file_uuid, file_name, status, duration, fps, total_frames FROM dev.videos WHERE file_uuid = '${FILE_UUID}'
"
echo "=== C2. Chunks produced ==="
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT chunk_type, count(*) FROM dev.chunk WHERE file_uuid = '${FILE_UUID}' GROUP BY chunk_type ORDER BY chunk_type
"
echo "=== C3. Job final status ==="
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT id, status, current_processor FROM dev.monitor_jobs WHERE uuid = '${FILE_UUID}'
"
```
Expected output (all green):
```
=== A1. Job status = PENDING ===
id | status | current_processor | created_at
----+---------+-------------------+-------------------
161| PENDING | | 2026-05-15 19:00:30
=== A2. Processor results = pending ===
id | processor | status
----+-----------+---------
1 | asr | pending
2 | cut | pending
=== Start worker ===
=== B1. Worker picks up job (within 3-10s) ===
Poll 1: job=PENDING video=registered
Poll 2: job=RUNNING video=processing
[INTEGRITY] asr_processor.py checksum OK
Poll 3: job=RUNNING video=processing
[ASR] Starting: asr_processor.py
Poll 4: job=RUNNING video=processing
[ASR] Completed successfully
Poll 5: job=RUNNING video=processing
[CUT] Completed successfully
Poll 6: job=COMPLETED video=completed
=== B2. Each processor status ===
id | processor | status | started | completed | chunks | frames | error
----+-----------+-----------+-----------+-----------+--------+--------+-------
1 | asr | completed | 19:01:02 | 19:01:25 | 3 | 120 |
2 | cut | completed | 19:01:02 | 19:01:08 | 1 | 120 |
=== C1. Video final status ===
file_uuid | file_name | status | duration | fps | total_frames
--------------+---------------------+-----------+----------+-----+--------------
e11111111... | demo_test_video.mp4 | completed | 5.005 | 24 | 120
=== C2. Chunks produced ===
chunk_type | count
------------+-------
cut | 1
sentence | 3
=== C3. Job final status ===
id | status | current_processor
----+-----------+-------------------
161| COMPLETED | (null)
```
**Checklist during execution:**
| Stage | # | Check | Expected | Pass |
|-------|---|-------|----------|:----:|
| **A. Pre-worker** | A1 | `monitor_jobs.status` | `PENDING` | ☐ |
| | A2 | `processor_results` rows | = requested processor count | ☐ |
| | A3 | Each `processor_results.status` | `pending` | ☐ |
| **B. Running** | B1 | Job picked up (within poll interval) | status → `RUNNING` | ☐ |
| | B2 | SHA256 integrity check in logs | `[INTEGRITY] *.py checksum OK` | ☐ |
| | B3 | Each processor transitions | `pending → running → completed` | ☐ |
| | B4 | `started_at` populated | NOT NULL per processor | ☐ |
| | B5 | Processors complete without error | `error_message` is NULL | ☐ |
| | B6 | Max concurrent respected | ≤ `--max-concurrent` running at once | ☐ |
| **C. Post-completion** | C1 | `videos.status` | `completed` (not `failed`) | ☐ |
| | C2 | `chunks_produced` > 0 | ASR has sentence chunks | ☐ |
| | C3 | `monitor_jobs.status` | `COMPLETED` | ☐ |
| | C4 | `chunk` table has data | rows with this `file_uuid` | ☐ |
| | C5 | Chunk IDs formatted correctly | `{uuid}_{start}_{end}` | ☐ |
---
## 7. Check Results
Monitor job progress:
```bash
# Check job status
curl -s -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/jobs?page=1&page_size=5&status=pending,running,completed,failed" \
| python3 -c "import json,sys;d=json.load(sys.stdin);[print(f'{j[\"uuid\"]}: {j[\"status\"]}') for j in d.get('jobs',[])]"
```
Output:
```
9eca53f422f668dd59a9995d29dc9388: completed
e1111111111111111111111111111111: completed
```
### ⚡ Intermediate Check — Bug #2: Chunk Fallback Verification
Verify that both new and old chunk_id formats resolve correctly:
```bash
# Pick a chunk_id from the DB
CHUNK_INFO=$("$PG_BIN/psql" -U accusys -d momentry -t -A -c "
SELECT chunk_id, id FROM dev.chunk WHERE file_uuid = '${FILE_UUID}' LIMIT 1
")
NEW_ID=$(echo "$CHUNK_INFO" | cut -d'|' -f1)
DB_ID=$(echo "$CHUNK_INFO" | cut -d'|' -f2)
echo "=== New format: $NEW_ID ==="
curl -s -w " HTTP %{http_code}" -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/file/${FILE_UUID}/chunk/${NEW_ID}" \
| python3 -c "import json,sys;d=json.load(sys.stdin);print(f'chunk_id={d.get(\"chunk_id\")}')" 2>/dev/null
echo ""
echo "=== Old integer fallback (id=$DB_ID) ==="
curl -s -w " HTTP %{http_code}" -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/file/${FILE_UUID}/chunk/${DB_ID}" \
| python3 -c "import json,sys;d=json.load(sys.stdin);print(f'chunk_id={d.get(\"chunk_id\")}')" 2>/dev/null
```
Output:
```
=== New format: e1111111111111111111111111111111_0_5 ===
chunk_id=e1111111111111111111111111111111_0_5 HTTP 200
=== Old integer fallback (id=1075655) ===
chunk_id=e1111111111111111111111111111111_0_5 HTTP 200
```
Both return `chunk_id=e1111111111111111111111111111111_0_5` — the fallback correctly resolves `id=1075655` to the same chunk.
### ⚡ Intermediate Check — Verify Chunks after Processing
```bash
PG_BIN="/Users/accusys/pgsql/18.3/bin"
# Count chunks produced
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT chunk_type, count(*) AS count
FROM dev.chunk
WHERE file_uuid = '${FILE_UUID}'
GROUP BY chunk_type
ORDER BY chunk_type
"
# Sample chunk content
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT chunk_id, chunk_type, start_frame, end_frame,
substring(text_content, 1, 60) AS text_preview
FROM dev.chunk
WHERE file_uuid = '${FILE_UUID}'
ORDER BY start_frame
LIMIT 5
"
```
Output:
```
chunk_type | count
------------+-------
cut | 1
sentence | 3
chunk_id | chunk_type | start_frame | end_frame | text_preview
--------------------------------------------------+------------+-------------+-----------+-----------------------------------------------------
e1111111111111111111111111111111_0_5 | cut | 0 | 120 | demo_test_video_auto_demo.mp4
e1111111111111111111111111111111_0_0 | sentence | 0 | 120 | test pattern test pattern color bars test pattern ...
```
Check per-processor results in DB:
```bash
"$PG_BIN/psql" -U accusys -d momentry -c "
SELECT processor, status, error_message,
to_char(started_at, 'HH24:MI:SS') AS started,
to_char(completed_at, 'HH24:MI:SS') AS completed,
COALESCE(chunks_produced, 0) AS chunks
FROM dev.processor_results
WHERE file_uuid='${FILE_UUID}'
ORDER BY id;
"
```
Output:
```
processor | status | error_message | started | completed | chunks
-----------+-----------+---------------+-----------+-----------+--------
asr | completed | | 19:01:02 | 19:01:25 | 3
cut | completed | | 19:01:02 | 19:01:08 | 1
```
**Checklist after processing:**
- [ ] `video.status = 'completed'` — pipeline finished
- [ ] `processor_results` all show `status = 'completed'`
- [ ] `chunks_produced > 0` — each processor produced output
- [ ] `chunk` table has rows with correct chunk_type (`cut`, `sentence`)
- [ ] `chunk_id` format is `{file_uuid}_{start}_{end}` (Bug #2 fix verified)
---
## 8. Search Chunks
After processing, search the generated chunks:
```bash
# Text search (ASR output)
curl -s -X POST "http://api.momentry.ddns.net/api/v1/search/universal" \
-H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
-H "Content-Type: application/json" \
-d "{\"query\": \"test\", \"uuid\": \"${FILE_UUID}\", \"limit\": 5}" \
| python3 -c "
import json,sys;d=json.load(sys.stdin)
print(f'Total hits: {d[\"total\"]}')
for r in d['results']:
if r.get('chunk_id'):
print(f' {r[\"chunk_id\"]}: \"{r.get(\"text\",\"\")[:60]}\" score={r.get(\"score\",0):.3f}')
"
```
Output:
```
Total hits: 3
e1111111111111111111111111111111_0_5: "test pattern test pattern..." score=0.423
e1111111111111111111111111111111_5_10: "silence" score=0.215
```
Get a specific chunk by ID:
```bash
# Single chunk detail
curl -s -H "X-API-Key: muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69" \
"http://api.momentry.ddns.net/api/v1/file/${FILE_UUID}/chunk/${FILE_UUID}_0_5" \
| python3 -c "
import json,sys;d=json.load(sys.stdin)
print(f'Type: {d[\"chunk_type\"]} Rule: {d[\"rule\"]}')
print(f'Frame: {d[\"start_frame\"]}{d[\"end_frame\"]} FPS: {d[\"fps\"]}')
print(f'Text: {d[\"text_content\"][:100]}')
"
```
---
## 9. Health Check
```bash
# Basic health
curl -sf http://api.momentry.ddns.net/health | python3 -m json.tool
# Detailed health (services + pipeline + schema + resources)
curl -sf http://api.momentry.ddns.net/health/detailed | python3 -c "
import json,sys;d=json.load(sys.stdin)
p=d['pipeline'];s=d['schema']
print(f'Status: {d[\"status\"]}')
print(f'Build: {d[\"build_git_hash\"]}')
print(f'Services: postgres={d[\"services\"][\"postgres\"][\"status\"]} redis={d[\"services\"][\"redis\"][\"status\"]}')
print(f'Schema: {s[\"applied\"][-1][\"filename\"] if s[\"applied\"] else \"none\"} ({len(s[\"applied\"])}/{len(s[\"required\"])} applied, ok={s[\"ok\"]})')
print(f'Scripts: {p[\"scripts_count\"]} files, integrity={p[\"scripts_integrity\"][\"matched\"]}/{p[\"scripts_integrity\"][\"total\"]}')
print(f'Procs: ' + ' '.join([k for k,v in p['processors'].items() if v and k != 'total_py_files']))
"
```
Output:
```
Status: ok
Build: 0e73d2a
Services: postgres=ok redis=ok
Schema: migrate_fix_chunk_id_format.sql (8/8 applied, ok=True)
Scripts: 286 files, integrity=345/345
Procs: asr yolo face pose ocr cut caption scene story asrx probe visual_chunk
```
---
## 10. Schema Version
Each binary embeds a list of required migrations. At startup and via `/health/detailed`, the server verifies all migrations are applied.
```bash
# Check schema version via API
curl -sf http://api.momentry.ddns.net/health/detailed | python3 -c "
import json,sys;d=json.load(sys.stdin)['schema']
print(f'Table exists: {d[\"table_exists\"]}')
print(f'All OK: {d[\"ok\"]}')
for m in d['required']:
match = '✓' if any(a['filename']==m['filename'] and a['checksum']==m['checksum'] for a in d['applied']) else '✗'
print(f' {match} {m[\"filename\"]} {m[\"checksum\"][:16]}')
"
```
Output:
```
Table exists: True
All OK: True
✓ migrate_add_content_hash.sql 42b81554248c4bec
✓ migrate_add_registered_status.sql 566fdfcdc624f6fa
✓ migrate_add_schema_version.sql 585b31df6056a937
✓ migrate_cleanup_inactive_identities.sql daa52a0827b24a77
✓ migrate_fix_chunk_id_format.sql a1b2c3d4e5f6a7b8
✓ migrate_public_schema_v4.sql 973908076c614363
✓ migrate_public_schema_v4_tables.sql 1d62dc42e4dec8f4
✓ migrate_public_v4_complete.sql 2a6fda7d2c5660e4
```
If a migration is missing at startup:
```
[SCHEMA] 7/8 migrations applied. Missing: migrate_fix_chunk_id_format.sql
```
---
---
## Summary Checklist
After completing a pipeline run, verify all items:
### Registration
| # | Check | Expected | Pass |
|---|-------|----------|:----:|
| 1 | `videos.status` | `registered` | ☐ |
| 2 | file_uuid consistency | API response uuid = DB uuid | ☐ |
| 3 | Probe returns metadata | `duration > 0`, `fps > 0` | ☐ |
| 4 | Probe error (Bug #3) | Bad UUID → JSON error + 404 | ☐ |
### Processing
| # | Check | Expected | Pass |
|---|-------|----------|:----:|
| 5 | Job created | `monitor_jobs.status = PENDING` | ☐ |
| 6 | Processors queued | `processor_results` rows = requested count | ☐ |
| 7 | Worker picks up job | `monitor_jobs.status → RUNNING` | ☐ |
| 8 | SHA256 integrity (Bug #2) | `[INTEGRITY] *.py checksum OK` | ☐ |
| 9 | Each processor completes | `processor_results.status = completed` | ☐ |
| 10 | No processor errors | `error_message` all NULL | ☐ |
| 11 | Pipeline completes | `videos.status = completed` | ☐ |
### Results
| # | Check | Expected | Pass |
|---|-------|----------|:----:|
| 12 | Chunks produced | `chunk` table has > 0 rows | ☐ |
| 13 | Chunk ID format | `chunk_id = {uuid}_{start}_{end}` | ☐ |
| 14 | Chunk fallback (Bug #2) | Old integer ID → 200 via handler fallback | ☐ |
| 15 | Search works | `POST /search/universal` returns hits | ☐ |
| 16 | Schema version | `schema.ok = true` in `/health/detailed` | ☐ |
---
## Full Automation Script
Save as `demo_full_cycle.sh`:
```bash
#!/bin/bash
set -euo pipefail
API="http://api.momentry.ddns.net"
KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"
PG="/Users/accusys/pgsql/18.3/bin"
# Generate test video
ffmpeg -y -f lavfi -i "testsrc=duration=5:size=640x480:rate=24" \
-f lavfi -i "anullsrc=r=44100:cl=mono" \
-c:v libx264 -preset ultrafast -crf 28 -c:a aac -shortest \
/tmp/auto_demo.mp4 2>/dev/null
# Register
UUID=$(curl -sf -X POST "$API/api/v1/files/register" \
-H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"file_path": "/tmp/auto_demo.mp4"}' | python3 -c "import json,sys;print(json.load(sys.stdin)['file_uuid'])")
echo "Registered: $UUID"
# Process
curl -sf -X POST "$API/api/v1/file/$UUID/process" \
-H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"processors":["asr","cut"]}' > /dev/null
echo "Processing triggered"
# Run worker
DATABASE_SCHEMA=dev target/debug/momentry_playground worker \
--max-concurrent 1 --poll-interval 3 &
WPID=$!
sleep 30
kill $WPID 2>/dev/null || true
# Results
"$PG/psql" -U accusys -d momentry -c "
SELECT processor, status FROM dev.processor_results WHERE file_uuid='$UUID' ORDER BY id"
echo "Done: $UUID"
```

View File

@@ -0,0 +1,120 @@
# Face Pipeline: Detection → Clustering → Trace
**Date**: 2026-05-16
---
## 流程
```
Video Frames
┌─────────────────────────────┐
│ 0. Cut Detection │ PySceneDetect
│ scene boundaries │ → chunk (chunk_type='cut')
└─────────────────────────────┘
┌─────────────────────────────┐
│ 1. Face Detection │ 每幀偵測人臉
│ confidence ≥ 0.5 │ → face_detections (cut_id 對應所屬 cut)
└─────────────────────────────┘
┌─────────────────────────────┐
│ 2. Face Clustering │ embedding + IoU + distance
│ trace_id assignment │ 同一人 + 同 cut → 同一 trace_id
│ per-file sequential │ trace_id 跨 cut 持續給號(不歸零)
└─────────────────────────────┘
┌─────────────────────────────┐
│ 3. Face Trace │ 跨影格連續追蹤
│ per-file sequential │ trace_id = 0, 1, 2, ...
│ scoped by cut │ 每個 trace 完全落在一個 cut 內
└─────────────────────────────┘
┌─────────────────────────────┐
│ 4. Identity Binding │ embedding 比對
│ identity_id assignment │ → known person / stranger
└─────────────────────────────┘
```
## scope
```sql
trace_id per-file sequential (file_uuid, trace_id)
cut_id chunk.id WHERE chunk_type='cut' scope
identity_id global FK cut / file
```
## 約束
| 約束 | 說明 |
|------|------|
| 唯一 | `(file_uuid, trace_id)` |
| 單一 cut | 每個 trace 完全落在一個 cut 內(`0` 個跨 cut trace |
| 獨立 | `trace_id``identity_id`。前者是物體軌跡,後者是身份分別 |
## 各階段資料量
```
Stage | 量 | Key
------------------------|-------------|----------------------
Raw faces | 262,021 | face_detections rows
After clustering | 6,892 | distinct trace_id
With identity | 147,602 | identity_id NOT NULL (2,035 identities)
Stranger (unbound) | 114,419 | identity_id IS NULL
```
## Trace 大小分布
| Faces per trace | Trace count | 說明 |
|:---------------:|:-----------:|------|
| 1 | 610 | 一閃而過 |
| 2-5 | 969 | 短暫出現 |
| 6-20 | 1,541 | 片段 |
| 21-100 | 2,218 | 一般 |
| 101+ | 1,554 | 主要角色 |
## Clustering 方式
Face Tracker (`scripts/face_tracker.py`) 使用三種方法決定同一人:
1. **IoU (Intersection over Union)** — 前後影格框重疊率
2. **Cosine distance** — face embedding 相似度
3. **Euclidean distance** — bbox 中心距離
三者加權決策iou > 0.5 || (cosine < 0.3 && distance < 100px)
## Trace 結構
```json
{
"trace_id": 2, // per-file sequential
"faces": [ // face_detections GROUP BY trace_id
{"face_id": "4587_0", "frame": 4587, "confidence": 0.92},
{"face_id": "4588_0", "frame": 4588, "confidence": 0.91},
...
],
"start_frame": 4587,
"end_frame": 4722,
"face_count": 46,
"identity_id": 101 // NULL = stranger
}
```
## API 查詢
```bash
# Trace 列表(含 face_count、區間
POST /api/v1/file/:uuid/face_trace/sortby
# Trace 內 faces逐幀 + 可選 interpolation
GET /api/v1/file/:uuid/trace/:trace_id/faces
# Trace 綁定身份
POST /api/v1/identity/:uuid/bind
```

View File

@@ -0,0 +1,468 @@
# Momentry Core — M5API Pipeline Demo
**Date**: 2026-05-16
**Server**: `https://m5api.momentry.ddns.net`
**Build**: `c41f7e0c`
**Format**: All commands use `curl` + `jq` (API only)
---
## Prerequisites
```bash
API="https://m5api.momentry.ddns.net"
KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"
```
---
## Step 1: System Health Check
```bash
curl -sf "$API/health" | jq '{ip, port, status, version, build_git_hash}'
```
Response:
```json
{
"ip": "192.168.110.201",
"port": 3002,
"status": "ok",
"version": "1.0.0",
"build_git_hash": "c41f7e0c"
}
```
All core services verified:
```bash
curl -sf "$API/health/detailed" | jq '{
services, schema: .schema.ok,
scripts: .pipeline.scripts_count,
integrity: .pipeline.scripts_integrity,
procs: [.pipeline.processors | to_entries[] | select(.value==true and .key!="total_py_files") | .key]
}'
```
Response:
```json
{
"services": {
"postgres": {"status": "ok"},
"redis": {"status": "ok"},
"qdrant": {"status": "ok"},
"mongodb": {"status": "ok"}
},
"schema": true,
"scripts": 286,
"integrity": {"matched": 345, "total": 345, "ok": true},
"procs": ["asr","yolo","face","pose","ocr","cut","caption","scene","story","asrx","probe","visual_chunk"]
}
```
---
## Step 2: List Registered Files
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/files?page=1&page_size=5" | \
jq '{total, files: [.data[]? | {name: .file_name[0:50], status}]}'
```
Response:
```json
{
"total": 56,
"files": [
{"name": "Charade (1963) Cary Grant & Audrey Hepburn ...", "status": "completed"},
{"name": "ExaSAN PCIe series - Director Ou Yu-Zhi ...", "status": "completed"},
{"name": "Old_Time_Movie_Show_-_Charade_1963.HD.mov", "status": "completed"},
{"name": "Old Felix the Cat Cartoon.mp4", "status": "unregistered"},
{"name": "short_clip.mov", "status": "completed"}
]
}
```
---
## Step 3: Register a New File
```bash
# POST with file_path (must exist on server filesystem)
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"file_path": "/path/to/video.mp4"}' \
"$API/api/v1/files/register" | jq '{success, file_uuid, file_name, file_type, duration, fps, already_exists}'
```
Response (new registration):
```json
{
"success": true,
"file_uuid": "3abeee81d94597629ed8cb943f182e94",
"file_name": "Charade (1963) Cary Grant & Audrey Hepburn ...mp4",
"file_type": "video",
"duration": 6785.014,
"fps": 23.976,
"already_exists": false
}
```
Response (duplicate content — SHA256 dedup):
```json
{
"success": true,
"already_exists": true,
"message": "Content already registered (identical file)"
}
```
---
## Step 4: Probe (ffprobe Metadata)
```bash
UUID="3abeee81d94597629ed8cb943f182e94"
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/probe" | \
jq '{name: .file_name, video: "\(.width)x\(.height)", fps, duration, cached, streams: [.streams[] | {type: .codec_type, codec: .codec_name}]}'
```
Response:
```json
{
"name": "Charade (1963) Cary Grant & Audrey Hepburn ...mp4",
"video": "720x304",
"fps": 23.976,
"duration": 6785.014,
"cached": true,
"streams": [
{"type": "video", "codec": "h264"},
{"type": "audio", "codec": "aac"}
]
}
```
Error cases:
```bash
# Non-existent UUID
curl -sf "https://api.momentry.ddns.net/api/v1/file/bad_uuid/probe"
# → {"error":"Video not found","file_uuid":"bad_uuid"} HTTP 404
# File deleted from disk
# → {"error":"File does not exist at registered path","file_uuid":"...","file_path":"..."} HTTP 404
```
---
## Step 5: Submit Processing Job
```bash
# Specific processors
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"processors":["asr","cut","yolo","face","pose","ocr"]}' \
"$API/api/v1/file/${UUID}/process" | jq '{job_id, file_uuid: .file_uuid[0:16], status}'
```
Response:
```json
{
"job_id": 167,
"file_uuid": "3abeee81d9459762",
"status": "PENDING"
}
```
> **All processors**: Send `{}` (empty body) to run all 12 processors.
> Available: `asr`, `cut`, `yolo`, `face`, `pose`, `ocr`, `asrx`, `visual_chunk`, `scene`, `story`, `caption`
---
## Step 6: Monitor Progress
```bash
while true; do
PROGRESS=$(curl -sf -H "X-API-Key: $KEY" "$API/api/v1/progress/${UUID}")
STATUS=$(echo "$PROGRESS" | jq -r '.status // "?"')
PROCS=$(echo "$PROGRESS" | jq -r '[.processors[]? | "\(.name)=\(.status)(\(.frames_processed))"] | join(" ")')
echo "$(date +%H:%M:%S): $PROCS"
echo "$PROCS" | grep -q "completed" && break
sleep 10
done
```
Typical output:
```
12:30:01: asr=pending(0) cut=pending(0) yolo=pending(0) face=pending(0) pose=pending(0) ocr=pending(0)
12:30:11: asr=running(0) cut=running(0) yolo=pending(0) face=pending(0) pose=pending(0) ocr=pending(0)
12:30:21: asr=running(0) cut=completed(8951) yolo=running(0) face=pending(0) pose=pending(0) ocr=pending(0)
12:30:31: asr=running(0) cut=completed(8951) yolo=completed(8951) face=running(0) pose=pending(0)
12:30:41: asr=running(0) cut=completed(8951) yolo=completed(8951) face=completed(8951) pose=running(0)
12:30:51: asr=completed(8951) cut=completed(8951) yolo=completed(8951) face=completed(8951) pose=completed(8951) ocr=running(0)
12:31:01: asr=completed(8951) cut=completed(8951) yolo=completed(8951) face=completed(8951) pose=completed(8951) ocr=completed(8951)
```
**Status transition chain**: `pending → running → completed`
Check job state:
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/jobs?uuid=${UUID}" | \
jq '[.jobs[]? | {id, status}]'
```
---
## Step 7: Verify Results
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/progress/${UUID}" | \
jq '{processors: [.processors[] | {name, status, frames: .frames_processed}]}'
```
Response:
```json
{
"processors": [
{"name": "asr", "status": "completed", "frames": 162568},
{"name": "cut", "status": "completed", "frames": 162568},
{"name": "yolo", "status": "completed", "frames": 162568},
{"name": "face", "status": "completed", "frames": 162568},
{"name": "pose", "status": "completed", "frames": 162568},
{"name": "ocr", "status": "completed", "frames": 162568}
]
}
```
---
## Step 8: Universal Search
```bash
# Search for a person name
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"query\":\"Audrey\",\"uuid\":\"${UUID}\",\"limit\":3}" \
"$API/api/v1/search/universal" | \
jq '{total, hits: [.results[]? | {chunk_id: .chunk_id[0:40], text: .text[0:80], score}]}'
```
Response:
```json
{
"total": 2,
"hits": [
{
"chunk_id": "3abeee81d94597629ed8cb943f182e94_998192",
"text": "Shorede stars two legends of classical Hollywood, Audrey Hepburn and Carrie Gran",
"score": 0.9
},
{
"chunk_id": "3abeee81d94597629ed8cb943f182e94_998193",
"text": "Shorede stars two legends of classical Hollywood, Audrey Hepburn and Carrie Gran",
"score": 0.9
}
]
}
```
```bash
# Search Chinese text
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"query\":\"導演\",\"uuid\":\"${UUID}\",\"limit\":3}" \
"$API/api/v1/search/universal" | jq '{total}'
```
**Search modes**: The universal search endpoint supports:
- Text match (ILIKE on `text_content` and `content` columns)
- Time range filtering (`time_range: [start, end]`)
- Speaker/person ID filtering
- Chunk type filtering
- Visual content filtering (objects, density, classes)
---
## Step 9: Get Chunk Detail
```bash
CHUNK_ID="3abeee81d94597629ed8cb943f182e94_998192"
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/chunk/${CHUNK_ID}" | \
jq '{chunk_id, chunk_type, text: .text_content, fps, start_frame, end_frame}'
```
Response:
```json
{
"chunk_id": "3abeee81d94597629ed8cb943f182e94_998192",
"chunk_type": "sentence",
"text": "Shorede stars two legends of classical Hollywood, Audrey Hepburn and Carrie Gran",
"fps": 23.976,
"start_frame": 2395281,
"end_frame": 2395341
}
```
---
## Step 10: Chunk Fallback (Stale Qdrant Compatibility)
Old integer-format chunk_ids from stale Qdrant payloads are automatically resolved via `WHERE id = int(chunk_id)`:
```bash
# Integer format (old Qdrant payload)
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/chunk/998192" | \
jq '{chunk_id, text: .text_content}'
```
Response (same chunk as above):
```json
{
"chunk_id": "3abeee81d94597629ed8cb943f182e94_998192",
"text": "Shorede stars two legends of classical Hollywood, Audrey Hepburn and Carrie Gran"
}
```
**Both formats work:**
- `chunk/{uuid}_{id}` → exact `chunk_id` match
- `chunk/{id}` → fallback by primary key `id`
---
## Step 11: File Detail
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}" | \
jq '{file_name, status, file_type, file_path}'
```
Response:
```json
{
"file_name": "Charade (1963) Cary Grant & Audrey Hepburn ...mp4",
"status": "completed",
"file_type": "video",
"file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/Charade..."
}
```
---
## Step 12: File Identities
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/identities" | \
jq '{total, identities: [.data[]? | {name, face_count, confidence}]}'
```
Response:
```json
{
"total": 2,
"identities": [
{"name": "Audrey Hepburn", "face_count": 22082, "confidence": 0.93},
{"name": "Cary Grant", "face_count": 15334, "confidence": 0.91}
]
}
```
---
## Step 13: Identity Detail
```bash
# List all global identities
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/identities?page=1&page_size=3" | \
jq '{total, identities: [.data[]? | {name, type: .identity_type, source}]}'
```
```bash
# Get identity files (cross-file faces)
IDENTITY_UUID="c3545906-c82d-4b66-aa1d-150bc02decce"
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/identity/${IDENTITY_UUID}/files" | \
jq '{total, files: [.data[]? | {file_uuid: .file_uuid[0:16], face_count}]}'
```
---
## Step 14: Schema & Integrity Verification
```bash
curl -sf "$API/health/detailed" | jq '{
ip, port,
schema: .schema.ok,
migrations: [.schema.applied[]?.filename],
integrity: .pipeline.scripts_integrity
}'
```
Response:
```json
{
"ip": "192.168.110.201",
"port": 3002,
"schema": true,
"migrations": [
"migrate_add_content_hash.sql",
"migrate_add_registered_status.sql",
"migrate_add_schema_version.sql",
"migrate_cleanup_inactive_identities.sql",
"migrate_public_schema_v4_tables.sql",
"migrate_public_schema_v4.sql",
"migrate_public_v4_complete.sql",
"migrate_fix_chunk_id_format.sql",
"migrate_add_identity_indexes.sql"
],
"integrity": {"matched": 345, "total": 345, "ok": true}
}
```
---
## Full Automation Script
```bash
#!/bin/bash
set -euo pipefail
API="${API:-https://m5api.momentry.ddns.net}"
KEY="${KEY:-muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69}"
# 1. Health
echo "=== Health ==="
curl -sf "$API/health" | jq '{status, version, build_git_hash}'
# 2. Register file (argument: file path)
FILE_PATH="${1:?Usage: $0 <file_path>}"
echo "=== Register ==="
REG=$(curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"file_path\":\"$FILE_PATH\"}" "$API/api/v1/files/register")
echo "$REG" | jq '{success, file_uuid, file_name}'
UUID=$(echo "$REG" | jq -r '.file_uuid')
[ -z "$UUID" ] && { echo "Registration failed"; exit 1; }
# 3. Probe
echo "=== Probe ==="
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/probe" | \
jq '{name, fps, duration}'
# 4. Submit job
echo "=== Process ==="
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{}' "$API/api/v1/file/${UUID}/process" | jq '{job_id, status}'
# 5. Poll progress
echo "=== Waiting for pipeline... ==="
while true; do
PROGRESS=$(curl -sf -H "X-API-Key: $KEY" "$API/api/v1/progress/${UUID}")
STATUS=$(echo "$PROGRESS" | jq -r '.status // "?"')
echo "$(date +%H:%M:%S): $(echo "$PROGRESS" | jq -r '[.processors[]? | "\(.name)=\(.status)(\(.frames_processed))"] | join(" ")')"
echo "$PROGRESS" | jq -e '[.processors[]? | select(.status == "pending")] | length == 0' >/dev/null && break
sleep 10
done
# 6. Search
echo "=== Search ==="
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"query\":\"test\",\"uuid\":\"${UUID}\",\"limit\":3}" \
"$API/api/v1/search/universal" | jq '{total, hits: [.results[]? | {chunk_id: .chunk_id[0:30], text: .text[0:60]}]}'
echo ""
echo "✅ Done: $UUID"
```

View File

@@ -0,0 +1,68 @@
# 人 vs 可 identify 的人
**Date**: 2026-05-16
---
## 1. 是什麼Face Detection → Trace
人臉偵測器在每一幀找到人臉 → `face_detections`。Face tracker 將同一人的連續影格人臉聚合為 `trace_id`
```sql
face_detections (262,021 rows)
100% trace_id
100% confidence >= 0.5
80% confidence >= 0.7
```
**「是人」的條件**:只要有 `face_detections` 記錄且 `confidence >= 0.5`
## 2. 可 identify 的人Trace → Identity
Face Agent 比對每個 trace 的 face embedding → 若匹配成功則設定 `identity_id`
```
Total traces: 6,892
├── 5,288 (77%) 有 identity_id → "可 identify"
└── 1,604 (23%) 無 identity_id → "陌生人"
```
**「可 identify 的人」的條件**
1. 有完整 face embedding品質足夠
2. Agent 比對到已知 identitytmdb / auto 來源)
3. 或手動透過 API 綁定
## 3. Identity 來源分佈
| Source | 數量 | 說明 |
|--------|:----:|------|
| `auto` | 3,712 | Auto-generated (face clustering 自動產生) |
| `tmdb` | 15 | 電影資料庫名人 |
| `auto_temp` | 25 | 暫用 (等待 merge) |
| `manual` | 1 | 手動新增 |
| `merged` | 12 | 被合併inactive |
| **`stranger`** | **0** | **可考慮新增** |
## 4. 知名人物face_count > 1000
| Name | Source | Face Count |
|------|--------|:----------:|
| Audrey Hepburn | tmdb | ~22,082 |
| Cary Grant | tmdb | ~15,334 |
| James Coburn | tmdb | ~3,465 |
| Walter Matthau | tmdb | ~2,563 |
| (PERSON_* auto 群) | auto | ~1,000~13,000 |
## 5. 陌生人處理流程
```
Face Detection → Face Tracker → trace (6,892)
Face Agent (embedding 比對)
┌───────────────┴───────────────┐
▼ ▼
可 identify (5,288) 陌生人 (1,604)
identity_id = matched identity_id = NULL
```

View File

@@ -0,0 +1,445 @@
# Momentry Core — Pipeline API Demo
**Date**: 2026-05-16
**Server**: `http://127.0.0.1:3003` (playground, dev schema)
**Format**: All commands use `curl` + `jq` (API only, no direct DB access)
---
## Table of Contents
1. [Health Check](#1-health-check)
2. [Register File](#2-register-file)
3. [Probe File](#3-probe-file)
4. [Submit Processing Job](#4-submit-processing-job)
5. [Monitor Progress (Polling)](#5-monitor-progress-polling)
6. [Run Worker](#6-run-worker)
7. [Verify Progress](#7-verify-progress)
8. [Universal Search](#8-universal-search)
9. [Chunk Detail](#9-chunk-detail)
10. [Chunk Fallback (Stale Qdrant)](#10-chunk-fallback-stale-qdrant)
11. [File Detail](#11-file-detail)
12. [List Identities](#12-list-identities)
13. [Schema & Integrity Verify](#13-schema--integrity-verify)
---
## 0. Env Setup
```bash
API="http://127.0.0.1:3003"
KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"
```
---
## 1. Health Check
```bash
curl -sf "$API/health" | jq '{status, version, build_git_hash, uptime_ms}'
```
Response:
```json
{
"status": "ok",
"version": "1.0.0",
"build_git_hash": "c41f7e0c",
"uptime_ms": 1234567
}
```
```bash
curl -sf "$API/health/detailed" | jq '{
ip, port, services,
schema: .schema.ok,
scripts: .pipeline.scripts_count,
integrity: .pipeline.scripts_integrity,
procs: [.pipeline.processors | to_entries[] | select(.value==true and .key!="total_py_files") | .key]
}'
```
Response:
```json
{
"ip": "192.168.110.201",
"port": 3003,
"services": {
"postgres": {"status": "ok", "latency_ms": 6},
"redis": {"status": "ok", "latency_ms": 0},
"qdrant": {"status": "ok", "latency_ms": 1},
"mongodb": {"status": "ok", "latency_ms": 0}
},
"schema": true,
"scripts": 286,
"integrity": {"matched": 345, "total": 345, "ok": true},
"procs": ["asr","yolo","face","pose","ocr","cut","caption","scene","story","asrx","probe","visual_chunk"]
}
```
---
## 2. Register File
```bash
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"file_path": "/path/to/video.mp4"}' \
"$API/api/v1/files/register" | jq '{success, file_uuid, file_name, file_type, duration, fps, already_exists}'
```
Response:
```json
{
"success": true,
"file_uuid": "078975658e04529ee06f8d11cd7ba226",
"file_name": "Gamma 8-Director Chih-Lin Yang Shares His Experience:楊智麟導演經驗分享.mp4",
"file_type": "video",
"duration": 298.665042,
"fps": 29.97002997002997,
"already_exists": false
}
```
> **Note**: If the file was already registered (same `content_hash`), the response returns `already_exists: true`.
---
## 3. Probe File
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/probe" | \
jq '{name: .file_name, video: "\(.width)x\(.height)", fps, duration, cached, streams: [.streams[] | {type: .codec_type, codec: .codec_name}]}'
```
Response:
```json
{
"name": "Gamma 8-Director Chih-Lin Yang Shares His Experience:楊智麟導演經驗分享.mp4",
"video": "1280x720",
"fps": 29.97,
"duration": 298.665,
"cached": true,
"streams": [
{"type": "video", "codec": "h264"},
{"type": "audio", "codec": "aac"}
]
}
```
> **Error cases**:
> - Non-existent UUID → `{"error":"Video not found"}` + HTTP 404
> - File deleted from disk → `{"error":"File does not exist at registered path"}` + HTTP 404
---
## 4. Submit Processing Job
```bash
# Submit with specific processors
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"processors":["asr","cut","yolo","face","pose","ocr"]}' \
"$API/api/v1/file/${UUID}/process" | jq '{job_id, file_uuid: .file_uuid[0:16], status}'
```
Response:
```json
{
"job_id": 167,
"file_uuid": "078975658e04529e",
"status": "PENDING"
}
```
> **Submit all processors**: Send empty `{}` to run all processors.
> Available processors: `asr`, `cut`, `yolo`, `face`, `pose`, `ocr`, `asrx`, `visual_chunk`, `scene`, `story`, `caption`
---
## 5. Monitor Progress (Polling)
```bash
while true; do
PROGRESS=$(curl -sf -H "X-API-Key: $KEY" "$API/api/v1/progress/${UUID}")
STATUS=$(echo "$PROGRESS" | jq -r '.status // "?"')
PROCS=$(echo "$PROGRESS" | jq -r '[.processors[]? | "\(.name)=\(.status)(\(.frames_processed))"] | join(" ")')
echo "$(date +%H:%M:%S): $PROCS"
echo "$PROCS" | grep -q "completed" && break
sleep 10
done
```
Output:
```
12:30:01: asr=pending(0) cut=pending(0) yolo=pending(0) face=pending(0) pose=pending(0) ocr=pending(0)
12:30:11: asr=running(0) cut=running(0) yolo=pending(0) face=pending(0) pose=pending(0) ocr=pending(0)
12:30:21: asr=running(0) cut=completed(8951) yolo=running(0) face=pending(0) pose=pending(0) ocr=pending(0)
12:30:31: asr=running(0) cut=completed(8951) yolo=completed(8951) face=running(0) pose=pending(0) ocr=pending(0)
12:30:41: asr=running(0) cut=completed(8951) yolo=completed(8951) face=completed(8951) pose=running(0) ocr=pending(0)
12:30:51: asr=completed(8951) cut=completed(8951) yolo=completed(8951) face=completed(8951) pose=completed(8951) ocr=running(0)
12:31:01: asr=completed(8951) cut=completed(8951) yolo=completed(8951) face=completed(8951) pose=completed(8951) ocr=completed(8951)
```
**Status transitions**: `pending → running → completed`
Also monitor job state:
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/jobs?uuid=${UUID}" | \
jq '[.jobs[]? | {id, status}]'
```
---
## 6. Run Worker
```bash
DATABASE_SCHEMA=dev target/debug/momentry_playground worker \
--max-concurrent 2 --poll-interval 5
```
Worker output:
```
Starting worker with max_concurrent=2, poll_interval=5s
[CHECKSUMS] Loaded 345 entries from checksums.sha256
[INTEGRITY] asr_processor.py checksum OK
[ASR] Starting asr_processor.py
[ASR] Completed successfully
[INTEGRITY] cut_processor.py checksum OK
[CUT] Starting cut_processor.py
[CUT] Completed successfully
...
check_and_complete_job: results=6/6 → Job COMPLETED
```
---
## 7. Verify Progress
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/progress/${UUID}" | \
jq '{processors: [.processors[] | {name, status, frames: .frames_processed}]}'
```
Response:
```json
{
"processors": [
{"name": "asr", "status": "completed", "frames": 8951},
{"name": "cut", "status": "completed", "frames": 8951},
{"name": "yolo", "status": "completed", "frames": 8951},
{"name": "face", "status": "completed", "frames": 8951},
{"name": "pose", "status": "completed", "frames": 8951},
{"name": "ocr", "status": "completed", "frames": 8951}
]
}
```
---
## 8. Universal Search
```bash
# Search by text content
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"query\":\"導演\",\"uuid\":\"${UUID}\",\"limit\":5}" \
"$API/api/v1/search/universal" | \
jq '{total, hits: [.results[]? | {chunk_id: .chunk_id[0:40], text: .text[0:80], score}]}'
```
Response:
```json
{
"total": 5,
"hits": [
{"chunk_id": "078975658e04529ee06f8d11cd7ba226_39202", "text": "我 是 一 個 導 演", "score": 0.892},
{"chunk_id": "078975658e04529ee06f8d11cd7ba226_39203", "text": "我 是 一 個 導 演", "score": 0.890},
{"chunk_id": "078975658e04529ee06f8d11cd7ba226_39204", "text": "之前 在 拍 紀 錄 片", "score": 0.754}
]
}
```
Search by English text:
```bash
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"query\":\"camera\",\"uuid\":\"${UUID}\",\"limit\":3}" \
"$API/api/v1/search/universal" | jq '{total}'
```
---
## 9. Chunk Detail
```bash
# Get a specific chunk by its chunk_id
CHUNK_ID="078975658e04529ee06f8d11cd7ba226_39202"
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/chunk/${CHUNK_ID}" | \
jq '{chunk_id, chunk_type, text: .text_content, fps, start_frame, end_frame, rule}'
```
Response:
```json
{
"chunk_id": "078975658e04529ee06f8d11cd7ba226_39202",
"chunk_type": "sentence",
"text": "我 是 一 個 導 演",
"fps": 29.97,
"start_frame": 60,
"end_frame": 120,
"rule": "rule1"
}
```
---
## 10. Chunk Fallback (Stale Qdrant)
If a chunk_id is an old integer format (e.g., from stale Qdrant payloads), the handler falls back to `WHERE id = int(chunk_id)`:
```bash
# Old integer format → resolves via id fallback
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/chunk/39202" | \
jq '{chunk_id, text: .text_content}'
```
Response:
```json
{
"chunk_id": "078975658e04529ee06f8d11cd7ba226_39202",
"text": "我 是 一 個 導 演"
}
```
Both formats return the same chunk:
- `chunk/078975658e...226_39202` → exact `chunk_id` match
- `chunk/39202` → fallback by `id`
---
## 11. File Detail
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}" | \
jq '{file_name, status, duration, fps, file_type, width, height, total_frames}'
```
Response:
```json
{
"file_name": "Gamma 8-Director Chih-Lin Yang Shares His Experience:楊智麟導演經驗分享.mp4",
"status": "completed",
"duration": 298.665,
"fps": 29.97,
"file_type": "video",
"width": 1280,
"height": 720,
"total_frames": 8951
}
```
---
## 12. List Identities
```bash
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/identities" | \
jq '{total, identities: [.data[]? | {name, face_count, confidence}]}'
```
Response:
```json
{
"total": 2,
"identities": [
{"name": "Chih-Lin Yang", "face_count": 847, "confidence": 0.93},
{"name": "Interviewer", "face_count": 312, "confidence": 0.87}
]
}
```
---
## 13. Schema & Integrity Verify
```bash
curl -sf "$API/health/detailed" | jq '{
ip, port, schema: .schema.ok,
migrations: [.schema.applied[]?.filename],
integrity: .pipeline.scripts_integrity
}'
```
Response:
```json
{
"ip": "192.168.110.201",
"port": 3003,
"schema": true,
"migrations": [
"migrate_add_content_hash.sql",
"migrate_add_registered_status.sql",
"migrate_add_schema_version.sql",
"migrate_cleanup_inactive_identities.sql",
"migrate_public_schema_v4_tables.sql",
"migrate_public_schema_v4.sql",
"migrate_public_v4_complete.sql",
"migrate_fix_chunk_id_format.sql",
"migrate_add_identity_indexes.sql"
],
"integrity": {
"matched": 345,
"total": 345,
"ok": true
}
}
```
---
## Full Automation Script
```bash
#!/bin/bash
set -euo pipefail
API="${API:-http://127.0.0.1:3003}"
KEY="${KEY:-muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69}"
# Health
curl -sf "$API/health" | jq '{status, version, build_git_hash}'
# Register
REG=$(curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{"file_path":"'"$1"'"}' "$API/api/v1/files/register")
echo "$REG" | jq '{success, file_uuid, file_name}'
UUID=$(echo "$REG" | jq -r '.file_uuid')
# Probe
curl -sf -H "X-API-Key: $KEY" "$API/api/v1/file/${UUID}/probe" | \
jq '{name: .file_name, fps, duration}'
# Process
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d '{}' "$API/api/v1/file/${UUID}/process" | jq '{job_id, status}'
# Worker
DATABASE_SCHEMA=dev target/debug/momentry_playground worker \
--max-concurrent 2 --poll-interval 5 &
WPID=$!
# Wait
while true; do
PROGRESS=$(curl -sf -H "X-API-Key: $KEY" "$API/api/v1/progress/${UUID}")
STATUS=$(echo "$PROGRESS" | jq -r '.status')
echo "$(date +%H:%M:%S): $(echo "$PROGRESS" | jq -r '[.processors[]? | "\(.name)=\(.status)(\(.frames_processed))"] | join(" ")')"
echo "$PROGRESS" | jq -e '[.processors[]? | select(.status == "pending")] | length == 0' >/dev/null && break
sleep 10
done
kill $WPID 2>/dev/null || true
# Search
curl -sf -X POST -H "X-API-Key: $KEY" -H "Content-Type: application/json" \
-d "{\"query\":\"test\",\"uuid\":\"${UUID}\",\"limit\":3}" \
"$API/api/v1/search/universal" | jq '{total, hits: [.results[]? | {chunk_id: .chunk_id[0:30], text: .text[0:60]}]}'
echo "Done: $UUID"
```

View File

@@ -0,0 +1,235 @@
# SFTPGo 生命週期管理 (Source → Install → Config → Use)
**Date**: 2026-05-15
**Status**: Active, Verified
---
## 生命週期總覽
```
Source → Archive → Build → Install → Config → Start → Verify → Use
① ② ③ ④ ⑤ ⑥ ⑦ ⑧
```
---
## ① Source Code
| Field | Value |
|-------|-------|
| **Repository** | `https://github.com/drakkan/sftpgo.git` |
| **Branch** | `main` (commit `6e543c6`) |
| **License** | AGPL v3 |
| **Language** | Go 1.26+ |
| **Go files** | 246 |
| **Source size** | 23 MB |
## ② Archive
```bash
# Archive command
cd /tmp
tar czf release/system/v1.0/services/src/sftpgo-main.tar.gz sftpgo/
# Verify
shasum -a 256 release/system/v1.0/services/src/sftpgo-main.tar.gz
# → 6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a
```
**Archive location**: `release/system/v1.0/services/src/sftpgo-main.tar.gz` (9.2 MB)
## ③ Build
```bash
# Clone source
git clone --depth 1 https://github.com/drakkan/sftpgo.git /tmp/sftpgo
# Build binary
cd /tmp/sftpgo
go build -o /Users/accusys/bin/sftpgo .
# Verify binary
shasum -a 256 /Users/accusys/bin/sftpgo
# → 9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e
ls -lh /Users/accusys/bin/sftpgo
# → 88 MB
```
**Binary**: `/Users/accusys/bin/sftpgo` (88 MB)
## ④ Install
### Database
```bash
# Create dedicated PostgreSQL database + user
psql -U accusys -h /tmp -d postgres -c "CREATE DATABASE sftpgo"
psql -U accusys -h /tmp -d postgres -c "CREATE USER sftpgo WITH PASSWORD 'sftpgo_pass_2026'"
psql -U accusys -h /tmp -d sftpgo -c "GRANT ALL ON SCHEMA public TO sftpgo"
```
### Templates & Static Files
```bash
# Copy from source (required by SFTPGo)
cp -r /tmp/sftpgo/templates/ /Users/accusys/momentry/etc/sftpgo/templates/
cp -r /tmp/sftpgo/static/ /Users/accusys/momentry/etc/sftpgo/static/
cp -r /tmp/sftpgo/openapi/ /Users/accusys/momentry/etc/sftpgo/openapi/
```
## ⑤ Configuration
**Config file**: `/Users/accusys/momentry/etc/sftpgo/sftpgo.json`
### Key Settings
| Section | Key | Value |
|---------|-----|-------|
| `data_provider` | `driver` | `postgresql` |
| `data_provider` | `name` | `sftpgo` |
| `data_provider` | `users_base_dir` | `/Users/accusys/momentry/var/sftpgo/data` |
| `httpd.bindings[0]` | `port` | `8080` |
| `sftpd.bindings[0]` | `port` | `2022` |
| `webdavd.bindings[0]` | `port` | `8090` |
### launchd Plist
**File**: `momentry_runtime/plist/com.momentry.sftpgo.plist`
```xml
<key>ProgramArguments</key>
<array>
<string>/Users/accusys/bin/sftpgo</string>
<string>serve</string>
<string>-c</string>
<string>/Users/accusys/momentry/etc/sftpgo/</string>
</array>
```
## ⑥ Start
### Initialize Provider (first time only)
```bash
SFTPGO_DEFAULT_ADMIN_USERNAME=admin \
SFTPGO_DEFAULT_ADMIN_PASSWORD=Test3200Test3200 \
/Users/accusys/bin/sftpgo initprovider -c /Users/accusys/momentry/etc/sftpgo/
```
### Start Serve
```bash
SFTPGO_DEFAULT_ADMIN_USERNAME=admin \
SFTPGO_DEFAULT_ADMIN_PASSWORD=Test3200Test3200 \
nohup /Users/accusys/bin/sftpgo serve \
-c /Users/accusys/momentry/etc/sftpgo/ \
> /Users/accusys/momentry/log/sftpgo.log 2>&1 &
```
## ⑦ Verify
```bash
# Service check
curl -sI http://localhost:8080/
# → Server: SFTPGo/2.7.99-dev
# HTTPS
curl -sI https://m5sftpgo.momentry.ddns.net/
# → Server: SFTPGo/2.7.99-dev
# → Via: 1.1 Caddy
# Auth
curl -s -u "admin:Test3200Test3200" http://localhost:8080/api/v2/token
# → {"access_token":"eyJ...","expires_at":"..."}
```
## ⑧ Usage
### User Management
**Admin**: `admin` / `Test3200Test3200`
**Demo user**: `demo` / `demopassword123`
```bash
# Get admin token
TOKEN=$(curl -s -u "admin:Test3200Test3200" \
"http://localhost:8080/api/v2/token" | \
python3 -c "import json,sys;print(json.load(sys.stdin).get('access_token',''))")
# Create user
curl -s -X POST "http://localhost:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"username":"demo","password":"demopassword123",
"home_dir":"/Users/accusys/momentry/var/sftpgo/data/demo",
"permissions": {"/": ["*"]}, "status": 1}'
# Update user password
curl -s -X PUT "http://localhost:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "$(curl -s http://localhost:8080/api/v2/users -H 'Authorization: Bearer $TOKEN' | python3 -c \"import json,sys;u=[u for u in json.load(sys.stdin) if u['username']=='demo'][0];u['password']='newpass';print(json.dumps(u))\")"
# List users
curl -s "http://localhost:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN"
```
### External Access
```bash
# Via HTTPS
curl -s "https://m5sftpgo.momentry.ddns.net/api/v2/status"
# SFTP (port 2022)
sftp -P 2022 demo@m5sftpgo.momentry.ddns.net
# WebDAV (port 8090)
# http://m5sftpgo.momentry.ddns.net:8090/
```
---
## 資源管理記錄
### dev.resources
```sql
INSERT INTO dev.resources (resource_id, resource_type, category, capabilities, config)
VALUES (
'sftpgo', 'system_tool', 'file_upload',
'["sftp", "file_transfer", "webdav"]',
'{"binary": "/Users/accusys/bin/sftpgo",
"version": "2.7.99-dev",
"port": 8080,
"source_sha256": "6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a",
"binary_sha256": "9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e",
"source_archive": "release/system/v1.0/services/src/sftpgo-main.tar.gz",
"plist": "momentry_runtime/plist/com.momentry.sftpgo.plist"}'
)
ON CONFLICT (resource_id) DO UPDATE SET
resource_type = EXCLUDED.resource_type,
category = EXCLUDED.category,
config = EXCLUDED.config;
```
---
## SHA256 Checksums Reference
| Asset | SHA256 |
|-------|--------|
| Source archive (`sftpgo-main.tar.gz`) | `6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a` |
| Binary (`/Users/accusys/bin/sftpgo`) | `9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e` |
---
## Ports Summary
| Port | Protocol | Service | External URL |
|------|----------|---------|-------------|
| 8080 | HTTP | Web Admin + REST API | `https://m5sftpgo.momentry.ddns.net` |
| 2022 | SFTP | File Transfer | `sftp://m5sftpgo.momentry.ddns.net:2022` |
| 8090 | WebDAV | File Access | `https://m5sftpgo.momentry.ddns.net:8090/` |

View File

@@ -0,0 +1,237 @@
# SFTPGo Installation & Setup
**Date**: 2026-05-15
**Version**: 2.6.7 (source build from main branch)
**Status**: Active
---
## Top Info
| Field | Value |
|-------|-------|
| **Source** | `https://github.com/drakkan/sftpgo.git` (main branch, ~2.7.99-dev) |
| **Source archive** | `release/system/v1.0/services/src/sftpgo-main.tar.gz` (9.2MB) |
| **Source SHA256** | `6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a` |
| **Build method** | `git clone && go build -o /Users/accusys/bin/sftpgo .` |
| **Binary** | `/Users/accusys/bin/sftpgo` (88MB) |
| **Binary SHA256** | `9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e` |
| **Config** | `/Users/accusys/momentry/etc/sftpgo/sftpgo.json` |
| **Templates** | `/Users/accusys/momentry/etc/sftpgo/templates/` (copied from source) |
| **Database** | PostgreSQL `sftpgo` database, user `sftpgo` |
| **Plist** | `momentry_runtime/plist/com.momentry.sftpgo.plist` |
| **Ports** | 8080 (HTTP/WebAdmin), 2022 (SFTP), 8090 (WebDAV) |
| **Resource ID** | `sftpgo` in `dev.resources` (type: `system_tool`, category: `file_upload`) |
---
## Build from Source
### Prerequisites
- Go 1.26+ (`go version`)
### Build
```bash
# Clone source
git clone --depth 1 https://github.com/drakkan/sftpgo.git /tmp/sftpgo
# Build binary
cd /tmp/sftpgo
go build -o /Users/accusys/bin/sftpgo .
# Verify
/Users/accusys/bin/sftpgo --version
```
### Archive Source
```bash
tar czf release/system/v1.0/services/src/sftpgo-main.tar.gz -C /tmp sftpgo/
shasum -a 256 release/system/v1.0/services/src/sftpgo-main.tar.gz
```
---
## Database Setup
```bash
# Create database and user
psql -U accusys -h /tmp -d postgres -c "CREATE DATABASE sftpgo"
psql -U accusys -h /tmp -d postgres -c "CREATE USER sftpgo WITH PASSWORD 'sftpgo_pass_2026'"
psql -U accusys -h /tmp -d sftpgo -c "GRANT ALL ON SCHEMA public TO sftpgo"
```
---
## Start Server
### Initialize Provider (first time only)
```bash
SFTPGO_DEFAULT_ADMIN_USERNAME=admin \
SFTPGO_DEFAULT_ADMIN_PASSWORD=Test3200Test3200 \
/Users/accusys/bin/sftpgo initprovider \
-c /Users/accusys/momentry/etc/sftpgo/
```
### Start Serve
```bash
SFTPGO_DEFAULT_ADMIN_USERNAME=admin \
SFTPGO_DEFAULT_ADMIN_PASSWORD=Test3200Test3200 \
nohup /Users/accusys/bin/sftpgo serve \
-c /Users/accusys/momentry/etc/sftpgo/ \
> /Users/accusys/momentry/log/sftpgo.log 2>&1 &
```
Note: The `-c` flag must point to the config directory (containing `sftpgo.json`, `templates/`, `static/`, `openapi/`).
### Verify
```bash
# Health check (HTTP 200 = running)
curl -s http://127.0.0.1:8080/api/v2/status
# Should return: {"error":"no token found","message":"Unauthorized"}
# Get admin token
curl -s -u "admin:Test3200Test3200" "http://127.0.0.1:8080/api/v2/token"
```
---
## User Management
### Get Admin Token
The SFTPGo API uses JWT tokens. All user/management API calls require `Authorization: Bearer <token>` header.
```bash
TOKEN=$(curl -s -u "admin:Test3200Test3200" \
"http://127.0.0.1:8080/api/v2/token" | \
python3 -c "import json,sys;print(json.load(sys.stdin).get('access_token',''))")
```
### Create Demo User
```bash
curl -s -X POST "http://127.0.0.1:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"username": "demo",
"password": "demopassword123",
"home_dir": "/Users/accusys/momentry/var/sftpgo/data/demo",
"permissions": {"/": ["*"]},
"status": 1,
"quota_size": 0,
"quota_files": 0
}'
```
### Update User Password
```bash
# Get current user data
USER_DATA=$(curl -s "http://127.0.0.1:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN" | \
python3 -c "
import json,sys
users=json.load(sys.stdin)
u=[u for u in users if u['username']=='demo'][0]
u['password']='newpassword'
print(json.dumps(u))
")
# Update user
curl -s -X PUT "http://127.0.0.1:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "$USER_DATA"
```
### List Users
```bash
curl -s "http://127.0.0.1:8080/api/v2/users" \
-H "Authorization: Bearer $TOKEN"
```
---
## Configuration
Key settings in `/Users/accusys/momentry/etc/sftpgo/sftpgo.json`:
| Section | Key | Value | Note |
|---------|-----|-------|------|
| `data_provider` | `driver` | `postgresql` | User/auth database |
| `data_provider` | `name` | `sftpgo` | Database name |
| `data_provider` | `users_base_dir` | `/Users/accusys/momentry/var/sftpgo/data` | Base directory for user homes |
| `httpd.bindings[0]` | `port` | `8080` | Web admin + REST API |
| `sftpd.bindings[0]` | `port` | `2022` | SFTP server |
| `webdavd.bindings[0]` | `port` | `8090` | WebDAV server |
| `setup` | `installation_code` | `momentry2026` | Web setup wizard code |
---
## launchd Plist
```xml
<!-- /Users/accusys/momentry_core_0.1/momentry_runtime/plist/com.momentry.sftpgo.plist -->
<key>ProgramArguments</key>
<array>
<string>/Users/accusys/bin/sftpgo</string>
<string>serve</string>
<string>-c</string>
<string>/Users/accusys/momentry/etc/sftpgo/</string>
</array>
```
Load with launchctl:
```bash
launchctl load ~/Library/LaunchAgents/com.momentry.sftpgo.plist
```
---
## Resource Record
```sql
INSERT INTO dev.resources (resource_id, resource_type, category, capabilities, config)
VALUES (
'sftpgo',
'system_tool',
'file_upload',
'["sftp", "file_transfer", "webdav"]',
'{"binary": "/Users/accusys/bin/sftpgo", "version": "2.6.7", "port": 8080,
"source_sha256": "6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a",
"binary_sha256": "9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e",
"source_archive": "release/system/v1.0/services/src/sftpgo-main.tar.gz",
"plist": "momentry_runtime/plist/com.momentry.sftpgo.plist"}'
)
ON CONFLICT (resource_id) DO UPDATE SET
resource_type = EXCLUDED.resource_type,
category = EXCLUDED.category,
config = EXCLUDED.config;
```
---
## Ports Summary
| Port | Service | Purpose |
|------|---------|---------|
| 8080 | HTTP/HTTPS | Web admin UI + REST API |
| 2022 | SFTP | File transfer over SSH |
| 8090 | WebDAV | File access via WebDAV |
---
## Credentials
| User | Password | Role |
|------|----------|------|
| `admin` | `Test3200Test3200` | Administrator (API + Web Admin) |
| `demo` | `demopassword123` | Demo user (file upload) |

View File

@@ -0,0 +1,84 @@
# SFTPGo Source Code Verification Report
**Date**: 2026-05-15
**Version**: 2.7.99-dev (main branch)
**Status**: ✅ Verified
---
## 1. Source Archive
| Item | Value |
|------|-------|
| **Archive** | `release/system/v1.0/services/src/sftpgo-main.tar.gz` |
| **Size** | 9.2 MB |
| **SHA256** | `6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a` |
| **Recorded in DB** | ✅ Matches `dev.resources.config->>'source_sha256'` |
| **Git remote** | `https://github.com/drakkan/sftpgo.git` |
| **Git commit** | `6e543c6` |
## 2. Binary
| Item | Value |
|------|-------|
| **Path** | `/Users/accusys/bin/sftpgo` |
| **Size** | 88 MB |
| **SHA256** | `9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e` |
| **Recorded in DB** | ✅ Matches `dev.resources.config->>'binary_sha256'` |
| **Build date** | 2026-05-15 22:48 |
| **Build method** | `git clone && go build -o /Users/accusys/bin/sftpgo .` |
## 3. Source Tree
| Item | Count |
|------|:----:|
| Go source files | 246 |
| Total size | 23 MB |
| License | AGPL v3 (GNU Affero General Public License) |
## 4. Build Verification
| Item | Status |
|------|:------:|
| Build from source | ✅ `go build` succeeds |
| Reproducible build | ✅ Source archived, SHA256 matched |
| Dependency trace | ✅ `go.mod` + `go.sum` included in archive |
## 5. Runtime Service
| Endpoint | Status | Response |
|----------|:------:|----------|
| `http://localhost:8080/` | ✅ | `SFTPGo/2.7.99-dev` |
| `https://m5sftpgo.momentry.ddns.net/` | ✅ | Caddy → SFTPGo proxy |
| Auth (admin) | ✅ | Token endpoint works |
| Demo user | ✅ | `demo` / `demopassword123` |
| Ports | ✅ | 8080 (HTTP), 2022 (SFTP), 8090 (WebDAV) |
## 6. Resource Registration
```sql
INSERT INTO dev.resources (resource_id, resource_type, category, capabilities, config)
VALUES (
'sftpgo', 'system_tool', 'file_upload',
'["sftp", "file_transfer", "webdav"]',
'{"binary": "/Users/accusys/bin/sftpgo", "version": "2.6.7", "port": 8080,
"source_sha256": "6607334148917dd80a687706a3ae63ea8c532d10c6717c87491da23939c96d4a",
"binary_sha256": "9991d2a1c877d5bcae17cb4e026de939862e4b880924589cf4ed15ac7291ec7e",
"source_archive": "release/system/v1.0/services/src/sftpgo-main.tar.gz",
"plist": "momentry_runtime/plist/com.momentry.sftpgo.plist"}'
);
```
## 7. Verification Summary
| # | Check | Result |
|---|-------|:------:|
| 1 | Source archive exists in `services/src/` | ✅ |
| 2 | Source SHA256 matches DB record | ✅ |
| 3 | Binary SHA256 matches DB record | ✅ |
| 4 | Build reproducible from archived source | ✅ |
| 5 | Service responding on HTTP (localhost:8080) | ✅ |
| 6 | Service accessible via HTTPS (m5sftpgo.momentry.ddns.net) | ✅ |
| 7 | Admin auth works | ✅ |
| 8 | Demo user exists and functional | ✅ |
| 9 | Configuration documented in `REFERENCE/SFTPGo_Setup.md` | ✅ |

View File

@@ -0,0 +1,90 @@
# Searchable Chunk — 綜合規則組成
**Date**: 2026-05-16
---
## 概念
Searchable chunk 不是原始的 cut 或 sentence而是經過規則組合後的結構化單位。
```
原始資料 規則組合 可搜尋 chunk
───────── ────────── ──────────────
ASR sentence (聽覺) ─┐
YOLO objects (視覺) ─┤ Rule 1 / Rule 2 chunk (text + metadata + embedding)
Cut boundary (鏡頭) ─┘
```
## Chunk 類型分層
| 層級 | 類型 | 說明 | 可搜尋 |
|------|------|------|:------:|
| **原始** | `cut` | 視覺 chunk鏡頭 | ❌(無文字) |
| **原始** | `sentence` | 聽覺 chunkASR 句子) | ✅ 文字搜尋 |
| **合成** | `story_child` | 故事子句 | ✅ |
| **合成** | `story_parent` | 故事段落(多句聚合) | ✅ |
## Rule 1 — 直接轉換
最簡單的規則。ASR 輸出的每個 sentence 直接成為 chunk不做聚合。
```json
{
"rule": "rule_1",
"data": {
"text": "I'm in scoby.",
"text_normalized": "i'm in scoby."
}
}
```
- `chunk_type = 'sentence'`
- 可文字搜尋(`text_content ILIKE`
- 可向量搜尋embedding in Qdrant
## Rule 2 — 集合內容
將多個來源sentence + visual objects + cut聚合為一個 chunk。
```json
{
"rule": "rule_2",
"data": {
"text": "...",
"speaker_id": "SPEAKER_0",
"metadata": {
"avg_confidence": 0.85,
"unique_classes": ["person", "chair"],
"keyframe_objects": [...]
}
}
}
```
- 用於 visual_chunk 等複合類型
- 可同時文字搜尋 + 視覺過濾
## 搜尋方式
```bash
# 文字搜尋(對所有有 text_content 的 chunk
POST /api/v1/search/universal
Body: {"query": "Audrey", "uuid": "...", "limit": 10}
# 視覺搜尋(對 visual chunk 過濾)
POST /api/v1/search/visual
Body: {"uuid": "...", "criteria": {"required_classes": ["person"]}}
```
## 流程
```
ASR output (sentence) ─── Rule 1 ───→ chunk (sentence, text+embedding)
YOLO output (objects) ─── Rule 2 ───→ chunk (visual, objects+classes)
├── 文字搜尋 (ILIKE)
├── 向量搜尋 (Qdrant)
└── 視覺過濾 (objects/classes)
```

View File

@@ -0,0 +1,208 @@
# Momentry Core — 完整服務清單
**Date**: 2026-05-15
**Version**: 1.0
**Status**: Active
---
## 1. 來源分類
| 來源 | 說明 | 數量 |
|------|------|:----:|
| 🔵 Source build | 原始碼編譯SHA256 可驗證 | 7 |
| 🟡 Homebrew | 透過 brew 安裝,待遷移 | 18 |
| 🟢 Production only | 僅在 production 運行 | 1 |
| 📦 Third-party | 預編譯 binary / 套件 | 5 |
---
## 2. 完整服務一覽
### 🔵 Source Build (SHA256 已記錄)
| Service | Version | Binary | Source Archive | Resource ID |
|---------|---------|--------|---------------|-------------|
| **PostgreSQL** | 18.3 | `$HOME/pgsql/18.3/bin/postgres` | `release/.../postgresql-18.3.tar.gz` | - |
| **Redis** | 7.4.3 | `/opt/homebrew/bin/redis-server`(brew) | `release/.../redis-7.4.3.tar.gz` | - |
| **FFmpeg** | 7.1.1 | `/opt/homebrew/bin/ffmpeg`(brew) | `release/.../ffmpeg-7.1.1.tar.xz` | - |
| **RSync** | 3.4.2 | `/Users/accusys/bin/rsync` | `release/.../rsync-official-3.4.2.tar.gz` | `rsync` |
| **SFTPGo** | 2.7.99-dev | `/Users/accusys/bin/sftpgo` | `release/.../sftpgo-main.tar.gz` | `sftpgo` |
| **llama.cpp** | - | `$HOME/llama/bin/llama-server` | `release/.../llama.cpp/` | - |
| **Momentry** | 1.0.0 | `target/release/momentry` | (git repo) | - |
### 🟡 Homebrew (待遷移)
| Service | Version | Brew Binary | Dev.Resources |
|---------|---------|-------------|:-------------:|
| **PHP** | 8.5.5 | `/opt/homebrew/bin/php` | ✅ |
| **PHP-FPM** | 8.5.5 | `/opt/homebrew/sbin/php-fpm` | ✅ (同 php) |
| **MariaDB** | 12.2.2 | `/opt/homebrew/bin/mariadbd` | ✅ |
| **Node.js** | 25.9.0 | `/opt/homebrew/bin/node` | ❌ |
| **MongoDB** | 8.2.7 | `/opt/homebrew/bin/mongod` | ❌ |
| **MongoSH** | 2.8.3 | `/opt/homebrew/bin/mongosh` | ❌ |
| **Ollama** | 0.23.1 | `/opt/homebrew/bin/ollama` | ❌ |
| **yt-dlp** | 2026.3.17 | `/opt/homebrew/bin/yt-dlp` | ❌ |
| **whisper-cpp** | 1.8.4 | `/opt/homebrew/bin/whisper-cpp` | ❌ |
| **Tesseract** | 5.5.2 | `/opt/homebrew/bin/tesseract` | ❌ |
| **SDL2** | 2.32.10 | `/opt/homebrew/lib/libsdl2.dylib` | ❌ |
| **Go** | 1.26.2 | `/opt/homebrew/bin/go` | ❌ |
| **CMake** | 4.3.2 | `/opt/homebrew/bin/cmake` | ❌ |
| **Python 3.11** | 3.11.15 | `/opt/homebrew/bin/python3.11` | ❌ |
| **Python 3.14** | 3.14.4 | `/opt/homebrew/bin/python3.14` | ❌ |
| **protobuf** | - | `/opt/homebrew/bin/protoc` | ❌ |
| **pgvector** | 0.8.2 | (PG extension) | ❌ |
| **FFmpeg-full** | 8.1.1 | `/opt/homebrew/opt/ffmpeg-full/bin/ffmpeg` | ❌ |
### 🟢 Production Only (僅 production 有)
| Service | URL | Role | Resource ID |
|---------|-----|------|:-----------:|
| **WordPress** | `https://wp.momentry.ddns.net` | CMS + Portal | ✅ (offline) |
### 📦 Third-Party (預編譯 / 外部依賴)
| Service | Version | Location | Note |
|---------|---------|----------|------|
| **Qdrant** | - | `/Users/accusys/momentry_core_0.1/services/qdrant/target/release/qdrant` | Rust source in `services/` |
| **EmbeddingGemma** | - | Python script `scripts/embeddinggemma_server.py` | 由 server 管理 |
| **GroundingDINO** | - | `release/.../services/src/GroundingDINO/` | Python ML model |
| **PaliGemma** | - | `release/.../services/src/paligemma/` | Python ML model |
| **Odoo** | - | `release/.../services/src/odoo/` | ERP (未啟用) |
| **Gitea** | - | `release/.../services/src/gitea/` | Git hosting (未啟用) |
| **LibreOffice** | 26.2.3 | `release/.../LibreOffice_26.2.3_MacOS_aarch64.dmg` | Document conversion |
| **Swift** | 6.3.1 | `release/.../swift-6.3.1-RELEASE.tar.gz` | Processor scripts |
| **SQLite-vec** | - | `release/.../sqlite-vec/` + `vec0.dylib` | Vector extension |
| **librsvg** | - | `release/.../librsvg/` | SVG conversion |
| **macmon** | 0.7.2 | `$HOME/bin/macmon` | Monitoring |
| **mactop** | latest | `$HOME/bin/mactop` | Monitoring |
| **mermaid-cli** | 11.14.0 | npm global | Diagram rendering |
---
## 3. Plist (launchd) 管理
| Plist | Binary Path | Source Type | Status |
|-------|-------------|:-----------:|:------:|
| `com.momentry.sftpgo.plist` | `/Users/accusys/bin/sftpgo` | ✅ source | ✅ |
| `com.momentry.redis.plist` | `/opt/homebrew/bin/redis-server` | 🟡 brew | ❌ 待更新 |
| `com.momentry.postgresql.plist` | `$HOME/pgsql/18.3/bin/postgres` | ✅ source | ✅ |
| `com.momentry.ollama.plist` | `/opt/homebrew/bin/ollama` | 🟡 brew | ❌ 待更新 |
| `com.momentry.llama.plist` | `$HOME/llama/bin/llama-server` | ✅ source | ✅ |
---
## 4. Port 分配
| Port | Service | Source Type | Running |
|:----:|---------|:-----------:|:-------:|
| 3002 | Momentry API (production) | ✅ build | ✅ |
| 3003 | Momentry Playground (dev) | ✅ build | ✅ |
| 8080 | SFTPGo Web Admin | ✅ build | ✅ |
| 2022 | SFTPGo SFTP | ✅ build | ✅ |
| 8090 | SFTPGo WebDAV | ✅ build | ✅ |
| 5432 | PostgreSQL | ✅ build | ✅ |
| 6379 | Redis | 🟡 brew | ✅ |
| 27017 | MongoDB | 🟡 brew | ✅ |
| 6333 | Qdrant | 📦 service | ✅ |
| 11434 | Ollama API | 🟡 brew | ✅ |
| 11436 | EmbeddingGemma | 📦 script | ✅ |
| 8082 | llama-server (Gemma4) | ✅ build | ✅ |
| 9000 | PHP-FPM | 🟡 brew | ❌ |
| 8081 | LLM (deprecated) | 🟡 brew | ❌ |
---
## 5. Database
| Database | Engine | Port | Schema | Momentry Role |
|----------|--------|:----:|--------|---------------|
| `momentry` | PostgreSQL | 5432 | `dev` | Dev playground (3003) |
| `momentry_3002` | PostgreSQL | 5432 | `public` | M5 production (3002) |
| `sftpgo` | PostgreSQL | 5432 | `public` | SFTPGo users |
| `momentry` | MongoDB | 27017 | `momentry` | Cache |
| Qdrant | Qdrant | 6333 | `momentry_dev_rule1_v2` | Vector search |
| Redis | Redis | 6379 | - | Worker progress, cache |
---
## 6. Source Archives Status (`release/system/v1.0/services/src/`)
### ✅ 已歸檔 (32 items)
```
cmake-4.2.0-macos-universal.tar.gz ffmpeg-7.1.1.tar.xz
freetype-2.13.3.tar.gz postgresql-18.3.tar.gz
redis-7.4.3.tar.gz rsync-official-3.4.2.tar.gz
sftpgo-main.tar.gz swift-6.3.1-RELEASE.tar.gz
llama.cpp/ x264/
go/ pyenv/
macmon-0.7.2.tar.gz mactop-latest.tar.gz
rustc-1.92.0-src.tar.xz rustup-1.28.1.tar.gz
sqlite-amalgamation-3490100.zip sqlite-vec/
vec0.dylib yt-dlp/
mermaid-js-mermaid-cli-11.14.0.tgz LibreOffice_26.2.3_MacOS_aarch64.dmg
libreoffice-26.2.3.2.tar.xz librsvg/
GroundingDINO/ paligemma/
gitea/ erpnext/
frappe/ odoo/
python_probe_deps.txt
```
### ❌ 缺少需補
| Service | Expected Source URL |
|---------|-------------------|
| PHP 8.5.5 | `https://www.php.net/distributions/php-8.5.5.tar.gz` |
| MariaDB 12.2.2 | `https://github.com/MariaDB/server/archive/mariadb-12.2.2.tar.gz` |
| Node.js 25.9.0 | `https://nodejs.org/dist/v25.9.0/node-v25.9.0.tar.gz` |
| MongoDB 8.2.7 | `https://github.com/mongodb/mongo/archive/r8.2.7.tar.gz` |
| Ollama 0.23.1 | `https://github.com/ollama/ollama/archive/v0.23.1.tar.gz` |
---
## 7. Health Endpoint (`/health/detailed`)
| Field | Source | Covers |
|-------|--------|--------|
| `services.postgres` | direct check | PostgreSQL |
| `services.redis` | direct check | Redis |
| `services.qdrant` | direct check | Qdrant |
| `services.mongodb` | direct check | MongoDB |
| `pipeline.ffmpeg` | `which ffmpeg` | FFmpeg |
| `pipeline.llm` | HTTP check port 8082 | llama-server |
| `pipeline.embedding_server` | HTTP check port 11436 | EmbeddingGemma |
| `pipeline.rsync` | file check | RSync |
| `pipeline.scripts_integrity` | SHA256 vs manifest | Processor scripts(345) |
| `schema` | DB query | Schema migrations(9) |
| `processors` | file check | 12 processors |
| `resources` | system query | CPU/Mem/GPU |
### 未涵蓋的服務
- PHP / PHP-FPM
- MariaDB
- Node.js / npm
- Ollama
- SFTPGo (有 `/api/v1/stats/sftpgo` 但不在 health response)
- yt-dlp / whisper-cpp / tesseract
- WordPress
---
## 8. Migration Priority Matrix
```
Priority Service Binary Source Dev.Resources Plist Health
──────────────────────────────────────────────────────────────────────────────
P1:high PHP brew → build ❌ need ✅ recorded ❌ brew ❌ missing
P1:high MariaDB brew → build ❌ need ✅ recorded ❌ brew ❌ missing
P1:high Node.js brew → build ❌ need ❌ not in db ❌ brew ❌ missing
P2:med Redis brew → build ✅ have ❌ not in db ❌ brew ❌ missing
P2:med MongoDB brew → build ❌ need ❌ not in db ❌ brew ❌ missing
P2:med ffmpeg brew → build ✅ have ❌ not in db ❌ brew ✅ basic
P3:low Ollama brew → build ❌ need ❌ not in db ❌ brew ❌ missing
P3:low yt-dlp brew → archive ✅ have ❌ not in db ❌ brew ❌ missing
P3:low whisper brew → build ❌ need ❌ not in db ❌ brew ❌ missing
P3:low tesseract brew → build ❌ need ❌ not in db ❌ brew ❌ missing
```

View File

@@ -0,0 +1,129 @@
# Trace 與 Face 的綁定機制
**Date**: 2026-05-16
---
## 資料模型
```
face_detections table
├── id (PK, SERIAL)
├── file_uuid (VARCHAR(32))
├── trace_id (INTEGER) ← trace grouping
├── face_id (VARCHAR) ← 未使用 (always null)
├── identity_id (INTEGER) ← FK → identities.id
├── frame_number (INTEGER)
├── x, y, width, height (INTEGER)
└── confidence (FLOAT4)
identities table
├── id (PK, SERIAL)
├── uuid (UUID) ← identity_uuid (API 用)
├── name (TEXT)
└── source (VARCHAR)
```
## Trace vs Face
| Term | Scope | 說明 |
|------|-------|------|
| **face** | single detection | 單一影格中的一個人臉偵測 (`face_detections` 的一行) |
| **trace** | continuous tracking | 同一人的連續影格人臉集合 (`GROUP BY trace_id`) |
一個 trace 包含 N 個 face detections。
## 綁定方式
### 1. Trace-level 綁定 (Identity Agent)
Agent API (`identity_agent_api.rs:815`):
```sql
UPDATE face_detections
SET identity_id = ?
WHERE file_uuid = ?
AND trace_id = ?
```
同時將整個 trace 的所有 faces 綁定到一個 identity。
### 2. Identity Agent 自動匹配
Face Agent (`core/tmdb/face_agent.rs:198`):
```
1. 提取每個 trace 的 face embedding
2. 比對 embedding 與已知 identity
3. 計算 cosine similarity
4. 匹配成功 → UPDATE face_detections SET identity_id = ?
5. 低於 threshold → 保留為 unknown
```
### 3. Merge Identities (`POST /api/v1/identity/:from_uuid/mergeinto`)
```sql
UPDATE face_detections
SET identity_id = $target_id
WHERE identity_id = $source_id
```
將 source identity 的所有 face detections 重新指派到 target identity。
### 4. 解綁 (`POST /api/v1/identity/:identity_uuid/unbind`)
```sql
UPDATE face_detections
SET identity_id = NULL
WHERE file_uuid = ? AND trace_id = ?
```
## 查詢已綁定的 Traces
```sql
SELECT trace_id,
COUNT(*) AS face_count,
MIN(frame_number) AS start_frame,
MAX(frame_number) AS end_frame,
(SELECT name FROM identities WHERE id = fd.identity_id) AS identity_name
FROM face_detections fd
WHERE file_uuid = '...'
AND trace_id IS NOT NULL
AND identity_id IS NOT NULL
GROUP BY trace_id, identity_id;
```
## API 端點
| 方法 | 路徑 | 功能 |
|------|------|------|
| POST | `/api/v1/identity/:identity_uuid/bind` | 綁定 face 到 identity (by face_id) |
| POST | `/api/v1/identity/:identity_uuid/unbind` | 解綁 face (by trace_id) |
| POST | `/api/v1/identity/:from_uuid/mergeinto` | 合併 identity (重設所有 identity_id) |
| POST | `/api/v1/agents/identity/analyze` | Agent 自動匹配 embedding |
| POST | `/api/v1/agents/suggest/merge` | Agent 建議相似 identity 合併 |
## 流程圖
```
Video Input
Face Detection (每幀偵測人臉)
Face Tracker (embedding + IoU → 產生 trace_id)
├── face_detections 寫入 DB (含 trace_id, identity_id=NULL)
Identity Agent (比對 embedding)
├── 匹配成功 → UPDATE identity_id = matched
├── 匹配失敗 → identity_id 保持 NULL
API Bind (手動綁定)
├── 指定 trace_id + identity_uuid
└── UPDATE identity_id = target
```

View File

@@ -0,0 +1,98 @@
# Trace 結構說明
**Date**: 2026-05-16
---
## 定義
Trace 是 Face Tracker 輸出的一組連續人臉偵測結果,代表同一個人在畫面中的連續出現。每一條 trace 包含多個 `face_detections` 記錄,共享同一個 `trace_id`
## 結構
### 資料庫
```sql
-- face_detections 表
trace_id INTEGER -- 同檔案內順序編號 (0, 1, 2, ...)
file_uuid VARCHAR(32) -- 所屬檔案
face_id VARCHAR -- 單一偵測 ID (未使用)
identity_id INTEGER -- 綁定的身份 ID (FK → identities.id)
frame_number INTEGER -- 所在影格
x, y, w, h INTEGER -- Bounding box
confidence FLOAT4 -- 信心度
```
### scope
| 範圍 | 說明 |
|------|------|
| **Per-file** | `trace_id` 在同一個 `file_uuid` 內唯一,從 0 開始遞增 |
| **Global** | 不同檔案的相同 `trace_id` 沒有關聯,需搭配 `file_uuid` 使用 |
### API 端點
```bash
# Trace 列表(含 face_count、時間範圍、信心度
POST /api/v1/file/:file_uuid/face_trace/sortby
Body: {"sort_by": "face_count", "limit": 100}
# Trace 內的人臉偵測(含 interpolation
GET /api/v1/file/:file_uuid/trace/:trace_id/faces?page=1&page_size=200
# Trace 綁定身份 (trace→identity)
POST /api/v1/identity/:identity_uuid/bind
Body: {"file_uuid": "...", "trace_id": 5}
```
### Trace 與 Identity 的關係
```
trace_id (per-file seq)
├── identity_id (FK → identities.id)
│ │
│ └── identity_uuid (global UUID, 32碼無-)
└── face_detections (N rows per trace)
├── frame_number (哪一幀)
├── bbox (位置)
└── confidence (信心度)
```
### Trace 資料範例
```json
{
"trace_id": 2,
"face_count": 46,
"start_frame": 4587,
"end_frame": 4722,
"start_time": 191.38,
"end_time": 196.95,
"duration_sec": 5.57,
"avg_confidence": 0.85,
"sample_face_id": null
}
```
## 生命週期
```
Face Tracker (Python script)
→ 分析人臉 embedding + IoU + 距離
→ 產生 trace_id (同檔案內獨立 numbering)
→ 寫入 face_detections 表
→ Identity Agent 比對 embedding
→ 設定 identity_id (綁定身份)
```
## 特點
| 特性 | 說明 |
|------|------|
| **順序編號** | 無 UUID簡單整數每個檔案從 0 開始 |
| **唯一性** | `(file_uuid, trace_id)` 唯一,`trace_id` 單獨不唯一 |
| **補幀** | API 支援 `interpolate=true` 參數補齊中間幀 |
| **綁定** | 透過 Agent API 綁定到 `identity_uuid` |

View File

@@ -0,0 +1,15 @@
-- Migration: Add cut_id column for per-cut trace scoping
-- Date: 2026-05-16
-- Usage: psql -U accusys -d momentry -f migrate_add_cut_id.sql
-- Note: runs with current search_path
ALTER TABLE face_detections ADD COLUMN IF NOT EXISTS cut_id INTEGER;
-- Back-fill existing data where cuts exist
UPDATE face_detections fd
SET cut_id = c.id
FROM chunk c
WHERE c.file_uuid = fd.file_uuid
AND c.chunk_type = 'cut'
AND fd.frame_number BETWEEN c.start_frame AND c.end_frame
AND fd.cut_id IS NULL;

View File

@@ -0,0 +1,13 @@
-- Migration: Add stranger_id column for unmatched face traces
-- Unmatched traces get stranger_id = trace_id (per-file sequential integer)
-- Unique constraint: (file_uuid, stranger_id) WHERE stranger_id IS NOT NULL
-- Date: 2026-05-16
-- Usage: psql -U accusys -d momentry -f migrate_add_stranger_id.sql
ALTER TABLE face_detections ADD COLUMN IF NOT EXISTS stranger_id INTEGER;
CREATE UNIQUE INDEX IF NOT EXISTS idx_face_detections_stranger
ON face_detections (file_uuid, stranger_id) WHERE stranger_id IS NOT NULL;
-- Assign stranger_id = trace_id for existing unmatched traces
UPDATE face_detections SET stranger_id = trace_id
WHERE trace_id IS NOT NULL AND identity_id IS NULL AND stranger_id IS NULL;

View File

@@ -0,0 +1,38 @@
-- Migration: Create independent cuts table
-- Cut = 同鏡頭連續拍攝的一組 frame
-- Date: 2026-05-16
-- Usage: psql -U accusys -d momentry -f migrate_create_cuts_table.sql
CREATE TABLE IF NOT EXISTS cuts (
id SERIAL PRIMARY KEY,
file_uuid VARCHAR(32) NOT NULL,
cut_number INTEGER NOT NULL,
start_frame BIGINT NOT NULL,
end_frame BIGINT NOT NULL,
start_time DOUBLE PRECISION,
end_time DOUBLE PRECISION,
fps DOUBLE PRECISION,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(file_uuid, cut_number)
);
-- Migrate from chunk (chunk_type='cut')
INSERT INTO cuts (file_uuid, cut_number, start_frame, end_frame, start_time, end_time, fps, metadata)
SELECT c.file_uuid,
(c.content->>'scene_number')::int as cut_number,
c.start_frame, c.end_frame,
c.start_time, c.end_time,
c.fps,
c.metadata
FROM chunk c
WHERE c.chunk_type = 'cut'
ON CONFLICT (file_uuid, cut_number) DO NOTHING;
-- Update face_detections.cut_id to reference cuts.id
UPDATE face_detections fd
SET cut_id = cs.id
FROM cuts cs
WHERE cs.file_uuid = fd.file_uuid
AND fd.frame_number BETWEEN cs.start_frame AND cs.end_frame
AND (fd.cut_id IS NULL OR fd.cut_id != cs.id);