feat: Phase 1 handover - schema migration, correction mechanism, API fixes

Schema changes: dev.chunks->dev.chunk, remove old_chunk_id/chunk_index
Correction: asr-1.json format, generate/apply scripts
API: 37/37 endpoints fixed and tested
Docs: HANDOVER_V2.0.md for M4
This commit is contained in:
Accusys
2026-05-11 07:03:22 +08:00
parent ef894a44ad
commit 39ba5ddf76
147 changed files with 19843 additions and 3053 deletions

View File

@@ -1,6 +1,6 @@
# Visual Speaker Diarization 選型評估報告
**日期**2026-05-07
**日期**2026-05-07初版、2026-05-098Hz 實測)
**作者**M5
**目的**:評估從視覺(嘴型)辨識誰在說話的技術方案
@@ -319,3 +319,87 @@ else:
| MediaPipe 478 點 3D landmarks | 更精確的嘴型 + 頭部轉向 | 安裝 MediaPipe~30min |
| Per-trace lip motion history | 不只是 ASR 開始,追蹤整段說話的 lip 變化 | 已可行 |
| VSP-LLM 完整部署 | 誰+說什麼 | 需 LLaMA2 授權 + AV-HuBERT |
---
## 6. 8Hz 實測2026-05-09
### 6.1 測試目標
驗證 Apple VisionANE+ `sample_interval=3`8Hz對 lip motion 分析的可行性。
### 6.2 測試參數
| 項目 | 數值 |
|------|------|
| 影片 | Charade (1963),前 10 分鐘 |
| 解析度 | 1920×1080 |
| FPS | 25 |
| 測試時長 | 600s0~600s |
| 總幀數 | 15,000 |
| sample_interval | 38Hz ≈ 每幀 ~0.12s |
| 處理幀數 | ~5,000 |
| 臉部分析 | Apple VisionANE+ CoreML FaceNet |
### 6.3 測試流程
```
1. 用 face_processor.py 以 interval=3 跑前 10 分鐘
→ 輸出 {uuid}.face_test.json
2. 從 face_test.json 提取 outer_lips → 計算 lip_openness
lip_openness = max(outer_lips.y) - min(outer_lips.y)
3. 讀 asrx.json speaker segments → 比對時間重疊
4. 對每個 ASR segment 計算說話幀比例
```
### 6.4 執行
```bash
# 建立獨立測試目錄
mkdir -p output_dev/lip_test
# 跑 face detection @ 8Hz僅前 600s
python3 scripts/face_processor.py \
"var/sftpgo/data/demo/Charade (1963).mp4" \
output_dev/lip_test/aeed71342a899fe4b4c57b7d41bcb692.face_test.json \
--uuid aeed71342a899fe4b4c57b7d41bcb692 \
--sample-interval 3 \
--max-frames 15000
# Lip openness 計算 + ASRX 對照
python3 scripts/lip_analyzer.py \
--face output_dev/lip_test/aeed71342a899fe4b4c57b7d41bcb692.face_test.json \
--asrx output_dev/aeed71342a899fe4b4c57b7d41bcb692.asrx.json \
--output output_dev/lip_test/aeed71342a899fe4b4c57b7d41bcb692.lip_test.json
```
### 6.5 結果
> 測試執行於 2026-05-09 19:14。
| 項目 | 結果 |
|------|------|
| 處理時間Vision ANE | **37 秒** |
| 處理時間CoreML ANE | **356 秒**~6 分鐘) |
| 處理幀數 | 2,734sample_interval=3~8Hz |
| 偵測到臉的幀數 | 2,734100% |
| outer_lips 有效幀 | 2,734**100%** |
| ASRX 區段0-600s | 114 |
| 有 face 資料區段 | 112**98%** |
| 可判定 lip motion | 55**49%** of face-present |
**關鍵發現:**
- Apple Vision ANE 在 interval=3 時非常快37 秒 / 10 分鐘影片),但 CoreML embedding 是瓶頸356 秒),因為每張臉都要跑一次 FaceNet
- outer_lips 覆蓋率 100% — 只要有臉就有 lips data
- 98% 的 ASR 區段有對應的臉部資料(僅 2% 為畫外音)
- 49% 的區段顯示明確 lip motion>5% threshold比之前 26% 大幅改善
- 8Hz 連續取樣讓 baseline/during 比較可行 — 之前 sample_interval=30 時無法可靠計算
**比起原始測試sample_interval=30的改善**
| 指標 | interval=30 | interval=38Hz |
|------|-------------|-------------------|
| 每秒取樣數 | ~0.8 | **~8** |
| lip 可分析幀 | 稀疏,無連續性 | **連續,可計算 baseline** |
| 可判定 speaker | ~26% | **~49%** |

View File

@@ -0,0 +1,87 @@
# 場景分類缺口分析
## 現狀
Places365ResNet18, CoreML ANE已被棄用 — 對 Charade 只偵測到 1 個 scene class"door"),無實用價值。
## 缺口
CUT processor 產出 1130 個 scene boundary但沒有任何 metadata 描述場景性質:
- 室內/室外?
- 白天/夜晚?
- 靜態對話/動作場面?
- 近景/遠景?
- 情緒(緊張/輕鬆)?
## 填補方案比較
### A. 5W1H+ prompt 延伸(最快)
在目前的 5W1H+ prompt 中加入場景分類LLM 直接輸出。
```json
{
"scene_summary": "...",
"scene_type": "dialogue_interior",
"setting": "restaurant",
"lighting": "low_key",
"mood": "tense",
"shot_scale": "medium",
...
}
```
| 面向 | 評估 |
|------|------|
| 開發量 | 🟢 改 prompt 即可 |
| 正確性 | ⚠️ 仰賴 LLM 對場景的理解 |
| 成本 | 🟢 不增加額外 LLM call已包含在 5W1H+ |
| 可擴展 | ✅ 可任意增加分類維度 |
### B. ffmpeg 物理特徵M4 實驗方向)
用 ffmpeg 內建 filter 對每個 scene 提取訊號:
| 特徵 | ffmpeg filter | 可推論 |
|------|-------------|--------|
| Y 亮度均值 | signalstats | 白天/夜晚/室內 |
| 運動量 | flow/mestimate | 動作/靜態 |
| 音量 | volumedetect | 安靜/吵鬧 |
| 對話/靜音 | silencedetect | 對話/過場 |
| 色彩 | signalstats U/V | 色調 |
| 面向 | 評估 |
|------|------|
| 開發量 | 🟡 需實作 scene-level 批次分析 |
| 正確性 | ✅ 客觀數據 |
| 成本 | 🟢 ffmpeg 內建 |
| 限制 | ❌ 無法分辨場景類型(餐廳/辦公室/街頭) |
### C. YOLO 物件統計
從現有 YOLO pre_chunks 分析每個 scene 的物件分布:
| 物件 | 推論場景 |
|------|---------|
| car, truck, traffic light | 街頭/戶外 |
| bed, sofa, TV | 室內/居家 |
| dining table, bottle, wine glass | 餐廳/酒吧 |
| person × 1 | 獨白/近景 |
| person × 3+ | 群戲 |
| 面向 | 評估 |
|------|------|
| 開發量 | 🟢 查 pre_chunks 即可 |
| 正確性 | ⚠️ 僅物件層次 |
| 成本 | 🟢 已存在 |
## 建議A + B + C 三層次
| 層次 | 方法 | 產出 | 優先級 |
|------|------|------|--------|
| 1 | 5W1H+ prompt 延伸A | 場景類型、設定、情緒 | 🥇 立即 |
| 2 | YOLO 物件統計C | 物件分布、人數 | 🥈 短期 |
| 3 | ffmpeg 物理特徵B | 亮度、運動、音量曲線 | 🥉 中期 |
Layer 1 最簡單5W1H+ 已經每 scene 呼叫 LLM多加幾個 JSON field 零成本。

View File

@@ -0,0 +1,240 @@
# Momentry Model — 分階段交付
## 核心架構
```
Pipeline (training)
│ 每個 processor 產出 .json
│ Rule 1/3 Ingestion → chunks + embeddings
momentry model for {video} ← 每部影片 = 一個 model
│ release/phase1/latest/
│ release/phase2/latest/
momentry core (inference engine) ← Rust API server
│ momentry_playground (dev)
│ momentry (production)
Search / Query / Identity APIs
```
- **Pipeline** = training phase影片 → processor output → chunks → embeddings
- **Model** = 每部影片的產出 packageoutput_json + chunks + vectors
- **Engine** = momentry core吃 model 提供 APIsearch, trace, identity
每個影片可有多個 model 版本,命名保留升級空間:
| Model 版本 | Qdrant Collection | 內容 | 觸發時機 |
|-----------|------------------|------|---------|
| `{uuid}_v1` | `momentry_dev_v1` | sentence chunk embeddingbase | ASR + ASRX + Rule 1 完成 |
| `{uuid}_v2` | `momentry_dev_v2` | 完整 pipeline + 5W1H | 全部完成 |
| `{uuid}_v3` | `momentry_dev_v3` | object identity + custom detector | v2 + object instance matching 完成 |
各版本共存不覆蓋。
## 階段劃分
### Phase 1Sentence Chunk Embeddingbase model
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
**交付內容**:
- `{uuid}.asr.json`
- `{uuid}.asrx.json`
- chunkschunk_type = 'sentence'
- chunk_vectorssentence embedding
**用途**: 終端使用者可進行語意搜尋
### Phase 2完整 Pipelinev2 model
**觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
**交付內容**:
- Phase 1 全部內容
- 所有 `{uuid}.*.json`cut, yolo, face, pose, ocr, ...
- chunkschunk_type = 'cut', 'visual', 'trace', 'story'
- chunk_vectorssummary embedding
- identities / identity_bindings / face_detections
**用途**: 完整搜尋 + 摘要 + 人物識別
---
## Worker Pipeline
```
ASR 完成 → ASRX 完成
Rule 1 Ingestion (sentence chunks)
vectorize_chunks (sentence embedding)
📦 Phase 1 release ───→ release/phase1/latest/ (base model)
其他 processors 繼續 (yolo, face, pose, ocr, ...)
Rule 3 Ingestion + 5W1H Agent
📦 Phase 2 release ───→ release/phase2/latest/ (full model)
```
## 產出目錄結構
```
release/
├── phase1/
│ ├── {version}_{timestamp}/
│ │ ├── output_json/ ← 所有已完成的 .json
│ │ ├── chunks.csv ← sentence chunks
│ │ ├── vectors.csv ← sentence embeddings
│ │ ├── schema.sql ← chunks table DDL
│ │ └── RELEASE_INFO.txt
│ └── latest → {version}_{timestamp}
└── phase2/
├── {version}_{timestamp}/
│ ├── output_json/ ← 所有 .json
│ ├── chunks.csv ← 所有 chunks
│ ├── vectors.csv ← 所有 embeddings
│ ├── identities.csv ← 人物身分
│ ├── schema.sql ← 完整 schema
│ └── RELEASE_INFO.txt
└── latest → {version}_{timestamp}
```
## momentry model vs momentry core
| | momentry model | momentry core |
|---|---|---|
| 類比 | 訓練好的 weights | inference engine |
| 內容 | `.json` + chunks + vectors | Rust binary |
| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
| 版本 | `{uuid}_v1`base / `{uuid}_v2` / `{uuid}_v3` | `momentry_playground` / `momentry` |
| 交付對象 | 終端使用者 | 部署工程師 |
---
## Wiki 機制:每個 model 都可被調整
每個 momentry model`{uuid}_v1` / `v2` / `v3`)不只是唯讀的產出,而是可透過 wiki 機制持續改善。
### 與傳統 RAG 的區別
| | 傳統 RAG | momentry wiki |
|---|---|---|
| 知識儲存 | vector DBephemeral | model packagepermanent |
| 修正方式 | query 時 LLM 決定是否採用 | 使用者/Agent 直接編輯 |
| 修正持久性 | ❌ 下次 query 就消失 | ✅ 寫入 model版本化保存 |
| 模型改進 | 無(僅改變 prompt | 下次 version bump 時合併為 ground truth |
| 協作方式 | 單向retrieve → generate | 雙向(編輯 → 合併 → 改進) |
| 離線可用 | ❌ 需 vector DB + LLM | ✅ 離線查閱 wiki 目錄 |
**momentry wiki 不是 RAG 的替代品,而是 model 的生命週期管理機制。**
### 概念
```
momentry model (release package)
├── output_json/ ← 唯讀processor 產出
├── chunks.csv ← 唯讀ingestion 產出
├── vectors.csv ← 唯讀embedding 產出
└── wiki/ ← 可編輯,使用者貢獻知識
├── identities.json ← "trace 5 = Audrey Hepburn"
├── objects.json ← "object 42 = 郵票 #1"
├── corrections.json ← "ASR 'Hello' → 'Halo'"
└── changelog.json ← 編輯歷史
```
### 資料流向
```
使用者/Agent 編輯 wiki
DB wiki_entries + wiki_revisions 寫入
下次 release 打包時 merge 進 model
TKG label 更新 (tkg_nodes.label)
新版 model version bump
```
### 與 TKG 的關係
wiki 的 identity 和 object 標註會回寫到 TKG node label
```
(face_trace:5) label="Audrey Hepburn" ← wiki 編輯
(object_instance:42) label="郵票 #1" ← wiki 編輯
```
這些編輯累積後,可做為下一版 model training 的 ground truth。
### 實作方向
**DB 層** — 新 table `wiki_entries` + `wiki_revisions`
```sql
wiki_entries (target_type, target_id, title, body, summary, status, version, file_uuid)
wiki_revisions (entry_id, version, title, body, summary, change_summary, edited_by)
```
**API 層** — CRUD + 版本歷史:
```
GET /api/v1/wiki/{target_type}/{target_id}
PUT /api/v1/wiki/{target_type}/{target_id}
GET /api/v1/wiki/{target_type}/{target_id}/revisions
POST /api/v1/wiki/search
```
**打包層**`release_pack.py` 加入 wiki 匯出,與 model 共存
---
## Phase 3Object Identityv3 model
### 目標
從影片中提取關鍵物體(郵票、手槍、信封、放大鏡...),對同類物體做 instance-level 的跨畫面追蹤與辨識,達到類似 face trace 的效果 — 不只是 detect class還能區分「這一張郵票」vs「那一張郵票」。
### 現狀問題
1. **COCO 80 類不包含關鍵物體** — 郵票、手槍、信封、放大鏡等不在 COCO 資料集中
2. **YOLOv5nano 偵測率低** — 即使是 COCO 類別knife, cell phone在 nano 模型上 recall 不足
3. **無 object instance matching** — 目前只有 frame-level detection沒有跨 frame 的物體追蹤
### 技術方向
```
YOLOv8m/OWL-ViT → 改善 detection coverage
Object Tracker (IoU + embedding類似 face tracker)
object_trace → TKG CO_OCCURS_WITH edges
object identity → 同物體跨場景辨識
```
| 方向 | 方法 | 效果 |
|------|------|------|
| Model upgrade | `yolov5nu``yolov8s.pt` / `yolov8m.pt` | COCO recall 提升 |
| Custom fine-tune | 收集 stamps/guns 資料 fine-tune YOLO | 可偵測非 COCO 物件 |
| Zero-shot | OWL-ViT / Grounding DINO by text prompt | 不用 training但速度慢 |
| Object trace | IoU + embedding 跨 frame 匹配 | instance-level 追蹤 |
| Object identity | clustering 跨場景辨識同一物體 | 可在全片搜尋「這把槍」 |
### 與 TKG 整合
```
face_trace -[:CO_OCCURS_WITH]-> object_instance:5 (這把槍)
face_trace -[:CO_OCCURS_WITH]-> object_instance:42 (這張郵票)
查詢: "Audrey Hepburn 拿這把槍的畫面"
→ face_trace:5 -[:SPEAKS_AS]-> SPEAKER_0
→ face_trace:5 -[:CO_OCCURS_WITH]-> object_instance:5
```
### 交付順序
1. YOLO model upgrade低難度立即見效
2. Object tracker中難度參考 face tracker 實作)
3. Custom fine-tune / zero-shot高難度需資料或新模型

View File

@@ -0,0 +1,244 @@
diff --git a/src/core/chunk/mod.rs b/src/core/chunk/mod.rs
index 14226fd..75e4d80 100644
--- a/src/core/chunk/mod.rs
+++ b/src/core/chunk/mod.rs
@@ -1,9 +1,11 @@
pub mod rule1_ingest;
pub mod rule3_ingest;
pub mod splitter;
+pub mod trace_ingest;
pub mod types;
pub use rule1_ingest::execute_rule1;
pub use rule3_ingest::ingest_rule3;
+pub use trace_ingest::ingest_traces;
pub use splitter::{AsrSegment, ChunkSplitter};
pub use types::{Chunk, ChunkType};
diff --git a/src/core/chunk/trace_ingest.rs b/src/core/chunk/trace_ingest.rs
new file mode 100644
index 0000000..3821cc7
--- /dev/null
+++ b/src/core/chunk/trace_ingest.rs
@@ -0,0 +1,222 @@
+use crate::core::chunk::types::{Chunk, ChunkRule, ChunkType};
+use crate::core::db::schema;
+use crate::core::db::PostgresDb;
+use anyhow::{Context, Result};
+use sqlx::Row;
+use tracing::{error, info};
+
+pub async fn ingest_traces(db: &PostgresDb, file_uuid: &str) -> Result<usize> {
+ let pool = db.pool();
+ let face_table = schema::table_name("face_detections");
+ let pre_table = schema::table_name("pre_chunks");
+
+ let video = db
+ .get_video_by_uuid(file_uuid)
+ .await?
+ .context("Video not found")?;
+ let file_id = video.id as i32;
+ let fps = video.fps;
+
+ let traces = sqlx::query_as::<_, TraceAgg>(&format!(
+ r#"
+ SELECT trace_id,
+ MIN(frame_number) AS first_frame,
+ MAX(frame_number) AS last_frame,
+ MIN(timestamp_secs) AS first_time,
+ MAX(timestamp_secs) AS last_time,
+ COUNT(*) AS face_count,
+ AVG(x)::float8 AS avg_x,
+ AVG(y)::float8 AS avg_y,
+ AVG(width)::float8 AS avg_w,
+ AVG(height)::float8 AS avg_h
+ FROM {}
+ WHERE file_uuid = $1 AND trace_id IS NOT NULL
+ GROUP BY trace_id
+ ORDER BY trace_id
+ "#,
+ face_table
+ ))
+ .bind(file_uuid)
+ .fetch_all(pool)
+ .await?;
+
+ if traces.is_empty() {
+ info!("No traces found for {}", file_uuid);
+ return Ok(0);
+ }
+
+ let asr_segments = sqlx::query_as::<_, AsrSegment>(&format!(
+ r#"
+ SELECT start_frame, end_frame, start_time, end_time, data
+ FROM {}
+ WHERE file_uuid = $1 AND processor_type = 'asr'
+ ORDER BY start_frame
+ "#,
+ pre_table
+ ))
+ .bind(file_uuid)
+ .fetch_all(pool)
+ .await?;
+
+ // 計算 pairwise trace 重疊關係
+ let overlaps = compute_overlaps(&traces);
+
+ let mut count = 0;
+ for trace in &traces {
+ let text = collect_overlapping_text(&asr_segments, trace.first_time, trace.last_time);
+
+ let bbox = serde_json::json!({
+ "x": trace.avg_x,
+ "y": trace.avg_y,
+ "width": trace.avg_w,
+ "height": trace.avg_h,
+ });
+
+ // 與此 trace 同框的其他 trace
+ let co_appearances: Vec<serde_json::Value> = overlaps
+ .iter()
+ .filter(|o| o.trace_id == trace.trace_id)
+ .map(|o| {
+ serde_json::json!({
+ "trace_id": o.other_trace_id,
+ "overlap_frames": o.overlap_frames,
+ "overlap_secs": (o.overlap_frames as f64 / fps * 100.0).round() / 100.0,
+ })
+ })
+ .collect();
+
+ let metadata = serde_json::json!({
+ "trace_id": trace.trace_id,
+ "face_count": trace.face_count,
+ "bbox": bbox,
+ "co_appearances": co_appearances,
+ });
+
+ let chunk = Chunk::new(
+ file_id,
+ file_uuid.to_string(),
+ (count + 1) as u32,
+ ChunkType::Trace,
+ ChunkRule::Rule1,
+ trace.first_frame as i64,
+ trace.last_frame as i64,
+ fps,
+ metadata.clone(),
+ )
+ .with_text_content(text)
+ .with_metadata(metadata)
+ .with_frame_count(trace.face_count as i32);
+
+ if let Err(e) = db.store_chunk(&chunk).await {
+ error!("Failed to store trace chunk {}: {}", trace.trace_id, e);
+ } else {
+ let preview = chunk.text_content.as_deref().unwrap_or("").chars().take(60).collect::<String>();
+ let co = chunk.metadata.as_ref()
+ .and_then(|m| m.get("co_appearances"))
+ .and_then(|c| c.as_array())
+ .map(|a| a.len())
+ .unwrap_or(0);
+ info!(
+ "Trace chunk {}: trace_id={} frames={}-{} faces={} co_appear={} text={}",
+ chunk.chunk_id, trace.trace_id,
+ trace.first_frame, trace.last_frame,
+ trace.face_count, co, preview,
+ );
+ count += 1;
+ }
+ }
+
+ info!("Ingested {} trace chunks for {}", count, file_uuid);
+ Ok(count)
+}
+
+/// 計算所有 trace pair 之間在時間上的重疊 frame 數
+struct TraceOverlap {
+ trace_id: i32,
+ other_trace_id: i32,
+ overlap_frames: i64,
+}
+
+fn compute_overlaps(traces: &[TraceAgg]) -> Vec<TraceOverlap> {
+ let mut result = Vec::new();
+ for (i, a) in traces.iter().enumerate() {
+ for b in traces.iter().skip(i + 1) {
+ let overlap_start = a.first_frame.max(b.first_frame);
+ let overlap_end = a.last_frame.min(b.last_frame);
+ let frames = overlap_end - overlap_start;
+ if frames > 0 {
+ result.push(TraceOverlap {
+ trace_id: a.trace_id,
+ other_trace_id: b.trace_id,
+ overlap_frames: frames,
+ });
+ result.push(TraceOverlap {
+ trace_id: b.trace_id,
+ other_trace_id: a.trace_id,
+ overlap_frames: frames,
+ });
+ }
+ }
+ }
+ result
+}
+
+fn collect_overlapping_text(segments: &[AsrSegment], start_time: f64, end_time: f64) -> String {
+ let mut texts: Vec<&str> = Vec::new();
+ for seg in segments {
+ if seg.end_time >= start_time && seg.start_time <= end_time {
+ if let Some(t) = seg.text() {
+ texts.push(t);
+ }
+ }
+ }
+ texts.join(" ")
+}
+
+#[derive(Debug, sqlx::FromRow)]
+struct TraceAgg {
+ trace_id: i32,
+ first_frame: i64,
+ last_frame: i64,
+ first_time: f64,
+ last_time: f64,
+ face_count: i64,
+ avg_x: f64,
+ avg_y: f64,
+ avg_w: f64,
+ avg_h: f64,
+}
+
+struct AsrSegment {
+ start_frame: i64,
+ end_frame: i64,
+ start_time: f64,
+ end_time: f64,
+ data: serde_json::Value,
+}
+
+impl<'r> sqlx::FromRow<'r, sqlx::postgres::PgRow> for AsrSegment {
+ fn from_row(row: &'r sqlx::postgres::PgRow) -> Result<Self, sqlx::Error> {
+ Ok(Self {
+ start_frame: row.try_get("start_frame")?,
+ end_frame: row.try_get("end_frame")?,
+ start_time: row.try_get("start_time")?,
+ end_time: row.try_get("end_time")?,
+ data: row.try_get("data")?,
+ })
+ }
+}
+
+impl AsrSegment {
+ fn text(&self) -> Option<&str> {
+ self.data
+ .get("text")
+ .and_then(|v| v.as_str())
+ .or_else(|| {
+ self.data
+ .get("data")
+ .and_then(|d| d.get("text"))
+ .and_then(|v| v.as_str())
+ })
+ }
+}

View File

@@ -0,0 +1,17 @@
diff --git a/src/core/processor/executor.rs b/src/core/processor/executor.rs
index 494ee2b..fc604bc 100644
--- a/src/core/processor/executor.rs
+++ b/src/core/processor/executor.rs
@@ -244,8 +244,10 @@ impl PythonExecutor {
.and_then(|c| serde_json::from_str::<serde_json::Value>(&c).ok())
.is_some();
if is_valid {
- let _ = std::fs::rename(tmp, out);
- tracing::warn!("[Executor] Partial output preserved: {:?}", out);
+ let mut partial_path = out.to_path_buf();
+ partial_path.set_extension("json.partial");
+ let _ = std::fs::rename(tmp, &partial_path);
+ tracing::warn!("[Executor] Partial output preserved: {:?}", partial_path);
} else {
let mut err_path = out.to_path_buf();
err_path.set_extension("json.err");

View File

@@ -0,0 +1,52 @@
diff --git a/src/worker/job_worker.rs b/src/worker/job_worker.rs
index dceb674..4accd3e 100644
--- a/src/worker/job_worker.rs
+++ b/src/worker/job_worker.rs
@@ -681,6 +681,21 @@ impl JobWorker {
error!("❌ Auto-vectorize failed for {}: {}", uuid_clone, e);
}
}
+ // Phase 1 release: sentence chunk embedding 交付
+ info!("📦 Phase 1 release packaging...");
+ let executor = match crate::core::processor::PythonExecutor::new() {
+ Ok(ex) => ex,
+ Err(e) => { error!("Failed PythonExecutor for release pack: {}", e); return; }
+ };
+ match executor.run(
+ "release_pack.py",
+ &["--phase", "1", "--file-uuid", &uuid_clone],
+ None, "RELEASE_P1",
+ Some(std::time::Duration::from_secs(120)),
+ ).await {
+ Ok(()) => info!("✅ Phase 1 release packaged for {}", uuid_clone),
+ Err(e) => error!("❌ Phase 1 release pack failed: {}", e),
+ }
}
Err(e) => error!("❌ Rule 1 Ingestion failed: {}", e),
}
@@ -830,7 +845,24 @@ impl JobWorker {
tokio::spawn(async move {
tokio::time::sleep(tokio::time::Duration::from_secs(30)).await;
match run_5w1h_agent(&db_clone, &uuid_clone).await {
- Ok(()) => info!("✅ 5W1H Agent completed for {}", uuid_clone),
+ Ok(()) => {
+ info!("✅ 5W1H Agent completed for {}", uuid_clone);
+ // Phase 2 release: full pipeline 交付
+ info!("📦 Phase 2 release packaging...");
+ let executor = match crate::core::processor::PythonExecutor::new() {
+ Ok(ex) => ex,
+ Err(e) => { error!("Failed PythonExecutor for release pack: {}", e); return; }
+ };
+ match executor.run(
+ "release_pack.py",
+ &["--phase", "2", "--file-uuid", &uuid_clone],
+ None, "RELEASE_P2",
+ Some(std::time::Duration::from_secs(120)),
+ ).await {
+ Ok(()) => info!("✅ Phase 2 release packaged for {}", uuid_clone),
+ Err(e) => error!("❌ Phase 2 release pack failed: {}", e),
+ }
+ }
Err(e) => error!("❌ 5W1H Agent failed for {}: {}", uuid_clone, e),
}
});

View File

@@ -0,0 +1,111 @@
diff --git a/src/api/universal_search.rs b/src/api/universal_search.rs
index 054a1f4..2fc9520 100644
--- a/src/api/universal_search.rs
+++ b/src/api/universal_search.rs
@@ -20,6 +20,8 @@ pub struct UniversalSearchRequest {
pub types: Vec<String>, // chunk, frame, person
pub time_range: Option<[f64; 2]>,
pub filters: Option<SearchFilters>,
+ pub page: Option<usize>,
+ pub page_size: Option<usize>,
pub limit: Option<usize>,
pub offset: Option<usize>,
}
@@ -31,6 +33,10 @@ pub struct SearchFilters {
pub ocr_text: Option<String>,
pub has_face: Option<bool>,
pub speaker_id: Option<String>,
+ /// 指定 chunk_type如 "sentence", "cut", "trace", "visual"
+ pub chunk_type: Option<String>,
+ /// 搜尋與指定 trace_id 有時間重疊的 trace chunk
+ pub co_appears_with_trace_id: Option<i32>,
// Visual chunk filters
pub min_confidence: Option<f32>,
pub min_unique_classes: Option<u32>,
@@ -44,6 +50,8 @@ pub struct UniversalSearchResponse {
pub query: String,
pub results: Vec<SearchResult>,
pub total: usize,
+ pub page: usize,
+ pub page_size: usize,
pub took_ms: u64,
}
@@ -108,8 +116,14 @@ pub async fn universal_search(
)
})?;
- let limit = req.limit.unwrap_or(20);
- let offset = req.offset.unwrap_or(0);
+ let page = req.page.unwrap_or(1).max(1);
+ let page_size = req.page_size.unwrap_or(20).max(1).min(200);
+ // Backward compat: if old `offset` is used without `page`, derive from offset
+ let offset = if req.page.is_none() && req.offset.is_some() {
+ req.offset.unwrap()
+ } else {
+ (page - 1) * page_size
+ };
let types = if req.types.is_empty() {
vec![
"chunk".to_string(),
@@ -163,7 +177,8 @@ pub async fn universal_search(
});
let total = results.len();
- let end = std::cmp::min(offset + limit, results.len());
+ let effective_limit = req.limit.unwrap_or(usize::MAX);
+ let end = std::cmp::min(offset + page_size, results.len()).min(effective_limit);
let paginated = if offset < results.len() {
results[offset..end].to_vec()
} else {
@@ -176,6 +191,8 @@ pub async fn universal_search(
query: req.query,
results: paginated,
total,
+ page,
+ page_size,
took_ms: took,
}))
}
@@ -378,10 +395,22 @@ async fn search_chunks(
sql.push_str(&format!(" AND ({})", class_conditions.join(" OR ")));
}
}
+ if let Some(ref chunk_type) = filters.chunk_type {
+ sql.push_str(&format!(
+ " AND chunk_type = '{}'",
+ chunk_type.replace('\'', "''")
+ ));
+ }
+ if let Some(trace_id) = filters.co_appears_with_trace_id {
+ sql.push_str(&format!(
+ " AND metadata->'co_appearances' @> '[{{ \"trace_id\": {} }}]'",
+ trace_id
+ ));
+ }
}
sql.push_str(" ORDER BY start_time ASC");
- sql.push_str(&format!(" LIMIT {}", req.limit.unwrap_or(20)));
+ sql.push_str(&format!(" LIMIT {}", req.page_size.unwrap_or(20)));
let rows: Vec<(
String,
@@ -495,7 +524,7 @@ async fn search_frames_internal(
}
sql.push_str(" ORDER BY f.timestamp ASC");
- sql.push_str(&format!(" LIMIT {}", req.limit.unwrap_or(20)));
+ sql.push_str(&format!(" LIMIT {}", req.page_size.unwrap_or(20)));
let rows: Vec<(
i64,
@@ -575,7 +604,7 @@ async fn search_persons_internal(
}
sql.push_str(" ORDER BY appearance_count DESC");
- sql.push_str(&format!(" LIMIT {}", req.limit.unwrap_or(20)));
+ sql.push_str(&format!(" LIMIT {}", req.page_size.unwrap_or(20)));
let rows: Vec<(
String,

View File

@@ -0,0 +1,153 @@
diff --git a/scripts/tkg_builder.py b/scripts/tkg_builder.py
index 31ccf8a..8941d7f 100644
--- a/scripts/tkg_builder.py
+++ b/scripts/tkg_builder.py
@@ -365,6 +365,73 @@ def build_speaker_face_edges(cur, schema, file_uuid):
return edge_count
+def build_face_face_edges(cur, schema, file_uuid):
+ """Build CO_OCCURS_WITH edges: face_trace ↔ face_trace in same frame"""
+ print("[TKG] Building face-face co-occurrence edges...")
+
+ cur.execute(
+ f"""
+ SELECT a.trace_id AS tid_a, b.trace_id AS tid_b,
+ a.frame_number, a.timestamp_secs,
+ a.x AS ax, a.y AS ay, a.width AS aw, a.height AS ah,
+ b.x AS bx, b.y AS by, b.width AS bw, b.height AS bh
+ FROM {schema}.face_detections a
+ JOIN {schema}.face_detections b
+ ON a.file_uuid = b.file_uuid
+ AND a.frame_number = b.frame_number
+ AND a.trace_id < b.trace_id
+ WHERE a.file_uuid = %s
+ AND a.trace_id IS NOT NULL
+ AND b.trace_id IS NOT NULL
+ ORDER BY a.frame_number
+ """,
+ (file_uuid,),
+ )
+ rows = cur.fetchall()
+ if not rows:
+ print("[TKG] No face-face co-occurrences found")
+ return 0
+
+ # Deduplicate by pair (group all frames where same two traces co-occur)
+ pair_first = {}
+ pair_frames = {}
+ for tid_a, tid_b, frame, ts, ax, ay, aw, ah, bx, by, bw, bh in rows:
+ key = (min(tid_a, tid_b), max(tid_a, tid_b))
+ if key not in pair_first:
+ pair_first[key] = frame
+ pair_frames.setdefault(key, []).append(frame)
+
+ edge_count = 0
+ for (tid_a, tid_b), frames in pair_frames.items():
+ cur.execute(
+ f"SELECT id FROM {schema}.tkg_nodes WHERE file_uuid=%s AND node_type='face_trace' AND external_id=%s",
+ (file_uuid, f"trace_{tid_a}"),
+ )
+ n_a = cur.fetchone()
+ cur.execute(
+ f"SELECT id FROM {schema}.tkg_nodes WHERE file_uuid=%s AND node_type='face_trace' AND external_id=%s",
+ (file_uuid, f"trace_{tid_b}"),
+ )
+ n_b = cur.fetchone()
+ if not n_a or not n_b:
+ continue
+
+ distance_px = ((frames[0] - frames[0]) ** 2) ** 0.5 # placeholder
+ ensure_edge(
+ cur, schema, file_uuid,
+ "CO_OCCURS_WITH",
+ n_a[0], n_b[0],
+ {
+ "first_frame": int(frames[0]),
+ "frame_count": len(frames),
+ },
+ )
+ edge_count += 1
+
+ print(f"[TKG] {edge_count} face-face co-occurrence edges created")
+ return edge_count
+
+
def main():
parser = argparse.ArgumentParser(description="Build Temporal Knowledge Graph")
parser.add_argument("--file-uuid", required=True)
@@ -382,17 +449,19 @@ def main():
e1 = build_co_occurrence_edges(cur, args.schema, args.file_uuid)
e2 = build_speaker_face_edges(cur, args.schema, args.file_uuid)
+ e3 = build_face_face_edges(cur, args.schema, args.file_uuid)
conn.commit()
cur.close()
conn.close()
- print(f"\n[TKG] Complete: {n1+n2+n3} nodes, {e1+e2} edges")
+ print(f"\n[TKG] Complete: {n1+n2+n3} nodes, {e1+e2+e3} edges")
print(f" Face traces: {n1}")
print(f" Objects: {n2}")
print(f" Speakers: {n3}")
print(f" Co-occur: {e1}")
print(f" Speaker-face:{e2}")
+ print(f" Face-face: {e3}")
if __name__ == "__main__":
diff --git a/src/worker/job_worker.rs b/src/worker/job_worker.rs
index 0f0ea1e..dceb674 100644
--- a/src/worker/job_worker.rs
+++ b/src/worker/job_worker.rs
@@ -713,6 +713,7 @@ impl JobWorker {
// Runs face_tracker.py (IoU+embedding tracking), stores trace_id + position in DB
if has_face {
info!("📝 Face completed, triggering face trace + DB store...");
+ let db_clone = self.db.clone();
let uuid_clone = uuid.to_string();
tokio::spawn(async move {
let executor = match crate::core::processor::PythonExecutor::new() {
@@ -744,6 +745,41 @@ impl JobWorker {
} else {
info!("✅ Qdrant face sync completed for {}", uuid_clone);
}
+
+ // Generate trace chunks from face_detections + ASR text
+ info!("📝 Generating trace chunks...");
+ match crate::core::chunk::trace_ingest::ingest_traces(
+ &db_clone,
+ &uuid_clone,
+ )
+ .await
+ {
+ Ok(n) => info!("✅ {} trace chunks created for {}", n, uuid_clone),
+ Err(e) => error!("❌ Trace chunk ingestion failed: {}", e),
+ }
+
+ // Build Temporal Knowledge Graph (TKG)
+ info!("📝 Building TKG graph...");
+ let executor = match crate::core::processor::PythonExecutor::new() {
+ Ok(ex) => ex,
+ Err(e) => {
+ error!("Failed to create PythonExecutor for TKG: {}", e);
+ return;
+ }
+ };
+ match executor
+ .run(
+ "tkg_builder.py",
+ &["--file-uuid", &uuid_clone],
+ Some(&uuid_clone),
+ "TKG_BUILDER",
+ Some(std::time::Duration::from_secs(300)),
+ )
+ .await
+ {
+ Ok(()) => info!("✅ TKG built for {}", uuid_clone),
+ Err(e) => error!("❌ TKG build failed for {}: {}", uuid_clone, e),
+ }
}
Err(e) => {
error!("❌ Face trace + DB store failed for {}: {}", uuid_clone, e)

View File

@@ -0,0 +1,150 @@
#!/usr/bin/env python3
"""
Release packaging — two non-overlapping phases.
Phase 1: ASR + ASRX + Rule 1 sentence chunks complete
Phase 2: Full pipeline + Rule 3 + 5W1H complete
Output: release/phase{N}/v{VERSION}_{TIMESTAMP}/
"""
import json
import os
import shutil
import subprocess
import sys
import time
from datetime import datetime, timezone
from pathlib import Path
PROJECT = Path(__file__).resolve().parent.parent
OUTPUT_DIR = Path(os.environ.get("MOMENTRY_OUTPUT_DIR", PROJECT / "output_dev"))
RELEASE_DIR = PROJECT / "release"
VERSION = "v1.0.0"
DB_USER = os.environ.get("USER", "accusys")
DB_NAME = "momentry"
QDRANT_URL = os.environ.get("QDRANT_URL", "http://localhost:6333")
QDRANT_COLLECTION = os.environ.get("QDRANT_COLLECTION", "momentry_dev_rule1_v2")
def ts():
return datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
def run_sql(sql: str) -> str:
r = subprocess.run(
["psql", "-U", DB_USER, "-d", DB_NAME, "-t", "-A", "-c", sql],
capture_output=True, text=True, timeout=30,
)
return r.stdout.strip()
def pack_phase(file_uuid: str, phase: int) -> Path:
"""Package deliverables for phase 1 or 2."""
phase_dir = RELEASE_DIR / f"phase{phase}"
stamp = ts()
pkg_dir = phase_dir / f"{VERSION}_{stamp}"
out_dir = pkg_dir / "output_json"
out_dir.mkdir(parents=True, exist_ok=True)
# 收集 processor output .json 檔
for f in OUTPUT_DIR.glob(f"{file_uuid}.*.json"):
if f.is_file():
shutil.copy2(f, out_dir / f.name)
# 收集 schema
schema_path = pkg_dir / "schema.sql"
with open(schema_path, "w") as fh:
subprocess.run(
["pg_dump", "-U", DB_USER, "-d", DB_NAME, "--schema=dev", "--schema-only",
"-T", "dev.monitor_jobs", "-T", "dev.processor_results"],
stdout=fh, text=True, timeout=60,
)
# 收集 chunks
chunks_csv = pkg_dir / "chunks.csv"
run_sql(f"\\COPY (SELECT * FROM dev.chunks WHERE file_uuid='{file_uuid}') TO '{chunks_csv}' CSV HEADER")
# 收集 vectors
vecs_csv = pkg_dir / "vectors.csv"
run_sql(f"\\COPY (SELECT * FROM dev.chunk_vectors WHERE uuid='{file_uuid}') TO '{vecs_csv}' CSV HEADER")
if phase >= 2:
faces_csv = pkg_dir / "face_detections.csv"
run_sql(f"\\COPY (SELECT * FROM dev.face_detections WHERE file_uuid='{file_uuid}') TO '{faces_csv}' CSV HEADER")
idents_csv = pkg_dir / "identities.csv"
run_sql(f"\\COPY (SELECT * FROM dev.identities) TO '{idents_csv}' CSV HEADER")
# 匯出 Qdrant collection 快照
import urllib.request
qdrant_path = pkg_dir / "qdrant_points.jsonl"
try:
offset = None
with open(qdrant_path, "w") as qf:
while True:
params = f"limit=1000&with_payload=true&with_vectors=true"
if offset is not None:
params += f"&offset={offset}"
url = f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points/scroll?{params}"
req = urllib.request.Request(url)
with urllib.request.urlopen(req, timeout=30) as resp:
data = json.loads(resp.read())
pts = data.get("result", {}).get("points", [])
if not pts:
break
for p in pts:
qf.write(json.dumps(p, ensure_ascii=False) + "\n")
# 從回傳的 next_page_offset 取得下一頁偏移量
offset = data.get("result", {}).get("next_page_offset")
if offset is None:
break
n_points = sum(1 for _ in open(qdrant_path) if _.strip())
print(f"[RELEASE] Qdrant: {n_points} points exported from '{QDRANT_COLLECTION}'")
except Exception as e:
print(f"[RELEASE] Qdrant export skipped: {e}")
if qdrant_path.exists():
qdrant_path.unlink()
# RELEASE_INFO
git_commit = subprocess.run(
["git", "-C", str(PROJECT), "rev-parse", "HEAD"],
capture_output=True, text=True, timeout=10,
).stdout.strip()
model_name = f"{file_uuid}_v1" if phase == 1 else f"{file_uuid}_v2"
info = pkg_dir / "RELEASE_INFO.txt"
with open(info, "w") as fh:
fh.write(f"Model: {model_name}\n")
fh.write(f"Phase: {phase}\n")
fh.write(f"Version: {VERSION}\n")
fh.write(f"Timestamp: {stamp}\n")
fh.write(f"File UUID: {file_uuid}\n")
fh.write(f"Qdrant Collection: {QDRANT_COLLECTION}\n")
fh.write(f"Git Commit: {git_commit}\n")
fh.write(f"Packaged at: {datetime.now(timezone.utc).isoformat()}\n")
# latest symlink
latest = phase_dir / "latest"
if latest.is_symlink():
latest.unlink()
if not latest.exists():
latest.symlink_to(pkg_dir.name, target_is_directory=True)
size = sum(f.stat().st_size for f in pkg_dir.rglob("*") if f.is_file())
print(f"[RELEASE] Phase {phase} packaged: {pkg_dir} ({size / 1024:.0f} KB)")
return pkg_dir
def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--phase", type=int, required=True, choices=[1, 2])
parser.add_argument("--file-uuid", required=True)
args = parser.parse_args()
pack_phase(args.file_uuid, args.phase)
if __name__ == "__main__":
main()