Files
momentry_core/docs/RELEASE_PHASES.md
Accusys 076af4cba1 docs: Phase 3 object identity design (v3 model)
- Object instance tracking (similar to face trace)
- Custom detector for stamps, guns, etc.
- TKG integration for object-face-speaker graph
- Upgrade path: yolov5nu → yolov8m, fine-tune, zero-shot
2026-05-09 14:22:37 +08:00

166 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Momentry Model — 分階段交付
## 核心架構
```
Pipeline (training)
│ 每個 processor 產出 .json
│ Rule 1/3 Ingestion → chunks + embeddings
momentry model for {video} ← 每部影片 = 一個 model
│ release/phase1/latest/
│ release/phase2/latest/
momentry core (inference engine) ← Rust API server
│ momentry_playground (dev)
│ momentry (production)
Search / Query / Identity APIs
```
- **Pipeline** = training phase影片 → processor output → chunks → embeddings
- **Model** = 每部影片的產出 packageoutput_json + chunks + vectors
- **Engine** = momentry core吃 model 提供 APIsearch, trace, identity
每個影片可有多個 model 版本,命名保留升級空間:
| Model 版本 | Qdrant Collection | 內容 | 觸發時機 |
|-----------|------------------|------|---------|
| `{uuid}_v1` | `momentry_dev_v1` | sentence chunk embeddingbase | ASR + ASRX + Rule 1 完成 |
| `{uuid}_v2` | `momentry_dev_v2` | 完整 pipeline + 5W1H | 全部完成 |
| `{uuid}_v3` | `momentry_dev_v3` | object identity + custom detector | v2 + object instance matching 完成 |
各版本共存不覆蓋。
## 階段劃分
### Phase 1Sentence Chunk Embeddingbase model
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
**交付內容**:
- `{uuid}.asr.json`
- `{uuid}.asrx.json`
- chunkschunk_type = 'sentence'
- chunk_vectorssentence embedding
**用途**: 終端使用者可進行語意搜尋
### Phase 2完整 Pipelinev2 model
**觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
**交付內容**:
- Phase 1 全部內容
- 所有 `{uuid}.*.json`cut, yolo, face, pose, ocr, ...
- chunkschunk_type = 'cut', 'visual', 'trace', 'story'
- chunk_vectorssummary embedding
- identities / identity_bindings / face_detections
**用途**: 完整搜尋 + 摘要 + 人物識別
---
## Worker Pipeline
```
ASR 完成 → ASRX 完成
Rule 1 Ingestion (sentence chunks)
vectorize_chunks (sentence embedding)
📦 Phase 1 release ───→ release/phase1/latest/ (base model)
其他 processors 繼續 (yolo, face, pose, ocr, ...)
Rule 3 Ingestion + 5W1H Agent
📦 Phase 2 release ───→ release/phase2/latest/ (full model)
```
## 產出目錄結構
```
release/
├── phase1/
│ ├── {version}_{timestamp}/
│ │ ├── output_json/ ← 所有已完成的 .json
│ │ ├── chunks.csv ← sentence chunks
│ │ ├── vectors.csv ← sentence embeddings
│ │ ├── schema.sql ← chunks table DDL
│ │ └── RELEASE_INFO.txt
│ └── latest → {version}_{timestamp}
└── phase2/
├── {version}_{timestamp}/
│ ├── output_json/ ← 所有 .json
│ ├── chunks.csv ← 所有 chunks
│ ├── vectors.csv ← 所有 embeddings
│ ├── identities.csv ← 人物身分
│ ├── schema.sql ← 完整 schema
│ └── RELEASE_INFO.txt
└── latest → {version}_{timestamp}
```
## momentry model vs momentry core
| | momentry model | momentry core |
|---|---|---|
| 類比 | 訓練好的 weights | inference engine |
| 內容 | `.json` + chunks + vectors | Rust binary |
| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
| 版本 | `{uuid}_v1`base / `{uuid}_v2` / `{uuid}_v3` | `momentry_playground` / `momentry` |
| 交付對象 | 終端使用者 | 部署工程師 |
---
## Phase 3Object Identityv3 model
### 目標
從影片中提取關鍵物體(郵票、手槍、信封、放大鏡...),對同類物體做 instance-level 的跨畫面追蹤與辨識,達到類似 face trace 的效果 — 不只是 detect class還能區分「這一張郵票」vs「那一張郵票」。
### 現狀問題
1. **COCO 80 類不包含關鍵物體** — 郵票、手槍、信封、放大鏡等不在 COCO 資料集中
2. **YOLOv5nano 偵測率低** — 即使是 COCO 類別knife, cell phone在 nano 模型上 recall 不足
3. **無 object instance matching** — 目前只有 frame-level detection沒有跨 frame 的物體追蹤
### 技術方向
```
YOLOv8m/OWL-ViT → 改善 detection coverage
Object Tracker (IoU + embedding類似 face tracker)
object_trace → TKG CO_OCCURS_WITH edges
object identity → 同物體跨場景辨識
```
| 方向 | 方法 | 效果 |
|------|------|------|
| Model upgrade | `yolov5nu``yolov8s.pt` / `yolov8m.pt` | COCO recall 提升 |
| Custom fine-tune | 收集 stamps/guns 資料 fine-tune YOLO | 可偵測非 COCO 物件 |
| Zero-shot | OWL-ViT / Grounding DINO by text prompt | 不用 training但速度慢 |
| Object trace | IoU + embedding 跨 frame 匹配 | instance-level 追蹤 |
| Object identity | clustering 跨場景辨識同一物體 | 可在全片搜尋「這把槍」 |
### 與 TKG 整合
```
face_trace -[:CO_OCCURS_WITH]-> object_instance:5 (這把槍)
face_trace -[:CO_OCCURS_WITH]-> object_instance:42 (這張郵票)
查詢: "Audrey Hepburn 拿這把槍的畫面"
→ face_trace:5 -[:SPEAKS_AS]-> SPEAKER_0
→ face_trace:5 -[:CO_OCCURS_WITH]-> object_instance:5
```
### 交付順序
1. YOLO model upgrade低難度立即見效
2. Object tracker中難度參考 face tracker 實作)
3. Custom fine-tune / zero-shot高難度需資料或新模型