docs: momentry model vs core architecture

Pipeline = training → produces momentry model per video
Core = inference engine → serves APIs from model
Phase 1 = tiny model (sentence chunks)
Phase 2 = full model (complete + 5W1H)
This commit is contained in:
Accusys
2026-05-09 14:03:00 +08:00
parent 28652f5b76
commit 227c647a43

View File

@@ -1,31 +1,57 @@
# Release 分階段交付
# Momentry Model — 分階段交付
## 核心架構
```
Pipeline (training)
│ 每個 processor 產出 .json
│ Rule 1/3 Ingestion → chunks + embeddings
momentry model for {video} ← 每部影片 = 一個 model
│ release/phase1/latest/
│ release/phase2/latest/
momentry core (inference engine) ← Rust API server
│ momentry_playground (dev)
│ momentry (production)
Search / Query / Identity APIs
```
- **Pipeline** = training phase影片 → processor output → chunks → embeddings
- **Model** = 每部影片的產出 packageoutput_json + chunks + vectors
- **Engine** = momentry core吃 model 提供 APIsearch, trace, identity
如同模型的 tiny / small / medium / large每個影片可有多個 model 版本:
| Model 版本 | 內容 | 觸發時機 |
|-----------|------|---------|
| `{uuid}_v1_tiny` | sentence chunk embedding | ASR + ASRX + Rule 1 完成 |
| `{uuid}_v2_full` | 完整 pipeline + 5W1H | 全部完成 |
各版本共存不覆蓋。
## 階段劃分
### Phase 1Sentence Chunk Embedding 交付
### Phase 1Sentence Chunk Embeddingtiny model
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion 完成
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
**交付內容**:
- `{uuid}.asr.json`
- `{uuid}.asrx.json`
- chunkschunk_type = 'sentence'
- chunk_vectorssentence embedding
- DB schema + chunks table data
**用途**: 終端使用者可進行語意搜尋
### Phase 25W1H Summary Chunk Embedding 交付
### Phase 2完整 Pipelinefull model
**觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
**交付內容**:
- Phase 1 全部內容
- `{uuid}.cut.json`
- `{uuid}.yolo.json`
- `{uuid}.face.json`
- `{uuid}.pose.json`
- `{uuid}.ocr.json`
- 所有 `{uuid}.*.json`cut, yolo, face, pose, ocr, ...
- chunkschunk_type = 'cut', 'visual', 'trace', 'story'
- chunk_vectorssummary embedding
- identities / identity_bindings / face_detections
@@ -34,20 +60,22 @@
---
## Worker Pipeline 整合
## Worker Pipeline
```
ASR 完成 → ASRX 完成
Rule 1 Ingestion (sentence chunks)
Phase 1 Release Packaging ← 自動
其餘 Processors 繼續
Rule 3 Ingestion (cut chunks + 5W1H summary)
Phase 2 Release Packaging ← 自動
ASR 完成 → ASRX 完成
Rule 1 Ingestion (sentence chunks)
vectorize_chunks (sentence embedding)
📦 Phase 1 release ───→ release/phase1/latest/ (tiny model)
其他 processors 繼續 (yolo, face, pose, ocr, ...)
Rule 3 Ingestion + 5W1H Agent
📦 Phase 2 release ───→ release/phase2/latest/ (full model)
```
## 產出目錄結構
@@ -55,21 +83,31 @@ ASR 完成 → ASRX 完成
```
release/
├── phase1/
│ ├── v1.0.0_20260509_120000/
│ │ ├── output_json/ ← asr.json, asrx.json
│ │ ├── schema.sqlchunks table DDL
│ │ ├── chunks.csv ← sentence chunks data
│ ├── {version}_{timestamp}/
│ │ ├── output_json/ ← 所有已完成的 .json
│ │ ├── chunks.csvsentence chunks
│ │ ├── vectors.csv ← sentence embeddings
│ │ ├── schema.sql ← chunks table DDL
│ │ └── RELEASE_INFO.txt
│ └── latest → v1.0.0_20260509_120000
│ └── latest → {version}_{timestamp}
└── phase2/
├── v1.0.0_20260509_140000/
│ ├── output_json/ ← all processor outputs
│ ├── schema.sqlfull schema
│ ├── chunks.csv all chunks
│ ├── vectors.csv ← all embeddings
│ ├── identities.csv ← person identities
├── {version}_{timestamp}/
│ ├── output_json/ ← 所有 .json
│ ├── chunks.csv所有 chunks
│ ├── vectors.csv ← 所有 embeddings
│ ├── identities.csv ← 人物身分
│ ├── schema.sql ← 完整 schema
│ └── RELEASE_INFO.txt
└── latest → v1.0.0_20260509_140000
└── latest → {version}_{timestamp}
```
## momentry model vs momentry core
| | momentry model | momentry core |
|---|---|---|
| 類比 | 訓練好的 weights | inference engine |
| 內容 | `.json` + chunks + vectors | Rust binary |
| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
| 版本 | `{uuid}_v1_tiny` / `{uuid}_v2_full` | `momentry_playground` / `momentry` |
| 交付對象 | 終端使用者 | 部署工程師 |