docs: momentry model vs core architecture
Pipeline = training → produces momentry model per video Core = inference engine → serves APIs from model Phase 1 = tiny model (sentence chunks) Phase 2 = full model (complete + 5W1H)
This commit is contained in:
@@ -1,31 +1,57 @@
|
||||
# Release 分階段交付
|
||||
# Momentry Model — 分階段交付
|
||||
|
||||
## 核心架構
|
||||
|
||||
```
|
||||
Pipeline (training)
|
||||
│ 每個 processor 產出 .json
|
||||
│ Rule 1/3 Ingestion → chunks + embeddings
|
||||
▼
|
||||
momentry model for {video} ← 每部影片 = 一個 model
|
||||
│ release/phase1/latest/
|
||||
│ release/phase2/latest/
|
||||
▼
|
||||
momentry core (inference engine) ← Rust API server
|
||||
│ momentry_playground (dev)
|
||||
│ momentry (production)
|
||||
▼
|
||||
Search / Query / Identity APIs
|
||||
```
|
||||
|
||||
- **Pipeline** = training phase:影片 → processor output → chunks → embeddings
|
||||
- **Model** = 每部影片的產出 package(output_json + chunks + vectors)
|
||||
- **Engine** = momentry core,吃 model 提供 API(search, trace, identity)
|
||||
|
||||
如同模型的 tiny / small / medium / large,每個影片可有多個 model 版本:
|
||||
|
||||
| Model 版本 | 內容 | 觸發時機 |
|
||||
|-----------|------|---------|
|
||||
| `{uuid}_v1_tiny` | sentence chunk embedding | ASR + ASRX + Rule 1 完成 |
|
||||
| `{uuid}_v2_full` | 完整 pipeline + 5W1H | 全部完成 |
|
||||
|
||||
各版本共存不覆蓋。
|
||||
|
||||
## 階段劃分
|
||||
|
||||
### Phase 1:Sentence Chunk Embedding 交付
|
||||
### Phase 1:Sentence Chunk Embedding(tiny model)
|
||||
|
||||
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion 完成
|
||||
**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
|
||||
|
||||
**交付內容**:
|
||||
- `{uuid}.asr.json`
|
||||
- `{uuid}.asrx.json`
|
||||
- chunks(chunk_type = 'sentence')
|
||||
- chunk_vectors(sentence embedding)
|
||||
- DB schema + chunks table data
|
||||
|
||||
**用途**: 終端使用者可進行語意搜尋
|
||||
|
||||
### Phase 2:5W1H Summary Chunk Embedding 交付
|
||||
### Phase 2:完整 Pipeline(full model)
|
||||
|
||||
**觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
|
||||
|
||||
**交付內容**:
|
||||
- Phase 1 全部內容
|
||||
- `{uuid}.cut.json`
|
||||
- `{uuid}.yolo.json`
|
||||
- `{uuid}.face.json`
|
||||
- `{uuid}.pose.json`
|
||||
- `{uuid}.ocr.json`
|
||||
- 所有 `{uuid}.*.json`(cut, yolo, face, pose, ocr, ...)
|
||||
- chunks(chunk_type = 'cut', 'visual', 'trace', 'story')
|
||||
- chunk_vectors(summary embedding)
|
||||
- identities / identity_bindings / face_detections
|
||||
@@ -34,20 +60,22 @@
|
||||
|
||||
---
|
||||
|
||||
## Worker Pipeline 整合
|
||||
## Worker Pipeline
|
||||
|
||||
```
|
||||
ASR 完成 → ASRX 完成
|
||||
↓
|
||||
Rule 1 Ingestion (sentence chunks)
|
||||
↓
|
||||
Phase 1 Release Packaging ← 自動
|
||||
↓
|
||||
其餘 Processors 繼續
|
||||
↓
|
||||
Rule 3 Ingestion (cut chunks + 5W1H summary)
|
||||
↓
|
||||
Phase 2 Release Packaging ← 自動
|
||||
ASR 完成 → ASRX 完成
|
||||
↓
|
||||
Rule 1 Ingestion (sentence chunks)
|
||||
↓
|
||||
vectorize_chunks (sentence embedding)
|
||||
↓
|
||||
📦 Phase 1 release ───→ release/phase1/latest/ (tiny model)
|
||||
↓
|
||||
其他 processors 繼續 (yolo, face, pose, ocr, ...)
|
||||
↓
|
||||
Rule 3 Ingestion + 5W1H Agent
|
||||
↓
|
||||
📦 Phase 2 release ───→ release/phase2/latest/ (full model)
|
||||
```
|
||||
|
||||
## 產出目錄結構
|
||||
@@ -55,21 +83,31 @@ ASR 完成 → ASRX 完成
|
||||
```
|
||||
release/
|
||||
├── phase1/
|
||||
│ ├── v1.0.0_20260509_120000/
|
||||
│ │ ├── output_json/ ← asr.json, asrx.json
|
||||
│ │ ├── schema.sql ← chunks table DDL
|
||||
│ │ ├── chunks.csv ← sentence chunks data
|
||||
│ ├── {version}_{timestamp}/
|
||||
│ │ ├── output_json/ ← 所有已完成的 .json
|
||||
│ │ ├── chunks.csv ← sentence chunks
|
||||
│ │ ├── vectors.csv ← sentence embeddings
|
||||
│ │ ├── schema.sql ← chunks table DDL
|
||||
│ │ └── RELEASE_INFO.txt
|
||||
│ └── latest → v1.0.0_20260509_120000
|
||||
│ └── latest → {version}_{timestamp}
|
||||
│
|
||||
└── phase2/
|
||||
├── v1.0.0_20260509_140000/
|
||||
│ ├── output_json/ ← all processor outputs
|
||||
│ ├── schema.sql ← full schema
|
||||
│ ├── chunks.csv ← all chunks
|
||||
│ ├── vectors.csv ← all embeddings
|
||||
│ ├── identities.csv ← person identities
|
||||
├── {version}_{timestamp}/
|
||||
│ ├── output_json/ ← 所有 .json
|
||||
│ ├── chunks.csv ← 所有 chunks
|
||||
│ ├── vectors.csv ← 所有 embeddings
|
||||
│ ├── identities.csv ← 人物身分
|
||||
│ ├── schema.sql ← 完整 schema
|
||||
│ └── RELEASE_INFO.txt
|
||||
└── latest → v1.0.0_20260509_140000
|
||||
└── latest → {version}_{timestamp}
|
||||
```
|
||||
|
||||
## momentry model vs momentry core
|
||||
|
||||
| | momentry model | momentry core |
|
||||
|---|---|---|
|
||||
| 類比 | 訓練好的 weights | inference engine |
|
||||
| 內容 | `.json` + chunks + vectors | Rust binary |
|
||||
| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
|
||||
| 版本 | `{uuid}_v1_tiny` / `{uuid}_v2_full` | `momentry_playground` / `momentry` |
|
||||
| 交付對象 | 終端使用者 | 部署工程師 |
|
||||
|
||||
Reference in New Issue
Block a user