diff --git a/docs/RELEASE_PHASES.md b/docs/RELEASE_PHASES.md index f8e2cfb..6040df9 100644 --- a/docs/RELEASE_PHASES.md +++ b/docs/RELEASE_PHASES.md @@ -1,31 +1,57 @@ -# Release 分階段交付 +# Momentry Model — 分階段交付 + +## 核心架構 + +``` +Pipeline (training) + │ 每個 processor 產出 .json + │ Rule 1/3 Ingestion → chunks + embeddings + ▼ +momentry model for {video} ← 每部影片 = 一個 model + │ release/phase1/latest/ + │ release/phase2/latest/ + ▼ +momentry core (inference engine) ← Rust API server + │ momentry_playground (dev) + │ momentry (production) + ▼ +Search / Query / Identity APIs +``` + +- **Pipeline** = training phase:影片 → processor output → chunks → embeddings +- **Model** = 每部影片的產出 package(output_json + chunks + vectors) +- **Engine** = momentry core,吃 model 提供 API(search, trace, identity) + +如同模型的 tiny / small / medium / large,每個影片可有多個 model 版本: + +| Model 版本 | 內容 | 觸發時機 | +|-----------|------|---------| +| `{uuid}_v1_tiny` | sentence chunk embedding | ASR + ASRX + Rule 1 完成 | +| `{uuid}_v2_full` | 完整 pipeline + 5W1H | 全部完成 | + +各版本共存不覆蓋。 ## 階段劃分 -### Phase 1:Sentence Chunk Embedding 交付 +### Phase 1:Sentence Chunk Embedding(tiny model) -**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion 完成 +**觸發時機**: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成 **交付內容**: - `{uuid}.asr.json` - `{uuid}.asrx.json` - chunks(chunk_type = 'sentence') - chunk_vectors(sentence embedding) -- DB schema + chunks table data **用途**: 終端使用者可進行語意搜尋 -### Phase 2:5W1H Summary Chunk Embedding 交付 +### Phase 2:完整 Pipeline(full model) **觸發時機**: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent **交付內容**: - Phase 1 全部內容 -- `{uuid}.cut.json` -- `{uuid}.yolo.json` -- `{uuid}.face.json` -- `{uuid}.pose.json` -- `{uuid}.ocr.json` +- 所有 `{uuid}.*.json`(cut, yolo, face, pose, ocr, ...) - chunks(chunk_type = 'cut', 'visual', 'trace', 'story') - chunk_vectors(summary embedding) - identities / identity_bindings / face_detections @@ -34,20 +60,22 @@ --- -## Worker Pipeline 整合 +## Worker Pipeline ``` -ASR 完成 → ASRX 完成 - ↓ - Rule 1 Ingestion (sentence chunks) - ↓ - Phase 1 Release Packaging ← 自動 - ↓ - 其餘 Processors 繼續 - ↓ - Rule 3 Ingestion (cut chunks + 5W1H summary) - ↓ - Phase 2 Release Packaging ← 自動 +ASR 完成 → ASRX 完成 + ↓ +Rule 1 Ingestion (sentence chunks) + ↓ +vectorize_chunks (sentence embedding) + ↓ +📦 Phase 1 release ───→ release/phase1/latest/ (tiny model) + ↓ +其他 processors 繼續 (yolo, face, pose, ocr, ...) + ↓ +Rule 3 Ingestion + 5W1H Agent + ↓ +📦 Phase 2 release ───→ release/phase2/latest/ (full model) ``` ## 產出目錄結構 @@ -55,21 +83,31 @@ ASR 完成 → ASRX 完成 ``` release/ ├── phase1/ -│ ├── v1.0.0_20260509_120000/ -│ │ ├── output_json/ ← asr.json, asrx.json -│ │ ├── schema.sql ← chunks table DDL -│ │ ├── chunks.csv ← sentence chunks data +│ ├── {version}_{timestamp}/ +│ │ ├── output_json/ ← 所有已完成的 .json +│ │ ├── chunks.csv ← sentence chunks │ │ ├── vectors.csv ← sentence embeddings +│ │ ├── schema.sql ← chunks table DDL │ │ └── RELEASE_INFO.txt -│ └── latest → v1.0.0_20260509_120000 +│ └── latest → {version}_{timestamp} │ └── phase2/ - ├── v1.0.0_20260509_140000/ - │ ├── output_json/ ← all processor outputs - │ ├── schema.sql ← full schema - │ ├── chunks.csv ← all chunks - │ ├── vectors.csv ← all embeddings - │ ├── identities.csv ← person identities + ├── {version}_{timestamp}/ + │ ├── output_json/ ← 所有 .json + │ ├── chunks.csv ← 所有 chunks + │ ├── vectors.csv ← 所有 embeddings + │ ├── identities.csv ← 人物身分 + │ ├── schema.sql ← 完整 schema │ └── RELEASE_INFO.txt - └── latest → v1.0.0_20260509_140000 + └── latest → {version}_{timestamp} ``` + +## momentry model vs momentry core + +| | momentry model | momentry core | +|---|---|---| +| 類比 | 訓練好的 weights | inference engine | +| 內容 | `.json` + chunks + vectors | Rust binary | +| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 | +| 版本 | `{uuid}_v1_tiny` / `{uuid}_v2_full` | `momentry_playground` / `momentry` | +| 交付對象 | 終端使用者 | 部署工程師 |