- Phase 1 = v1 (base model, sentence chunk embedding) - Phase 2 = v2 (full pipeline + 5W1H) - Naming leaves room for v3, v4, etc. - Qdrant collection: momentry_dev_v1 (active model under dev) - Release packaging exports Qdrant points snapshot
3.5 KiB
3.5 KiB
Momentry Model — 分階段交付
核心架構
Pipeline (training)
│ 每個 processor 產出 .json
│ Rule 1/3 Ingestion → chunks + embeddings
▼
momentry model for {video} ← 每部影片 = 一個 model
│ release/phase1/latest/
│ release/phase2/latest/
▼
momentry core (inference engine) ← Rust API server
│ momentry_playground (dev)
│ momentry (production)
▼
Search / Query / Identity APIs
- Pipeline = training phase:影片 → processor output → chunks → embeddings
- Model = 每部影片的產出 package(output_json + chunks + vectors)
- Engine = momentry core,吃 model 提供 API(search, trace, identity)
每個影片可有多個 model 版本,命名保留升級空間:
| Model 版本 | Qdrant Collection | 內容 | 觸發時機 |
|---|---|---|---|
{uuid}_v1 |
momentry_dev_v1 |
sentence chunk embedding(base) | ASR + ASRX + Rule 1 完成 |
{uuid}_v2 |
momentry_dev_v2 |
完整 pipeline + 5W1H | 全部完成 |
{uuid}_v3 |
- | 預留後續升級 | - |
各版本共存不覆蓋。
階段劃分
Phase 1:Sentence Chunk Embedding(base model)
觸發時機: ASR + ASRX 完成 + Rule 1 Ingestion + vectorize 完成
交付內容:
{uuid}.asr.json{uuid}.asrx.json- chunks(chunk_type = 'sentence')
- chunk_vectors(sentence embedding)
用途: 終端使用者可進行語意搜尋
Phase 2:完整 Pipeline(v2 model)
觸發時機: 全部 processor 完成 + Rule 3 Ingestion + 5W1H Agent
交付內容:
- Phase 1 全部內容
- 所有
{uuid}.*.json(cut, yolo, face, pose, ocr, ...) - chunks(chunk_type = 'cut', 'visual', 'trace', 'story')
- chunk_vectors(summary embedding)
- identities / identity_bindings / face_detections
用途: 完整搜尋 + 摘要 + 人物識別
Worker Pipeline
ASR 完成 → ASRX 完成
↓
Rule 1 Ingestion (sentence chunks)
↓
vectorize_chunks (sentence embedding)
↓
📦 Phase 1 release ───→ release/phase1/latest/ (base model)
↓
其他 processors 繼續 (yolo, face, pose, ocr, ...)
↓
Rule 3 Ingestion + 5W1H Agent
↓
📦 Phase 2 release ───→ release/phase2/latest/ (full model)
產出目錄結構
release/
├── phase1/
│ ├── {version}_{timestamp}/
│ │ ├── output_json/ ← 所有已完成的 .json
│ │ ├── chunks.csv ← sentence chunks
│ │ ├── vectors.csv ← sentence embeddings
│ │ ├── schema.sql ← chunks table DDL
│ │ └── RELEASE_INFO.txt
│ └── latest → {version}_{timestamp}
│
└── phase2/
├── {version}_{timestamp}/
│ ├── output_json/ ← 所有 .json
│ ├── chunks.csv ← 所有 chunks
│ ├── vectors.csv ← 所有 embeddings
│ ├── identities.csv ← 人物身分
│ ├── schema.sql ← 完整 schema
│ └── RELEASE_INFO.txt
└── latest → {version}_{timestamp}
momentry model vs momentry core
| momentry model | momentry core | |
|---|---|---|
| 類比 | 訓練好的 weights | inference engine |
| 內容 | .json + chunks + vectors |
Rust binary |
| 生命週期 | 每部影片產出一個 | 一個 binary 服務所有影片 |
| 版本 | {uuid}_v1(base) / {uuid}_v2 |
momentry_playground / momentry |
| 交付對象 | 終端使用者 | 部署工程師 |