cleanup: remove dead code and duplicate docs
- Remove session-ses_2f27.md (161KB raw session log) - Remove 49 ROOT_* duplicate files across REFERENCE/ - Remove 14 duplicate files between REFERENCE/ root and history/ - Remove asr_legacy.rs (dead code, replaced by asr.rs) - Remove src/core/worker/ (duplicate JobWorker) - Remove src/core/layers/ (empty directory) - Remove 4 .bak files in src/ - Remove 7 dead private methods in worker/processor.rs - Remove backup directory from git tracking
This commit is contained in:
183
docs_v1.0/API_V1.0.0/INTERNAL/API_DICTIONARY_V1.0.0.md
Normal file
183
docs_v1.0/API_V1.0.0/INTERNAL/API_DICTIONARY_V1.0.0.md
Normal file
@@ -0,0 +1,183 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Momentry Core API 字典 V1.0.0"
|
||||
date: "2026-05-01"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "api"
|
||||
- "dictionary"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Momentry Core API 字典查詢"
|
||||
- "API 端點與參數說明"
|
||||
- "API 回應格式定義"
|
||||
- "查詢所有 Public/Internal/Admin API 端點列表"
|
||||
- "API 端點的 HTTP 方法與路徑結構"
|
||||
- "搜尋 API 有哪些端點(search/bm25/hybrid/visual)"
|
||||
- "API 端點的狀態分類(Public/Internal/Admin)"
|
||||
related_documents:
|
||||
- "API_V1.0.0/MOMENTRY_CORE_API_V1.0.0.md"
|
||||
- "API_V1.0.0/API_USAGE_DEMO_V1.0.0.md"
|
||||
- "API_V1.0.0/API_REFERENCE_v1.0.0.20260501md.md"
|
||||
- "API_V1.0.0/CHUNK_DEFINITION_V1.0.0.md"
|
||||
- "API_V1.0.0/VECTOR_SPEC_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Momentry Core API 字典級全量文件 V1.0.0
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Public API | 供前端與外部系統使用的標準介面(58 個端點) |
|
||||
| Internal API | 系統內部流程或狀態查詢用(5 個端點) |
|
||||
| Admin API | 管理員專用(5 個端點) |
|
||||
| file_uuid | 32 碼 SHA256 檔案識別碼 |
|
||||
| RESTful | 以資源為中心的 API 設計風格 |
|
||||
|
||||
## 📊 端點統計 (Endpoint Statistics)
|
||||
|
||||
| 分類 | 數量 | 說明 |
|
||||
|---|---|---|
|
||||
| ✅ **Public** | 58 | 供前端與外部系統使用的標準介面 |
|
||||
| ⚠️ **Internal** | 5 | 系統內部流程或狀態查詢 (如 Probe, SFTPGo) |
|
||||
| 🔒 **Admin** | 5 | 管理員專用 (如 Resources, Config Cache) |
|
||||
| **總計** | **67** | 所有已註冊路由 (`gen-traces` 已移除) |
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-01 |
|
||||
| 端點總數 | **68** |
|
||||
| 文件版本 | V1.1 (Route Fixes + Arch Notes) |
|
||||
|
||||
---
|
||||
|
||||
## 🚀 設計原則 (Design Principles)
|
||||
|
||||
### 1. Clear API (介面清晰化)
|
||||
* **去蕪存菁**: 嚴格區分 **Public** (公開) 與 **Internal** (內部) 端點。舊版冗餘路徑(如 `/api/v1/videos`, `/api/v1/probe`)已全面移除或合併。
|
||||
* **標準化回應**: 所有列表型 API 均回傳統一結構 `{ "success": true, "data": [...], "total": N }`。
|
||||
* **命名規範**: 採用 RESTful 風格,資源以複數名詞或明確動作命名(如 `files`, `identities`)。
|
||||
|
||||
### 2. File-Centric (以檔案為核心)
|
||||
* **唯一識別**: 每個媒體檔案(影片/圖片/音訊)均由 **32 碼 UUID** (`file_uuid`) 唯一標識。
|
||||
* **生命週期**: `File` 是所有資料的根節點。所有的 `Chunk` (片段), `Snapshot` (快照), `Jobs` (任務) 皆隸屬於特定的 `File`。
|
||||
* **操作模式**: 前端應優先呼叫 `GET /api/v1/files` 取得清單,再透過 `POST /api/v1/files/:uuid/snapshots/migrate` 載入詳細資源。
|
||||
|
||||
### 4. Trace Aggregation (軌跡聚合獨立化)
|
||||
* **架構**: `trace_face` 聚合由獨立 Python 腳本 `scripts/trace_face_aggregator.py` 處理,**不**內嵌於 Rust DB 層。
|
||||
* **流程**: Face Processor (Python) 輸出離散幀級資料到 `face_detections` 表 → Rust Worker 排程 `trace_face_aggregator.py` → 該腳本讀取 DB、按 `face_id` 分組聚合、寫入 `pre_chunks` (source_type=`trace_face`)。
|
||||
* **設計理由**: 保持 Rust 排程層輕量化,軌跡聚合邏輯留在 Python 層統一維護,便於未來調整聚合演算法 (如 IOU 門檻、時間間隔合併等) 而無需重新編譯 Rust。
|
||||
|
||||
### 5. Global Identity (全域身份識別)
|
||||
* **跨檔案關聯**: `Identity` 代表一個獨立的人物或角色,不受單一檔案限制。
|
||||
* **綁定機制 (Binding)**: 透過 `POST /api/v1/identities/bind`,我們可以將多個檔案中偵測到的臉部 (`face`) 或聲音 (`speaker`) 聚合到同一個 `Identity` 下。
|
||||
* **資料聚合**: 查詢某個 `Identity` 即可看到該人物在所有歷史檔案中的軌跡 (`/api/v1/identities/:uuid/files`)。
|
||||
|
||||
---
|
||||
|
||||
## 1. 系統與認證 (System & Auth)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `GET` | `/health` | ✅ Public |
|
||||
| `GET` | `/health/detailed` | ✅ Public |
|
||||
| `POST` | `/api/v1/auth/login` | ✅ Public |
|
||||
| `POST` | `/api/v1/auth/logout` | ✅ Public |
|
||||
|
||||
## 2. 檔案管理 (Files & Assets)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `GET` | `/api/v1/files` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/scan` | ✅ Public |
|
||||
| `POST` | `/api/v1/files/register` | ✅ Public |
|
||||
| `POST` | `/api/v1/unregister` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/:file_uuid` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/:file_uuid/identities` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/:file_uuid/snapshots` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/:file_uuid/snapshots/status` | ✅ Public |
|
||||
| `POST` | `/api/v1/files/:file_uuid/snapshots/migrate` | ✅ Public |
|
||||
| `POST` | `/api/v1/files/:file_uuid/snapshots/teardown` | ✅ Public |
|
||||
|
||||
## 3. 影片與任務 (Videos & Jobs)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `DELETE` | `/api/v1/videos/:file_uuid` | ✅ Public |
|
||||
| `GET` | `/api/v1/videos/:file_uuid/details` | ✅ Public |
|
||||
| `GET` | `/api/v1/videos/:file_uuid/pre_chunks` | ✅ Public |
|
||||
| `GET` | `/api/v1/progress/:file_uuid` | ✅ Public |
|
||||
| `GET` | `/api/v1/jobs` | ✅ Public |
|
||||
| `GET` | `/api/v1/jobs/:job_id` | ✅ Public |
|
||||
| `GET` | `/api/v1/rules/:rule/status` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/:file_uuid/probe` | ✅ Public |
|
||||
| `POST` | `/api/v1/files/:file_uuid/process` | ✅ Public |
|
||||
| `GET` | `/api/v1/assets/:uuid/status` | ⚠️ Internal |
|
||||
| `POST` | `/api/v1/resources/register` | 🔒 Internal |
|
||||
| `POST` | `/api/v1/resources/heartbeat` | 🔒 Internal |
|
||||
| `GET` | `/api/v1/resources` | 🔒 Internal |
|
||||
|
||||
## 4. 搜尋 (Search)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `POST` | `/api/v1/search` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/bm25` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/hybrid` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/visual` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/visual/class` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/visual/density` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/visual/combination` | ✅ Public |
|
||||
| `POST` | `/api/v1/search/visual/stats` | ✅ Public |
|
||||
|
||||
## 5. 身份與綁定 (Identity & Binding)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `GET` | `/api/v1/identities` | ✅ Public |
|
||||
| `GET` | `/api/v1/identities/:uuid` | ✅ Public |
|
||||
| `GET` | `/api/v1/identities/:uuid/files` | ✅ Public |
|
||||
| `GET` | `/api/v1/identities/:uuid/chunks` | ✅ Public |
|
||||
| `GET` | `/api/v1/identities/:identity_id/faces` | ✅ Public |
|
||||
| `POST` | `/api/v1/identities/from-person` | ✅ Public |
|
||||
| `POST` | `/api/v1/identities/from-face` | ✅ Public |
|
||||
| `POST` | `/api/v1/identities/bind` | ✅ Public |
|
||||
| `POST` | `/api/v1/identities/unbind` | ✅ Public |
|
||||
|
||||
## 6. 臉部 (Face)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `GET` | `/api/v1/face/list` | ✅ Public |
|
||||
| `GET` | `/api/v1/face/:face_id` | ✅ Public |
|
||||
| `DELETE` | `/api/v1/face/:face_id` | ✅ Public |
|
||||
| `POST` | `/api/v1/face/recognize` | ✅ Public |
|
||||
| `POST` | `/api/v1/face/register` | ✅ Public |
|
||||
| `POST` | `/api/v1/face/search` | ✅ Public |
|
||||
| `GET` | `/api/v1/faces/candidates` | ✅ Public |
|
||||
| `GET` | `/api/v1/files/:file_uuid/faces/:face_id/thumbnail` | ✅ Public |
|
||||
| `GET` | `/api/v1/signals/unbound` | ✅ Public |
|
||||
| `GET` | `/api/v1/signals/:uuid/:binding_type/:binding_value/timeline` | ✅ Public |
|
||||
|
||||
## 7. 代理人 (Agents)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `POST` | `/api/v1/agents/translate` | ✅ Public |
|
||||
| `POST` | `/api/v1/agents/5w1h/analyze` | ✅ Public |
|
||||
| `POST` | `/api/v1/agents/5w1h/batch` | ✅ Public |
|
||||
| `GET` | `/api/v1/agents/5w1h/status` | ✅ Public |
|
||||
| `POST` | `/api/v1/agents/identity/analyze` | ✅ Public |
|
||||
| `POST` | `/api/v1/agents/identity/suggest` | ✅ Public |
|
||||
| `GET` | `/api/v1/agents/identity/status` | ✅ Public |
|
||||
| `POST` | `/api/v1/agents/suggest/merge` | ✅ Public |
|
||||
|
||||
## 8. 狀態與統計 (Stats)
|
||||
| 方法 | 路徑 | 狀態 |
|
||||
|---|---|---|
|
||||
| `GET` | `/api/v1/stats/ingest` | ✅ Public |
|
||||
| `GET` | `/api/v1/stats/sftpgo` | ⚠️ Internal |
|
||||
| `GET` | `/api/v1/stats/inference` | ⚠️ Internal |
|
||||
| `POST` | `/api/v1/config/cache` | 🔒 Internal |
|
||||
| `GET` | `/api/v1/lookup` | ✅ Public |
|
||||
310
docs_v1.0/API_V1.0.0/INTERNAL/API_REFERENCE_v1.0.0.20260501md.md
Normal file
310
docs_v1.0/API_V1.0.0/INTERNAL/API_REFERENCE_v1.0.0.20260501md.md
Normal file
@@ -0,0 +1,310 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Momentry Core API 參考文件 V1.0.0 (Demo 完整指南)"
|
||||
date: "2026-05-01"
|
||||
version: "V3.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "api"
|
||||
- "reference"
|
||||
- "v1.0.0"
|
||||
- "demo"
|
||||
- "marcom"
|
||||
ai_query_hints:
|
||||
- "查詢 V1.0.0 Demo 所需 API 列表"
|
||||
- "Momentry Core Demo 流程如何使用 API?"
|
||||
- "API 的檔案註冊、處理、臉部綁定流程"
|
||||
- "Demo 流程中 Scan → Unregister → Register → Probe → Process → Faces → Bind 的完整步驟"
|
||||
- "API 的 curl 範例與回應格式"
|
||||
- "Process 回傳 400 Bad Request 的常見原因與解決方法"
|
||||
- "臉部查詢回傳空結果的疑難排解步驟"
|
||||
related_documents:
|
||||
- "STANDARDS/DOCS_STANDARD.md"
|
||||
- "API_V1.0.0/MOMENTRY_CORE_API_V1.0.0.md"
|
||||
- "TEST_REPORT_CLI.md"
|
||||
---
|
||||
|
||||
# Momentry Core API 參考文件 V1.0.0 (Demo 完整指南)
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| file_uuid | 32 碼 SHA256 檔案識別碼 |
|
||||
| X-API-Key | API 認證方式,透過 HTTP Header 傳遞 |
|
||||
| Scan | 掃描檔案系統,列出所有檔案及當前狀態 |
|
||||
| Register | 將檔案加入資料庫系統 |
|
||||
| Probe | 讀取檔案 metadata(時長、解析度、幀率) |
|
||||
| Bind | 將臉部綁定到指定身份 |
|
||||
| Progress | 獲取處理進度與目前階段 |
|
||||
|
||||
## 📊 文件統計 (Document Statistics)
|
||||
|
||||
| 項目 | 數值 |
|
||||
|---|---|
|
||||
| **收錄端點** | 15+ (Demo 核心流程) |
|
||||
| **涵蓋率** | Demo 流程 100% |
|
||||
| **測試狀態** | ✅ CLI Verified |
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-01 |
|
||||
| 文件版本 | V3.0 |
|
||||
|
||||
---
|
||||
|
||||
## 1. Demo 流程總覽 (Demo Workflow)
|
||||
|
||||
本文件專注於 **Demo 測試計畫** 所需的 API。以下是完整流程與對應 API:
|
||||
|
||||
```
|
||||
1. 掃描狀態 (Scan) → GET /api/v1/files/scan
|
||||
2. 檔案重置 (Unregister) → POST /api/v1/unregister
|
||||
3. 檔案註冊 (Register) → POST /api/v1/files/register
|
||||
4. 檔案探測 (Probe) → GET /api/v1/files/:file_uuid/probe
|
||||
5. 開始處理 (Process) → POST /api/v1/files/:file_uuid/process
|
||||
6. 監控進度 (Progress) → GET /api/v1/progress/:file_uuid**
|
||||
7. 查詢臉部 (Faces) → GET /api/v1/faces/candidates
|
||||
8. 綁定身份 (Bind) → POST /api/v1/identities/bind
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. 快速資訊
|
||||
|
||||
- **Base URL (Dev)**: `http://localhost:3003`
|
||||
- **Base URL (Prod)**: `http://localhost:3002`
|
||||
- **認證方式**: Header `X-API-Key: muser_test_001`
|
||||
- **測試 Key**: `muser_test_001`
|
||||
|
||||
---
|
||||
|
||||
## 3. API 詳細說明 (依 Demo 順序)
|
||||
|
||||
### 3.1 掃描檔案系統 (Scan Files)
|
||||
**路徑**: `GET /api/v1/files/scan`
|
||||
|
||||
**用途**: 列出檔案系統中所有檔案及當前狀態,**是 Demo 流程的第一步**。
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"files": [
|
||||
{
|
||||
"file_name": "A12T3-Share-User Experience of Thunderbolt 3 Shareable Storage.mp4",
|
||||
"file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/A12T3-Share-User Experience of Thunderbolt 3 Shareable Storage.mp4",
|
||||
"file_uuid": "7ab7e25f48b58675e33aca44d15c1ecc",
|
||||
"is_registered": true,
|
||||
"status": "processing"
|
||||
}
|
||||
],
|
||||
"total": 20,
|
||||
"registered_count": 20,
|
||||
"unregistered_count": 0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.2 取消註冊 (Unregister File)
|
||||
**路徑**: `POST /api/v1/unregister`
|
||||
|
||||
**用途**: 從 Scan 結果中選取 `file_uuid`,對該檔案執行取消註冊。
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"uuid": "53e3a229bf68878b7a799e811e097f9c"
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"uuid": "53e3a229bf68878b7a799e811e097f9c",
|
||||
"message": "File unregistered successfully"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.3 註冊檔案 (Register File)
|
||||
**路徑**: `POST /api/v1/files/register`
|
||||
|
||||
**用途**: 從 Scan 結果中選取 `file_path`,將檔案加入資料庫系統。
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/view15.mp4"
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"file_uuid": "53e3a229bf68878b7a799e811e097f9c",
|
||||
"file_name": "view15.mp4",
|
||||
"file_path": "/Users/.../demo/view15.mp4",
|
||||
"already_exists": false
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.4 檔案探測 (Probe File)
|
||||
**路徑**: `GET /api/v1/files/:file_uuid/probe`
|
||||
|
||||
**用途**: 讀取檔案的 metadata (時長、解析度、幀率)。**必須在 Process 前執行**。
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"file_uuid": "7ab7e25f48b58675e33aca44d15c1ecc",
|
||||
"file_name": "A12T3-Share-User Experience of Thunderbolt 3 Shareable Storage.mp4",
|
||||
"duration": 621.55,
|
||||
"width": 1920,
|
||||
"height": 1080,
|
||||
"fps": 29.97,
|
||||
"cached": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.5 觸發處理 (Process File)
|
||||
**路徑**: `POST /api/v1/files/:file_uuid/process`
|
||||
|
||||
**用途**: 啟動後端 Worker 進行分析 (ASR, Face, YOLO, 等)。
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "Processing started"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.6 查詢進度 (Progress)
|
||||
**路徑**: `GET /api/v1/progress/:file_uuid`
|
||||
|
||||
**用途**: 獲取處理進度與目前階段。
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"file_uuid": "53e3a229bf68878b7a799e811e097f9c",
|
||||
"overall_progress": 65,
|
||||
"current_processor": "face",
|
||||
"status": "running",
|
||||
"processors": [
|
||||
{ "name": "probe", "status": "completed" },
|
||||
{ "name": "asr", "status": "completed" },
|
||||
{ "name": "face", "status": "running" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.6 查詢未綁定臉部 (List Face Candidates)
|
||||
**路徑**: `GET /api/v1/faces/candidates`
|
||||
|
||||
**用途**: 列出檔案中尚未綁定身份的臉部。
|
||||
|
||||
**Query Parameters**:
|
||||
- `file_uuid` (必填): 檔案 UUID
|
||||
- `min_confidence` (選填): 最低信心值 (預設 0.5)
|
||||
- `page_size` (選填): 每頁數量 (預設 20)
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"candidates": [
|
||||
{
|
||||
"id": 123,
|
||||
"face_id": "123_RoleA",
|
||||
"file_uuid": "384b0ff44aaaa1f14cb2cd63b3fea966",
|
||||
"frame_number": 115,
|
||||
"confidence": 0.98,
|
||||
"bbox": { "x": 50, "y": 50, "w": 100, "h": 100 }
|
||||
}
|
||||
],
|
||||
"total": 1,
|
||||
"page": 1,
|
||||
"page_size": 20
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.7 綁定身份 (Bind Identity)
|
||||
**路徑**: `POST /api/v1/identities/bind`
|
||||
|
||||
**用途**: 將臉部綁定到指定身份 (或建立新身份)。
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"identity_id": 22,
|
||||
"binding_type": "face",
|
||||
"binding_value": "123_RoleA"
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "Bound face '123_RoleA' to Identity 'Cary Grant'"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 補充 API (Demo 選用)
|
||||
|
||||
### 4.1 列出身份 (List Identities)
|
||||
**路徑**: `GET /api/v1/identities`
|
||||
|
||||
**用途**: 列出系統中所有已建立的身份。
|
||||
|
||||
---
|
||||
|
||||
## 5. 常見問題 (FAQ)
|
||||
|
||||
### Q1: 為什麼 Process 回傳 400 Bad Request?
|
||||
**Ans**: 必須先執行 **Probe** (`GET /api/v1/files/:file_uuid/probe`),確保系統已知曉檔案的幀數資訊。
|
||||
|
||||
### Q2: 為什麼 Unregister 回傳 404?
|
||||
**Ans**: 確認伺服器是否已更新至最新版本。舊版可能尚未包含此路由。
|
||||
|
||||
### Q3: 臉部查詢回傳空結果?
|
||||
**Ans**:
|
||||
1. 確認檔案已**處理完成** (Progress = 100%)。
|
||||
2. 嘗試降低 `min_confidence` 參數 (例如設為 0.0)。
|
||||
3. 確認該檔案內容確實包含可辨識的臉部。
|
||||
|
||||
---
|
||||
|
||||
## 6. 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 |
|
||||
|------|------|------|--------|
|
||||
| V1.0 | 2026-04-30 | 初始 API 列表 | OpenCode |
|
||||
| V2.0 | 2026-05-01 | 基於 Production 測試結果補足文件 | OpenCode |
|
||||
| V3.0 | 2026-05-01 | 重構為 Demo 流程導向,補齊 Probe/Unregister 說明 | OpenCode |
|
||||
| V3.1 | 2026-05-01 | 修正 `:uuid`→`:file_uuid`,修正 port 3002→3003,移除重複 Scan 章節 | OpenCode |
|
||||
376
docs_v1.0/API_V1.0.0/INTERNAL/API_USAGE_DEMO_V1.0.0.md
Normal file
376
docs_v1.0/API_V1.0.0/INTERNAL/API_USAGE_DEMO_V1.0.0.md
Normal file
@@ -0,0 +1,376 @@
|
||||
---
|
||||
document_type: "develop_guide"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Momentry Core V1.0.0 API 示範與整合指南"
|
||||
date: "2026-05-01"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "api-usage"
|
||||
- "demo"
|
||||
- "n8n"
|
||||
- "wordpress"
|
||||
ai_query_hints:
|
||||
- "查詢 V1.0.0 API 示範與整合指南的內容"
|
||||
- "如何使用 n8n 呼叫 V1.0.0 API?"
|
||||
- "如何整合 V1.0.0 API 到 WordPress?"
|
||||
- "V1.0.0 API 的 curl 範例"
|
||||
- "PHP 整合 V1.0.0 API 的方式(wp_remote_request)"
|
||||
- "n8n 工作流如何串接 V1.0.0 API"
|
||||
- "Face 綁定錯誤修正的 API 操作步驟"
|
||||
- "前端 Face Interpolation 的實作方式"
|
||||
related_documents:
|
||||
- "API_V1.0.0/MOMENTRY_CORE_API_V1.0.0.md"
|
||||
- "API_V1.0.0/API_DICTIONARY_V1.0.0.md"
|
||||
- "API_V1.0.0/API_REFERENCE_v1.0.0.20260501md.md"
|
||||
- "API_V1.0.0/CHUNK_DEFINITION_V1.0.0.md"
|
||||
- "API_V1.0.0/PROCESSOR_SELECTION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Momentry Core V1.0.0 API 示範與整合指南
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-01 |
|
||||
| 文件版本 | V1.0 |
|
||||
| 適用版本 | Momentry Core V1.0.0+ |
|
||||
|
||||
---
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| file_uuid | 32 碼 SHA256 檔案識別碼 |
|
||||
| X-API-Key | API 認證方式,透過 HTTP Header 傳遞 |
|
||||
| face_id | 單一幀中的人臉偵測 ID,格式為 `<檢測ID>_<角色後綴>` |
|
||||
| Identity | 全域人物身份,跨檔案關聯同一人物 |
|
||||
| Face Interpolation | 前端線性插值,補足非逐幀臉部標記的顯示 |
|
||||
| Scan | 掃描檔案系統,列出所有檔案及當前狀態 |
|
||||
|
||||
## 1. 快速開始 (Quick Start)
|
||||
|
||||
### 1.1 環境 URL
|
||||
|
||||
| 環境 | URL | 用途 |
|
||||
|------|-----|------|
|
||||
| **對外 URL** | `https://api.momentry.ddns.net` | 外部存取 |
|
||||
| **Dev Server** | `http://localhost:3003` | **開發環境,所有測試用** |
|
||||
| **Local Server** | `http://localhost:3002` | Production,僅 release 用 |
|
||||
|
||||
### 1.2 測試連線
|
||||
|
||||
```bash
|
||||
curl http://localhost:3003/health
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"version": "1.0.0 (build: ...)",
|
||||
"uptime_ms": 64880
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. 核心 API 工作流 (Workflows)
|
||||
|
||||
### 2.1 掃描檔案系統 (Scan Files)
|
||||
**入口 API**: `GET /api/v1/files/scan` — 所有 Demo 流程從這裡開始。
|
||||
|
||||
**掃描檔案**:
|
||||
```bash
|
||||
curl -s "http://localhost:3003/api/v1/files/scan" \
|
||||
-H "X-API-Key: <your_api_key>"
|
||||
```
|
||||
|
||||
**列出檔案 (分頁)**:
|
||||
```bash
|
||||
curl -s "http://localhost:3003/api/v1/files?page=1&page_size=10" \
|
||||
-H "X-API-Key: <your_api_key>"
|
||||
```
|
||||
|
||||
**取得單一檔案詳情**:
|
||||
```bash
|
||||
curl -s "http://localhost:3003/api/v1/files/<file_uuid>" \
|
||||
-H "X-API-Key: <your_api_key>"
|
||||
```
|
||||
|
||||
### 2.2 搜尋 (Search)
|
||||
支援語意搜尋、混合搜尋與視覺搜尋。
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:3003/api/v1/search" \
|
||||
-H "X-API-Key: <your_api_key>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "尋找紅色信封", "uuid": "<file_uuid>"}'
|
||||
```
|
||||
|
||||
### 2.3 單獨 Face 綁定流程 (Single Face Binding Workflow)
|
||||
|
||||
此流程適用於手動將特定臉部關聯到已知人物或建立新人物的場景。系統支援**一人分飾多角**,透過 `face_id` 加上角色後綴來區分。
|
||||
|
||||
#### 步驟 1: 選定 Face (Input Format)
|
||||
使用者需提供一個 **`file_uuid`** 搭配 **`face_id`** 來鎖定目標。
|
||||
選定的意思是輸入 **`<file_uuid>:<face_id>`** 的組合。
|
||||
|
||||
* **命名規則**: `face_id` 格式通常為 `<原始檢測 ID>_<後綴>`,用於區分同一人的不同臉部實體或角色。
|
||||
* **有角色名稱**: 使用角色名 (如 `123_PeterJoshua`)。
|
||||
* **無角色名稱**: 使用通用代號 (如 `123_RoleA`, `123_RoleB`)。
|
||||
|
||||
#### 步驟 2: 列出 Identities 或新增 Identity
|
||||
使用者決定將該 Face 綁定到系統中已存在的全域人物 (Identity),或是建立一個新人物。
|
||||
* **Identity 特性**: 代表現實世界中的真實人物,具備**全域唯一性** (如 "Cary Grant")。
|
||||
|
||||
- **選項 A: 列出人物清單**
|
||||
```bash
|
||||
curl -s "http://localhost:3003/api/v1/identities?page=1&page_size=20" \
|
||||
-H "X-API-Key: <your_api_key>"
|
||||
```
|
||||
|
||||
- **選項 B: 決定新增人物名稱**
|
||||
若列表中沒有對應人物,使用者需準備一個新名稱(如 "Cary Grant")。
|
||||
|
||||
#### 步驟 3: 確認綁定
|
||||
透過 `POST /api/v1/identities/bind` 完成綁定。
|
||||
* **若提供 `identity_id`**: 將帶有後綴的 `face_id` 綁定至該人物。
|
||||
* **若提供 `name`**: 系統自動建立新人物 (Identity),並將該臉部綁定上去。
|
||||
|
||||
- **綁定至現有身份 (範例)**:
|
||||
假設我們要綁定的目標是檔案 `file_uuid_abc` 中的臉部 `123_PeterJoshua`。
|
||||
```bash
|
||||
curl -X POST "http://localhost:3003/api/v1/identities/bind" \
|
||||
-H "X-API-Key: <your_api_key>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"identity_id": 101,
|
||||
"binding_type": "face",
|
||||
"binding_value": "123_PeterJoshua"
|
||||
}'
|
||||
```
|
||||
*註: 雖然 API 接收的是 `binding_value`,但系統內部會根據選定的 `file_uuid` 與 `face_id` 組合來精確鎖定目標。*
|
||||
|
||||
#### 步驟 4: 循環
|
||||
完成綁定後,返回列表處理下一個未綁定的 Face。
|
||||
|
||||
---
|
||||
|
||||
### 2.4 取得 Face 截圖 (Retrieve Face Snapshots)
|
||||
|
||||
在確認綁定前,通常需要檢視臉部截圖。根據使用場景,取得截圖有兩種方式:
|
||||
|
||||
#### 1. Local Path / Filename (本地路徑)
|
||||
* **適用**: Tauri 桌面應用、本機腳本。
|
||||
* **說明**: 直接從硬碟讀取圖片檔案,速度最快,無需經過網路層。
|
||||
* **路徑**: `<MOMENTRY_OUTPUT_DIR>/<file_uuid>/snapshots/faces/<face_id>.jpg`
|
||||
|
||||
#### 2. URL (網路存取)
|
||||
* **適用**: Web 前端、外部系統。
|
||||
* **說明**: 透過 HTTP GET 請求取得影像串流。
|
||||
* **API Endpoint**: `GET /api/v1/files/<file_uuid>/faces/<face_id>/thumbnail`
|
||||
* **範例**:
|
||||
```bash
|
||||
curl -s -o face.jpg \
|
||||
"http://localhost:3003/api/v1/files/<file_uuid>/faces/<face_id>/thumbnail" \
|
||||
-H "X-API-Key: <your_api_key>"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.4.1 前端動態辨識與插值 (Face Interpolation Logic)
|
||||
|
||||
由於系統對臉部標記並非逐幀 (Frame-by-Frame) 進行(為節省運算資源或受限於取樣率),在 Client 端進行**逐幀播放**或**時間軸拖曳**時,若直接顯示會導致臉部框選忽閃忽滅。
|
||||
|
||||
#### 運作邏輯
|
||||
前端需實作**線性插值 (Linear Interpolation)** 機制:
|
||||
|
||||
1. **取得資料**:從 API 取得該 `face_id` 在所有 `frame_number` 的座標列表(例如:Frame 10, Frame 15 有資料)。
|
||||
2. **插值計算**:
|
||||
* 當使用者停在 **Frame 12** 時,系統無直接資料。
|
||||
* 前端應找出前後最近的有資料幀(Frame 10 與 Frame 15)。
|
||||
* 根據時間差比例,動態計算出 Frame 12 的座標 `x, y, w, h`。
|
||||
|
||||
#### 實作範例 (JavaScript/TypeScript)
|
||||
|
||||
```typescript
|
||||
// 假設 API 回傳該 Face 的軌跡點
|
||||
const detections = [
|
||||
{ frame: 10, bbox: { x: 100, y: 100, w: 50, h: 60 } },
|
||||
{ frame: 15, bbox: { x: 110, y: 105, w: 50, h: 60 } },
|
||||
];
|
||||
|
||||
// 計算 Frame 12 的預測框選
|
||||
function getInterpolatedBBox(frameIndex: number, detections) {
|
||||
// 找到前一幀與後一幀
|
||||
const prev = detections.find(d => d.frame <= frameIndex); // Frame 10
|
||||
const next = detections.find(d => d.frame > frameIndex); // Frame 15
|
||||
|
||||
if (!prev) return null; // 還沒開始出現
|
||||
if (!next) return prev.bbox; // 結束了,維持最後位置
|
||||
|
||||
// 計算比例 (0.0 - 1.0)
|
||||
const ratio = (frameIndex - prev.frame) / (next.frame - prev.frame);
|
||||
|
||||
return {
|
||||
x: prev.bbox.x + (next.bbox.x - prev.bbox.x) * ratio,
|
||||
y: prev.bbox.y + (next.bbox.y - prev.bbox.y) * ratio,
|
||||
// w, h 亦可依此邏輯進行縮放插值
|
||||
w: prev.bbox.w,
|
||||
h: prev.bbox.h,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.5 Face 綁定錯誤修正 (Face Binding Error Correction)
|
||||
|
||||
此流程適用於移除錯誤綁定的臉部資料,使其恢復為未綁定狀態。
|
||||
|
||||
1. **選定 Face**: 確認需要解除綁定的臉部 `face_id` 以及所屬的 `file_uuid`。
|
||||
2. **解除綁定 (Unbind)**:
|
||||
```bash
|
||||
curl -X POST "http://localhost:3003/api/v1/identities/unbind" \
|
||||
-H "X-API-Key: <your_api_key>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"binding_type": "face",
|
||||
"binding_value": "<selected_face_id>"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. n8n 整合範例
|
||||
|
||||
### 3.1 HTTP Request 設定
|
||||
|
||||
| 欄位 | 值 |
|
||||
|---|---|
|
||||
| Method | `GET` 或 `POST` |
|
||||
| URL | `http://localhost:3003/api/v1/files` (Dev) 或 `https://<your-domain>` (Prod) |
|
||||
| Header `X-API-Key` | `<your_api_key>` |
|
||||
|
||||
### 3.2 列出檔案 Workflow (JSON)
|
||||
使用 `GET /api/v1/files/scan` 作為入口。
|
||||
|
||||
```json
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"name": "Get Files",
|
||||
"type": "n8n-nodes-base.httpRequest",
|
||||
"parameters": {
|
||||
"method": "GET",
|
||||
"url": "http://localhost:3003/api/v1/files/scan",
|
||||
"sendHeaders": true,
|
||||
"headerParameters": {
|
||||
"parameters": [{ "name": "X-API-Key", "value": "{{ $env.API_KEY }}" }]
|
||||
},
|
||||
"options": { "qs": { "page": 1, "page_size": 10 } }
|
||||
},
|
||||
"position": [450, 300]
|
||||
},
|
||||
{
|
||||
"name": "Extract List",
|
||||
"type": "n8n-nodes-base.code",
|
||||
"parameters": {
|
||||
"jsCode": "return $input.first().json.data.map(f => ({\n json: {\n uuid: f.file_uuid,\n name: f.file_name,\n status: f.status\n }\n}));"
|
||||
},
|
||||
"position": [650, 300]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. WordPress / PHP 整合範例
|
||||
|
||||
### 4.1 PHP Client Library (V1.0.0 相容)
|
||||
|
||||
```php
|
||||
<?php
|
||||
class Momentry_API {
|
||||
private const API_URL = 'http://localhost:3003'; // Dev environment
|
||||
private const API_KEY = '<your_api_key>';
|
||||
|
||||
private function request(string $endpoint, array $data = [], string $method = 'GET'): array {
|
||||
$url = self::API_URL . $endpoint;
|
||||
$args = [
|
||||
'headers' => [
|
||||
'X-API-Key' => self::API_KEY,
|
||||
'Content-Type' => 'application/json',
|
||||
],
|
||||
'timeout' => 30,
|
||||
];
|
||||
|
||||
if ($method === 'POST') {
|
||||
$args['method'] = 'POST';
|
||||
$args['body'] = json_encode($data);
|
||||
}
|
||||
|
||||
$response = wp_remote_request($url, $args);
|
||||
if (is_wp_error($response)) {
|
||||
throw new Exception($response->get_error_message());
|
||||
}
|
||||
return json_decode(wp_remote_retrieve_body($response), true);
|
||||
}
|
||||
|
||||
// 掃描檔案
|
||||
public function scan_files(): array {
|
||||
return $this->request('/api/v1/files/scan');
|
||||
}
|
||||
|
||||
// 列出檔案
|
||||
public function list_files(): array {
|
||||
return $this->request('/api/v1/files');
|
||||
}
|
||||
|
||||
// 搜尋
|
||||
public function search(string $query): array {
|
||||
return $this->request('/api/v1/search', ['query' => $query], 'POST');
|
||||
}
|
||||
}
|
||||
?>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 疑難排解
|
||||
|
||||
| 錯誤 | 原因 | 解決方案 |
|
||||
|------|------|----------|
|
||||
| `401 Unauthorized` | API Key 無效 | 檢查 Key 格式與權限 |
|
||||
| `404 Not Found` | 端點不存在 | 確認是否使用了舊版 `/api/v1/videos`,應改為 `/api/v1/files` |
|
||||
| `400 Bad Request on Process` | 缺少 Probe 資料 | 先執行 `GET /api/v1/files/:file_uuid/probe` |
|
||||
| `500 Error` | 伺服器錯誤 | 檢查資料庫連線與 Schema 版本 |
|
||||
|
||||
---
|
||||
|
||||
## 6. 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-01 | 初始版本 | OpenCode | deepseek-chat |
|
||||
| V1.1 | 2026-05-01 | 修正 port 為 Dev(3003),更新 API 路徑與掃描入口 | OpenCode | deepseek-chat |
|
||||
|
||||
---
|
||||
|
||||
## 7. 附錄:UUID 格式說明
|
||||
|
||||
V1.0.0 使用 **32 碼 SHA256** 作為 `file_uuid`。
|
||||
|
||||
```
|
||||
/Users/.../demo/video.mp4
|
||||
↓
|
||||
SHA256 Hash (前 32 字元)
|
||||
↓
|
||||
53e3a229bf68878b7a799e811e097f9c
|
||||
```
|
||||
198
docs_v1.0/API_V1.0.0/INTERNAL/CHUNK_DEFINITION_V1.0.0.md
Normal file
198
docs_v1.0/API_V1.0.0/INTERNAL/CHUNK_DEFINITION_V1.0.0.md
Normal file
@@ -0,0 +1,198 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Chunk 定義 V1.0.0"
|
||||
date: "2026-05-01"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "chunk"
|
||||
- "v1.0.0"
|
||||
- "chunk-type"
|
||||
- "pre-chunk"
|
||||
- "parent-child"
|
||||
- "data-structure"
|
||||
ai_query_hints:
|
||||
- "chunk 的定義與結構"
|
||||
- "pre_chunk 與 chunk 的關係"
|
||||
- "parent_chunk 與 child_chunk 的關係"
|
||||
- "ChunkType 包含哪些類型(Sentence/Cut/Visual/Trace/Story)"
|
||||
- "chunk 的巢狀結構與 Rule 組合規則"
|
||||
- "chunk 如何對應到 file_uuid 與幀區間"
|
||||
- "chunk 的搜尋用途與向量儲存方式"
|
||||
- "chunk 與 pre_chunk 的雙層資料架構"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "VECTOR_SPEC_V1.0.0.md"
|
||||
- "PROCESSORS/ASR_V1.0.0.md"
|
||||
- "PROCESSORS/CUT_V1.0.0.md"
|
||||
- "PROCESSORS/FACE_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Chunk 定義 V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-01 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
## 名詞定義
|
||||
|
||||
| 名詞 | 定義 | 範例 |
|
||||
|------|------|------|
|
||||
| **Processor JSON** | Processor 腳本的第一層產出檔案 | `384b0ff44aaaa1f14cb2cd63b3fea966.face.json` |
|
||||
| **pre_chunk** | 從 Processor JSON 匯入 DB 的最低層元件(`pre_chunks` 表) | 單幀 face detection、單句 ASR text |
|
||||
| **chunk** | 可搜尋單位(`chunks` 表),由 Rule 組合 pre_chunks 產出,`start_frame` ~ `end_frame` 定義區間 | sentence chunk, visual chunk, scene chunk |
|
||||
| **parent_chunk** | chunk 的一種,包含 `child_chunk_ids`,其區間涵蓋多個 child_chunks,由 Summary Agent 產出統整描述 | scene chunk, story chunk |
|
||||
| **child_chunk** | chunk 的一種,被 parent_chunk 參照為子元素 | sentence chunk, visual chunk |
|
||||
|
||||
---
|
||||
|
||||
## Chunk 結構
|
||||
|
||||
```rust
|
||||
Chunk {
|
||||
uuid: String, // file_uuid (32-char hex)
|
||||
chunk_id: String, // "{uuid}_{chunk_index}"
|
||||
chunk_index: u32, // 0-based 序號
|
||||
chunk_type: ChunkType, // Sentence | Cut | Visual | Trace | Story
|
||||
rule: ChunkRule, // Rule1 (直接組合) | Rule2 (聚合)
|
||||
start_frame: i64, // 起始幀(0-based,唯一時間參考)
|
||||
end_frame: i64, // 結束幀(exclusive)
|
||||
fps: f64, // 該區間的 fps
|
||||
content: JSON, // 主要內容
|
||||
text_content: Option<String>, // 純文字內容(供搜尋用)
|
||||
metadata: Option<JSON>, // speaker, face_ids, yolo_objects 等
|
||||
pre_chunk_ids: Vec<i32>, // 來源 pre_chunks(原始元件追溯)
|
||||
parent_chunk_id: Option<String>, // 父 chunk ID(如存在)
|
||||
child_chunk_ids: Vec<String>, // 子 chunk IDs(如為 parent_chunk)
|
||||
vector_id: Option<String>, // 向量儲存參考
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ChunkType
|
||||
|
||||
| 類型 | 說明 | 範例 |
|
||||
|------|------|------|
|
||||
| `Sentence` | ASR 句子 chunk | 一句話對應一個 chunk |
|
||||
| `Cut` | 場景切換 chunk | PySceneDetect 輸出的場景邊界 |
|
||||
| `Visual` | 視覺物件 chunk | YOLO/OCR/Face/Pose 聚合 |
|
||||
| `Trace` | 追蹤 chunk | face_trace / yolo_trace |
|
||||
| `Story` | 敘事 chunk(parent) | 5W1H Agent 產出的統整描述 |
|
||||
|
||||
---
|
||||
|
||||
## Chunk 特性
|
||||
|
||||
- **區間定義**: `start_frame` / `end_frame`(frames 為唯一時間座標)
|
||||
- **可重疊**: 不同類型的 chunk 可以覆蓋相同區間
|
||||
- **可不連續**: chunk 之間不需要連續
|
||||
- **巢狀**: parent_chunk 包含 child_chunk_ids,子區間不須填滿父區間
|
||||
- **單幀 chunk**: `start_frame == end_frame`(如 frame-level detection)
|
||||
|
||||
---
|
||||
|
||||
## 資料流
|
||||
|
||||
```
|
||||
Processor JSON ({file_uuid}.{type}.json)
|
||||
│
|
||||
▼ 匯入
|
||||
pre_chunks (原始元件, start_frame / end_frame / data)
|
||||
│
|
||||
▼ Rule 組合 (Rule1 / Rule2 / Rule3)
|
||||
chunks (可搜尋單位)
|
||||
├── child_chunk (基礎搜尋單位)
|
||||
│ └── 5W1H: 該 chunk 的摘要描述(3~5 句話)
|
||||
│
|
||||
└── parent_chunk (較大區間, Summary Agent 產出)
|
||||
├── child_chunk_ids: [內含的所有 child_chunks]
|
||||
└── summary: (child_chunks 的 5W1H + parent_chunk 補充描述)
|
||||
via Summary Agent (如 5W1H Agent)
|
||||
summary 為 3~5 句話,統整區間內所有內容
|
||||
用於 embedding 成向量,確保搜尋時涵蓋足夠語意
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 與 pre_chunk 的關係
|
||||
|
||||
| 層級 | 產生方式 | 目的 |
|
||||
|------|----------|------|
|
||||
| pre_chunk | 直接從 Processor JSON 匯入 | 保留原始資料,供 Rule 加工 |
|
||||
| chunk | Rule 組合 pre_chunks | 成為可搜尋單位 |
|
||||
| child_chunk | chunk 的一種 | 基礎搜尋目標 |
|
||||
| parent_chunk | Summary Agent 產出 | 補足單一 child_chunk 資訊量不足 |
|
||||
|
||||
---
|
||||
|
||||
## 範例
|
||||
|
||||
### Sentence Chunk (child_chunk)
|
||||
|
||||
```json
|
||||
{
|
||||
"chunk_id": "384b0ff44aaaa1f14cb2cd63b3fea966_42",
|
||||
"chunk_index": 42,
|
||||
"chunk_type": "sentence",
|
||||
"rule": "rule_1",
|
||||
"start_frame": 1260,
|
||||
"end_frame": 1350,
|
||||
"fps": 29.97,
|
||||
"content": {
|
||||
"text": "今天天氣很好,我們決定去公園走走。",
|
||||
"speaker": "SPEAKER_00"
|
||||
},
|
||||
"text_content": "今天天氣很好,我們決定去公園走走。",
|
||||
"metadata": {
|
||||
"speaker": "SPEAKER_00",
|
||||
"face_ids": ["face_42", "face_43"],
|
||||
"5w1h": "講者 SPEAKER_00 在室內提到今天天氣很好。他建議大家一起到公園散步。同伴們同意這個提議。大家開始準備出發。整個對話顯示團隊氣氛融洽。"
|
||||
},
|
||||
"pre_chunk_ids": [101, 102, 103],
|
||||
"parent_chunk_id": "384b0ff44aaaa1f14cb2cd63b3fea966_scene_3"
|
||||
}
|
||||
```
|
||||
|
||||
### Scene Chunk (parent_chunk)
|
||||
|
||||
```json
|
||||
{
|
||||
"chunk_id": "384b0ff44aaaa1f14cb2cd63b3fea966_scene_3",
|
||||
"chunk_index": 3,
|
||||
"chunk_type": "cut",
|
||||
"rule": "rule_3",
|
||||
"start_frame": 1200,
|
||||
"end_frame": 1800,
|
||||
"fps": 29.97,
|
||||
"content": {
|
||||
"scene_number": 3,
|
||||
"scene_type": "dialogue"
|
||||
},
|
||||
"text_content": "今天天氣很好,我們決定去公園走走。之後我們在公園裡散步,看到很多花。",
|
||||
"metadata": {
|
||||
"summary": "講者和同伴在室內討論天氣狀況,提到今天陽光明媚。他們決定到附近的公園散步享受好天氣。抵達公園後,他們沿著步道行走,觀察到許多盛開的花朵。其中一人用手機拍攝了花朵的照片。整個對話氣氛輕鬆愉快。"
|
||||
},
|
||||
"pre_chunk_ids": [98, 99, 100],
|
||||
"child_chunk_ids": [
|
||||
"384b0ff44aaaa1f14cb2cd63b3fea966_42",
|
||||
"384b0ff44aaaa1f14cb2cd63b3fea966_43",
|
||||
"384b0ff44aaaa1f14cb2cd63b3fea966_44"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-01 | 初始版本 | OpenCode | deepseek-chat |
|
||||
240
docs_v1.0/API_V1.0.0/INTERNAL/MOMENTRY_CORE_API_V1.0.0.md
Normal file
240
docs_v1.0/API_V1.0.0/INTERNAL/MOMENTRY_CORE_API_V1.0.0.md
Normal file
@@ -0,0 +1,240 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Momentry Core V1.0.0 API 參考文件"
|
||||
date: "2026-04-30"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "api"
|
||||
- "reference"
|
||||
- "v1.0.0"
|
||||
- "marcom"
|
||||
- "restful"
|
||||
- "endpoint"
|
||||
- "file-centric"
|
||||
ai_query_hints:
|
||||
- "Momentry Core V1.0.0 API 參考文件的主要內容是什麼?"
|
||||
- "查詢 V1.0.0 API 列表包含哪些端點?"
|
||||
- "Marcom 團隊如何使用 API Reference?"
|
||||
- "API 的 Progressive Workflow 範例"
|
||||
- "Momentry API 的檔案管理與搜尋功能"
|
||||
- "API 的 Progressive Workflow 操作步驟"
|
||||
- "API 的檔案管理與搜尋功能"
|
||||
related_documents:
|
||||
- "STANDARDS/DOCS_STANDARD.md"
|
||||
- "DEV_API_V1.0/API_REFERENCE_v1.0.0.md"
|
||||
- "API_DICTIONARY_V1.0.0.md"
|
||||
- "API_USAGE_DEMO_V1.0.0.md"
|
||||
- "PRODUCTION_VERIFICATION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Momentry Core V1.0.0 API 參考文件
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-04-30 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-04-30 | 創建 V1.0.0 API 列表,移除過時端點 | OpenCode | OpenCode |
|
||||
|
||||
---
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| file_uuid | 媒體檔案(影片/圖片/音訊)的唯一 32 碼 SHA256 識別碼 |
|
||||
| identity_uuid | 全域人物身份識別碼,跨檔案關聯同一人物 |
|
||||
| Chunk | 可搜尋單位,由 Rule 組合 pre_chunks 產出 |
|
||||
| Snapshot | 臉部或場景的快取快照,需 migrate 後供 UI 使用 |
|
||||
| API Key | 認證方式,透過 Header `X-API-Key` 傳遞 |
|
||||
|
||||
## 概述
|
||||
|
||||
本文檔定義 Momentry Core **V1.0.0** 版本供 **Marcom 團隊** 使用的 API 列表與開發範例。此列表已移除舊版、冗餘及內部使用的端點,確保前端開發使用的是標準且穩定的介面。
|
||||
|
||||
---
|
||||
|
||||
## 🚀 設計原則 (Design Principles)
|
||||
|
||||
### 1. Clear API (介面清晰化)
|
||||
* **去蕪存菁**: 嚴格區分 **Public** (公開) 與 **Internal** (內部) 端點。舊版冗餘路徑(如 `/api/v1/videos`, `/api/v1/probe`)已全面移除或合併。
|
||||
* **標準化回應**: 所有列表型 API 均回傳統一結構 `{ "success": true, "data": [...], "total": N }`。
|
||||
* **命名規範**: 採用 RESTful 風格,資源以複數名詞或明確動作命名(如 `files`, `identities`)。
|
||||
|
||||
### 2. File-Centric (以檔案為核心)
|
||||
* **唯一識別**: 每個媒體檔案(影片/圖片/音訊)均由 **32 碼 UUID** (`file_uuid`) 唯一標識。
|
||||
* **生命週期**: `File` 是所有資料的根節點。所有的 `Chunk` (片段), `Snapshot` (快照), `Jobs` (任務) 皆隸屬於特定的 `File`。
|
||||
* **操作模式**: 前端應優先呼叫 `GET /api/v1/files` 取得清單,再透過 `POST /api/v1/files/:uuid/snapshots/migrate` 載入詳細資源。
|
||||
|
||||
### 3. Global Identity (全域身份識別)
|
||||
* **跨檔案關聯**: `Identity` 代表一個獨立的人物或角色,不受單一檔案限制。
|
||||
* **綁定機制 (Binding)**: 透過 `POST /api/v1/identities/bind`,我們可以將多個檔案中偵測到的臉部 (`face`) 或聲音 (`speaker`) 聚合到同一個 `Identity` 下。
|
||||
* **資料聚合**: 查詢某個 `Identity` 即可看到該人物在所有歷史檔案中的軌跡 (`/api/v1/identities/:uuid/files`)。
|
||||
|
||||
---
|
||||
|
||||
## 當前狀態
|
||||
|
||||
| 項目 | 狀態 |
|
||||
|------|------|
|
||||
| API 版本 | V1.0.0 |
|
||||
| 開發環境 Port | 3003 |
|
||||
| 正式環境 Port | 3002 |
|
||||
| 認證方式 | Header `X-API-Key` |
|
||||
|
||||
---
|
||||
|
||||
## 1. API Dictionary (端點清單)
|
||||
|
||||
### 1.1 系統與認證 (System & Auth)
|
||||
| Method | Endpoint | 說明 |
|
||||
| :--- | :--- | :--- |
|
||||
| `GET` | `/health` | 基本健康檢查 |
|
||||
| `POST` | `/api/v1/auth/login` | 登入以取得 API Key |
|
||||
|
||||
### 1.2 檔案管理 (File Management)
|
||||
*主要入口:瀏覽與管理資產*
|
||||
| Method | Endpoint | 說明 |
|
||||
| :--- | :--- | :--- |
|
||||
| `GET` | `/api/v1/files` | **列出所有檔案** (支援分頁) |
|
||||
| `GET` | `/api/v1/files/:uuid` | 取得檔案詳情 (包含 probe_json, metadata) |
|
||||
| `POST` | `/api/v1/files/register` | 從磁碟註冊新檔案 |
|
||||
| `DELETE`| `/api/v1/videos/:uuid` | **刪除影片** 及其關聯資料 |
|
||||
|
||||
### 1.3 搜尋與檢索 (Search & Retrieval)
|
||||
| Method | Endpoint | 說明 |
|
||||
| :--- | :--- | :--- |
|
||||
| `POST` | `/api/v1/search` | **語意搜尋** (Text-based, 使用 Embedding) |
|
||||
| `POST` | `/api/v1/search/hybrid` | 混合搜尋 (Vector + BM25 關鍵字) |
|
||||
| `POST` | `/api/v1/search/visual` | 視覺搜尋 (尋找物件/形狀) |
|
||||
| `POST` | `/api/v1/search/visual/class`| 依物件類別過濾 (如 "person", "car") |
|
||||
|
||||
### 1.4 身份與人物管理 (Identity Management)
|
||||
*跨影片的人物/角色關聯*
|
||||
| Method | Endpoint | 說明 |
|
||||
| :--- | :--- | :--- |
|
||||
| `GET` | `/api/v1/identities` | **列出所有身份** (人物/角色) |
|
||||
| `GET` | `/api/v1/identities/:uuid` | 取得身份詳情 (名稱, 品質, 來源) |
|
||||
| `GET` | `/api/v1/identities/:uuid/files`| 列出該身份出現的所有檔案 |
|
||||
| `GET` | `/api/v1/identities/:uuid/chunks`| 列出特定的時間軸片段 (Chunks) |
|
||||
| `POST` | `/api/v1/identities/bind` | 將臉部/聲音訊號綁定至身份 |
|
||||
|
||||
### 1.5 臉部與快照 (Face & Snapshots)
|
||||
| Method | Endpoint | 說明 |
|
||||
| :--- | :--- | :--- |
|
||||
| `GET` | `/api/v1/face/list` | 列出特定影片中偵測到的所有臉部 |
|
||||
| `POST` | `/api/v1/face/recognize` | 對指定影片觸發臉部辨識流程 |
|
||||
| `GET` | `/api/v1/files/:uuid/snapshots` | 檢查快照快取狀態 (Hot/Cold) |
|
||||
| `POST` | `/api/v1/files/:uuid/snapshots/migrate`| **載入快照至記憶體** (UI 顯示快圖前需呼叫) |
|
||||
|
||||
### 1.6 任務與代理人 (Jobs & Agents)
|
||||
| Method | Endpoint | 說明 |
|
||||
| :--- | :--- | :--- |
|
||||
| `GET` | `/api/v1/progress/:uuid` | 檢查即時處理進度 |
|
||||
| `POST` | `/api/v1/assets/:uuid/process` | 觸發處理流程 (ASR, YOLO, 等) |
|
||||
| `POST` | `/api/v1/agents/identity/analyze` | AI Agent: 分析身份重複情況 |
|
||||
|
||||
---
|
||||
|
||||
## 2. Progressive Workflow Examples (操作範例)
|
||||
|
||||
此章節展示典型的使用者操作情境:**尋找影片 → 處理 → 搜尋 → 人物綁定**。
|
||||
|
||||
### Phase 1: 瀏覽與檢視
|
||||
*使用者瀏覽檔案庫以尋找目標影片。*
|
||||
|
||||
**Step 1: 登入**
|
||||
```bash
|
||||
curl -s -X POST http://localhost:3003/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username": "demo", "password": "demo"}'
|
||||
# 回應範例: { "api_key": "muser_test_001..." }
|
||||
```
|
||||
|
||||
**Step 2: 列出檔案**
|
||||
```bash
|
||||
curl -s "http://localhost:3003/api/v1/files?page=1&page_size=5" \
|
||||
-H "X-API-Key: muser_test_001"
|
||||
# 回應範例: { "success": true, "data": [ { "file_uuid": "...", "file_name": "Demo.mp4" ... } ] }
|
||||
```
|
||||
|
||||
### Phase 2: 處理與監控
|
||||
*使用者決定分析該影片的臉部與語音內容。*
|
||||
|
||||
**Step 3: 觸發處理**
|
||||
```bash
|
||||
curl -s -X POST "http://localhost:3003/api/v1/assets/{file_uuid}/process" \
|
||||
-H "X-API-Key: muser_test_001" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{}'
|
||||
# 啟動 ASR, 臉部偵測等處理器
|
||||
```
|
||||
|
||||
**Step 4: 檢查進度**
|
||||
```bash
|
||||
curl -s "http://localhost:3003/api/v1/progress/{file_uuid}" \
|
||||
-H "X-API-Key: muser_test_001"
|
||||
# 回應範例: { "overall_progress": 50, "processors": [...] }
|
||||
```
|
||||
|
||||
### Phase 3: 搜尋內容
|
||||
*使用者搜尋影片中的特定內容。*
|
||||
|
||||
**Step 5: 語意搜尋 (文字描述)**
|
||||
```bash
|
||||
curl -s -X POST "http://localhost:3003/api/v1/search" \
|
||||
-H "X-API-Key: muser_test_001" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "一個人拿著紅色的信封", "uuid": "{file_uuid}"}'
|
||||
# 回應範例: 符合文字描述的片段列表
|
||||
```
|
||||
|
||||
### Phase 4: 身份管理 (GUI 開發重點)
|
||||
*使用者發現了一張臉,確認該人物,並將其綁定到已知身份。*
|
||||
|
||||
**Step 6: 載入快照 (Migrate Snapshots)**
|
||||
*在 GUI 渲染大量臉部縮圖前,必須先將快取載入記憶體以加速讀取。*
|
||||
```bash
|
||||
curl -s -X POST "http://localhost:3003/api/v1/files/{file_uuid}/snapshots/migrate" \
|
||||
-H "X-API-Key: muser_test_001" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"parent_uuid": "{file_uuid}"}'
|
||||
# 回應範例: { "success": true, "migrated_types": ["faces", ...] }
|
||||
```
|
||||
|
||||
**Step 7: 綁定臉部到身份 (Bind Face)**
|
||||
*假設偵測到臉部 `face_123`,欲綁定至身份 `uuid_identity`。*
|
||||
```bash
|
||||
curl -s -X POST "http://localhost:3003/api/v1/identities/bind" \
|
||||
-H "X-API-Key: muser_test_001" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"identity_id": null,
|
||||
"name": "Cary Grant",
|
||||
"binding_type": "face",
|
||||
"binding_value": "face_123"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 棄用聲明 (Deprecation Notices)
|
||||
|
||||
以下端點已在 V1.0.0 移除或棄用,**請勿**在新的開發中使用。
|
||||
|
||||
* `GET /api/v1/videos` (列表) → 已取代為 `GET /api/v1/files`
|
||||
* `POST /api/v1/register` → 已取代為 `POST /api/v1/files/register`
|
||||
* `POST /api/v1/probe` → 已取代為 `GET /api/v1/files/:uuid`
|
||||
* `GET /api/v1/people/...` → 已合併為 `GET /api/v1/identities/...`
|
||||
* `/api/v1/n8n/search/...` → 僅供內部 n8n 工作流使用 (請使用標準 `/api/v1/search`)
|
||||
102
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASRX_V1.0.0.md
Normal file
102
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASRX_V1.0.0.md
Normal file
@@ -0,0 +1,102 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "ASRX Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "asrx"
|
||||
- "speaker-diarization"
|
||||
- "speechbrain"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "ASRX 使用 SpeechBrain ECAPA-TDNN 進行說話者日誌化"
|
||||
- "ASRX 從 Pyannote 遷移至自定義 SpeechBrain,快 6 倍"
|
||||
- "ASRX 不需要 HuggingFace token(相較 Pyannote)"
|
||||
- "ASRX Charade 6879s 長片輸出 1118 segments, 8 說話人"
|
||||
- "ASRX 依賴 ASR processor 的轉錄結果"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../ASR_V1.0.0.md"
|
||||
- "../CUT_V1.0.0.md"
|
||||
- "../VOICE_EMBEDDING_FLOW_V1.0.0.md"
|
||||
- "../VECTOR_SPEC_V1.0.0.md"
|
||||
---
|
||||
|
||||
# ASRX Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ⚠️ 80% | **模型**: SpeechBrain ECAPA-TDNN | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| ASRX | 進階語音處理,包含說話者日誌化(Speaker Diarization) |
|
||||
| Speaker Diarization | 說話者日誌化,區分「誰在什麼時候說話」 |
|
||||
| ECAPA-TDNN | SpeechBrain 提供的說話人辨識模型,產出 192-D embedding |
|
||||
| VAD | Voice Activity Detection,語音活動檢測(使用 Silero) |
|
||||
| Spectral Clustering | 頻譜聚類,將 embedding 分群以區分不同說話人 |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
| 指標 | Pyannote-based(原始) | Custom SpeechBrain(新) |
|
||||
|------|----------------------|------------------------|
|
||||
| Pipeline | VAD → Whisper → Align → Diarize | VAD (Silero) → ECAPA-TDNN → Spectral Clustering |
|
||||
| 處理時間 | 4.79s(輸出為空) | **1.66s** (96.25x) |
|
||||
| 比 Pyannote 快 | 基準 | **6x 更快** |
|
||||
| HuggingFace token | ✅ **需要** | ❌ **不需要** |
|
||||
| 重疊語音 | ✅ 支援 | ❌ 不支援 |
|
||||
|
||||
**決策**: 因 pyannote.audio 需要 HuggingFace token、import 錯誤頻繁、輸出為空,已改為自定義 SpeechBrain 實作。
|
||||
|
||||
---
|
||||
|
||||
## 處理時間分解(Custom SpeechBrain)
|
||||
|
||||
| 步驟 | 時間 | 佔比 |
|
||||
|------|------|------|
|
||||
| VAD (Silero) | 0.41s | 24.7% |
|
||||
| Speaker embedding (ECAPA-TDNN) | 1.15s | 69.3% |
|
||||
| Spectral clustering | 0.10s | 6.0% |
|
||||
|
||||
---
|
||||
|
||||
## Charade 長片(6879s)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| Segments | 1118 |
|
||||
| 說話人數 | 8 |
|
||||
| 匹配率 | 99.82% |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.8 |
|
||||
| 記憶體 | 2048 MB |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | ASR |
|
||||
117
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASR_V1.0.0.md
Normal file
117
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASR_V1.0.0.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "ASR Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "asr"
|
||||
- "whisper"
|
||||
- "speech-recognition"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "ASR 使用 faster-whisper/small 模型及 INT8 CPU 量化"
|
||||
- "ASR 以 CUT 場景邊界為基礎分段處理長片"
|
||||
- "ASR 每個 segment 記錄 scene_number 對應 CUT 場景序號"
|
||||
- "ASR 處理 159.6s 影片約 12.68s,即時倍率 12.6x"
|
||||
- "ASR 依賴 CUT processor 的場景邊界輸出"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../CUT_V1.0.0.md"
|
||||
- "../ASRX_V1.0.0.md"
|
||||
- "../STORY_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# ASR Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: faster-whisper/small | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| ASR | Automatic Speech Recognition,自動語音辨識 |
|
||||
| faster-whisper | 基於 OpenAI Whisper 的優化版本,支援 INT8 CPU 量化 |
|
||||
| segment | Whisper 輸出的語音片段,包含 start/end/time/text |
|
||||
| scene_number | CUT 場景序號(1-based),標示 segment 所屬場景 |
|
||||
| real-time factor | 即時倍率,處理時間與影片時長的比值 |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
| 模型 | 參數 | 大小 | English WER | Chinese CER | 速度 |
|
||||
|------|------|------|-------------|-------------|------|
|
||||
| tiny | 39M | ~40MB | 9.5% | 15.0% | ~1x RT |
|
||||
| base | 74M | ~75MB | 7.3% | 11.2% | ~1.5x RT |
|
||||
| **small** | **244M** | **~250MB** | **5.5%** | **8.4%** | **~2x RT** |
|
||||
| medium | 769M | ~800MB | 4.3% | 6.4% | ~3x RT |
|
||||
| large-v3 | 1.5B | ~1.5GB | 3.5% | 4.9% | ~5x RT |
|
||||
|
||||
**決策**: small 在準確率與速度間取得最佳平衡,經實驗驗證最少要使用 small 才能較好處理多語種及台灣腔國語。
|
||||
|
||||
---
|
||||
|
||||
## 效能實測(ExaSAN 159.6s 影片)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 12.68s |
|
||||
| 即時倍率 | 12.6x |
|
||||
| 輸出 | 78~79 segments, ~15KB |
|
||||
|
||||
---
|
||||
|
||||
## 長片分段處理
|
||||
|
||||
對於長片(如 Charade 6879s),ASR 以 CUT processor 產出的場景邊界為基礎分段處理:
|
||||
|
||||
1. CUT 先產出 `{file_uuid}.cut.json`(含 `scenes[]`,每個有 `start_time`/`end_time`)
|
||||
2. ASR 讀取 CUT JSON,依 `scene_number` 順序對每個場景萃取音訊
|
||||
3. 每個場景分別用 Whisper 轉錄
|
||||
4. 合併結果,每個 segment 記錄所屬的 `scene_number`
|
||||
|
||||
每個 segment 的 JSON 格式:
|
||||
```json
|
||||
{
|
||||
"start": 12.5,
|
||||
"end": 15.3,
|
||||
"text": "Hello world",
|
||||
"scene_number": 42
|
||||
}
|
||||
```
|
||||
|
||||
`scene_number` 是在該 `file_uuid` 下的 CUT 場景序號(1-based)。
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
---
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 1.0(一個完整核心) |
|
||||
| 記憶體 | 2048 MB(長片因分段處理,實際低於此值) |
|
||||
| GPU | 不使用(INT8 CPU 量化) |
|
||||
| 依賴 | 無 |
|
||||
80
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CAPTION_V1.0.0.md
Normal file
80
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CAPTION_V1.0.0.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Caption Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "caption"
|
||||
- "moondream2"
|
||||
- "image-captioning"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Caption 使用 Moondream2 進行本地圖像描述生成"
|
||||
- "Caption 已從 GPT-4o 雲端 API 本地化為 Moondream2"
|
||||
- "Caption Moondream2 模型約 1.8GB,完全本地執行"
|
||||
- "Caption 處理速度約 5s/frame"
|
||||
- "Caption 備援方案為 YOLO + OCR + Scene 串接"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../SCENE_V1.0.0.md"
|
||||
- "../STORY_V1.0.0.md"
|
||||
- "../YOLO_V1.0.0.md"
|
||||
- "../OCR_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Caption Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: Moondream2 | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Caption | 圖像描述生成,為每個場景產出文字敘述 |
|
||||
| Moondream2 | HuggingFace transformers 提供的本地圖像描述模型 |
|
||||
| GPT-4o | (已移除)先前使用的雲端 API 方案 |
|
||||
| local deployment | 完全本地執行,不依賴任何雲端 API |
|
||||
| fallback | 備援方案:YOLO + OCR + Scene 結果串接 |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
| 指標 | GPT-4o(已移除) | Moondream2(新) |
|
||||
|------|-----------------|-----------------|
|
||||
| 速度 | 2s/frame | 5s/frame |
|
||||
| 品質 | 高 | 良好 |
|
||||
| 依賴 | ✅ 雲端 API Key | ❌ 完全本地 |
|
||||
|
||||
**決策**: 已從 GPT-4o 雲端 API 本地化為 Moondream2(HuggingFace transformers, ~1.8GB)。備援方案為 YOLO + OCR + Scene 結果串接。
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | - |
|
||||
| 記憶體 | ~1.8 GB(模型載入後) |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | Scene |
|
||||
135
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CUT_V1.0.0.md
Normal file
135
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CUT_V1.0.0.md
Normal file
@@ -0,0 +1,135 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "CUT Processor (Scene Cut Detection) V1.0.0"
|
||||
date: "2026-05-03"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "cut"
|
||||
- "scene-detection"
|
||||
- "pyscenedetect"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "CUT 場景檢測的輸出結構與檔案後綴規則"
|
||||
- "CUT 的 cut_count 與 cut_max_duration 用途"
|
||||
- "長影片動態調度如何將 Face 移到 ASR 前"
|
||||
- "CUT 與 Scene 的執行階段(register 同步)"
|
||||
- "CUT 輸出 JSON 結構(start_time/end_time)"
|
||||
related_documents:
|
||||
- "PROCESSORS/SCENE_V1.0.0.md"
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "PROCESSORS/ASR_V1.0.0.md"
|
||||
- "PROCESSORS/FACE_V1.0.0.md"
|
||||
- "CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# CUT Processor (Scene Cut Detection) V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-03 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: PySceneDetect (ContentDetector) | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| CUT | 場景切換檢測,使用 PySceneDetect ContentDetector |
|
||||
| scene boundary | 場景邊界,以 start_time/end_time 定義 |
|
||||
| cut_count | 場景數量,register 階段寫入 DB |
|
||||
| cut_max_duration | 最長場景秒數,用於長影片動態調度 |
|
||||
| ContentDetector | 基於幀差異的場景切換檢測演算法 |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
無 ML 模型,基於幀差異的場景切換檢測。門檻值 threshold=27 為實驗最佳值。
|
||||
|
||||
---
|
||||
|
||||
## 輸出結構
|
||||
|
||||
CUT 產出 `{file_uuid}.cut.json`,結構如下:
|
||||
|
||||
```json
|
||||
{
|
||||
"scenes": [
|
||||
{ "start_time": 0.0, "end_time": 120.5 },
|
||||
{ "start_time": 120.5, "end_time": 245.0 }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 執行階段
|
||||
|
||||
CUT 在 **register 階段同步執行**(`register_single_file`),不做 worker pipeline 排程。完成後寫入 DB 欄位:
|
||||
- `cut_done: bool` — 是否完成
|
||||
- `cut_count: i32` — 場景數量
|
||||
- `cut_max_duration: f64` — 最長場景秒數
|
||||
|
||||
---
|
||||
|
||||
## 狀態後綴
|
||||
|
||||
| 後綴 | 意義 | 行為 |
|
||||
|------|------|------|
|
||||
| `.cut.json` | 完成 | 直接載入使用 |
|
||||
| `.cut.json.tmp` | 執行中 | 跳過、等待 |
|
||||
| `.cut.json.err` | 失敗 | 跳過、不重試 |
|
||||
|
||||
---
|
||||
|
||||
## 長影片動態調度
|
||||
|
||||
當 `cut_count ≤ 3 && cut_max_duration > 600s`(如會議紀錄長鏡頭),Worker 自動調整 pipeline 順序:
|
||||
- **Face 移到 ASR 前面**,先用 face detection 找出人物進出點
|
||||
- 後續可用 face 分佈切分長 scene,輔助 ASR 分段
|
||||
|
||||
---
|
||||
|
||||
## 效能實測
|
||||
|
||||
**ExaSAN 159.6s 影片**:
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 0.08s |
|
||||
| 即時倍率 | 2036.5x(最快的 processor) |
|
||||
| 輸出 | 52 bytes |
|
||||
|
||||
**Charade 長片(6879s, 412343 幀)**:
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 場景數 | 1331 |
|
||||
| 輸出 | 217 KB |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-03 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
---
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.5 |
|
||||
| 記憶體 | 512 MB |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | 無 |
|
||||
@@ -0,0 +1,133 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Face Embedding 產出流程 V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "face"
|
||||
- "embedding"
|
||||
- "qdrant"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Face Embedding 的完整處理流程(Frame → InsightFace → Qdrant)"
|
||||
- "Face processor 的輸出結構與 embedding 欄位說明"
|
||||
- "Worker store_face_chunks 與 store_face_embeddings_to_qdrant 的步驟"
|
||||
- "Qdrant face collection 的 payload 結構與點位 ID 規則"
|
||||
- "Face embedding 的 512-D ArcFace w600k_r50 向量規格"
|
||||
- "Face embedding 使用 Cosine 距離計算"
|
||||
- "InsightFace buffalo_l 的資源預估與 GPU 加速資訊"
|
||||
- "face_detections 表與 Qdrant 的資料同步方式"
|
||||
related_documents:
|
||||
- "../VECTOR_SPEC_V1.0.0.md"
|
||||
- "../PROCESSORS/FACE_V1.0.0.md"
|
||||
- "../PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
- "../MOMENTRY_CORE_API_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Face Embedding 產出流程 V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Face Embedding | 人臉向量嵌入,由 InsightFace ArcFace 產出 512-D 向量 |
|
||||
| SCRFD-10G | InsightFace 的人臉檢測模型 |
|
||||
| ArcFace w600k_r50 | InsightFace 的人臉辨識模型,產出 512-D embedding |
|
||||
| point_id | Qdrant 中向量的唯一 ID,使用幀編號 (frame number) |
|
||||
| Cosine distance | 餘弦距離,用於向量相似度計算 |
|
||||
| payload | Qdrant 向量的附帶 metadata 欄位 |
|
||||
|
||||
## 處理流程
|
||||
|
||||
```
|
||||
1. Video Frame (取樣)
|
||||
│
|
||||
▼
|
||||
2. Face Processor (face_processor.py)
|
||||
├── InsightFace buffalo_l
|
||||
│ ├── SCRFD-10G 人臉檢測
|
||||
│ ├── ArcFace w600k_r50 512-D embedding
|
||||
│ ├── 年齡/性別預測
|
||||
│ └── 2D106 landmarks
|
||||
│
|
||||
├── 輸出: job_{id}_face_{ts}.json → {file_uuid}.face.json
|
||||
│ └── FaceResult { frame_count, fps, frames: [FaceFrame] }
|
||||
│
|
||||
▼
|
||||
3. Worker store_face_chunks()
|
||||
├── 解析 FaceResult
|
||||
├── 寫入 pre_chunks 表 (file_uuid, processor_type='face', data)
|
||||
└── 寫入 face_detections 表
|
||||
│
|
||||
▼
|
||||
4. Worker store_face_embeddings_to_qdrant()
|
||||
├── 對每個 face frame 的每個 face
|
||||
│ └── 若有 embedding (512-D):
|
||||
│ ├── point_id = frame number (u64)
|
||||
│ ├── vector = 512-D float array
|
||||
│ └── payload (見下方)
|
||||
└── 寫入 Qdrant collection `momentry_dev_face`
|
||||
```
|
||||
|
||||
## Qdrant Payload 結構
|
||||
|
||||
```json
|
||||
{
|
||||
"file_uuid": "dd61fda85fee441fdd00ab5528213ff7",
|
||||
"face_id": null,
|
||||
"frame": 15,
|
||||
"timestamp": 0.68,
|
||||
"x": 328,
|
||||
"y": 88,
|
||||
"width": 63,
|
||||
"height": 75,
|
||||
"confidence": 0.83
|
||||
}
|
||||
```
|
||||
|
||||
| 欄位 | 型別 | 說明 |
|
||||
|------|------|------|
|
||||
| `file_uuid` | string | 來源影片識別碼 |
|
||||
| `face_id` | string|null | 臉部追蹤 ID(尚未分配時為 null) |
|
||||
| `frame` | integer | 幀編號 |
|
||||
| `timestamp` | float | 時間戳(秒) |
|
||||
| `x, y, width, height` | integer | 人臉邊界框 |
|
||||
| `confidence` | float | 檢測信心度 (0~1) |
|
||||
|
||||
## Vector 規格
|
||||
|
||||
| 屬性 | 值 |
|
||||
|------|-----|
|
||||
| 模型 | InsightFace ArcFace w600k_r50 |
|
||||
| 維度 | 512 |
|
||||
| 距離計算 | Cosine |
|
||||
| 歸一化 | 否 (raw output) |
|
||||
|
||||
## 來源 Processor 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| 模型 | InsightFace buffalo_l (~150MB) |
|
||||
| CPU | 0.6 |
|
||||
| 記憶體 | 1536 MB |
|
||||
| GPU | 支援(CoreML 50-80 FPS, CUDA 80-120 FPS) |
|
||||
| 處理速度 | 130.5x real-time (M4 Mac Mini) |
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
104
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_V1.0.0.md
Normal file
104
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_V1.0.0.md
Normal file
@@ -0,0 +1,104 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Face Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "face"
|
||||
- "insightface"
|
||||
- "face-detection"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Face 使用 InsightFace buffalo_l 進行人臉偵測與辨識"
|
||||
- "Face 在 ExaSAN 159.6s 影片上僅需 1.22s,即時倍率 130.5x"
|
||||
- "Face 支援 GPU 加速,CoreML 可達 50~80 FPS"
|
||||
- "Face 輸出 512-D embedding 用於比對"
|
||||
- "Face 不再使用 Haar Cascade fallback,強制使用 InsightFace"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../FACE_EMBEDDING_FLOW_V1.0.0.md"
|
||||
- "../CUT_V1.0.0.md"
|
||||
- "../VECTOR_SPEC_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Face Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: InsightFace buffalo_l | **GPU**: 是
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Face Detection | 人臉偵測,使用 InsightFace SCRFD-10G |
|
||||
| Face Recognition | 人臉辨識,使用 ArcFace w600k_r50 產出 512-D embedding |
|
||||
| embedding | 向量嵌入,用於人臉比對與搜尋 |
|
||||
| CoreML | Apple Silicon 上的 GPU 加速方案 |
|
||||
| LFW | Labeled Faces in the Wild,人臉辨識基準資料集 |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
| 模型 | 類型 | 大小 | 檢測率 | 辨識率 | Embedding |
|
||||
|------|------|------|--------|--------|-----------|
|
||||
| **InsightFace Buffalo_l** | **完整套件** | **~150MB** | **97.3% mAP** | **99.77% (LFW)** | **512-D ✅** |
|
||||
| MediaPipe BlazeFace | 輕量檢測 | 1~2MB | 95.2% mAP | 無 | ❌ |
|
||||
| OpenCV Haar Cascade | 傳統 ML | 900KB | 70~85% | 無 | ❌ |
|
||||
|
||||
**關鍵決策**: 舊版 Haar Cascade fallback 會產生全鏈路失敗(0 embeddings),已改為強制使用 InsightFace。
|
||||
|
||||
---
|
||||
|
||||
## 效能實測(ExaSAN 159.6s 影片)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 1.22s |
|
||||
| 即時倍率 | 130.5x |
|
||||
| 輸出 | 49 frames, 67 faces |
|
||||
|
||||
---
|
||||
|
||||
## GPU 加速
|
||||
|
||||
| 平台 | FPS |
|
||||
|------|-----|
|
||||
| CoreML (Apple Silicon) | 50~80 FPS |
|
||||
| CUDA (NVIDIA) | 80~120 FPS |
|
||||
| CPU | 15~20 FPS |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
---
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.6 |
|
||||
| 記憶體 | 1536 MB |
|
||||
| GPU | 支援(`uses_gpu = true`) |
|
||||
| 依賴 | 無 |
|
||||
|
||||
---
|
||||
87
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/OCR_V1.0.0.md
Normal file
87
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/OCR_V1.0.0.md
Normal file
@@ -0,0 +1,87 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "OCR Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "ocr"
|
||||
- "paddleocr"
|
||||
- "optical-character-recognition"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "OCR 使用 PaddleOCR PP-OCRv4 模型支援 80+ 語言"
|
||||
- "OCR 處理 159.6s 影片全幀約 36.87s,即時倍率 4.3x"
|
||||
- "OCR 輸出 102 frames, 234 texts, 65KB"
|
||||
- "OCR 不使用 GPU,CPU 使用率 0.8"
|
||||
- "OCR 精度 > 95%,支援繁體中文"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../YOLO_V1.0.0.md"
|
||||
- "../CAPTION_V1.0.0.md"
|
||||
- "../VISUAL_CHUNK_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# OCR Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: PaddleOCR PP-OCRv4 | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| OCR | Optical Character Recognition,光學字元辨識 |
|
||||
| PaddleOCR | 百度開發的 OCR 引擎,PP-OCRv4 為最新版本 |
|
||||
| PP-OCRv4 | PaddleOCR 第四代模型,支援 80+ 語言 |
|
||||
| real-time factor | 即時倍率,處理時間與影片時長的比值 |
|
||||
| full-frame processing | 全幀處理模式,對影片每一幀進行 OCR |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
選擇 PaddleOCR 原因:
|
||||
- 支援 80+ 語言(含繁體中文)
|
||||
- 精度 > 95%
|
||||
- EasyOCR 經測試不如 PaddleOCR
|
||||
|
||||
---
|
||||
|
||||
## 效能實測(ExaSAN 159.6s 影片, 全幀處理)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 36.87s |
|
||||
| 即時倍率 | 4.3x |
|
||||
| 輸出 | 102 frames, 234 texts, 65KB |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.8 |
|
||||
| 記憶體 | 1024 MB |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | 無 |
|
||||
84
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/POSE_V1.0.0.md
Normal file
84
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/POSE_V1.0.0.md
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Pose Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "pose"
|
||||
- "mediapipe"
|
||||
- "pose-estimation"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Pose 使用 MediaPipe Pose (pose_landmarker_heavy, 33 keypoints)"
|
||||
- "Pose 處理 159.6s 影片全幀約 65.87s,即時倍率 2.4x"
|
||||
- "Pose 輸出 1853 frames, 2341 persons, 603KB"
|
||||
- "Pose 支援 GPU 加速(uses_gpu = true)"
|
||||
- "Pose 與 YOLO 同為處理瓶頸之一"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../YOLO_V1.0.0.md"
|
||||
- "../FACE_V1.0.0.md"
|
||||
- "../CUT_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Pose Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: MediaPipe Pose | **GPU**: 是
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Pose Estimation | 姿態估計,偵測人體關鍵點位置 |
|
||||
| MediaPipe | Google 開發的跨平台 ML 解決方案 |
|
||||
| keypoint | 關鍵點,pose_landmarker_heavy 輸出 33 個關鍵點 |
|
||||
| landmarker_heavy | MediaPipe 的精確模式,準確度最高但速度較慢 |
|
||||
| bottleneck | 處理瓶頸,Pose 與 YOLO 同為最耗時的 processor |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
使用 MediaPipe Pose(pose_landmarker_heavy, 33 keypoints)。
|
||||
|
||||
---
|
||||
|
||||
## 效能實測(ExaSAN 159.6s 影片, 全幀處理)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 65.87s |
|
||||
| 即時倍率 | 2.4x(瓶頸之一,與 YOLO 相當) |
|
||||
| 輸出 | 1853 frames, 2341 persons, 603KB |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.4 |
|
||||
| 記憶體 | 1024 MB |
|
||||
| GPU | 支援(`uses_gpu = true`) |
|
||||
| 依賴 | 無 |
|
||||
95
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/SCENE_V1.0.0.md
Normal file
95
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/SCENE_V1.0.0.md
Normal file
@@ -0,0 +1,95 @@
|
||||
---
|
||||
document_type: "processor-spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Scene Processor (Scene Classification) V1.0.0"
|
||||
date: "2026-05-03"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "scene"
|
||||
- "places365"
|
||||
- "scene-classification"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Scene 分類的模型選型與效能實測"
|
||||
- "Scene 的執行階段與檔案後綴檢查規則"
|
||||
- "Scene 與 CUT 的依賴關係(已移除 ASR)"
|
||||
- "Scene 輸出為 pre_chunks 供 Rule 3 parent chunk 使用"
|
||||
- "load_scene_from_file 直接載入 JSON 不入庫"
|
||||
related_documents:
|
||||
- "PROCESSORS/CUT_V1.0.0.md"
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "PROCESSORS/CAPTION_V1.0.0.md"
|
||||
- "PROCESSORS/STORY_V1.0.0.md"
|
||||
- "CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Scene Processor (Scene Classification) V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-03 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: MIT Places365 (ResNet18) | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Scene Classification | 場景分類,辨識影片畫面的場景類型 |
|
||||
| Places365 | MIT 開發的場景辨識資料集與模型(365 個場景類別) |
|
||||
| ResNet18 | 殘差網路架構,輕量級分類模型 |
|
||||
| pre_chunks | 原始元件的資料表,Scene 輸出供 Rule 3 使用 |
|
||||
| parent chunk | 聚合多個 child chunks 的上層 chunk,由 Rule 3 產出 |
|
||||
|
||||
## 選型過程
|
||||
|
||||
初始使用 ImageNet(產生 scene_XXX 類別索引),後升級至 Places365 以獲得具名場景類別(如 living_room, beach, airport),準確率 85~90%。
|
||||
|
||||
## 執行階段
|
||||
|
||||
Scene 在 **register 階段同步執行**(`register_single_file`)。Worker 中重入時檢查後綴:
|
||||
- `.scene.json` → 從檔案載入(不入庫 pre_chunks)
|
||||
- `.scene.json.tmp` → 跳過(回傳空結果)
|
||||
- `.scene.json.err` → 跳過(回傳空結果)
|
||||
|
||||
載入函數:`load_scene_from_file(path: &str) -> SceneClassificationResult`
|
||||
|
||||
## 與 CUT 的關係
|
||||
|
||||
Scene 與 ASR 無關(純視覺分類),已移除對 ASR 的依賴。CUT 為 Scene 的唯一前置依賴。
|
||||
|
||||
## 輸出用途
|
||||
|
||||
Scene 為 **pre_chunks**(scene boundary),供 Rule 3 產生 parent chunk。Rule 3 需要 CUT + Scene 的 boundary 來產生複合 parent chunk。
|
||||
|
||||
## 效能實測(ExaSAN 159.6s 影片, 取樣間隔=2s)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 4.09s |
|
||||
| 即時倍率 | 39.0x |
|
||||
| 取樣數 | 79 samples |
|
||||
|
||||
## Charade 長片(6879s)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 313.3s(5.2 分鐘) |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.3 |
|
||||
| 記憶體 | 512 MB |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | CUT, ASR |
|
||||
80
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/STORY_V1.0.0.md
Normal file
80
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/STORY_V1.0.0.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Story Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "story"
|
||||
- "template-aggregator"
|
||||
- "narrative"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Story 使用模板聚合從 ASR+YOLO+Scene 產生結構化敘述"
|
||||
- "Story 已從 GPT-4 雲端 API 本地化為模板聚合"
|
||||
- "Story 處理速度 <0.1s/chunk,極快"
|
||||
- "Story 完全不依賴雲端 API,完全本地執行"
|
||||
- "Story 依賴 Scene 和 Caption processor 的輸出"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../SCENE_V1.0.0.md"
|
||||
- "../CAPTION_V1.0.0.md"
|
||||
- "../ASR_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Story Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: 模板聚合 | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Story Processor | 從 ASR + YOLO + Scene 結果產生結構化敘述的處理器 |
|
||||
| Template Aggregation | 使用預定義模板組合資料,非 LLM 生成 |
|
||||
| GPT-4 | (已移除)先前使用的雲端 API 方案 |
|
||||
| local deployment | 完全本地執行,不依賴任何雲端 API |
|
||||
| structured narrative | 結構化敘述,以固定格式組織的故事描述 |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
| 指標 | GPT-4(已移除) | 模板(新) |
|
||||
|------|----------------|------------|
|
||||
| 速度 | 3s/chunk | **<0.1s/chunk** |
|
||||
| 品質 | 自然語言 | 結構化格式 |
|
||||
| 依賴 | ✅ 雲端 API Key | ❌ 完全本地 |
|
||||
|
||||
**決策**: 已從 GPT-4 雲端 API 本地化為模板聚合,從 ASR + YOLO + Scene 結果產生結構化敘述。
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | - |
|
||||
| 記憶體 | - |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | Scene, Caption |
|
||||
@@ -0,0 +1,74 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "VisualChunk Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "visual-chunk"
|
||||
- "rule-aggregator"
|
||||
- "yolo"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "VisualChunk 是規則驅動的聚合器,非 ML 模型"
|
||||
- "VisualChunk 將 YOLO 結果組合成視覺分片"
|
||||
- "VisualChunk 依賴 YOLO processor 的偵測結果"
|
||||
- "VisualChunk CPU 使用率低(0.3),記憶體 512 MB"
|
||||
- "VisualChunk 是 Scene 和 Story processor 的前置依賴"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../YOLO_V1.0.0.md"
|
||||
- "../SCENE_V1.0.0.md"
|
||||
- "../STORY_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# VisualChunk Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 整合 | **模型**: 無(規則聚合) | **GPU**: 否
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| VisualChunk | 規則驅動的聚合器,將 YOLO 結果組合成視覺分片 |
|
||||
| Rule Aggregation | 使用預設規則而非 ML 模型進行資料組合 |
|
||||
| Visual Chunk | 視覺分片,包含 YOLO 偵測物件的時間區間 |
|
||||
| pre_chunks | 原始元件表,VisualChunk 的輸出會寫入此表 |
|
||||
| dependency chain | 依賴鏈:YOLO → VisualChunk → Scene → Story |
|
||||
|
||||
---
|
||||
|
||||
## 說明
|
||||
|
||||
非 ML 模型,是規則驅動的聚合器,將 YOLO 結果組合成視覺分片。
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.3 |
|
||||
| 記憶體 | 512 MB |
|
||||
| GPU | 不使用 |
|
||||
| 依賴 | YOLO |
|
||||
@@ -0,0 +1,139 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Voice Embedding 產出流程 V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "voice"
|
||||
- "embedding"
|
||||
- "asrx"
|
||||
- "qdrant"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "Voice Embedding 的完整處理流程(音軌 → ECAPA-TDNN → Qdrant)"
|
||||
- "ASRX Processor 的三階段處理:音軌預處理 → ASR segments 載入 → Speaker Diarization"
|
||||
- "Worker store_asrx_chunks 的步驟與 pre_chunks 寫入規則"
|
||||
- "Qdrant voice collection 的 payload 結構與欄位定義"
|
||||
- "Voice embedding 的 192-D ECAPA-TDNN 向量規格(L2 normalize)"
|
||||
- "Voice embedding 使用 Cosine 距離計算與 L2 歸一化"
|
||||
- "SpeechBrain ECAPA-TDNN 的資源預估與處理速度"
|
||||
- "Voice embedding 與 ASR 處理器的依賴關係"
|
||||
related_documents:
|
||||
- "../VECTOR_SPEC_V1.0.0.md"
|
||||
- "../PROCESSORS/ASRX_V1.0.0.md"
|
||||
- "../PROCESSORS/ASR_V1.0.0.md"
|
||||
- "../PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../MOMENTRY_CORE_API_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Voice Embedding 產出流程 V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Voice Embedding | 語音向量嵌入,由 ECAPA-TDNN 產出 192-D 向量 |
|
||||
| ECAPA-TDNN | SpeechBrain 提供的說話人辨識模型 |
|
||||
| L2 normalize | 向量歸一化,確保所有向量單位長度 |
|
||||
| Spectral Clustering | 頻譜聚類,將語音 embedding 分群以區分說話人 |
|
||||
| segment_index | 在 asrx 輸出 segments 中的索引編號 |
|
||||
| speaker_id | 說話人標籤(如 SPEAKER_0, SPEAKER_1) |
|
||||
|
||||
## 處理流程
|
||||
|
||||
```
|
||||
1. Video → ffmpeg 萃取音軌 → 16kHz mono WAV
|
||||
│
|
||||
▼
|
||||
2. ASRX Processor (asrx_processor_custom.py)
|
||||
│
|
||||
├── Stage 1: 音軌預處理
|
||||
│ ├── ffprobe 列出所有音軌
|
||||
│ ├── 選擇最佳音軌(優先英語)
|
||||
│ └── ffmpeg 轉為 16kHz mono WAV
|
||||
│
|
||||
├── Stage 2: 載入 ASR segments
|
||||
│ └── 從 {file_uuid}.asr.json 讀取 segments
|
||||
│
|
||||
├── Stage 3: Speaker Diarization (SelfASRXFixed.process_with_segments)
|
||||
│ ├── 對每個 ASR segment 取出音訊片段
|
||||
│ ├── ECAPA-TDNN 產出 192-D embedding
|
||||
│ ├── 正規化 embeddings
|
||||
│ └── 譜聚類 → speaker label
|
||||
│
|
||||
├── 輸出: {file_uuid}.asrx.json
|
||||
│ ├── segments: [start_time, end_time, speaker_id]
|
||||
│ └── embeddings: [[192-D float array], ...]
|
||||
│
|
||||
▼
|
||||
3. Worker store_asrx_chunks()
|
||||
├── 解析 AsrxResult
|
||||
├── 寫入 pre_chunks 表
|
||||
└── 寫入 voice embeddings 到 Qdrant
|
||||
│
|
||||
▼
|
||||
4. Qdrant `momentry_dev_voice`
|
||||
└── 每個 segment 一個 vector
|
||||
```
|
||||
|
||||
## Qdrant Payload 結構
|
||||
|
||||
```json
|
||||
{
|
||||
"file_uuid": "dd61fda85fee441fdd00ab5528213ff7",
|
||||
"speaker_id": "SPEAKER_0",
|
||||
"segment_index": 0,
|
||||
"start_frame": 9,
|
||||
"end_frame": 441,
|
||||
"start_time": 0.3,
|
||||
"end_time": 14.7
|
||||
}
|
||||
```
|
||||
|
||||
| 欄位 | 型別 | 說明 |
|
||||
|------|------|------|
|
||||
| `file_uuid` | string | 來源影片識別碼 |
|
||||
| `speaker_id` | string | 說話人標籤(如 SPEAKER_0) |
|
||||
| `segment_index` | integer | 在 segments 中的索引 |
|
||||
| `start_frame` | integer | 起始幀 |
|
||||
| `end_frame` | integer | 結束幀 |
|
||||
| `start_time` | float | 起始時間(秒) |
|
||||
| `end_time` | float | 結束時間(秒) |
|
||||
|
||||
## Vector 規格
|
||||
|
||||
| 屬性 | 值 |
|
||||
|------|-----|
|
||||
| 模型 | SpeechBrain ECAPA-TDNN |
|
||||
| 維度 | 192 |
|
||||
| 距離計算 | Cosine |
|
||||
| 歸一化 | 是(L2 normalize) |
|
||||
|
||||
## 來源 Processor 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| 模型 | SpeechBrain ECAPA-TDNN (~80MB) |
|
||||
| CPU | 0.8 |
|
||||
| 記憶體 | 2048 MB |
|
||||
| GPU | 不使用 |
|
||||
| 處理速度 | 57x real-time (M4 Mac Mini) |
|
||||
| 依賴 | ASR(需 ASR JSON 完成後才能啟動) |
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
92
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/YOLO_V1.0.0.md
Normal file
92
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/YOLO_V1.0.0.md
Normal file
@@ -0,0 +1,92 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "YOLO Processor V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
parent: "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "yolo"
|
||||
- "object-detection"
|
||||
- "yolov8"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "YOLO 使用 yolov8n (nano) 模型進行物件偵測"
|
||||
- "YOLO 在 M4 Mac Mini 上可達 100~200 FPS"
|
||||
- "YOLO 支援 GPU 加速(MPS),可快 2~5 倍"
|
||||
- "YOLO 輸出 4.3 MB 含偵測結果"
|
||||
- "YOLO 是 VisualChunk 和 Scene 的依賴"
|
||||
related_documents:
|
||||
- "PROCESSOR_SELECTION_V1.0.0.md"
|
||||
- "../VISUAL_CHUNK_V1.0.0.md"
|
||||
- "../POSE_V1.0.0.md"
|
||||
- "../OCR_V1.0.0.md"
|
||||
- "../CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# YOLO Processor V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
**狀態**: ✅ 100% | **模型**: YOLOv8n (nano) | **GPU**: 是
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| YOLO | You Only Look Once,即時物件偵測演算法 |
|
||||
| YOLOv8n | Ultralytics YOLO 第八代 nano 版本,最小最快 |
|
||||
| object detection | 物件偵測,辨識影像中的物體類別與位置 |
|
||||
| MPS | Metal Performance Shaders,Apple Silicon GPU 加速 |
|
||||
| bottleneck | 處理瓶頸,YOLO 與 Pose 同為最耗時的 processor |
|
||||
|
||||
---
|
||||
|
||||
## 選型過程
|
||||
|
||||
| 模型 | 參數 | 大小 | 速度 | 精度 |
|
||||
|------|------|------|------|------|
|
||||
| **yolov8n (nano)** | **3.2M** | **6.2MB** | **最快** | **較低** |
|
||||
| yolov8s (small) | 11.2M | - | 快 | 中等 |
|
||||
| yolov8m (medium) | 25.9M | - | 中 | 高 |
|
||||
| yolov8l (large) | 43.7M | - | 慢 | 很高 |
|
||||
| yolov8x (x-large) | 68.2M | - | 最慢 | 最高 |
|
||||
|
||||
**決策**: 預設使用 `yolov8n.pt`(nano),在 M4 Mac Mini 上可達 100~200 FPS。可透過配置檔切換至更大模型。
|
||||
|
||||
---
|
||||
|
||||
## 效能實測(ExaSAN 159.6s 影片, 全幀處理)
|
||||
|
||||
| 指標 | 值 |
|
||||
|------|-----|
|
||||
| 處理時間 | 65.72s |
|
||||
| 即時倍率 | 2.4x(瓶頸之一) |
|
||||
| 輸出 | 4.3 MB |
|
||||
|
||||
---
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
|
||||
## 資源預估
|
||||
|
||||
| 資源 | 值 |
|
||||
|------|-----|
|
||||
| CPU | 0.3 |
|
||||
| 記憶體 | 1024 MB |
|
||||
| GPU | 支援(`yolo_processor_mps.py` 可使用 MPS,快 2~5 倍) |
|
||||
| 依賴 | 無 |
|
||||
108
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSOR_SELECTION_V1.0.0.md
Normal file
108
docs_v1.0/API_V1.0.0/INTERNAL/PROCESSOR_SELECTION_V1.0.0.md
Normal file
@@ -0,0 +1,108 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "Processor 選型與資源預估 V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.1"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "processor"
|
||||
- "model-selection"
|
||||
- "resource-estimation"
|
||||
- "v1.0.0"
|
||||
ai_query_hints:
|
||||
- "processor 的選型原因與實驗報告"
|
||||
- "各 processor 的資源預估與模型資訊"
|
||||
- "processor 之間的依賴關係"
|
||||
- "模型選擇的比較與決策"
|
||||
- "processor 檔案狀態後綴規則(json/tmp/err)"
|
||||
- "Job 完成條件與必要 processor 定義"
|
||||
related_documents:
|
||||
- "PROCESSORS/ASR_V1.0.0.md"
|
||||
- "PROCESSORS/FACE_V1.0.0.md"
|
||||
- "PROCESSORS/YOLO_V1.0.0.md"
|
||||
- "PROCESSORS/CUT_V1.0.0.md"
|
||||
- "CHUNK_DEFINITION_V1.0.0.md"
|
||||
---
|
||||
|
||||
# Processor 選型與資源預估 V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.1 |
|
||||
|
||||
---
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| Processor | 處理器,負責特定類型媒體分析的 Python 腳本 |
|
||||
| Pipeline | 處理管線,定義 processor 的執行順序與依賴關係 |
|
||||
| PythonExecutor | 統一執行 Python 腳本的 Rust 封裝層 |
|
||||
| real-time factor | 即時倍率,處理時間與影片時長的比值 |
|
||||
| resource estimation | 資源預估,包含 CPU/記憶體/GPU 的使用量 |
|
||||
| Job | 處理任務,包含多個 processor 的執行與狀態管理 |
|
||||
|
||||
## 總覽
|
||||
|
||||
| Processor | 狀態 | 模型 | 依賴 | GPU | CPU | 記憶體 | 文件 |
|
||||
|-----------|------|------|------|-----|-----|--------|------|
|
||||
| ASR | ✅ 100% | faster-whisper (small) | 無 | 否 | 1.0 | 2048 MB | [詳細](./PROCESSORS/ASR_V1.0.0.md) |
|
||||
| CUT | ✅ 100% | PySceneDetect | 無 | 否 | 0.5 | 512 MB | [詳細](./PROCESSORS/CUT_V1.0.0.md) |
|
||||
| YOLO | ✅ 100% | YOLOv8n | 無 | 是 | 0.3 | 1024 MB | [詳細](./PROCESSORS/YOLO_V1.0.0.md) |
|
||||
| OCR | ✅ 100% | PaddleOCR PP-OCRv4 | 無 | 否 | 0.8 | 1024 MB | [詳細](./PROCESSORS/OCR_V1.0.0.md) |
|
||||
| Face | ✅ 100% | InsightFace buffalo_l | 無 | 是 | 0.6 | 1536 MB | [詳細](./PROCESSORS/FACE_V1.0.0.md) |
|
||||
| Pose | ✅ 100% | MediaPipe Pose | 無 | 是 | 0.4 | 1024 MB | [詳細](./PROCESSORS/POSE_V1.0.0.md) |
|
||||
| ASRX | ⚠️ 80% | SpeechBrain ECAPA-TDNN | ASR | 否 | 0.8 | 2048 MB | [詳細](./PROCESSORS/ASRX_V1.0.0.md) |
|
||||
| Scene | ✅ 100% | MIT Places365 | CUT | 否 | 0.3 | 512 MB | [詳細](./PROCESSORS/SCENE_V1.0.0.md) |
|
||||
| VisualChunk | ✅ 整合 | 規則聚合(無模型) | YOLO | 否 | 0.3 | 512 MB | [詳細](./PROCESSORS/VISUAL_CHUNK_V1.0.0.md) |
|
||||
| Caption | ✅ 100% (本地化) | Moondream2 | Scene | 否 | - | - | [詳細](./PROCESSORS/CAPTION_V1.0.0.md) |
|
||||
| Story | ✅ 100% (本地化) | 模板聚合 | Scene, Caption | 否 | - | - | [詳細](./PROCESSORS/STORY_V1.0.0.md) |
|
||||
|
||||
---
|
||||
|
||||
## Processor 依賴關係圖 (V4.1)
|
||||
|
||||
```
|
||||
CUT ───→ Scene
|
||||
│
|
||||
ASR ───→ ASRX
|
||||
│
|
||||
YOLO ─→ VisualChunk
|
||||
```
|
||||
|
||||
> **註(V4.1)**:CUT 和 Scene 在 register 階段同步執行,Worker pipeline 中 Scene 依賴僅 CUT(已移除 ASR)。長影片(scene ≤ 3, max > 600s)時 Face 動態移到 ASR 前。
|
||||
|
||||
## 檔案狀態後綴
|
||||
|
||||
所有 processor 輸出檔案使用統一的後綴規則:
|
||||
|
||||
| 後綴 | 意義 | 行為 |
|
||||
|------|------|------|
|
||||
| `.json` | 完成 | 直接載入使用 |
|
||||
| `.json.tmp` | 執行中 | 跳過、等待 |
|
||||
| `.json.err` | 失敗 | 跳過、不重試 |
|
||||
|
||||
此規則由 `PythonExecutor` 統一處理(`executor.rs:150-279`)。
|
||||
|
||||
## Job 完成條件(V4.1)
|
||||
|
||||
| 條件 | 結果 |
|
||||
|------|------|
|
||||
| 所有 processor 完成 | ✅ Job completed |
|
||||
| 必要 processor (cut/asr/yolo) 完成,其餘失敗 | ✅ Job completed(非必要失敗不卡住) |
|
||||
| 必要 processor 任一失敗 | ❌ Job failed |
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本,含選型實驗報告與資源預估 | OpenCode | deepseek-chat |
|
||||
| V1.1 | 2026-05-03 | CUT 新增 cut_count/cut_max_duration;Scene 移除 ASR 依賴;長影片 Face 動態調度;Job 完成條件放寬 | OpenCode | deepseek-chat |
|
||||
129
docs_v1.0/API_V1.0.0/INTERNAL/VECTOR_SPEC_V1.0.0.md
Normal file
129
docs_v1.0/API_V1.0.0/INTERNAL/VECTOR_SPEC_V1.0.0.md
Normal file
@@ -0,0 +1,129 @@
|
||||
---
|
||||
document_type: "spec"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "向量化規範 V1.0.0"
|
||||
date: "2026-05-02"
|
||||
version: "V1.0"
|
||||
status: "active"
|
||||
owner: "Warren"
|
||||
created_by: "OpenCode"
|
||||
tags:
|
||||
- "momentry"
|
||||
- "core"
|
||||
- "vector-embedding"
|
||||
- "qdrant"
|
||||
- "v1.0.0"
|
||||
- "face-embedding"
|
||||
- "voice-embedding"
|
||||
- "text-embedding"
|
||||
ai_query_hints:
|
||||
- "向量化規範的向量類型與維度說明"
|
||||
- "Face/Voice/Text 三種 embedding 的處理流程"
|
||||
- "Qdrant collection 的名稱與 payload 結構"
|
||||
- "Face embedding 的 512-D 向量規格(InsightFace ArcFace)"
|
||||
- "Voice embedding 的 192-D 向量規格(ECAPA-TDNN)"
|
||||
- "Text embedding 的 768-D 向量規格(nomic-embed-text-v2-moe)"
|
||||
- "Qdrant Payload 中 face 與 voice 的欄位定義"
|
||||
- "向量化流程中 child chunk 與 parent chunk 的 collection 區別"
|
||||
related_documents:
|
||||
- "PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md"
|
||||
- "PROCESSORS/VOICE_EMBEDDING_FLOW_V1.0.0.md"
|
||||
- "CHUNK_DEFINITION_V1.0.0.md"
|
||||
- "PROCESSORS/FACE_V1.0.0.md"
|
||||
- "PROCESSORS/ASRX_V1.0.0.md"
|
||||
---
|
||||
|
||||
# 向量化規範 V1.0.0
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 建立者 | OpenCode |
|
||||
| 建立時間 | 2026-05-02 |
|
||||
| 文件版本 | V1.0 |
|
||||
|
||||
## 關鍵術語定義
|
||||
|
||||
| 術語 | 定義 |
|
||||
|------|------|
|
||||
| embedding | 向量嵌入,將非結構化資料轉換為數值向量 |
|
||||
| Qdrant | 向量資料庫,用於儲存與檢索 embedding |
|
||||
| collection | Qdrant 中的向量集合,類似資料庫中的資料表 |
|
||||
| 768-D | Text embedding 的維度,由 nomic-embed-text-v2-moe 產出 |
|
||||
| 512-D | Face embedding 的維度,由 InsightFace ArcFace 產出 |
|
||||
| 192-D | Voice embedding 的維度,由 SpeechBrain ECAPA-TDNN 產出 |
|
||||
|
||||
## 向量類型
|
||||
|
||||
| 類型 | 來源 | 維度 | Collection | 用途 |
|
||||
|------|------|------|------------|------|
|
||||
| Text (child) | sentence chunk | 768-D | `momentry_dev_rule1` | 語意搜尋 |
|
||||
| Text (parent) | scene chunk summary | 768-D | `momentry_dev_chunk_summaries` | 場景語意搜尋 |
|
||||
| **Face** | Face processor (InsightFace) | **512-D** | `momentry_dev_face` | 人臉比對 |
|
||||
| **Voice** | ASRX processor (ECAPA-TDNN) | **192-D** | `momentry_dev_voice` | 說話人比對 |
|
||||
|
||||
## 向量化流程
|
||||
|
||||
### Text Embedding
|
||||
|
||||
```
|
||||
chunk (sentence / scene)
|
||||
→ text_content / summary_text
|
||||
→ nomic-embed-text-v2-moe (Ollama)
|
||||
→ 768-D vector
|
||||
→ Qdrant momentry_dev_rule1 / momentry_dev_chunk_summaries
|
||||
```
|
||||
|
||||
### Face Embedding
|
||||
|
||||
```
|
||||
Face processor (InsightFace buffalo_l)
|
||||
→ face_detections.embedding (512-D)
|
||||
→ Qdrant momentry_dev_face
|
||||
→ 用於 1:N 人臉比對
|
||||
```
|
||||
|
||||
### Voice Embedding
|
||||
|
||||
```
|
||||
ASRX processor (ECAPA-TDNN)
|
||||
→ speaker embedding (192-D)
|
||||
→ Qdrant momentry_dev_voice
|
||||
→ 用於跨影片說話人辨識
|
||||
```
|
||||
|
||||
## Qdrant Payload 結構
|
||||
|
||||
### Face Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"file_uuid": "384b0ff44aaaa1f14cb2cd63b3fea966",
|
||||
"face_id": "face_42",
|
||||
"frame": 1260,
|
||||
"timestamp": 42.0,
|
||||
"x": 328,
|
||||
"y": 88,
|
||||
"width": 63,
|
||||
"height": 75,
|
||||
"confidence": 0.83
|
||||
}
|
||||
```
|
||||
|
||||
### Voice Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"file_uuid": "384b0ff44aaaa1f14cb2cd63b3fea966",
|
||||
"speaker_id": "SPEAKER_0",
|
||||
"start_frame": 9,
|
||||
"end_frame": 441,
|
||||
"start_time": 0.3,
|
||||
"end_time": 14.7
|
||||
}
|
||||
```
|
||||
|
||||
## 版本歷史
|
||||
|
||||
| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
|
||||
|------|------|------|--------|-----------|
|
||||
| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat |
|
||||
Reference in New Issue
Block a user