Files
momentry_core/docs/CHUNK_SPEC.md
accusys cb1fcd4846 docs: Add video chunk specification
- Define three chunk types: Sentence, Cut, TimeBased
- Support overlapping chunks
- Frame-accurate timestamps
- Include content and metadata structures
- Add PostgreSQL schema
- Document Rust data structures and splitter implementation
2026-03-25 14:53:41 +08:00

587 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Video Chunk 切分規範
本文檔定義 Momentry Core 系統中影片 chunks 的切分原則與資料結構。
---
## 1. Chunk 概述
### 1.1 設計原則
1. **允許重疊**: 不同類型的 chunk 可以重疊(如語句 chunk 與時間 chunk
2. **Frame 精確度**: 時間坐標精確到影片 frame
3. **多元分類**: 支援語句、場景、時間三種分割方式
### 1.2 Chunk 類型
| 類型 | 說明 | 是否可重疊 |
|------|------|------------|
| **Sentence** | 語句分割 | ✅ 可與其他類型重疊 |
| **Cut** | 場景切割 | ✅ 可與其他類型重疊 |
| **TimeBased** | 時間長度切割 | ✅ 可與其他類型重疊 |
---
## 2. 時間坐標系統
### 2.1 時間格式
所有時間使用 **秒** 為單位,精確到 **微秒** (浮點數)
```json
{
"start_time": 10.5,
"end_time": 15.75
}
```
### 2.2 Frame 計算
```
frame_number = floor(time_in_seconds * fps)
time_at_frame = frame_number / fps
```
**範例**:
- 影片 FPS: 24/1 (24 fps)
- 時間: 10.5 秒
- Frame: floor(10.5 * 24) = 252
- 校驗: 252 / 24 = 10.5 秒 ✅
### 2.3 Frame 資訊結構
```json
{
"start_time": 10.5,
"start_frame": 252,
"end_time": 15.75,
"end_frame": 378,
"fps": "24/1",
"fps_value": 24.0
}
```
---
## 3. 三種切分方式
### 3.1 Sentence (語句分割)
**原則**:
- 根據 ASR 語音識別結果
- 每個識別的語句為一個 chunk
- 文字內容來自 ASR 輸出
**範例**:
```
ASR 輸出:
[
{"start": 10.0, "end": 15.0, "text": "Hello world"},
{"start": 15.0, "end": 20.0, "text": "This is a test"},
{"start": 20.0, "end": 25.5, "text": "Processing video"}
]
轉換為 Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 10.0s - 15.0s "Hello world" │
├────────────────────────────────────────┤
│ chunk_0002: 15.0s - 20.0s "This is a test" │
├────────────────────────────────────────┤
│ chunk_0003: 20.0s - 25.5s "Processing video" │
└────────────────────────────────────────┘
```
### 3.2 Cut (場景切割)
**原則**:
- 根據影片鏡頭變化 (scene change / cut detection)
- 使用 ffmpeg 或 Python (scenedetect) 偵測
- 每個場景為一個 chunk
**偵測方法**:
```bash
# 使用 ffmpeg 偵測場景變化
ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null -
```
**範例**:
```
場景偵測結果:
[
{"start": 0.0, "end": 45.2, "scene_id": 1},
{"start": 45.2, "end": 120.5, "scene_id": 2},
{"start": 120.5, "end": 180.0, "scene_id": 3}
]
轉換為 Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 45.2s (Scene 1) │
├────────────────────────────────────────┤
│ chunk_0002: 45.2s - 120.5s (Scene 2) │
├────────────────────────────────────────┤
│ chunk_0003: 120.5s - 180.0s (Scene 3) │
└────────────────────────────────────────┘
```
### 3.3 TimeBased (時間長度切割)
**原則**:
- 固定時間長度切割
- 預設 **10 秒** 為一個 chunk
- 最後一個 chunk 可能不足 10 秒
- **支援重疊** (可設定 overlap 秒數)
**參數配置**:
| 參數 | 預設值 | 說明 |
|------|--------|------|
| duration | 10.0 | 每個 chunk 時長 (秒) |
| overlap | 0.0 | 重疊時長 (秒) |
**範例** (無重疊):
```
影片時長: 35 秒, duration=10
Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 10.0s │
├────────────────────────────────────────┤
│ chunk_0002: 10.0s - 20.0s │
├────────────────────────────────────────┤
│ chunk_0003: 20.0s - 30.0s │
├────────────────────────────────────────┤
│ chunk_0004: 30.0s - 35.0s (不足10秒) │
└────────────────────────────────────────┘
```
**範例** (有重疊, overlap=2):
```
影片時長: 35 秒, duration=10, overlap=2
Chunks:
┌────────────────────────────────────────┐
│ chunk_0001: 0.0s - 10.0s │
├────────────────────────────────────────┤
│ chunk_0002: 8.0s - 18.0s (重疊 2秒) │
├────────────────────────────────────────┤
│ chunk_0003: 16.0s - 26.0s (重疊 2秒) │
├────────────────────────────────────────┤
│ chunk_0004: 24.0s - 34.0s (重疊 2秒) │
├────────────────────────────────────────┤
│ chunk_0005: 32.0s - 35.0s (重疊+不足) │
└────────────────────────────────────────┘
```
---
## 4. Chunk 資料結構
### 4.1 基本結構
```json
{
"uuid": "1636719dc31f78ac",
"chunk_id": "sentence_0001",
"chunk_index": 1,
"chunk_type": "sentence",
"start_time": 10.5,
"start_frame": 252,
"end_time": 15.75,
"end_frame": 378,
"fps": "24/1",
"fps_value": 24.0,
"content": {
"text": "Hello world, this is a test"
},
"metadata": {
"source": "asr",
"confidence": 0.95,
"language": "en"
}
}
```
### 4.2 欄位說明
| 欄位 | 類型 | 必填 | 說明 |
|------|------|------|------|
| `uuid` | String | ✅ | 影片 UUID (16 字元) |
| `chunk_id` | String | ✅ | Chunk 唯一 ID |
| `chunk_index` | Integer | ✅ | Chunk 索引 (從 0 開始) |
| `chunk_type` | String | ✅ | 類型: sentence/cut/time_based |
| `start_time` | Float | ✅ | 開始時間 (秒) |
| `start_frame` | Integer | ✅ | 開始 frame 編號 |
| `end_time` | Float | ✅ | 結束時間 (秒) |
| `end_frame` | Integer | ✅ | 結束 frame 編號 |
| `fps` | String | ✅ | FPS 表示 (如 "24/1") |
| `fps_value` | Float | ✅ | FPS 數值 (如 24.0) |
| `content` | Object | ✅ | 內容 (見下文) |
| `metadata` | Object | ❌ | 額外資訊 (見下文) |
### 4.3 Content 結構
根據 `chunk_type` 不同content 結構也不同:
#### Sentence Content
```json
{
"content": {
"text": "Hello world, this is a test message",
"text_normalized": "hello world this is a test message",
"word_count": 7,
"char_count": 34
}
}
```
| 欄位 | 類型 | 說明 |
|------|------|------|
| `text` | String | 原始識別文字 |
| `text_normalized` | String | 正規化文字 (小寫,去除標點) |
| `word_count` | Integer | 字詞數量 |
| `char_count` | Integer | 字元數量 |
#### Cut Content
```json
{
"content": {
"scene_id": 2,
"scene_number": 2,
"transition_type": "cut",
"scene_change_score": 0.95
}
}
```
| 欄位 | 類型 | 說明 |
|------|------|------|
| `scene_id` | Integer | 場景 ID |
| `scene_number` | Integer | 場景編號 |
| `transition_type` | String | 轉場類型: cut/dissolve/fade |
| `scene_change_score` | Float | 場景變化分數 (0-1) |
#### TimeBased Content
```json
{
"content": {
"duration": 10.0,
"is_last": false,
"segment_number": 3,
"total_segments": 10
}
}
```
| 欄位 | 類型 | 說明 |
|------|------|------|
| `duration` | Float | 時長 (秒) |
| `is_last` | Boolean | 是否最後一個 chunk |
| `segment_number` | Integer | 分段編號 |
| `total_segments` | Integer | 總分段數 |
### 4.4 Metadata 結構
```json
{
"metadata": {
"source": "asr",
"confidence": 0.95,
"language": "en",
"model": "tiny",
"created_at": "2026-03-16T10:00:00Z"
}
}
```
| 欄位 | 類型 | 說明 |
|------|------|------|
| `source` | String | 來源: asr/scene_detect/time_based |
| `confidence` | Float | 信心度 (0-1) |
| `language` | String | 語言代碼 |
| `model` | String | 使用模型 |
| `created_at` | String | 創建時間 (ISO 8601) |
---
## 5. Chunk ID 命名規範
### 5.1 格式
```
{chunk_type}_{chunk_index:04}
```
| 類型 | 前綴 | 範例 |
|------|------|------|
| Sentence | `sentence_` | `sentence_0001` |
| Cut | `cut_` | `cut_0001` |
| TimeBased | `time_based_` | `time_based_0001` |
### 5.2 編號規則
-**0** 開始
- 使用 **4 位數** 補零
- 按時間順序遞增
---
## 6. 資料庫 Schema
### 6.1 PostgreSQL Table
```sql
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
uuid VARCHAR(16) NOT NULL,
chunk_id VARCHAR(64) NOT NULL,
chunk_index INTEGER NOT NULL,
chunk_type VARCHAR(32) NOT NULL,
start_time DOUBLE PRECISION NOT NULL,
start_frame BIGINT NOT NULL,
end_time DOUBLE PRECISION NOT NULL,
end_frame BIGINT NOT NULL,
fps VARCHAR(16) NOT NULL,
fps_value DOUBLE PRECISION NOT NULL,
content JSONB NOT NULL,
metadata JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
UNIQUE(uuid, chunk_id)
);
-- 索引
CREATE INDEX idx_chunks_uuid ON chunks(uuid);
CREATE INDEX idx_chunks_type ON chunks(chunk_type);
CREATE INDEX idx_chunks_time ON chunks(start_time, end_time);
CREATE INDEX idx_chunks_uuid_type ON chunks(uuid, chunk_type);
```
### 6.2 查詢範例
```sql
-- 查詢影片所有 chunks
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac';
-- 查詢特定類型的 chunks
SELECT * FROM chunks WHERE uuid = '1636719dc31f78ac' AND chunk_type = 'sentence';
-- 查詢時間範圍內的 chunks
SELECT * FROM chunks
WHERE uuid = '1636719dc31f78ac'
AND start_time <= 30.0 AND end_time >= 20.0;
-- 查詢時間範圍內的所有 chunks (混合類型)
SELECT * FROM chunks
WHERE uuid = '1636719dc31f78ac'
AND start_time <= 30.0 AND end_time >= 20.0
ORDER BY chunk_type, chunk_index;
```
---
## 7. Rust 資料結構
### 7.1 Chunk 定義
```rust
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum ChunkType {
Sentence,
Cut,
TimeBased,
}
impl ChunkType {
pub fn as_str(&self) -> &'static str {
match self {
ChunkType::Sentence => "sentence",
ChunkType::Cut => "cut",
ChunkType::TimeBased => "time_based",
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Chunk {
pub uuid: String,
pub chunk_id: String,
pub chunk_index: u32,
pub chunk_type: ChunkType,
pub start_time: f64,
pub start_frame: i64,
pub end_time: f64,
pub end_frame: i64,
pub fps: String,
pub fps_value: f64,
pub content: serde_json::Value,
pub metadata: Option<serde_json::Value>,
}
```
### 7.2 建立 Chunk
```rust
impl Chunk {
pub fn new(
uuid: String,
chunk_index: u32,
chunk_type: ChunkType,
start_time: f64,
end_time: f64,
fps: &str,
content: serde_json::Value,
) -> Self {
let fps_value = parse_fps(fps);
let start_frame = (start_time * fps_value) as i64;
let end_frame = (end_time * fps_value) as i64;
let chunk_id = format!("{}_{:04}", chunk_type.as_str(), chunk_index);
Self {
uuid,
chunk_id,
chunk_index,
chunk_type,
start_time,
start_frame,
end_time,
end_frame,
fps: fps.to_string(),
fps_value,
content,
metadata: None,
}
}
}
```
---
## 8. 時間切割器實作
### 8.1 TimeBasedSplitter
```rust
pub struct TimeBasedSplitter {
pub duration: f64, // 每個 chunk 時長 (秒)
pub overlap: f64, // 重疊時長 (秒)
}
impl TimeBasedSplitter {
pub fn new(duration: f64, overlap: f64) -> Self {
Self { duration, overlap }
}
pub fn split(&self, uuid: &str, video_duration: f64, fps: f64) -> Vec<Chunk> {
let mut chunks = Vec::new();
let step = self.duration - self.overlap;
let mut current_time = 0.0;
let mut index = 0;
while current_time < video_duration {
let end_time = (current_time + self.duration).min(video_duration);
let chunk = Chunk::new(
uuid.to_string(),
index,
ChunkType::TimeBased,
current_time,
end_time,
&format!("{:.0}/1", fps as u32),
serde_json::json!({
"duration": end_time - current_time,
"is_last": end_time >= video_duration,
"segment_number": index + 1,
}),
);
chunks.push(chunk);
current_time += step;
index += 1;
}
chunks
}
}
```
### 8.2 使用範例
```rust
// 建立時間切割器 (10秒, 無重疊)
let splitter = TimeBasedSplitter::new(10.0, 0.0);
let chunks = splitter.split(&uuid, video_duration, 24.0);
// 建立時間切割器 (10秒, 2秒重疊)
let splitter = TimeBasedSplitter::new(10.0, 2.0);
let chunks = splitter.split(&uuid, video_duration, 24.0);
```
---
## 9. 處理流程
### 9.1 完整流程
```
1. Register (註冊影片)
└── 取得 UUID, video_duration, fps
2. Probe (探測影片)
└── 取得 streams, format, fps
3. 產生 Sentence Chunks
└── 讀取 ASR 輸出
└── 為每個 segment 建立 chunk
4. 產生 Cut Chunks
└── 執行場景偵測
└── 為每個 scene 建立 chunk
5. 產生 TimeBased Chunks
└── 使用 TimeBasedSplitter
└── 為每個時間段建立 chunk
6. 儲存至資料庫
└── 批次寫入 PostgreSQL
```
### 9.2 輸出範例
```
影片: 35 秒, FPS: 24
Sentence Chunks (3 個):
sentence_0000: 0.0s - 10.0s (252 frames)
sentence_0001: 10.0s - 20.0s (480 frames)
sentence_0002: 20.0s - 35.0s (840 frames)
Cut Chunks (3 個):
cut_0000: 0.0s - 15.0s (360 frames)
cut_0001: 15.0s - 28.0s (672 frames)
cut_0002: 28.0s - 35.0s (168 frames)
TimeBased Chunks (4 個, 重疊 2秒):
time_based_0000: 0.0s - 10.0s (240 frames)
time_based_0001: 8.0s - 18.0s (240 frames)
time_based_0002: 16.0s - 26.0s (240 frames)
time_based_0003: 24.0s - 35.0s (264 frames)
```
---
## 10. 相關文件
- [JSON_OUTPUT_SPEC.md](./JSON_OUTPUT_SPEC.md) - JSON 輸出規範
- [RUST_DEVELOPMENT.md](./RUST_DEVELOPMENT.md) - Rust 開發規範
- [AGENTS.md](../AGENTS.md) - 開發規範