Add 5W1H+ quality evaluation report
- Gemma4 26B scored 5/10 overall - Template-heavy, lacks specific details and emotion - Suggested improvements: prompt tuning, temperature, model upgrade
This commit is contained in:
64
docs_v1.0/M5_workspace/2026-05-07_5w1h_quality_evaluation.md
Normal file
64
docs_v1.0/M5_workspace/2026-05-07_5w1h_quality_evaluation.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# 5W1H+ 品質評估報告
|
||||
|
||||
## 測試對象
|
||||
|
||||
| 項目 | 內容 |
|
||||
|------|------|
|
||||
| 模型 | Gemma4 26B MoE (Q5_K_M, 18GB) |
|
||||
| 載體 | llama-server (M5, Metal GPU, port 8082) |
|
||||
| Prompt | 遞迴式前情 + face/speaker/YOLO 資訊 |
|
||||
| 溫度 | 0.1 |
|
||||
| 測試影片 | Charade (1963), 24 scenes completed |
|
||||
|
||||
## 評估結果
|
||||
|
||||
### 優點
|
||||
|
||||
- ✅ 角色名稱正確(Audrey Hepburn, Cary Grant, Jacques Marin…)
|
||||
- ✅ 時間戳記準確(at the 214s-219s mark)
|
||||
- ✅ 事件順序合理(following a series of…)
|
||||
- ✅ 5W1H 架構完整(who/what/where/when/why/how)
|
||||
|
||||
### 缺點
|
||||
|
||||
- ❌ **模板化嚴重** — 大量重複片語:
|
||||
- "takes place within the established dramatic setting"
|
||||
- "navigating emotional conflicts"
|
||||
- "punctuate the heavy atmosphere"
|
||||
- ❌ **缺乏具體細節** — 不引用原文,只用模糊描述
|
||||
- ❌ **情感平淡** — 像警察報告,沒有抓到 Charade 懸疑喜劇的調性
|
||||
- ❌ **Why/How 薄弱** — 原因和方式都寫得很 vague
|
||||
|
||||
### 樣本對比
|
||||
|
||||
#### Gemma4 輸出(scene_32)
|
||||
|
||||
> Walter Matthau confronts an individual during a tense moment in the film. He questions whether the other person was aware that a weapon was loaded. This interaction takes place in a setting where a firearm is being handled. The dialogue occurs at the 214s-219s mark to emphasize the immediate danger. He delivers his words with a sense of stern disbelief.
|
||||
|
||||
#### 理想輸出(人工撰寫)
|
||||
|
||||
> Walter Matthau 發現槍枝已經上了膛,他以難以置信的語氣質問對方「你別跟我說你不知道它上了膛」。這發生在電影前段一個關鍵時刻——角色們正在互相試探,氣氛緊繃。這句臺詞揭露了某人對危險的輕忽,也暗示這把槍可能即將被使用。Matthau 的語氣混合了震驚與責備,為後續的衝突鋪路。
|
||||
|
||||
### 評分
|
||||
|
||||
| 面向 | Gemma4 26B | 目標 |
|
||||
|------|-----------|------|
|
||||
| 正確性 | 8/10 | 10 |
|
||||
| 具體細節 | 3/10 | 8 |
|
||||
| 劇情意義 | 4/10 | 8 |
|
||||
| 情感 | 3/10 | 7 |
|
||||
| 5W1H 結構 | 7/10 | 8 |
|
||||
| **綜合** | **5/10** | **8** |
|
||||
|
||||
## 改善方向
|
||||
|
||||
| 方案 | 預期效果 | 成本 |
|
||||
|------|---------|------|
|
||||
| Prompt 加「避免模板化,引用原文」 | 細節 ↑ 30% | 免費(改 prompt) |
|
||||
| Temperature 0.1 → 0.05 | 減少幻覺 | 免費 |
|
||||
| 增加劇情上下文(更多前情) | 連貫性 ↑ | 時間略增 |
|
||||
| 換 Qwen3 30B MoE | 品質 ↑,速度 ↓ | 需重啟 server |
|
||||
|
||||
## 結論
|
||||
|
||||
Gemma4 26B 的 5W1H+ 對搜尋來說**堪用**(角色/事件正確),但對摘要閱讀來說**不夠好**(模板化、無細節)。建議先調 prompt 改善,若還是不足再考慮換模型。
|
||||
Reference in New Issue
Block a user