Files

Warren 8f05a7c188 feat: update Python processors and add utility scripts

- Update ASR, face, OCR, pose processors
- Add release pre-flight check script
- Add synonym generation, chunk processing scripts
- Add face recognition, stamp search utilities

2026-04-30 15:07:49 +08:00

1.7 KiB

Raw Permalink Blame History

嘴部動作檢測方案說明

問題

MediaPipe 0.10.33 已移除舊版 solutions API，只支援新版 tasks API，需要：

下載 face_landmarker.task 模型文件（~100MB）
使用複雜的 Vision API
處理异步回调

替代方案

方案 1: Face + ASR 推斷（推薦⭐）

原理：

如果 Face 檢測到人臉 + ASR 檢測到語音 = 正在說話

優點：

✅ 不需要額外模型
✅ 快速（已整合）
✅ 準確度可接受

缺點：

⚠️ 無法檢測嘴部開合度
⚠️ 無法區分多人誰在說話

實施：

# 使用現有的 integrate_face_asrx.py
python3 scripts/integrate_face_asrx.py \
  face.json asr.json output.json

方案 2: MediaPipe Tasks API

需要：

下載模型：face_landmarker.task
使用新版 API

優點：

✅ 468 個人臉關鍵點
✅ 精確嘴部檢測

缺點：

❌ 需要下載 100MB 模型
❌ 處理慢
❌ API 複雜

方案 3: Dlib 68 點人脸關鍵點

需要：

安裝 dlib
下載 shape_predictor_68_face_landmarks.dat

優點：

✅ 68 個人臉關鍵點
✅ 包含嘴部輪廓（20 點）

缺點：

❌ 安裝複雜（需要編譯）
❌ 較慢

建議

目前使用方案 1（Face + ASR 推斷）

未來如果需要精確嘴部檢測：

安裝 Dlib
或使用 MediaPipe Tasks API

當前可用數據

/tmp/face_long.json - Face 檢測（10,691 幀）
/tmp/asr_small_long.json - ASR 轉錄（2,025 段）
/tmp/pose_long.json - Pose（空數據，無關鍵點）

整合驗證：

python3 scripts/integrate_face_asrx.py \
  /tmp/face_long.json \
  /tmp/asr_small_long.json \
  /tmp/integrated_long.json

1.7 KiB Raw Permalink Blame History Unescape Escape

嘴部動作檢測方案說明

問題

替代方案

方案 1: Face + ASR 推斷（推薦⭐）

方案 2: MediaPipe Tasks API

方案 3: Dlib 68 點人脸關鍵點

建議

當前可用數據

1.7 KiB

Raw Permalink Blame History