docs: complete M5 Gemma4 deployment record V1.1 — full build, dylib fix, codesign, reasoning off, OpenCode config

2026-05-06 17:54:16 +08:00
parent f65ac89e6a
commit 0b42365ecd
1 changed files with 263 additions and 106 deletions
--- a/docs_v1.0/API_V1.0.0/DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md
+++ b/docs_v1.0/API_V1.0.0/DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md
@@ -1,159 +1,316 @@
 ---
-document_type: "deployment_plan"
+document_type: "deployment_record"
 service: "MOMENTRY_CORE"
-title: "Gemma 4 LLM 部署計劃 — M5 Max MacBook Pro"
+title: "Gemma 4 31B — M5 Max 部署記錄"
 date: "2026-05-06"
-version: "V1.0"
-status: "draft"
+version: "V1.1"
+status: "active"
 owner: "Warren"
 created_by: "OpenCode"
 ---

-# Gemma 4 LLM 部署計劃 — M5 Max
+# Gemma 4 31B — M5 Max 部署記錄

 ## 1. 環境

-| 項目 | 規格 |
-|------|------|
-| 機型 | MacBook Pro M5 Max |
-| 統一記憶體 | 48 GB |
-| 架構 | arm64 (Apple Silicon) |
-| SSH | `accusys@10.10.10.10` |
-| 外網 | ❌ 無（需透過本機 scp） |
-| 本機 | M4 Mac，有外網，已有 llama.cpp |
+| 項目 | M4（開發機） | M5 Max（LLM 伺服器） |
+|------|------------|-------------------|
+| 機型 | MacBook Pro M4 | MacBook Pro M5 Max |
+| 記憶體 | 16 GB | **48 GB** |
+| 架構 | arm64 | arm64 |
+| OS | macOS 26.x | macOS 26.4.1 |
+| IP（初始） | — | 10.10.10.10 |
+| IP（最終） | — | **192.168.110.201** |
+| 外網 | 有 | 先無 → 後有（接上同網段 192.168.110.x） |
+| Homebrew | 有 | 無（用戶非 admin，無法 sudo brew） |
+| Xcode CLT | 有 | 無（install_name_tool、codesign 不可用） |
+| Rust | 有 | rustup 已安裝 (1.95.0) |
+| 專案目錄 | `/Users/accusys/momentry_core_0.1/` | `~/momentry_core_0.1/`（已 clone） |

-## 2. 模型選擇
+## 2. 模型規格

-| 版本 | 參數 | Q5_K_M 大小 | 預估速度 | 備註 |
-|------|------|------------|---------|------|
-| **Gemma 4 31B-it** | 33B | ~20 GB | 15-25 tok/s | 多模態，可處理圖像 |
-| Gemma 4 26B-A4B-it | 27B MoE | ~15 GB | 25-40 tok/s | MoE，更快 |
-| Gemma 4 E4B-it | 8B | ~5 GB | 60+ tok/s | 最快，品質較低 |
+| 屬性 | 值 |
+|------|-----|
+| 模型 | **Gemma 4 31B-it**（Image-Text-to-Text） |
+| 參數量 | 33B (30,697,345,596) |
+| 量化 | Q5_K_M |
+| GGUF 大小 | **20.16 GB** (`21658399744 bytes`) |
+| Embedding dim | 5376 |
+| Vocabulary | 262144 |
+| Context | 4096 (訓練 262144) |
+| 來源 | `unsloth/gemma-4-31B-it-GGUF` |
+| HF 下載數 | 1,685,377 |
+| HF 許可 | Gated（需 `huggingface-cli login`） |
+| License | Gemma (Apache 2.0 derived) |

-**推薦**: Gemma 4 31B-it (Q5_K_M)。48GB 記憶體綽綽有餘。
+## 3. Binary 與依賴

-## 3. 部署步驟
+### 3.1 建置方式

-### Step 1: 本機下載模型
+llama.cpp 從 source build，不透過 Homebrew。原因：Homebrew binary 有**絕對路徑** dylib 參照，無法搬移至 M5。

 ```bash
-# 登入 HuggingFace（需 access token）
-huggingface-cli login
-
-# 下載 Gemma 4 31B GGUF
-huggingface-cli download bartowski/gemma-4-31B-it-GGUF \
-  gemma-4-31b-it-Q5_K_M.gguf \
-  --local-dir ~/llama.cpp/models/
+# M4 上執行
+cd /tmp
+git clone https://github.com/ggerganov/llama.cpp.git
+cd llama.cpp
+cmake -B build -DGGML_METAL=ON
+cmake --build build -j10 --target llama-server
 ```

-### Step 2: 準備 llama.cpp binary
+### 3.2 Binary 依賴
+
+llama-server binary 依賴以下 dylib（共 26 個檔案）：
+
+| 類別 | 檔案 | 來源 |
+|------|------|------|
+| 核心 GGML | `libggml.0.dylib`, `libggml.dylib` | `build/bin/` |
+| 核心 GGML | `libggml-base.0.dylib`, `libggml-base.dylib` | `build/bin/` |
+| Metal GPU | `libggml-metal.0.dylib`, `libggml-metal.dylib` | `build/bin/` |
+| CPU | `libggml-cpu.0.dylib`, `libggml-cpu.dylib` | `build/bin/` |
+| BLAS | `libggml-blas.0.dylib`, `libggml-blas.dylib` | `build/bin/` |
+| LLama | `libllama.0.dylib`, `libllama.dylib` | `build/bin/` |
+| LLamaCommon | `libllama-common.0.dylib`, `libllama-common.dylib` | `build/bin/` |
+| MTMD | `libmtmd.0.dylib`, `libmtmd.dylib` | `build/bin/` |
+| OpenSSL | `libssl.3.dylib`, `libcrypto.3.dylib` | `/opt/homebrew/opt/openssl@3/lib/` |
+
+### 3.3 @rpath 修復
+
+build 時期 embedded 的 @rpath 指向 `/tmp/llama.cpp/build/bin/`，需改為 `@executable_path/../lib`。
+
+在 **M4** 上執行（Xcode CLT 可用）：

 ```bash
-# llama.cpp 已安裝於本機 /opt/homebrew/bin/llama-server
-# 收集依賴的 dylib
-mkdir -p /tmp/llama_bundle/bin /tmp/llama_bundle/lib
-cp /opt/homebrew/bin/llama-server /tmp/llama_bundle/bin/
-cp /opt/homebrew/lib/libggml*.dylib /tmp/llama_bundle/lib/
-cp /opt/homebrew/lib/libllama*.dylib /tmp/llama_bundle/lib/
+cp build/bin/llama-server /tmp/llama_final
+chmod +w /tmp/llama_final
+
+# 修復 OpenSSL 絕對路徑
+install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib @rpath/libssl.3.dylib /tmp/llama_final
+install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib @rpath/libcrypto.3.dylib /tmp/llama_final
+
+# 修復 GGML 絕對路徑（Homebrew build 才需要，source build 不需要）
+install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml.0.dylib @rpath/libggml.0.dylib /tmp/llama_final
+install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml-base.0.dylib @rpath/libggml-base.0.dylib /tmp/llama_final
+
+# 修正 @rpath
+install_name_tool -delete_rpath /tmp/llama.cpp/build/bin /tmp/llama_final
+install_name_tool -add_rpath @executable_path/../lib /tmp/llama_final
+
+# 重新簽章（install_name_tool 會破壞 code signature）
+codesign --force --sign - /tmp/llama_final
 ```

-### Step 3: scp 到 M5 Max
+### 3.4 libssl.3.dylib 自身也需修復
+
+libssl.3.dylib 內部也參照了 `/opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib`：

 ```bash
-# 傳送 binary
-scp -r /tmp/llama_bundle/* accusys@10.10.10.10:~/bin/
-
-# 傳送模型
-scp ~/llama.cpp/models/gemma-4-31b-it-Q5_K_M.gguf \
-  accusys@10.10.10.10:~/models/
-
-# 傳送模型（若檔案太大可分批或用 rsync）
-rsync -avz --progress ~/llama.cpp/models/ accusys@10.10.10.10:~/models/
+cp /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib /tmp/libssl_fixed.dylib
+cp /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib /tmp/libcrypto_fixed.dylib
+chmod +w /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib
+install_name_tool -change /opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib @loader_path/libcrypto.3.dylib /tmp/libssl_fixed.dylib
+codesign --force --sign - /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib
 ```

-### Step 4: M5 Max 上啟動
+### 3.5 全部傳送至 M5

 ```bash
-ssh accusys@10.10.10.10
+# 模型（20GB）
+scp ~/llama.cpp/models/gemma-4-31B-it-Q5_K_M.gguf \
+  accusys@192.168.110.201:~/models/

-# 設定 library path
-export DYLD_LIBRARY_PATH=~/bin/lib:$DYLD_LIBRARY_PATH
-
-# 啟動 llama-server
-~/bin/bin/llama-server \
-  -m ~/models/gemma-4-31b-it-Q5_K_M.gguf \
-  --host 0.0.0.0 \
-  --port 8081 \
-  --n-gpu-layers 999 \
-  --ctx-size 8192 \
-  --threads 10 \
-  --parallel 2 \
-  --mlock
+# binary + 全部 dylib
+ssh accusys@192.168.110.201 'rm -rf ~/llama && mkdir -p ~/llama/bin ~/llama/lib'
+scp /tmp/llama_final accusys@192.168.110.201:~/llama/bin/llama-server
+scp /tmp/llama.cpp/build/bin/*.dylib accusys@192.168.110.201:~/llama/lib/
+scp /tmp/libssl_fixed.dylib accusys@192.168.110.201:~/llama/lib/libssl.3.dylib
+scp /tmp/libcrypto_fixed.dylib accusys@192.168.110.201:~/llama/lib/libcrypto.3.dylib
 ```

-## 4. 記憶體分配
+## 4. 啟動與驗證

-```
-48 GB total
-  ├─ 20 GB  Gemma 4 31B Q5_K_M
-  ├─  4 GB  PostgreSQL
-  ├─  1 GB  Redis
-  ├─  1 GB  MongoDB + Qdrant
-  ├─  2 GB  swift_face / face_processor (burst)
-  ├─  3 GB  llama-server overhead
-  └─ 17 GB  剩餘 (OS + buffer)
-```
-
-## 5. Momentry 整合
-
-更新 `.env` 或 config：
+### 4.1 一次性手動啟動

 ```bash
-MOMENTRY_LLM_ENDPOINT=http://10.10.10.10:8081/v1
-MOMENTRY_LLM_MODEL=gemma-4-31b-it
+ssh accusys@192.168.110.201
+export DYLD_LIBRARY_PATH=$HOME/llama/lib
+codesign --force --sign - ~/llama/bin/llama-server
+codesign --force --sign - ~/llama/lib/*.dylib
+nohup ~/llama/bin/llama-server \
+  -m ~/models/gemma-4-31B-it-Q5_K_M.gguf \
+  --host 0.0.0.0 --port 8081 \
+  --n-gpu-layers 999 --ctx-size 4096 \
+  --threads 10 --mlock \
+  --reasoning off \
+  > ~/llama.log 2>&1 &
 ```

-Agent 端點改用 LLM：
- `POST /api/v1/agents/translate` → llama.cpp server
- `POST /api/v1/agents/identity/suggest` → llama.cpp server
- `POST /api/v1/agents/5w1h/analyze` → llama.cpp server
- `POST /api/v1/agents/suggest/merge` → llama.cpp server
+### 4.2 啟動腳本

-## 6. 測試驗證
+`~/start_llm.sh`（已建立）：

 ```bash
-# Health check
-curl http://10.10.10.10:8081/health
+#!/bin/bash
+export DYLD_LIBRARY_PATH=$HOME/llama/lib
+pkill -9 -f llama-server 2>/dev/null
+sleep 1
+nohup $HOME/llama/bin/llama-server \
+  -m $HOME/models/gemma-4-31B-it-Q5_K_M.gguf \
+  --host 0.0.0.0 --port 8081 \
+  --n-gpu-layers 999 --ctx-size 4096 \
+  --threads 10 --mlock \
+  --reasoning off \
+  > $HOME/llama.log 2>&1 &
+echo "llama-server PID: $!"
+```

-# Inference test
-curl http://10.10.10.10:8081/v1/chat/completions \
+### 4.3 參數說明
+
+| 參數 | 值 | 說明 |
+|------|-----|------|
+| `-m` | `~/models/gemma-4-31B-it-Q5_K_M.gguf` | 模型路徑 |
+| `--host` | `0.0.0.0` | 綁定所有網路介面 |
+| `--port` | `8081` | HTTP API port |
+| `--n-gpu-layers` | `999` | 所有層進 GPU (Metal) |
+| `--ctx-size` | `4096` | 上下文長度 |
+| `--threads` | `10` | M5 Max P-core 數量 |
+| `--mlock` | — | 鎖住記憶體以防 swap |
+| `--reasoning` | `off` | 關閉 thinking，否則 content 進 `reasoning_content` |
+| `DYLD_LIBRARY_PATH` | `~/llama/lib` | dylib 搜尋路徑 |
+
+### 4.4 啟動過程中遇到的問題
+
+| # | 問題 | 原因 | 解決 |
+|---|------|------|------|
+| 1 | `Library not loaded: libmtmd.0.dylib` | 未拷貝 Metal 相關 dylib | 從 build 拷貝全部 26 個 dylib |
+| 2 | `Library not loaded: /opt/homebrew/.../libssl.3.dylib` | binary 有 OpenSSL 絕對路徑 | `install_name_tool -change → @rpath` |
+| 3 | `Killed: 9` (exit 137) | code signature 被破壞 | `codesign --force --sign -` |
+| 4 | `Library not loaded: /opt/homebrew/Cellar/.../libcrypto.3.dylib` | libssl.3.dylib 內部也有絕對路徑 | `install_name_tool` 修復 libssl |
+| 5 | `no backends are loaded` | 缺少 Metal GPU backend | source build 時需 `-DGGML_METAL=ON` |
+| 6 | `couldn't bind HTTP server socket` | 前一個 process 未完全釋放 port | `pkill -9 -f llama-server` 先 |
+| 7 | **content 全在 reasoning_content** | Gemma4 預設為 thinking model | `--reasoning off` |
+
+## 5. API 驗證
+
+### 5.1 Health Check
+
+```bash
+curl -s http://192.168.110.201:8081/health
+# → {"status":"ok"}
+```
+
+### 5.2 推理測試（--reasoning off 後）
+
+```bash
+curl -s http://192.168.110.201:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
-    "model": "gemma-4-31b-it",
+    "model": "gemma-4-31B-it-Q5_K_M.gguf",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'
 ```

-## 7. 啟動腳本（M5 Max 上）
+回應（OpenAI-compatible）:

-```bash
-#!/bin/bash
-# ~/start_llm.sh
-export DYLD_LIBRARY_PATH=~/bin/lib:$DYLD_LIBRARY_PATH
-exec ~/bin/bin/llama-server \
-  -m ~/models/gemma-4-31b-it-Q5_K_M.gguf \
-  --host 0.0.0.0 --port 8081 \
-  --n-gpu-layers 999 --ctx-size 8192 \
-  --threads 10 --parallel 2 --mlock \
-  >> ~/llama.log 2>&1
+```json
+{
+  "choices": [{
+    "finish_reason": "stop",
+    "message": {
+      "role": "assistant",
+      "content": "Hello! How can I help you today?",
+      "reasoning_content": ""
+    }
+  }],
+  "usage": {
+    "completion_tokens": 100,
+    "prompt_tokens": 18,
+    "total_tokens": 118
+  },
+  "model": "gemma-4-31B-it-Q5_K_M.gguf",
+  "object": "chat.completion"
+}
 ```

-## 8. 風險與備案
+### 5.3 效能

-| 風險 | 備案 |
+| 指標 | 實測 |
 |------|------|
-| GGUF 下載失敗（HF gated） | 用 ollama pull + ollama export to GGUF |
-| M5 Max Metal 不相容 | 改用 CPU only (`--n-gpu-layers 0`) |
-| 31B 太大速度太慢 | 改用 26B-A4B (MoE, 更快) |
-| scp 傳輸中斷 | 用 rsync --partial 續傳 |
+| Prompt 速度 | 60.8 tok/s |
+| 生成速度 | **25.8 tok/s** |
+| Prompt 延遲 | 296 ms（18 tokens） |
+| 生成延遲 | 387 ms（10 tokens） |
+
+## 6. 整合至 OpenCode
+
+`~/.config/opencode/config.json` 中新增 provider：
+
+```json
+{
+  "m5-gemma4": {
+    "npm": "@ai-sdk/openai-compatible",
+    "name": "M5 Max Gemma 4",
+    "options": { "baseURL": "http://192.168.110.201:8081/v1" },
+    "models": {
+      "gemma-4-31B-it-Q5_K_M.gguf": { "name": "Gemma 4 31B" }
+    }
+  }
+}
+```
+
+預設 model 設為 `"m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf"`。Provider list 確認：
+
+```bash
+opencode models m5-gemma4
+# → m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf
+```
+
+## 7. M5 網路異動記錄
+
+| 時間 | IP | 網路 | 原因 |
+|------|-----|------|------|
+| 初始 | `10.10.10.10` | bridge (Thunderbolt) | 無外網，需透過 M4 NAT |
+| 切換後 | `192.168.110.201` | en0 (WiFi/Ethernet) | 改接同網段，有外網 |
+
+## 8. Rust 安裝（for Momentry dev）
+
+```bash
+curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+source $HOME/.cargo/env
+```
+
+- rustc 1.95.0
+- cargo 1.95.0
+- 免 sudo
+
+## 9. 記憶體使用
+
+```
+48 GB total
+  ├─ 20 GB  Gemma 4 31B Q5_K_M (process RSS ~28 GB)
+  ├─  4 GB  macOS + 系統
+  └─ 24 GB  剩餘
+```
+
+實測啟動後 RSS: `28,325,600 KB` (~28 GB)。
+
+## 10. 維護指令
+
+| 操作 | 指令 |
+|------|------|
+| 啟動 | `ssh accusys@192.168.110.201 '~/start_llm.sh'` |
+| 停止 | `ssh accusys@192.168.110.201 'pkill -9 -f llama-server'` |
+| 查看日誌 | `ssh accusys@192.168.110.201 'tail -50 ~/llama.log'` |
+| 健康檢查 | `curl http://192.168.110.201:8081/health` |
+| 模型檔案 | `~/models/gemma-4-31B-it-Q5_K_M.gguf (20G)` |
+| Binary 與 lib | `~/llama/bin/llama-server`, `~/llama/lib/*.dylib` |
+| config | `~/.config/opencode/config.json` |
+| 監控 | `htop -p $(pgrep llama-server)` |
+| 記憶體 | `ps -o rss= -p $(pgrep llama-server)` |
+
+## 11. 已知限制
+
+- **Thinking model**: Gemma4 為 thinking 模型（`--reasoning off` 關閉後 content 正常，但某些場景可能需要 reasoning）
+- **無 Homebrew**: 非 admin 帳號，無法 `brew install`。Momentry 其他服務（PostgreSQL, Redis, MongoDB）需用 portable binary 手動安裝
+- **無 Xcode CLT**: `install_name_tool`, `codesign` 不可用於 M5。binary 修復需在 M4 完成後 scp