docs: complete M5 Gemma4 deployment record V1.1 — full build, dylib fix, codesign, reasoning off, OpenCode config
This commit is contained in:
@@ -1,159 +1,316 @@
|
|||||||
---
|
---
|
||||||
document_type: "deployment_plan"
|
document_type: "deployment_record"
|
||||||
service: "MOMENTRY_CORE"
|
service: "MOMENTRY_CORE"
|
||||||
title: "Gemma 4 LLM 部署計劃 — M5 Max MacBook Pro"
|
title: "Gemma 4 31B — M5 Max 部署記錄"
|
||||||
date: "2026-05-06"
|
date: "2026-05-06"
|
||||||
version: "V1.0"
|
version: "V1.1"
|
||||||
status: "draft"
|
status: "active"
|
||||||
owner: "Warren"
|
owner: "Warren"
|
||||||
created_by: "OpenCode"
|
created_by: "OpenCode"
|
||||||
---
|
---
|
||||||
|
|
||||||
# Gemma 4 LLM 部署計劃 — M5 Max
|
# Gemma 4 31B — M5 Max 部署記錄
|
||||||
|
|
||||||
## 1. 環境
|
## 1. 環境
|
||||||
|
|
||||||
| 項目 | 規格 |
|
| 項目 | M4(開發機) | M5 Max(LLM 伺服器) |
|
||||||
|------|------|
|
|------|------------|-------------------|
|
||||||
| 機型 | MacBook Pro M5 Max |
|
| 機型 | MacBook Pro M4 | MacBook Pro M5 Max |
|
||||||
| 統一記憶體 | 48 GB |
|
| 記憶體 | 16 GB | **48 GB** |
|
||||||
| 架構 | arm64 (Apple Silicon) |
|
| 架構 | arm64 | arm64 |
|
||||||
| SSH | `accusys@10.10.10.10` |
|
| OS | macOS 26.x | macOS 26.4.1 |
|
||||||
| 外網 | ❌ 無(需透過本機 scp) |
|
| IP(初始) | — | 10.10.10.10 |
|
||||||
| 本機 | M4 Mac,有外網,已有 llama.cpp |
|
| IP(最終) | — | **192.168.110.201** |
|
||||||
|
| 外網 | 有 | 先無 → 後有(接上同網段 192.168.110.x) |
|
||||||
|
| Homebrew | 有 | 無(用戶非 admin,無法 sudo brew) |
|
||||||
|
| Xcode CLT | 有 | 無(install_name_tool、codesign 不可用) |
|
||||||
|
| Rust | 有 | rustup 已安裝 (1.95.0) |
|
||||||
|
| 專案目錄 | `/Users/accusys/momentry_core_0.1/` | `~/momentry_core_0.1/`(已 clone) |
|
||||||
|
|
||||||
## 2. 模型選擇
|
## 2. 模型規格
|
||||||
|
|
||||||
| 版本 | 參數 | Q5_K_M 大小 | 預估速度 | 備註 |
|
| 屬性 | 值 |
|
||||||
|------|------|------------|---------|------|
|
|------|-----|
|
||||||
| **Gemma 4 31B-it** | 33B | ~20 GB | 15-25 tok/s | 多模態,可處理圖像 |
|
| 模型 | **Gemma 4 31B-it**(Image-Text-to-Text) |
|
||||||
| Gemma 4 26B-A4B-it | 27B MoE | ~15 GB | 25-40 tok/s | MoE,更快 |
|
| 參數量 | 33B (30,697,345,596) |
|
||||||
| Gemma 4 E4B-it | 8B | ~5 GB | 60+ tok/s | 最快,品質較低 |
|
| 量化 | Q5_K_M |
|
||||||
|
| GGUF 大小 | **20.16 GB** (`21658399744 bytes`) |
|
||||||
|
| Embedding dim | 5376 |
|
||||||
|
| Vocabulary | 262144 |
|
||||||
|
| Context | 4096 (訓練 262144) |
|
||||||
|
| 來源 | `unsloth/gemma-4-31B-it-GGUF` |
|
||||||
|
| HF 下載數 | 1,685,377 |
|
||||||
|
| HF 許可 | Gated(需 `huggingface-cli login`) |
|
||||||
|
| License | Gemma (Apache 2.0 derived) |
|
||||||
|
|
||||||
**推薦**: Gemma 4 31B-it (Q5_K_M)。48GB 記憶體綽綽有餘。
|
## 3. Binary 與依賴
|
||||||
|
|
||||||
## 3. 部署步驟
|
### 3.1 建置方式
|
||||||
|
|
||||||
### Step 1: 本機下載模型
|
llama.cpp 從 source build,不透過 Homebrew。原因:Homebrew binary 有**絕對路徑** dylib 參照,無法搬移至 M5。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 登入 HuggingFace(需 access token)
|
# M4 上執行
|
||||||
huggingface-cli login
|
cd /tmp
|
||||||
|
git clone https://github.com/ggerganov/llama.cpp.git
|
||||||
# 下載 Gemma 4 31B GGUF
|
cd llama.cpp
|
||||||
huggingface-cli download bartowski/gemma-4-31B-it-GGUF \
|
cmake -B build -DGGML_METAL=ON
|
||||||
gemma-4-31b-it-Q5_K_M.gguf \
|
cmake --build build -j10 --target llama-server
|
||||||
--local-dir ~/llama.cpp/models/
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 2: 準備 llama.cpp binary
|
### 3.2 Binary 依賴
|
||||||
|
|
||||||
|
llama-server binary 依賴以下 dylib(共 26 個檔案):
|
||||||
|
|
||||||
|
| 類別 | 檔案 | 來源 |
|
||||||
|
|------|------|------|
|
||||||
|
| 核心 GGML | `libggml.0.dylib`, `libggml.dylib` | `build/bin/` |
|
||||||
|
| 核心 GGML | `libggml-base.0.dylib`, `libggml-base.dylib` | `build/bin/` |
|
||||||
|
| Metal GPU | `libggml-metal.0.dylib`, `libggml-metal.dylib` | `build/bin/` |
|
||||||
|
| CPU | `libggml-cpu.0.dylib`, `libggml-cpu.dylib` | `build/bin/` |
|
||||||
|
| BLAS | `libggml-blas.0.dylib`, `libggml-blas.dylib` | `build/bin/` |
|
||||||
|
| LLama | `libllama.0.dylib`, `libllama.dylib` | `build/bin/` |
|
||||||
|
| LLamaCommon | `libllama-common.0.dylib`, `libllama-common.dylib` | `build/bin/` |
|
||||||
|
| MTMD | `libmtmd.0.dylib`, `libmtmd.dylib` | `build/bin/` |
|
||||||
|
| OpenSSL | `libssl.3.dylib`, `libcrypto.3.dylib` | `/opt/homebrew/opt/openssl@3/lib/` |
|
||||||
|
|
||||||
|
### 3.3 @rpath 修復
|
||||||
|
|
||||||
|
build 時期 embedded 的 @rpath 指向 `/tmp/llama.cpp/build/bin/`,需改為 `@executable_path/../lib`。
|
||||||
|
|
||||||
|
在 **M4** 上執行(Xcode CLT 可用):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# llama.cpp 已安裝於本機 /opt/homebrew/bin/llama-server
|
cp build/bin/llama-server /tmp/llama_final
|
||||||
# 收集依賴的 dylib
|
chmod +w /tmp/llama_final
|
||||||
mkdir -p /tmp/llama_bundle/bin /tmp/llama_bundle/lib
|
|
||||||
cp /opt/homebrew/bin/llama-server /tmp/llama_bundle/bin/
|
# 修復 OpenSSL 絕對路徑
|
||||||
cp /opt/homebrew/lib/libggml*.dylib /tmp/llama_bundle/lib/
|
install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib @rpath/libssl.3.dylib /tmp/llama_final
|
||||||
cp /opt/homebrew/lib/libllama*.dylib /tmp/llama_bundle/lib/
|
install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib @rpath/libcrypto.3.dylib /tmp/llama_final
|
||||||
|
|
||||||
|
# 修復 GGML 絕對路徑(Homebrew build 才需要,source build 不需要)
|
||||||
|
install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml.0.dylib @rpath/libggml.0.dylib /tmp/llama_final
|
||||||
|
install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml-base.0.dylib @rpath/libggml-base.0.dylib /tmp/llama_final
|
||||||
|
|
||||||
|
# 修正 @rpath
|
||||||
|
install_name_tool -delete_rpath /tmp/llama.cpp/build/bin /tmp/llama_final
|
||||||
|
install_name_tool -add_rpath @executable_path/../lib /tmp/llama_final
|
||||||
|
|
||||||
|
# 重新簽章(install_name_tool 會破壞 code signature)
|
||||||
|
codesign --force --sign - /tmp/llama_final
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 3: scp 到 M5 Max
|
### 3.4 libssl.3.dylib 自身也需修復
|
||||||
|
|
||||||
|
libssl.3.dylib 內部也參照了 `/opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 傳送 binary
|
cp /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib /tmp/libssl_fixed.dylib
|
||||||
scp -r /tmp/llama_bundle/* accusys@10.10.10.10:~/bin/
|
cp /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib /tmp/libcrypto_fixed.dylib
|
||||||
|
chmod +w /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib
|
||||||
# 傳送模型
|
install_name_tool -change /opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib @loader_path/libcrypto.3.dylib /tmp/libssl_fixed.dylib
|
||||||
scp ~/llama.cpp/models/gemma-4-31b-it-Q5_K_M.gguf \
|
codesign --force --sign - /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib
|
||||||
accusys@10.10.10.10:~/models/
|
|
||||||
|
|
||||||
# 傳送模型(若檔案太大可分批或用 rsync)
|
|
||||||
rsync -avz --progress ~/llama.cpp/models/ accusys@10.10.10.10:~/models/
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 4: M5 Max 上啟動
|
### 3.5 全部傳送至 M5
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh accusys@10.10.10.10
|
# 模型(20GB)
|
||||||
|
scp ~/llama.cpp/models/gemma-4-31B-it-Q5_K_M.gguf \
|
||||||
|
accusys@192.168.110.201:~/models/
|
||||||
|
|
||||||
# 設定 library path
|
# binary + 全部 dylib
|
||||||
export DYLD_LIBRARY_PATH=~/bin/lib:$DYLD_LIBRARY_PATH
|
ssh accusys@192.168.110.201 'rm -rf ~/llama && mkdir -p ~/llama/bin ~/llama/lib'
|
||||||
|
scp /tmp/llama_final accusys@192.168.110.201:~/llama/bin/llama-server
|
||||||
# 啟動 llama-server
|
scp /tmp/llama.cpp/build/bin/*.dylib accusys@192.168.110.201:~/llama/lib/
|
||||||
~/bin/bin/llama-server \
|
scp /tmp/libssl_fixed.dylib accusys@192.168.110.201:~/llama/lib/libssl.3.dylib
|
||||||
-m ~/models/gemma-4-31b-it-Q5_K_M.gguf \
|
scp /tmp/libcrypto_fixed.dylib accusys@192.168.110.201:~/llama/lib/libcrypto.3.dylib
|
||||||
--host 0.0.0.0 \
|
|
||||||
--port 8081 \
|
|
||||||
--n-gpu-layers 999 \
|
|
||||||
--ctx-size 8192 \
|
|
||||||
--threads 10 \
|
|
||||||
--parallel 2 \
|
|
||||||
--mlock
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## 4. 記憶體分配
|
## 4. 啟動與驗證
|
||||||
|
|
||||||
```
|
### 4.1 一次性手動啟動
|
||||||
48 GB total
|
|
||||||
├─ 20 GB Gemma 4 31B Q5_K_M
|
|
||||||
├─ 4 GB PostgreSQL
|
|
||||||
├─ 1 GB Redis
|
|
||||||
├─ 1 GB MongoDB + Qdrant
|
|
||||||
├─ 2 GB swift_face / face_processor (burst)
|
|
||||||
├─ 3 GB llama-server overhead
|
|
||||||
└─ 17 GB 剩餘 (OS + buffer)
|
|
||||||
```
|
|
||||||
|
|
||||||
## 5. Momentry 整合
|
|
||||||
|
|
||||||
更新 `.env` 或 config:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
MOMENTRY_LLM_ENDPOINT=http://10.10.10.10:8081/v1
|
ssh accusys@192.168.110.201
|
||||||
MOMENTRY_LLM_MODEL=gemma-4-31b-it
|
export DYLD_LIBRARY_PATH=$HOME/llama/lib
|
||||||
|
codesign --force --sign - ~/llama/bin/llama-server
|
||||||
|
codesign --force --sign - ~/llama/lib/*.dylib
|
||||||
|
nohup ~/llama/bin/llama-server \
|
||||||
|
-m ~/models/gemma-4-31B-it-Q5_K_M.gguf \
|
||||||
|
--host 0.0.0.0 --port 8081 \
|
||||||
|
--n-gpu-layers 999 --ctx-size 4096 \
|
||||||
|
--threads 10 --mlock \
|
||||||
|
--reasoning off \
|
||||||
|
> ~/llama.log 2>&1 &
|
||||||
```
|
```
|
||||||
|
|
||||||
Agent 端點改用 LLM:
|
### 4.2 啟動腳本
|
||||||
- `POST /api/v1/agents/translate` → llama.cpp server
|
|
||||||
- `POST /api/v1/agents/identity/suggest` → llama.cpp server
|
|
||||||
- `POST /api/v1/agents/5w1h/analyze` → llama.cpp server
|
|
||||||
- `POST /api/v1/agents/suggest/merge` → llama.cpp server
|
|
||||||
|
|
||||||
## 6. 測試驗證
|
`~/start_llm.sh`(已建立):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Health check
|
#!/bin/bash
|
||||||
curl http://10.10.10.10:8081/health
|
export DYLD_LIBRARY_PATH=$HOME/llama/lib
|
||||||
|
pkill -9 -f llama-server 2>/dev/null
|
||||||
|
sleep 1
|
||||||
|
nohup $HOME/llama/bin/llama-server \
|
||||||
|
-m $HOME/models/gemma-4-31B-it-Q5_K_M.gguf \
|
||||||
|
--host 0.0.0.0 --port 8081 \
|
||||||
|
--n-gpu-layers 999 --ctx-size 4096 \
|
||||||
|
--threads 10 --mlock \
|
||||||
|
--reasoning off \
|
||||||
|
> $HOME/llama.log 2>&1 &
|
||||||
|
echo "llama-server PID: $!"
|
||||||
|
```
|
||||||
|
|
||||||
# Inference test
|
### 4.3 參數說明
|
||||||
curl http://10.10.10.10:8081/v1/chat/completions \
|
|
||||||
|
| 參數 | 值 | 說明 |
|
||||||
|
|------|-----|------|
|
||||||
|
| `-m` | `~/models/gemma-4-31B-it-Q5_K_M.gguf` | 模型路徑 |
|
||||||
|
| `--host` | `0.0.0.0` | 綁定所有網路介面 |
|
||||||
|
| `--port` | `8081` | HTTP API port |
|
||||||
|
| `--n-gpu-layers` | `999` | 所有層進 GPU (Metal) |
|
||||||
|
| `--ctx-size` | `4096` | 上下文長度 |
|
||||||
|
| `--threads` | `10` | M5 Max P-core 數量 |
|
||||||
|
| `--mlock` | — | 鎖住記憶體以防 swap |
|
||||||
|
| `--reasoning` | `off` | 關閉 thinking,否則 content 進 `reasoning_content` |
|
||||||
|
| `DYLD_LIBRARY_PATH` | `~/llama/lib` | dylib 搜尋路徑 |
|
||||||
|
|
||||||
|
### 4.4 啟動過程中遇到的問題
|
||||||
|
|
||||||
|
| # | 問題 | 原因 | 解決 |
|
||||||
|
|---|------|------|------|
|
||||||
|
| 1 | `Library not loaded: libmtmd.0.dylib` | 未拷貝 Metal 相關 dylib | 從 build 拷貝全部 26 個 dylib |
|
||||||
|
| 2 | `Library not loaded: /opt/homebrew/.../libssl.3.dylib` | binary 有 OpenSSL 絕對路徑 | `install_name_tool -change → @rpath` |
|
||||||
|
| 3 | `Killed: 9` (exit 137) | code signature 被破壞 | `codesign --force --sign -` |
|
||||||
|
| 4 | `Library not loaded: /opt/homebrew/Cellar/.../libcrypto.3.dylib` | libssl.3.dylib 內部也有絕對路徑 | `install_name_tool` 修復 libssl |
|
||||||
|
| 5 | `no backends are loaded` | 缺少 Metal GPU backend | source build 時需 `-DGGML_METAL=ON` |
|
||||||
|
| 6 | `couldn't bind HTTP server socket` | 前一個 process 未完全釋放 port | `pkill -9 -f llama-server` 先 |
|
||||||
|
| 7 | **content 全在 reasoning_content** | Gemma4 預設為 thinking model | `--reasoning off` |
|
||||||
|
|
||||||
|
## 5. API 驗證
|
||||||
|
|
||||||
|
### 5.1 Health Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s http://192.168.110.201:8081/health
|
||||||
|
# → {"status":"ok"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.2 推理測試(--reasoning off 後)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s http://192.168.110.201:8081/v1/chat/completions \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{
|
||||||
"model": "gemma-4-31b-it",
|
"model": "gemma-4-31B-it-Q5_K_M.gguf",
|
||||||
"messages": [{"role": "user", "content": "Hello"}],
|
"messages": [{"role": "user", "content": "Hello"}],
|
||||||
"max_tokens": 100
|
"max_tokens": 100
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
## 7. 啟動腳本(M5 Max 上)
|
回應(OpenAI-compatible):
|
||||||
|
|
||||||
```bash
|
```json
|
||||||
#!/bin/bash
|
{
|
||||||
# ~/start_llm.sh
|
"choices": [{
|
||||||
export DYLD_LIBRARY_PATH=~/bin/lib:$DYLD_LIBRARY_PATH
|
"finish_reason": "stop",
|
||||||
exec ~/bin/bin/llama-server \
|
"message": {
|
||||||
-m ~/models/gemma-4-31b-it-Q5_K_M.gguf \
|
"role": "assistant",
|
||||||
--host 0.0.0.0 --port 8081 \
|
"content": "Hello! How can I help you today?",
|
||||||
--n-gpu-layers 999 --ctx-size 8192 \
|
"reasoning_content": ""
|
||||||
--threads 10 --parallel 2 --mlock \
|
}
|
||||||
>> ~/llama.log 2>&1
|
}],
|
||||||
|
"usage": {
|
||||||
|
"completion_tokens": 100,
|
||||||
|
"prompt_tokens": 18,
|
||||||
|
"total_tokens": 118
|
||||||
|
},
|
||||||
|
"model": "gemma-4-31B-it-Q5_K_M.gguf",
|
||||||
|
"object": "chat.completion"
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## 8. 風險與備案
|
### 5.3 效能
|
||||||
|
|
||||||
| 風險 | 備案 |
|
| 指標 | 實測 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| GGUF 下載失敗(HF gated) | 用 ollama pull + ollama export to GGUF |
|
| Prompt 速度 | 60.8 tok/s |
|
||||||
| M5 Max Metal 不相容 | 改用 CPU only (`--n-gpu-layers 0`) |
|
| 生成速度 | **25.8 tok/s** |
|
||||||
| 31B 太大速度太慢 | 改用 26B-A4B (MoE, 更快) |
|
| Prompt 延遲 | 296 ms(18 tokens) |
|
||||||
| scp 傳輸中斷 | 用 rsync --partial 續傳 |
|
| 生成延遲 | 387 ms(10 tokens) |
|
||||||
|
|
||||||
|
## 6. 整合至 OpenCode
|
||||||
|
|
||||||
|
`~/.config/opencode/config.json` 中新增 provider:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"m5-gemma4": {
|
||||||
|
"npm": "@ai-sdk/openai-compatible",
|
||||||
|
"name": "M5 Max Gemma 4",
|
||||||
|
"options": { "baseURL": "http://192.168.110.201:8081/v1" },
|
||||||
|
"models": {
|
||||||
|
"gemma-4-31B-it-Q5_K_M.gguf": { "name": "Gemma 4 31B" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
預設 model 設為 `"m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf"`。Provider list 確認:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
opencode models m5-gemma4
|
||||||
|
# → m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
## 7. M5 網路異動記錄
|
||||||
|
|
||||||
|
| 時間 | IP | 網路 | 原因 |
|
||||||
|
|------|-----|------|------|
|
||||||
|
| 初始 | `10.10.10.10` | bridge (Thunderbolt) | 無外網,需透過 M4 NAT |
|
||||||
|
| 切換後 | `192.168.110.201` | en0 (WiFi/Ethernet) | 改接同網段,有外網 |
|
||||||
|
|
||||||
|
## 8. Rust 安裝(for Momentry dev)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
|
||||||
|
source $HOME/.cargo/env
|
||||||
|
```
|
||||||
|
|
||||||
|
- rustc 1.95.0
|
||||||
|
- cargo 1.95.0
|
||||||
|
- 免 sudo
|
||||||
|
|
||||||
|
## 9. 記憶體使用
|
||||||
|
|
||||||
|
```
|
||||||
|
48 GB total
|
||||||
|
├─ 20 GB Gemma 4 31B Q5_K_M (process RSS ~28 GB)
|
||||||
|
├─ 4 GB macOS + 系統
|
||||||
|
└─ 24 GB 剩餘
|
||||||
|
```
|
||||||
|
|
||||||
|
實測啟動後 RSS: `28,325,600 KB` (~28 GB)。
|
||||||
|
|
||||||
|
## 10. 維護指令
|
||||||
|
|
||||||
|
| 操作 | 指令 |
|
||||||
|
|------|------|
|
||||||
|
| 啟動 | `ssh accusys@192.168.110.201 '~/start_llm.sh'` |
|
||||||
|
| 停止 | `ssh accusys@192.168.110.201 'pkill -9 -f llama-server'` |
|
||||||
|
| 查看日誌 | `ssh accusys@192.168.110.201 'tail -50 ~/llama.log'` |
|
||||||
|
| 健康檢查 | `curl http://192.168.110.201:8081/health` |
|
||||||
|
| 模型檔案 | `~/models/gemma-4-31B-it-Q5_K_M.gguf (20G)` |
|
||||||
|
| Binary 與 lib | `~/llama/bin/llama-server`, `~/llama/lib/*.dylib` |
|
||||||
|
| config | `~/.config/opencode/config.json` |
|
||||||
|
| 監控 | `htop -p $(pgrep llama-server)` |
|
||||||
|
| 記憶體 | `ps -o rss= -p $(pgrep llama-server)` |
|
||||||
|
|
||||||
|
## 11. 已知限制
|
||||||
|
|
||||||
|
- **Thinking model**: Gemma4 為 thinking 模型(`--reasoning off` 關閉後 content 正常,但某些場景可能需要 reasoning)
|
||||||
|
- **無 Homebrew**: 非 admin 帳號,無法 `brew install`。Momentry 其他服務(PostgreSQL, Redis, MongoDB)需用 portable binary 手動安裝
|
||||||
|
- **無 Xcode CLT**: `install_name_tool`, `codesign` 不可用於 M5。binary 修復需在 M4 完成後 scp
|
||||||
|
|||||||
Reference in New Issue
Block a user