Initial commit: docs_v1.0 structure

- API_V1.0.0: 正式 API 文件(spec、release、deploy、test)
- M4_workspace: M4 工作記錄(review、issue、提案)
- M5_workspace: M5 工作記錄(實作、評估、sync)
- AGENTS.md: 專案規則

M5/M4 協作方式:git push/pull 同步 workspace 文件
This commit is contained in:
M5
2026-05-07 23:42:19 +08:00
commit 28e927f7bb
519 changed files with 136077 additions and 0 deletions

View File

@@ -0,0 +1,83 @@
# Embedding 跨機器部署方案 v1.0.0
## 分工原則
```
M5Pipeline + 主力 Embedding M4Portal + Fallback Embedding
├── 批量 vectorize1709 chunks ├── Portal search query embedding
├── EmbeddingGemma 主 server ├── 備援 embed server
├── 模型已上線port 11436 └── 預設呼叫 M5 API
└── 出門 demo 可離線運作
```
## 部署架構
```
Portal Search Query
┌─────────────┐ 成功 ┌──────────────────┐
│ M4 Portal │ ──────────→ │ M5:11436 │
│ embed │ │ EmbeddingGemma │
│ client │ │ (主力) │
│ │ 失敗 └──────────────────┘
│ retry │ ──────────→ ┌──────────────────┐
│ fallback │ │ M4:11436 │
└─────────────┘ │ EmbeddingGemma │
│ (備援) │
└──────────────────┘
```
## M4 安裝步驟
```bash
# 1. 安裝 Python 依賴
pip install torch transformers flask
# 2. 登入 HuggingFace需接受授權
open https://huggingface.co/google/embeddinggemma-300m
huggingface-cli login --token YOUR_TOKEN
# 3. 取得 script
rsync -av accusys@192.168.110.201:/Users/accusys/momentry_core_0.1/scripts/embeddinggemma_server.py \
./scripts/embeddinggemma_server.py
# 4. 啟動備援 server
python3 scripts/embeddinggemma_server.py --port 11436
```
## Portal Embed Client
```javascript
async function embedQuery(text) {
const servers = [
'http://192.168.110.201:11436/v1/embeddings', // M5 主力
'http://localhost:11436/v1/embeddings', // M4 備援
];
for (const url of servers) {
try {
const res = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ input: text }),
});
const data = await res.json();
return data.data[0].embedding;
} catch (e) {
continue; // 下一台
}
}
throw new Error('Embedding servers unreachable');
}
```
## 模型一致性
| 項目 | M5 | M4 |
|------|-----|-----|
| 模型 | EmbeddingGemma 300M | EmbeddingGemma 300M |
| 維度 | 768D | 768D |
| Server | Python MPS (port 11436) | Python CPU/MPS (port 11436) |
| Qdrant | 192.168.110.201:6333 | 192.168.110.201:6333 |
兩台使用同一模型、同一維度,確保 query embedding 與索引 embedding 可比對。

View File

@@ -0,0 +1,316 @@
---
document_type: "deployment_record"
service: "MOMENTRY_CORE"
title: "Gemma 4 31B — M5 Max 部署記錄"
date: "2026-05-06"
version: "V1.1"
status: "active"
owner: "Warren"
created_by: "OpenCode"
---
# Gemma 4 31B — M5 Max 部署記錄
## 1. 環境
| 項目 | M4開發機 | M5 MaxLLM 伺服器) |
|------|------------|-------------------|
| 機型 | MacBook Pro M4 | MacBook Pro M5 Max |
| 記憶體 | 16 GB | **48 GB** |
| 架構 | arm64 | arm64 |
| OS | macOS 26.x | macOS 26.4.1 |
| IP初始 | — | 10.10.10.10 |
| IP最終 | — | **192.168.110.201** |
| 外網 | 有 | 先無 → 後有(接上同網段 192.168.110.x |
| Homebrew | 有 | 無(用戶非 admin無法 sudo brew |
| Xcode CLT | 有 | 無install_name_tool、codesign 不可用) |
| Rust | 有 | rustup 已安裝 (1.95.0) |
| 專案目錄 | `/Users/accusys/momentry_core_0.1/` | `~/momentry_core_0.1/`(已 clone |
## 2. 模型規格
| 屬性 | 值 |
|------|-----|
| 模型 | **Gemma 4 31B-it**Image-Text-to-Text |
| 參數量 | 33B (30,697,345,596) |
| 量化 | Q5_K_M |
| GGUF 大小 | **20.16 GB** (`21658399744 bytes`) |
| Embedding dim | 5376 |
| Vocabulary | 262144 |
| Context | 4096 (訓練 262144) |
| 來源 | `unsloth/gemma-4-31B-it-GGUF` |
| HF 下載數 | 1,685,377 |
| HF 許可 | Gated`huggingface-cli login` |
| License | Gemma (Apache 2.0 derived) |
## 3. Binary 與依賴
### 3.1 建置方式
llama.cpp 從 source build不透過 Homebrew。原因Homebrew binary 有**絕對路徑** dylib 參照,無法搬移至 M5。
```bash
# M4 上執行
cd /tmp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_METAL=ON
cmake --build build -j10 --target llama-server
```
### 3.2 Binary 依賴
llama-server binary 依賴以下 dylib共 26 個檔案):
| 類別 | 檔案 | 來源 |
|------|------|------|
| 核心 GGML | `libggml.0.dylib`, `libggml.dylib` | `build/bin/` |
| 核心 GGML | `libggml-base.0.dylib`, `libggml-base.dylib` | `build/bin/` |
| Metal GPU | `libggml-metal.0.dylib`, `libggml-metal.dylib` | `build/bin/` |
| CPU | `libggml-cpu.0.dylib`, `libggml-cpu.dylib` | `build/bin/` |
| BLAS | `libggml-blas.0.dylib`, `libggml-blas.dylib` | `build/bin/` |
| LLama | `libllama.0.dylib`, `libllama.dylib` | `build/bin/` |
| LLamaCommon | `libllama-common.0.dylib`, `libllama-common.dylib` | `build/bin/` |
| MTMD | `libmtmd.0.dylib`, `libmtmd.dylib` | `build/bin/` |
| OpenSSL | `libssl.3.dylib`, `libcrypto.3.dylib` | `/opt/homebrew/opt/openssl@3/lib/` |
### 3.3 @rpath 修復
build 時期 embedded 的 @rpath 指向 `/tmp/llama.cpp/build/bin/`,需改為 `@executable_path/../lib`
**M4** 上執行Xcode CLT 可用):
```bash
cp build/bin/llama-server /tmp/llama_final
chmod +w /tmp/llama_final
# 修復 OpenSSL 絕對路徑
install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib @rpath/libssl.3.dylib /tmp/llama_final
install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib @rpath/libcrypto.3.dylib /tmp/llama_final
# 修復 GGML 絕對路徑Homebrew build 才需要source build 不需要)
install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml.0.dylib @rpath/libggml.0.dylib /tmp/llama_final
install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml-base.0.dylib @rpath/libggml-base.0.dylib /tmp/llama_final
# 修正 @rpath
install_name_tool -delete_rpath /tmp/llama.cpp/build/bin /tmp/llama_final
install_name_tool -add_rpath @executable_path/../lib /tmp/llama_final
# 重新簽章install_name_tool 會破壞 code signature
codesign --force --sign - /tmp/llama_final
```
### 3.4 libssl.3.dylib 自身也需修復
libssl.3.dylib 內部也參照了 `/opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib`
```bash
cp /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib /tmp/libssl_fixed.dylib
cp /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib /tmp/libcrypto_fixed.dylib
chmod +w /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib
install_name_tool -change /opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib @loader_path/libcrypto.3.dylib /tmp/libssl_fixed.dylib
codesign --force --sign - /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib
```
### 3.5 全部傳送至 M5
```bash
# 模型20GB
scp ~/llama.cpp/models/gemma-4-31B-it-Q5_K_M.gguf \
accusys@192.168.110.201:~/models/
# binary + 全部 dylib
ssh accusys@192.168.110.201 'rm -rf ~/llama && mkdir -p ~/llama/bin ~/llama/lib'
scp /tmp/llama_final accusys@192.168.110.201:~/llama/bin/llama-server
scp /tmp/llama.cpp/build/bin/*.dylib accusys@192.168.110.201:~/llama/lib/
scp /tmp/libssl_fixed.dylib accusys@192.168.110.201:~/llama/lib/libssl.3.dylib
scp /tmp/libcrypto_fixed.dylib accusys@192.168.110.201:~/llama/lib/libcrypto.3.dylib
```
## 4. 啟動與驗證
### 4.1 一次性手動啟動
```bash
ssh accusys@192.168.110.201
export DYLD_LIBRARY_PATH=$HOME/llama/lib
codesign --force --sign - ~/llama/bin/llama-server
codesign --force --sign - ~/llama/lib/*.dylib
nohup ~/llama/bin/llama-server \
-m ~/models/gemma-4-31B-it-Q5_K_M.gguf \
--host 0.0.0.0 --port 8081 \
--n-gpu-layers 999 --ctx-size 4096 \
--threads 10 --mlock \
--reasoning off \
> ~/llama.log 2>&1 &
```
### 4.2 啟動腳本
`~/start_llm.sh`(已建立):
```bash
#!/bin/bash
export DYLD_LIBRARY_PATH=$HOME/llama/lib
pkill -9 -f llama-server 2>/dev/null
sleep 1
nohup $HOME/llama/bin/llama-server \
-m $HOME/models/gemma-4-31B-it-Q5_K_M.gguf \
--host 0.0.0.0 --port 8081 \
--n-gpu-layers 999 --ctx-size 4096 \
--threads 10 --mlock \
--reasoning off \
> $HOME/llama.log 2>&1 &
echo "llama-server PID: $!"
```
### 4.3 參數說明
| 參數 | 值 | 說明 |
|------|-----|------|
| `-m` | `~/models/gemma-4-31B-it-Q5_K_M.gguf` | 模型路徑 |
| `--host` | `0.0.0.0` | 綁定所有網路介面 |
| `--port` | `8081` | HTTP API port |
| `--n-gpu-layers` | `999` | 所有層進 GPU (Metal) |
| `--ctx-size` | `4096` | 上下文長度 |
| `--threads` | `10` | M5 Max P-core 數量 |
| `--mlock` | — | 鎖住記憶體以防 swap |
| `--reasoning` | `off` | 關閉 thinking否則 content 進 `reasoning_content` |
| `DYLD_LIBRARY_PATH` | `~/llama/lib` | dylib 搜尋路徑 |
### 4.4 啟動過程中遇到的問題
| # | 問題 | 原因 | 解決 |
|---|------|------|------|
| 1 | `Library not loaded: libmtmd.0.dylib` | 未拷貝 Metal 相關 dylib | 從 build 拷貝全部 26 個 dylib |
| 2 | `Library not loaded: /opt/homebrew/.../libssl.3.dylib` | binary 有 OpenSSL 絕對路徑 | `install_name_tool -change → @rpath` |
| 3 | `Killed: 9` (exit 137) | code signature 被破壞 | `codesign --force --sign -` |
| 4 | `Library not loaded: /opt/homebrew/Cellar/.../libcrypto.3.dylib` | libssl.3.dylib 內部也有絕對路徑 | `install_name_tool` 修復 libssl |
| 5 | `no backends are loaded` | 缺少 Metal GPU backend | source build 時需 `-DGGML_METAL=ON` |
| 6 | `couldn't bind HTTP server socket` | 前一個 process 未完全釋放 port | `pkill -9 -f llama-server` 先 |
| 7 | **content 全在 reasoning_content** | Gemma4 預設為 thinking model | `--reasoning off` |
## 5. API 驗證
### 5.1 Health Check
```bash
curl -s http://192.168.110.201:8081/health
# → {"status":"ok"}
```
### 5.2 推理測試(--reasoning off 後)
```bash
curl -s http://192.168.110.201:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-4-31B-it-Q5_K_M.gguf",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'
```
回應OpenAI-compatible:
```json
{
"choices": [{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?",
"reasoning_content": ""
}
}],
"usage": {
"completion_tokens": 100,
"prompt_tokens": 18,
"total_tokens": 118
},
"model": "gemma-4-31B-it-Q5_K_M.gguf",
"object": "chat.completion"
}
```
### 5.3 效能
| 指標 | 實測 |
|------|------|
| Prompt 速度 | 60.8 tok/s |
| 生成速度 | **25.8 tok/s** |
| Prompt 延遲 | 296 ms18 tokens |
| 生成延遲 | 387 ms10 tokens |
## 6. 整合至 OpenCode
`~/.config/opencode/config.json` 中新增 provider
```json
{
"m5-gemma4": {
"npm": "@ai-sdk/openai-compatible",
"name": "M5 Max Gemma 4",
"options": { "baseURL": "http://192.168.110.201:8081/v1" },
"models": {
"gemma-4-31B-it-Q5_K_M.gguf": { "name": "Gemma 4 31B" }
}
}
}
```
預設 model 設為 `"m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf"`。Provider list 確認:
```bash
opencode models m5-gemma4
# → m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf
```
## 7. M5 網路異動記錄
| 時間 | IP | 網路 | 原因 |
|------|-----|------|------|
| 初始 | `10.10.10.10` | bridge (Thunderbolt) | 無外網,需透過 M4 NAT |
| 切換後 | `192.168.110.201` | en0 (WiFi/Ethernet) | 改接同網段,有外網 |
## 8. Rust 安裝for Momentry dev
```bash
curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source $HOME/.cargo/env
```
- rustc 1.95.0
- cargo 1.95.0
- 免 sudo
## 9. 記憶體使用
```
48 GB total
├─ 20 GB Gemma 4 31B Q5_K_M (process RSS ~28 GB)
├─ 4 GB macOS + 系統
└─ 24 GB 剩餘
```
實測啟動後 RSS: `28,325,600 KB` (~28 GB)。
## 10. 維護指令
| 操作 | 指令 |
|------|------|
| 啟動 | `ssh accusys@192.168.110.201 '~/start_llm.sh'` |
| 停止 | `ssh accusys@192.168.110.201 'pkill -9 -f llama-server'` |
| 查看日誌 | `ssh accusys@192.168.110.201 'tail -50 ~/llama.log'` |
| 健康檢查 | `curl http://192.168.110.201:8081/health` |
| 模型檔案 | `~/models/gemma-4-31B-it-Q5_K_M.gguf (20G)` |
| Binary 與 lib | `~/llama/bin/llama-server`, `~/llama/lib/*.dylib` |
| config | `~/.config/opencode/config.json` |
| 監控 | `htop -p $(pgrep llama-server)` |
| 記憶體 | `ps -o rss= -p $(pgrep llama-server)` |
## 11. 已知限制
- **Thinking model**: Gemma4 為 thinking 模型(`--reasoning off` 關閉後 content 正常,但某些場景可能需要 reasoning
- **無 Homebrew**: 非 admin 帳號,無法 `brew install`。Momentry 其他服務PostgreSQL, Redis, MongoDB需用 portable binary 手動安裝
- **無 Xcode CLT**: `install_name_tool`, `codesign` 不可用於 M5。binary 修復需在 M4 完成後 scp