Add DB/vector sync guide for M4
- PostgreSQL dump (890MB) ready at /tmp/momentry_3abeee81.sql - Qdrant face vectors (4873 points, 512D) available - Text vectors pending (5W1H+ in progress, ~9h) - Output JSON files ready for rsync
This commit is contained in:
162
docs_v1.0/M5_workspace/2026-05-07_db_vector_sync_guide.md
Normal file
162
docs_v1.0/M5_workspace/2026-05-07_db_vector_sync_guide.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# M5 → M4 Database & Vector Sync 指南
|
||||
|
||||
## 現狀
|
||||
|
||||
M5 有完整的 Charade 處理結果(Job 255),M4 需要同步資料才能開發 Portal search。
|
||||
|
||||
## 需要同步的資料
|
||||
|
||||
| 資料 | 位置 | 大小 | 同步方式 |
|
||||
|------|------|------|---------|
|
||||
| PostgreSQL (dev schema) | M5:5432 | ~500MB | pg_dump / pg_restore |
|
||||
| Qdrant vectors | M5:6333 | ~50MB | curl API 匯出/匯入 |
|
||||
| Output JSON | M5 檔案系統 | ~2GB | rsync |
|
||||
| 原始影片 | M5 檔案系統 | ~2GB | rsync(可選) |
|
||||
|
||||
## 方法一:完整 DB dump(首次設定)
|
||||
|
||||
```bash
|
||||
# M5 上匯出
|
||||
pg_dump -U accusys -d momentry --schema=dev --data-only -f /tmp/momentry_dev.sql
|
||||
|
||||
# 傳到 M4
|
||||
scp /tmp/momentry_dev.sql accusys@192.168.110.200:/tmp/
|
||||
|
||||
# M4 上匯入
|
||||
psql -U accusys -d momentry -c "DROP SCHEMA IF EXISTS dev CASCADE; CREATE SCHEMA dev;"
|
||||
psql -U accusys -d momentry -f /tmp/momentry_dev.sql
|
||||
```
|
||||
|
||||
**注意:** 僅首次需要完整 dump。後續只需增量更新。
|
||||
|
||||
## 方法二:增量 sync(日常使用)
|
||||
|
||||
### PostgreSQL(僅 chunks + face_detections)
|
||||
|
||||
```bash
|
||||
# M5 匯出增量資料
|
||||
pg_dump -U accusys -d momentry \
|
||||
--schema=dev \
|
||||
--data-only \
|
||||
--table=dev.chunks \
|
||||
--table=dev.face_detections \
|
||||
--table=dev.identities \
|
||||
--table=dev.identity_bindings \
|
||||
--table=dev.file_identities \
|
||||
--table=dev.processor_results \
|
||||
--table=dev.pre_chunks \
|
||||
-f /tmp/momentry_incr.sql
|
||||
|
||||
# scp → M4 → psql 匯入
|
||||
```
|
||||
|
||||
### Qdrant Vectors
|
||||
|
||||
```bash
|
||||
# M5 匯出 collection
|
||||
curl -s "http://localhost:6333/collections/momentry_dev_rule1/points/scroll" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"limit":10000}' > /tmp/qdrant_rule1.json
|
||||
|
||||
# 傳到 M4
|
||||
scp /tmp/qdrant_rule1.json accusys@192.168.110.200:/tmp/
|
||||
|
||||
# M4 匯入
|
||||
curl -s -X PUT "http://localhost:6333/collections/momentry_dev_rule1" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"vectors":{"size":768,"distance":"Cosine"}}'
|
||||
|
||||
curl -s -X POST "http://localhost:6333/collections/momentry_dev_rule1/points" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d @/tmp/qdrant_rule1.json
|
||||
```
|
||||
|
||||
### Output JSON
|
||||
|
||||
```bash
|
||||
rsync -av --include="*/" --include="*.json" --exclude="*" \
|
||||
/Users/accusys/momentry/output_dev/3abeee81d94597629ed8cb943f182e94/ \
|
||||
accusys@192.168.110.200:/Users/accusys/momentry/output/
|
||||
```
|
||||
|
||||
## 方法三:自動化 sync script
|
||||
|
||||
建立 `scripts/sync_to_m4.sh`(M5 執行):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# M5 → M4 sync script
|
||||
M4_SSH="accusys@192.168.110.200"
|
||||
FILE_UUID="3abeee81d94597629ed8cb943f182e94"
|
||||
|
||||
# 1. DB dump
|
||||
echo "=== Dumping DB ==="
|
||||
pg_dump -U accusys -d momentry --schema=dev --data-only \
|
||||
--table=dev.chunks --table=dev.face_detections \
|
||||
--table=dev.identities --table=dev.identity_bindings \
|
||||
--table=dev.file_identities \
|
||||
-f /tmp/momentry_sync.sql
|
||||
|
||||
# 2. Qdrant dump
|
||||
echo "=== Dumping Qdrant ==="
|
||||
curl -s "http://localhost:6333/collections/momentry_dev_rule1/points/scroll" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"limit":10000}' > /tmp/qdrant_sync.json
|
||||
|
||||
# 3. Output JSON
|
||||
echo "=== Syncing files ==="
|
||||
rsync -av /Users/accusys/momentry/output_dev/${FILE_UUID}/ \
|
||||
${M4_SSH}:/Users/accusys/momentry/output/
|
||||
|
||||
# 4. Transfer DB + Qdrant
|
||||
echo "=== Transferring ==="
|
||||
scp /tmp/momentry_sync.sql ${M4_SSH}:/tmp/
|
||||
scp /tmp/qdrant_sync.json ${M4_SSH}:/tmp/
|
||||
|
||||
echo "=== M4 上執行以下指令 ==="
|
||||
echo ""
|
||||
echo "# M4:"
|
||||
echo "psql -U accusys -d momentry -f /tmp/momentry_sync.sql"
|
||||
echo "curl -s -X PUT http://localhost:6333/collections/momentry_dev_rule1 -H 'Content-Type: application/json' -d '{\"vectors\":{\"size\":768,\"distance\":\"Cosine\"}}'"
|
||||
echo "curl -s -X POST http://localhost:6333/collections/momentry_dev_rule1/points -H 'Content-Type: application/json' -d @/tmp/qdrant_sync.json"
|
||||
```
|
||||
|
||||
## 目前可 sync 的資料(2026-05-07 凌晨)
|
||||
|
||||
| 項目 | 狀況 | 大小 |
|
||||
|------|------|------|
|
||||
| PostgreSQL dump | ✅ 已準備 | 890MB |
|
||||
| Qdrant face vectors | ✅ 4873 points (512D) | ~50MB |
|
||||
| Qdrant text vectors | ⏳ 等待 5W1H+ 完成(~9h) | 0 points |
|
||||
| Output JSON | ✅ 已就緒 | ~2GB |
|
||||
| 原始影片 | ✅ 已就緒 | ~2GB |
|
||||
|
||||
**5W1H+ 完成後**再做一次完整 sync,屆時 text vectors 也會就位。
|
||||
|
||||
## 傳輸指令
|
||||
|
||||
```bash
|
||||
# M5 上執行
|
||||
scp /tmp/momentry_3abeee81.sql accusys@192.168.110.200:/tmp/
|
||||
rsync -av /Users/accusys/momentry/output_dev/ \
|
||||
accusys@192.168.110.200:/Users/accusys/momentry/output/
|
||||
```
|
||||
|
||||
## Portal 驗證
|
||||
|
||||
```bash
|
||||
# 確認資料
|
||||
curl -s http://localhost:3003/api/v1/file/${FILE_UUID}/face_trace/sortby \
|
||||
-H "X-API-Key: muser_test_apikey" \
|
||||
-d '{"limit":1}'
|
||||
|
||||
# 確認 search
|
||||
curl -s "http://localhost:3003/api/v1/search?q=Audrey+Hepburn&file_uuid=${FILE_UUID}" \
|
||||
-H "X-API-Key: muser_test_apikey"
|
||||
```
|
||||
|
||||
## 注意事項
|
||||
|
||||
- M5 上的 pipeline 完成後(5W1H+ → vectorize),需要再做一次 sync
|
||||
- Qdrant collection 需先在 M4 上建立(768D Cosine),否則匯入會失敗
|
||||
- PostgreSQL schema 名稱 M5 用 `dev`,M4 保持一樣即可
|
||||
Reference in New Issue
Block a user