fix: pipeline timeline log, chunk lookup, face processor no fallback, Qdrant UUID script, delete safety rules
This commit is contained in:
71
docs_v1.0/OPERATIONS/SYSTEM_AUDIT_2026-05-17.md
Normal file
71
docs_v1.0/OPERATIONS/SYSTEM_AUDIT_2026-05-17.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# System Audit — 2026-05-17
|
||||
|
||||
## Current State
|
||||
|
||||
### Embedding Storage (三重冗余,無主)
|
||||
|
||||
| 資料類型 | PG pgvector | Qdrant | JSON 檔案 |
|
||||
|---------|------------|--------|-----------|
|
||||
| Sentence 向量 | `chunk.embedding` ✅ | `dev_v1` / `rule1_v2` / `sentence_*` ✅ | ❌ 無 |
|
||||
| Story 向量 | `chunk.embedding` ✅ | `dev_v1` / `dev_stories` ✅ | `.story_llm.json` ✅ |
|
||||
| Face 向量 | ❌ 已清除(依使用者指示) | `dev_faces` ✅ (97K) | `.face.json` ✅ |
|
||||
| Voice 向量 | ❌ 無 | `dev_voice` ✅ (4K) | ❌ 無 |
|
||||
|
||||
### Pipeline 問題
|
||||
|
||||
| 問題 | 影響 |
|
||||
|------|------|
|
||||
| `processor_results.duration_secs` 全為 0 | 無法查各步驟耗時 |
|
||||
| `processor_results.started_at/completed_at` 全 NULL | 時間線遺失 |
|
||||
| Redis timing 在 job 完成後被清掉 | 唯一 timing 來源消失 |
|
||||
| `get_chunk_by_chunk_id_and_uuid` 原本是 stub(已修) | Smart search 找不到 PG chunk |
|
||||
| `server.rs::search()` 未 mount 但仍編譯 | Dead code,混淆 Qdrant 用途 |
|
||||
| Face embedding 只寫 Qdrant 不寫 PG | 已刪除則全失 |
|
||||
|
||||
### Qdrant Collections 現況
|
||||
|
||||
| Collection | Points | 來源 | UUID |
|
||||
|-----------|--------|------|------|
|
||||
| `dev_v1` | 9,936 | PG rebuild | ✅ bd80fec... |
|
||||
| `dev_faces` | 97,000 | face.json rebuild | ✅ bd80fec... |
|
||||
| `dev_stories` | 560 | Snapshot | ✅ bd80fec... |
|
||||
| `dev_voice` | 4,188 | Snapshot | ✅ bd80fec... |
|
||||
| `dev_rule1_v2` | 3,417 | Snapshot | ✅ bd80fec... |
|
||||
| `sentence_story` | 4,188 | Snapshot | ✅ bd80fec... |
|
||||
| `sentence_summary` | 4,188 | Snapshot | ✅ bd80fec... |
|
||||
|
||||
## Safeguards & Fixes
|
||||
|
||||
### P0 — 必須修
|
||||
|
||||
| # | Fix | 做法 |
|
||||
|---|-----|------|
|
||||
| 1 | **Pipeline timing 寫入 DB** | `update_processor_result()` 加入 `started_at`、`completed_at`、`duration_secs` |
|
||||
| 2 | **Qdrant 不當主要儲存** | Embedding 以 PG `chunk.embedding` 為 source of truth,Qdrant 唯讀 cache |
|
||||
| 3 | **Smart search 只走 PG pgvector** | `search_parent_chunks_semantic` 已正確,無需 Qdrant |
|
||||
| 4 | **移除 `server.rs::search()` dead code** | 或 mount 到正式 route 並確認可用 |
|
||||
|
||||
### P1 — 建議修
|
||||
|
||||
| # | Fix | 做法 |
|
||||
|---|-----|------|
|
||||
| 5 | **刪除 Qdrant 前先 snapshot** | 自動 snapshot script |
|
||||
| 6 | **清理多餘 Qdrant collections** | `dev_voice` / `dev_stories` / `dev_rule1_v2` / `sentence_*` 無 server reader,可移除 |
|
||||
| 7 | **Face embedding 寫入 PG 或移除 dead code** | 目前 face Qdrant write 無人讀取,可移除 `sync_face_embeddings` |
|
||||
| 8 | **UUID 一致性檢查** | 同一 content 不應產生不同 UUID |
|
||||
|
||||
### P2 — 可選
|
||||
|
||||
| # | Fix | 做法 |
|
||||
|---|-----|------|
|
||||
| 9 | `chunk_selector.rs` (player binary)hardcode `momentry_rule1` | 改讀 env var 或 PG |
|
||||
| 10 | AGENTS.md 已加入 delete 安全規則 | ✅ Done |
|
||||
|
||||
## Data Recovery Path
|
||||
|
||||
| 資料來源 | 可恢復到 | 方法 |
|
||||
|---------|---------|------|
|
||||
| `chunk.embedding` (PG) | Qdrant `dev_v1` | SQL → Qdrant upsert |
|
||||
| `face.json` (磁碟) | Qdrant `dev_faces` | Python script |
|
||||
| `story_llm.json` (磁碟) | Qdrant `dev_stories` | Python script |
|
||||
| Qdrant snapshots (phase1) | Qdrant collections | Snapshot upload API |
|
||||
Reference in New Issue
Block a user