feat: fix Chinese text search and duplicate chunk_id bug
- Add helper functions to extract text from nested content structure - Update SearchResult to include uuid field - Add PostgreSQL function get_chunk_by_chunk_id_and_uuid to handle duplicate chunk_ids - Update Qdrant search functions to extract uuid from payload - Change embedding model to nomic-embed-text-v2-moe:latest - Update Qdrant collection name to momentry_rule1 - Fix MongoDB authentication and disable cache for development - Improve error handling in processor.rs - Update documentation with new embedding model
This commit is contained in:
@@ -119,11 +119,11 @@ cargo run --bin momentry -- chunk <uuid>
|
||||
### Stage 4: 向量化
|
||||
|
||||
```bash
|
||||
# 向量化 chunks
|
||||
# 向量化 chunks(使用預設模型 nomic-embed-text-v2-moe:latest)
|
||||
cargo run --bin momentry -- vectorize <uuid>
|
||||
|
||||
# 指定模型
|
||||
cargo run --bin momentry -- vectorize <uuid> --model sentence-transformers/all-MiniLM-L6-v2
|
||||
# 明確指定模型
|
||||
cargo run --bin momentry -- vectorize <uuid> --model nomic-embed-text-v2-moe:latest
|
||||
```
|
||||
|
||||
---
|
||||
@@ -187,18 +187,27 @@ YOLO: ✓ Already complete, skipping
|
||||
|
||||
## 向量化模型選擇
|
||||
|
||||
### 統一嵌入模型
|
||||
Momentry Core 統一使用 **`nomic-embed-text-v2-moe:latest`** 作為所有規則的嵌入模型:
|
||||
|
||||
```bash
|
||||
# 預設模型
|
||||
--model sentence-transformers/all-MiniLM-L6-v2
|
||||
# 統一模型(所有 Rule 1/2/3 使用)
|
||||
--model nomic-embed-text-v2-moe:latest
|
||||
```
|
||||
|
||||
# 高精度模型
|
||||
--model sentence-transformers/all-mpnet-base-v2
|
||||
### 模型特性
|
||||
| 特性 | 說明 |
|
||||
|------|------|
|
||||
| **模型名稱** | `nomic-embed-text-v2-moe:latest` |
|
||||
| **向量維度** | 768 維 |
|
||||
| **多語言支持** | ✅ 完整支持(英語、中文、日語、韓語等) |
|
||||
| **模型架構** | Mixture of Experts (MoE) |
|
||||
| **推理速度** | 快速,適合實時應用 |
|
||||
|
||||
# 多語言模型
|
||||
--model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
|
||||
|
||||
# 中文模型
|
||||
--model sentence-transformers/paraphrase-multilingual-mpnet-base-v2
|
||||
### 使用方式
|
||||
```bash
|
||||
# 向量化命令
|
||||
cargo run --bin momentry -- vectorize <uuid> --model nomic-embed-text-v2-moe:latest
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user