feat: fix Chinese text search and duplicate chunk_id bug

- Add helper functions to extract text from nested content structure
- Update SearchResult to include uuid field
- Add PostgreSQL function get_chunk_by_chunk_id_and_uuid to handle duplicate chunk_ids
- Update Qdrant search functions to extract uuid from payload
- Change embedding model to nomic-embed-text-v2-moe:latest
- Update Qdrant collection name to momentry_rule1
- Fix MongoDB authentication and disable cache for development
- Improve error handling in processor.rs
- Update documentation with new embedding model
This commit is contained in:
Warren
2026-03-29 04:44:28 +08:00
parent 82955504f3
commit 2393d81a3f
13 changed files with 355 additions and 106 deletions

View File

@@ -18,7 +18,7 @@ MOMENTRY_WORKER_BATCH_SIZE=5
DATABASE_URL=postgres://accusys@localhost:5432/momentry
# MongoDB
MONGODB_URL=mongodb://accusys:Test3200Test3200@localhost:27017/admin
MONGODB_URL=mongodb://localhost:27017
MONGODB_DATABASE=momentry
# Redis
@@ -28,7 +28,7 @@ REDIS_PASSWORD=accusys
# Qdrant Vector Database (same as production)
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=Test3200Test3200Test3200
QDRANT_COLLECTION=chunks_v3
QDRANT_COLLECTION=momentry_rule1
# Paths
MOMENTRY_OUTPUT_DIR=/Users/accusys/momentry/output_dev
@@ -51,7 +51,7 @@ MOMENTRY_CUT_TIMEOUT=3600
MOMENTRY_DEFAULT_TIMEOUT=7200
# Cache Settings
MONGODB_CACHE_ENABLED=true
MONGODB_CACHE_ENABLED=false
MONGODB_CACHE_TTL_VIDEOS=300
MONGODB_CACHE_TTL_SEARCH=300
MONGODB_CACHE_TTL_HYBRID_SEARCH=600