核心功能: - ✅ Categories/Series双视图管理(category_view.rs + import_markdown.rs) - ✅ FUSE Multi-Volume支持(tree_type参数) - ✅ SSH/SFTP/SCP/rsync协议完整实现(4042行) - ✅ NFS/SMB Module Phase 1-3完成 - ✅ Archive Module Phase 1-4完成(2916行) - ✅ Download Center API完整实现 - ✅ S3兼容API实现(560行) Git配置修正: - ✅ 删除错误origin(gitea.momentry.ddns.net) - ✅ 删除m5max128(指向机器名) - ✅ 设置origin = m5max128gitea.momentry.ddns.net/admin/markbase - ✅ 设置m4minigitea = m4minigitea.momentry.ddns.net/warren/markbase 数据清理: - ✅ 删除38个临时SQLite(保留accusys.sqlite、demo.sqlite) - ✅ 删除.bak、test_*.bin、调试脚本等临时文件 - ✅ 删除临时目录(build/、download files/、raid_test/等) - ✅ 更新.gitignore排除临时文件 架构优化: - 52个文件修改,2434行新增,4739行删除 - Workspace成员整合(16个crate) - 数据库状态:accusys.sqlite保留(主demo测试) 远程同步: - ✅ 准备推送到m5max128gitea(远程Gitea) - ✅ 准备推送到m4minigitea(本地Gitea)
1124 lines
31 KiB
Markdown
1124 lines
31 KiB
Markdown
# SQLite + Sled 混合架构设计文档
|
||
|
||
**设计日期:** 2026-05-29
|
||
**设计目标:** 保留 SQLite SQL优势 + 利用 Sled 性能优势
|
||
|
||
---
|
||
|
||
## 一、架构概述
|
||
|
||
### 1.1 设计原则
|
||
|
||
**核心原则:**
|
||
1. **SQL查询保留 SQLite** - 复杂查询、JOIN、WHERE
|
||
2. **KV操作使用 Sled** - 高并发写入、简单查询
|
||
3. **数据一致性保证** - 双写同步机制
|
||
4. **渐进式迁移** - 可回滚、可测试
|
||
|
||
### 1.2 分层架构图
|
||
|
||
```
|
||
MarkBase Hybrid Database Architecture:
|
||
┌─────────────────────────────────────────────┐
|
||
│ Application Layer (MarkBase API) │
|
||
│ - REST API endpoints │
|
||
│ - Web UI interactions │
|
||
│ - FUSE file operations │
|
||
└─────────────────────────────────────────────┘
|
||
↓ (request routing)
|
||
┌─────────────────────────────────────────────┐
|
||
│ Data Routing Layer (HybridRouter) │ ← 核心设计
|
||
│ - Query type detection │
|
||
│ - Database selection │
|
||
│ - Request forwarding │
|
||
└─────────────────────────────────────────────┘
|
||
↓ (parallel operations)
|
||
┌─────────────────────────┬───────────────────┐
|
||
│ Metadata Layer (SQLite)│ KV Layer (Sled) │
|
||
│ │ │
|
||
│ SQL查询优势: │ 性能优势: │
|
||
│ - file_nodes (CRUD) │ - file_content │
|
||
│ - file_registry │ hash → path │
|
||
│ - file_locations │ - hot_files_cache │
|
||
│ - user_auth │ - metadata_cache │
|
||
│ - sync_log │ - import_queue │
|
||
│ │ │
|
||
│ 特性: │ 特性: │
|
||
│ ✅ SQL查询 │ ✅ 并发写入 │
|
||
│ ✅ JOIN支持 │ ✅ MVCC无锁 │
|
||
│ ✅ WHERE过滤 │ ✅ 高吞吐量 │
|
||
│ ✅ 调试工具 │ ✅ 纯Rust实现 │
|
||
│ │ │
|
||
│ 性能: │ 性能: │
|
||
│ ⭐ 查询延迟<1ms │ ⭐ 导入163K/sec │
|
||
│ ⭐ 空间效率12.33MB │ ⭐ 并发5.22M/sec │
|
||
│ │ │
|
||
│ 适用场景: │ 适用场景: │
|
||
│ - parent_id查询 │ - 批量导入 │
|
||
│ - file_uuid JOIN │ - FUSE hot path │
|
||
│ - WHERE过滤 │ - 并发写入 │
|
||
│ - 复杂查询 │ - 简单KV存储 │
|
||
└─────────────────────────┴───────────────────┘
|
||
↓ (data synchronization)
|
||
┌─────────────────────────────────────────────┐
|
||
│ Data Sync Layer (SyncManager) │ ← 数据一致性
|
||
│ - Dual-write mechanism │
|
||
│ - Cache invalidation │
|
||
│ - Consistency checks │
|
||
└─────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 二、数据分层策略
|
||
|
||
### 2.1 Metadata Layer (SQLite)
|
||
|
||
**保留在 SQLite 的数据表:**
|
||
|
||
#### file_nodes 表(核心元数据)
|
||
|
||
**设计决策:** ✅ 保留 SQLite
|
||
|
||
**理由:**
|
||
1. **需要复杂查询**
|
||
- `WHERE parent_id = ?` - 父子关系查询
|
||
- `WHERE sha256 = ?` - Hash查询
|
||
- `WHERE node_type = ?` - 类型过滤
|
||
- `ORDER BY sort_order` - 排序查询
|
||
|
||
2. **需要 JOIN 查询**
|
||
- `file_nodes JOIN file_locations ON file_uuid`
|
||
- `file_nodes JOIN file_registry ON file_uuid`
|
||
|
||
3. **SQL优化成熟**
|
||
- 索引效率高
|
||
- 查询优化器成熟
|
||
- 复杂查询支持
|
||
|
||
**数据字段:**
|
||
```sql
|
||
CREATE TABLE file_nodes (
|
||
node_id TEXT PRIMARY KEY,
|
||
label TEXT NOT NULL,
|
||
aliases_json TEXT,
|
||
file_uuid TEXT,
|
||
sha256 TEXT,
|
||
parent_id TEXT,
|
||
children_json TEXT,
|
||
node_type TEXT,
|
||
icon TEXT,
|
||
color TEXT,
|
||
bg_color TEXT,
|
||
file_size INTEGER,
|
||
registered_at TEXT,
|
||
created_at TEXT,
|
||
updated_at TEXT,
|
||
sort_order INTEGER
|
||
);
|
||
|
||
-- 索引
|
||
CREATE INDEX idx_parent_id ON file_nodes(parent_id);
|
||
CREATE INDEX idx_sha256 ON file_nodes(sha256);
|
||
CREATE INDEX idx_file_uuid ON file_nodes(file_uuid);
|
||
CREATE INDEX idx_node_type ON file_nodes(node_type);
|
||
```
|
||
|
||
#### file_registry 表(文件注册)
|
||
|
||
**设计决策:** ✅ 保留 SQLite
|
||
|
||
**理由:**
|
||
1. **需要 JOIN 查询**
|
||
- `file_nodes JOIN file_registry ON file_uuid`
|
||
|
||
2. **需要 WHERE 查询**
|
||
- `WHERE sha256 = ?`
|
||
- `WHERE file_size > ?`
|
||
|
||
**数据字段:**
|
||
```sql
|
||
CREATE TABLE file_registry (
|
||
file_uuid TEXT PRIMARY KEY,
|
||
original_name TEXT NOT NULL,
|
||
file_size INTEGER,
|
||
file_type TEXT,
|
||
registered_at TEXT,
|
||
last_seen_at TEXT,
|
||
status TEXT
|
||
);
|
||
|
||
-- 索引
|
||
CREATE INDEX idx_registry_sha256 ON file_registry(sha256);
|
||
CREATE INDEX idx_registry_size ON file_registry(file_size);
|
||
```
|
||
|
||
#### file_locations 表(位置追踪)
|
||
|
||
**设计决策:** ✅ 保留 SQLite
|
||
|
||
**理由:**
|
||
1. **需要复杂 JOIN**
|
||
- `file_nodes JOIN file_locations ON file_uuid`
|
||
|
||
2. **需要 WHERE 查询**
|
||
- `WHERE storage_tier = ?`
|
||
- `WHERE is_primary = 1`
|
||
|
||
**数据字段:**
|
||
```sql
|
||
CREATE TABLE file_locations (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
file_uuid TEXT NOT NULL,
|
||
storage_tier TEXT NOT NULL,
|
||
storage_path TEXT NOT NULL,
|
||
is_primary INTEGER DEFAULT 0,
|
||
created_at TEXT,
|
||
FOREIGN KEY (file_uuid) REFERENCES file_registry(file_uuid)
|
||
);
|
||
|
||
-- 索引
|
||
CREATE INDEX idx_loc_file_uuid ON file_locations(file_uuid);
|
||
CREATE INDEX idx_loc_tier ON file_locations(storage_tier);
|
||
```
|
||
|
||
#### user_auth 表(用户认证)
|
||
|
||
**设计决策:** ✅ 保留 SQLite
|
||
|
||
**理由:**
|
||
1. **成熟认证方案**
|
||
- bcrypt密码加密
|
||
- Session管理
|
||
|
||
2. **复杂查询需求**
|
||
- `WHERE username = ?`
|
||
- `WHERE token = ?`
|
||
|
||
**数据字段:**
|
||
```sql
|
||
CREATE TABLE user_auth (
|
||
username TEXT PRIMARY KEY,
|
||
password_hash TEXT NOT NULL,
|
||
token TEXT,
|
||
expires_at TEXT,
|
||
groups TEXT,
|
||
permissions TEXT,
|
||
created_at TEXT
|
||
);
|
||
|
||
-- 索引
|
||
CREATE INDEX idx_auth_token ON user_auth(token);
|
||
```
|
||
|
||
#### sync_log 表(同步日志)
|
||
|
||
**设计决策:** ✅ 保留 SQLite
|
||
|
||
**理由:**
|
||
1. **需要时间查询**
|
||
- `WHERE sync_time > ?`
|
||
- `ORDER BY sync_time DESC`
|
||
|
||
**数据字段:**
|
||
```sql
|
||
CREATE TABLE sync_log (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
sync_type TEXT NOT NULL,
|
||
sync_time INTEGER,
|
||
users_synced INTEGER,
|
||
groups_synced INTEGER,
|
||
status TEXT,
|
||
error_message TEXT
|
||
);
|
||
|
||
-- 索引
|
||
CREATE INDEX idx_sync_time ON sync_log(sync_time);
|
||
```
|
||
|
||
---
|
||
|
||
### 2.2 KV Layer (Sled)
|
||
|
||
**迁移到 Sled 的数据:**
|
||
|
||
#### file_content_hash Tree(内容Hash索引)
|
||
|
||
**设计决策:** ✅ 迁移到 Sled
|
||
|
||
**理由:**
|
||
1. **简单KV查询**
|
||
- `sha256 → storage_path`
|
||
- 无需JOIN
|
||
|
||
2. **高并发写入需求**
|
||
- 批量导入时频繁更新
|
||
- 多users并发写入
|
||
|
||
3. **FUSE hot path**
|
||
- FUSE读取时快速查询
|
||
- 缓存热点数据
|
||
|
||
**数据结构:**
|
||
```rust
|
||
// Tree: file_content_hash
|
||
// Key: sha256 (String)
|
||
// Value: storage_path (String)
|
||
|
||
#[derive(Serialize, Deserialize)]
|
||
pub struct ContentLocation {
|
||
pub storage_path: String,
|
||
pub storage_tier: String,
|
||
pub file_size: u64,
|
||
pub mime_type: String,
|
||
pub last_accessed: String,
|
||
pub access_count: u32,
|
||
}
|
||
|
||
// API
|
||
let tree = db.open_tree("file_content_hash")?;
|
||
tree.insert(sha256.as_bytes(), serde_json::to_vec(&location)?)?;
|
||
let location = tree.get(sha256.as_bytes())?;
|
||
```
|
||
|
||
**性能优势:**
|
||
- 导入吞吐:163K/sec
|
||
- 查询延迟:1429ns
|
||
- 并发写入:多writer支持
|
||
|
||
#### hot_files_cache Tree(热点文件缓存)
|
||
|
||
**设计决策:** ✅ 新增 Sled Tree
|
||
|
||
**理由:**
|
||
1. **FUSE hot path**
|
||
- FUSE读取频繁访问的文件
|
||
- 快速查询热点数据
|
||
|
||
2. **LRU缓存机制**
|
||
- 自动淘汰冷数据
|
||
- 保持热点数据
|
||
|
||
3. **纯KV查询**
|
||
- `node_id → hot_flag`
|
||
- 无复杂查询
|
||
|
||
**数据结构:**
|
||
```rust
|
||
// Tree: hot_files_cache
|
||
// Key: node_id (String)
|
||
// Value: hot_flag (u8)
|
||
|
||
#[derive(Serialize, Deserialize)]
|
||
pub struct HotFileCache {
|
||
pub node_id: String,
|
||
pub access_count: u32,
|
||
pub last_accessed: String,
|
||
pub hot_level: u8, // 0-5 hot level
|
||
pub cache_priority: u32,
|
||
}
|
||
|
||
// API
|
||
let tree = db.open_tree("hot_files_cache")?;
|
||
tree.insert(node_id.as_bytes(), serde_json::to_vec(&cache)?)?;
|
||
|
||
// LRU淘汰机制
|
||
if tree.len() > MAX_CACHE_SIZE {
|
||
// 淘汰冷数据
|
||
for item in tree.iter() {
|
||
let (_, value) = item?;
|
||
let cache: HotFileCache = serde_json::from_slice(&value)?;
|
||
if cache.hot_level == 0 {
|
||
tree.remove(node_id.as_bytes())?;
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
#### metadata_cache Tree(元数据缓存)
|
||
|
||
**设计决策:** ✅ 新增 Sled Tree
|
||
|
||
**理由:**
|
||
1. **快速查询加速**
|
||
- 缓存常用元数据
|
||
- 减少SQLite查询
|
||
|
||
2. **MVCC无锁读取**
|
||
- 并发读取无阻塞
|
||
- 提升查询吞吐
|
||
|
||
**数据结构:**
|
||
```rust
|
||
// Tree: metadata_cache
|
||
// Key: node_id (String)
|
||
// Value: cached_metadata (FileNode JSON)
|
||
|
||
#[derive(Serialize, Deserialize)]
|
||
pub struct CachedMetadata {
|
||
pub node_id: String,
|
||
pub label: String,
|
||
pub parent_id: Option<String>,
|
||
pub node_type: NodeType,
|
||
pub file_size: Option<i64>,
|
||
pub sha256: Option<String>,
|
||
pub cached_at: String,
|
||
pub ttl: u32, // Time to live (seconds)
|
||
}
|
||
|
||
// API
|
||
let tree = db.open_tree("metadata_cache")?;
|
||
|
||
// 查询流程:
|
||
// 1. Check Sled cache
|
||
// 2. If not found, query SQLite
|
||
// 3. Update Sled cache
|
||
|
||
if let Some(cache) = tree.get(node_id.as_bytes())? {
|
||
let meta: CachedMetadata = serde_json::from_slice(&cache)?;
|
||
if meta.ttl > 0 {
|
||
return Ok(Some(meta)); // Cache hit
|
||
}
|
||
}
|
||
|
||
// Cache miss, query SQLite
|
||
let node = sqlite_query(node_id)?;
|
||
|
||
// Update cache
|
||
let cache = CachedMetadata::from_node(&node);
|
||
tree.insert(node_id.as_bytes(), serde_json::to_vec(&cache)?);
|
||
```
|
||
|
||
#### import_queue Tree(导入队列)
|
||
|
||
**设计决策:** ✅ 新增 Sled Tree
|
||
|
||
**理由:**
|
||
1. **高并发写入**
|
||
- 批量导入时队列管理
|
||
- 多users并发导入
|
||
|
||
2. **队列管理简单**
|
||
- `job_id → job_status`
|
||
- 无复杂查询
|
||
|
||
3. **导入吞吐优化**
|
||
- Sled导入吞吐163K/sec
|
||
- 比SQLite快11.42倍
|
||
|
||
**数据结构:**
|
||
```rust
|
||
// Tree: import_queue
|
||
// Key: job_id (String)
|
||
// Value: job_status (JSON)
|
||
|
||
#[derive(Serialize, Deserialize)]
|
||
pub struct ImportJob {
|
||
pub job_id: String,
|
||
pub user_id: String,
|
||
pub status: JobStatus,
|
||
pub total_files: u32,
|
||
pub imported_files: u32,
|
||
pub failed_files: u32,
|
||
pub started_at: String,
|
||
pub completed_at: Option<String>,
|
||
pub error_message: Option<String>,
|
||
}
|
||
|
||
#[derive(Serialize, Deserialize)]
|
||
pub enum JobStatus {
|
||
Pending,
|
||
Running,
|
||
Completed,
|
||
Failed,
|
||
}
|
||
|
||
// API
|
||
let tree = db.open_tree("import_queue")?;
|
||
|
||
// 创建导入任务
|
||
let job = ImportJob::new(user_id, total_files);
|
||
tree.insert(job.job_id.as_bytes(), serde_json::to_vec(&job)?)?;
|
||
|
||
// 更新进度
|
||
job.imported_files += 1;
|
||
tree.insert(job.job_id.as_bytes(), serde_json::to_vec(&job)?)?;
|
||
|
||
// 查询任务状态
|
||
let job_data = tree.get(job_id.as_bytes())?;
|
||
let job: ImportJob = serde_json::from_slice(&job_data)?;
|
||
```
|
||
|
||
---
|
||
|
||
## 三、数据路由层设计
|
||
|
||
### 3.1 HybridRouter 核心设计
|
||
|
||
**路由策略:**
|
||
|
||
```rust
|
||
pub struct HybridRouter {
|
||
sqlite_conn: Connection,
|
||
sled_db: sled::Db,
|
||
}
|
||
|
||
impl HybridRouter {
|
||
pub fn route_query(&self, query_type: QueryType) -> DatabaseType {
|
||
match query_type {
|
||
// SQL查询 → SQLite
|
||
QueryType::ParentChildren => DatabaseType::SQLite,
|
||
QueryType::FileUuidJoin => DatabaseType::SQLite,
|
||
QueryType::WhereFilter => DatabaseType::SQLite,
|
||
QueryType::ComplexJoin => DatabaseType::SQLite,
|
||
QueryType::OrderBySort => DatabaseType::SQLite,
|
||
|
||
// KV查询 → Sled
|
||
QueryType::ContentHashLookup => DatabaseType::Sled,
|
||
QueryType::HotFileCache => DatabaseType::Sled,
|
||
QueryType::MetadataCache => DatabaseType::Sled,
|
||
QueryType::ImportQueue => DatabaseType::Sled,
|
||
|
||
// 混合查询 → 优先Sled缓存
|
||
QueryType::NodeLookup => DatabaseType::Hybrid,
|
||
}
|
||
}
|
||
|
||
pub fn get_node(&self, node_id: &str) -> Result<Option<FileNode>> {
|
||
// 混合查询策略:
|
||
// 1. Check Sled cache first (fast)
|
||
// 2. If not found, query SQLite (slow)
|
||
// 3. Update Sled cache
|
||
|
||
// Step 1: Check Sled cache
|
||
let cache_tree = self.sled_db.open_tree("metadata_cache")?;
|
||
if let Some(cache_data) = cache_tree.get(node_id.as_bytes())? {
|
||
let cache: CachedMetadata = serde_json::from_slice(&cache_data)?;
|
||
if cache.ttl > 0 {
|
||
// Cache hit, return fast
|
||
return Ok(Some(cache.to_file_node()));
|
||
}
|
||
}
|
||
|
||
// Step 2: Query SQLite
|
||
let node = self.sqlite_query_node(node_id)?;
|
||
|
||
// Step 3: Update Sled cache
|
||
if let Some(n) = &node {
|
||
let cache = CachedMetadata::from_node(n);
|
||
cache_tree.insert(node_id.as_bytes(), serde_json::to_vec(&cache)?)?;
|
||
}
|
||
|
||
Ok(node)
|
||
}
|
||
|
||
pub fn get_children(&self, parent_id: &str) -> Result<Vec<FileNode>> {
|
||
// SQL查询 → SQLite
|
||
let mut stmt = self.sqlite_conn.prepare(
|
||
"SELECT * FROM file_nodes WHERE parent_id = ? ORDER BY sort_order"
|
||
)?;
|
||
|
||
let nodes = stmt.query_map([parent_id], |row| {
|
||
Ok(FileNode::from_row(row))
|
||
})?;
|
||
|
||
Ok(nodes.collect::<Result<Vec<_>, _>>()?)
|
||
}
|
||
|
||
pub fn import_batch(&self, nodes: &[FileNode]) -> Result<()> {
|
||
// 高并发写入 → Sled
|
||
let tree = self.sled_db.open_tree("import_queue")?;
|
||
|
||
// Sled批量导入
|
||
let sled_nodes = self.sled_db.open_tree("file_nodes_temp")?;
|
||
sled_nodes.insert_batch(nodes)?;
|
||
|
||
// 异步同步到SQLite
|
||
self.sync_to_sqlite_async(nodes)?;
|
||
|
||
Ok(())
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3.2 查询类型定义
|
||
|
||
```rust
|
||
pub enum QueryType {
|
||
// SQL查询类型(SQLite优先)
|
||
ParentChildren, // WHERE parent_id = ?
|
||
FileUuidJoin, // JOIN file_locations
|
||
WhereFilter, // WHERE node_type = ?
|
||
ComplexJoin, // 多表JOIN
|
||
OrderBySort, // ORDER BY sort_order
|
||
|
||
// KV查询类型(Sled优先)
|
||
ContentHashLookup, // sha256 → path
|
||
HotFileCache, // node_id → hot_flag
|
||
MetadataCache, // node_id → metadata
|
||
ImportQueue, // job_id → status
|
||
|
||
// 混合查询类型(Sled缓存 + SQLite fallback)
|
||
NodeLookup, // node_id查询(优先缓存)
|
||
}
|
||
|
||
pub enum DatabaseType {
|
||
SQLite, // 使用SQLite
|
||
Sled, // 使用Sled
|
||
Hybrid, // 混合策略(优先Sled缓存)
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 四、数据同步机制设计
|
||
|
||
### 4.1 双写同步机制
|
||
|
||
**同步策略:**
|
||
|
||
```rust
|
||
pub struct SyncManager {
|
||
sqlite_conn: Connection,
|
||
sled_db: sled::Db,
|
||
}
|
||
|
||
impl SyncManager {
|
||
pub fn dual_write_node(&self, node: &FileNode) -> Result<()> {
|
||
// 双写机制:
|
||
// 1. 先写SQLite(持久化保证)
|
||
// 2. 再写Sled(缓存加速)
|
||
|
||
// Step 1: Write to SQLite
|
||
self.insert_node_sqlite(node)?;
|
||
|
||
// Step 2: Write to Sled cache
|
||
let cache = CachedMetadata::from_node(node);
|
||
let cache_tree = self.sled_db.open_tree("metadata_cache")?;
|
||
cache_tree.insert(node.node_id.as_bytes(), serde_json::to_vec(&cache)?)?;
|
||
|
||
Ok(())
|
||
}
|
||
|
||
pub fn sync_batch_to_sqlite(&self, nodes: &[FileNode]) -> Result<()> {
|
||
// 批量同步:
|
||
// 1. Sled快速导入(163K/sec)
|
||
// 2. 异步同步到SQLite(14K/sec)
|
||
|
||
// Step 1: Fast import to Sled
|
||
let sled_nodes = self.sled_db.open_tree("file_nodes_temp")?;
|
||
sled_nodes.insert_batch(nodes)?;
|
||
|
||
// Step 2: Async sync to SQLite
|
||
self.spawn_sync_task(nodes)?;
|
||
|
||
Ok(())
|
||
}
|
||
|
||
pub fn consistency_check(&self) -> Result<ConsistencyReport> {
|
||
// 一致性检查:
|
||
// 1. 比对SQLite和Sled数据
|
||
// 2. 发现不一致数据
|
||
// 3. 自动修复
|
||
|
||
let sqlite_nodes = self.load_all_sqlite()?;
|
||
let sled_cache = self.load_all_sled_cache()?;
|
||
|
||
let mut report = ConsistencyReport::default();
|
||
|
||
for node in &sqlite_nodes {
|
||
if let Some(cache) = sled_cache.get(&node.node_id) {
|
||
if cache.sha256 != node.sha256 {
|
||
report.inconsistencies.push(Inconsistency {
|
||
node_id: node.node_id.clone(),
|
||
field: "sha256",
|
||
sqlite_value: node.sha256.clone(),
|
||
sled_value: cache.sha256.clone(),
|
||
});
|
||
|
||
// Auto-fix: Update Sled cache
|
||
self.update_sled_cache(&node)?;
|
||
}
|
||
} else {
|
||
// Cache missing, add to Sled
|
||
self.add_to_sled_cache(&node)?;
|
||
report.cache_misses += 1;
|
||
}
|
||
}
|
||
|
||
Ok(report)
|
||
}
|
||
}
|
||
|
||
pub struct ConsistencyReport {
|
||
pub inconsistencies: Vec<Inconsistency>,
|
||
pub cache_misses: u32,
|
||
pub auto_fixed: u32,
|
||
}
|
||
|
||
pub struct Inconsistency {
|
||
pub node_id: String,
|
||
pub field: String,
|
||
pub sqlite_value: Option<String>,
|
||
pub sled_value: Option<String>,
|
||
}
|
||
```
|
||
|
||
### 4.2 缓存失效机制
|
||
|
||
```rust
|
||
pub struct CacheInvalidator {
|
||
sled_db: sled::Db,
|
||
}
|
||
|
||
impl CacheInvalidator {
|
||
pub fn invalidate_node(&self, node_id: &str) -> Result<()> {
|
||
// 缓存失效触发条件:
|
||
// 1. Node更新
|
||
// 2. Node删除
|
||
// 3. TTL过期
|
||
|
||
let cache_tree = self.sled_db.open_tree("metadata_cache")?;
|
||
cache_tree.remove(node_id.as_bytes())?;
|
||
|
||
Ok(())
|
||
}
|
||
|
||
pub fn invalidate_children(&self, parent_id: &str) -> Result<()> {
|
||
// 批量失效子节点缓存
|
||
let cache_tree = self.sled_db.open_tree("metadata_cache")?;
|
||
|
||
for item in cache_tree.iter() {
|
||
let (key, value) = item?;
|
||
let cache: CachedMetadata = serde_json::from_slice(&value)?;
|
||
|
||
if cache.parent_id == Some(parent_id.to_string()) {
|
||
cache_tree.remove(key)?;
|
||
}
|
||
}
|
||
|
||
Ok(())
|
||
}
|
||
|
||
pub fn ttl_cleanup(&self) -> Result<u32> {
|
||
// TTL过期清理
|
||
let cache_tree = self.sled_db.open_tree("metadata_cache")?;
|
||
let mut cleaned = 0;
|
||
|
||
for item in cache_tree.iter() {
|
||
let (key, value) = item?;
|
||
let cache: CachedMetadata = serde_json::from_slice(&value)?;
|
||
|
||
if cache.ttl == 0 {
|
||
cache_tree.remove(key)?;
|
||
cleaned += 1;
|
||
}
|
||
}
|
||
|
||
Ok(cleaned)
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 五、API设计
|
||
|
||
### 5.1 HybridAPI 设计
|
||
|
||
```rust
|
||
pub struct HybridAPI {
|
||
router: HybridRouter,
|
||
sync_manager: SyncManager,
|
||
}
|
||
|
||
impl HybridAPI {
|
||
// === Metadata API (SQLite优先) ===
|
||
|
||
pub fn get_node(&self, node_id: &str) -> Result<Option<FileNode>> {
|
||
// 混合策略:Sled缓存 + SQLite fallback
|
||
self.router.get_node(node_id)
|
||
}
|
||
|
||
pub fn get_children(&self, parent_id: &str) -> Result<Vec<FileNode>> {
|
||
// SQL查询 → SQLite
|
||
self.router.get_children(parent_id)
|
||
}
|
||
|
||
pub fn search_nodes(&self, query: &str) -> Result<Vec<FileNode>> {
|
||
// WHERE查询 → SQLite
|
||
let mut stmt = self.router.sqlite_conn.prepare(
|
||
"SELECT * FROM file_nodes WHERE label LIKE ? ORDER BY sort_order"
|
||
)?;
|
||
|
||
let nodes = stmt.query_map([query], |row| {
|
||
Ok(FileNode::from_row(row))
|
||
})?;
|
||
|
||
Ok(nodes.collect::<Result<Vec<_>, _>>()?)
|
||
}
|
||
|
||
pub fn get_node_locations(&self, node_id: &str) -> Result<Vec<FileLocation>> {
|
||
// JOIN查询 → SQLite
|
||
let mut stmt = self.router.sqlite_conn.prepare(
|
||
"SELECT l.* FROM file_nodes n
|
||
JOIN file_locations l ON n.file_uuid = l.file_uuid
|
||
WHERE n.node_id = ?"
|
||
)?;
|
||
|
||
let locations = stmt.query_map([node_id], |row| {
|
||
Ok(FileLocation::from_row(row))
|
||
})?;
|
||
|
||
Ok(locations.collect::<Result<Vec<_>, _>>()?)
|
||
}
|
||
|
||
// === KV API (Sled优先) ===
|
||
|
||
pub fn get_content_path(&self, sha256: &str) -> Result<Option<String>> {
|
||
// KV查询 → Sled
|
||
let tree = self.router.sled_db.open_tree("file_content_hash")?;
|
||
if let Some(data) = tree.get(sha256.as_bytes())? {
|
||
let location: ContentLocation = serde_json::from_slice(&data)?;
|
||
return Ok(Some(location.storage_path));
|
||
}
|
||
Ok(None)
|
||
}
|
||
|
||
pub fn is_hot_file(&self, node_id: &str) -> Result<bool> {
|
||
// Hot path查询 → Sled
|
||
let tree = self.router.sled_db.open_tree("hot_files_cache")?;
|
||
if let Some(data) = tree.get(node_id.as_bytes())? {
|
||
let cache: HotFileCache = serde_json::from_slice(&data)?;
|
||
return Ok(cache.hot_level > 2);
|
||
}
|
||
Ok(false)
|
||
}
|
||
|
||
pub fn get_import_progress(&self, job_id: &str) -> Result<ImportJob> {
|
||
// 队列查询 → Sled
|
||
let tree = self.router.sled_db.open_tree("import_queue")?;
|
||
if let Some(data) = tree.get(job_id.as_bytes())? {
|
||
let job: ImportJob = serde_json::from_slice(&data)?;
|
||
return Ok(job);
|
||
}
|
||
Err(anyhow!("Job not found"))
|
||
}
|
||
|
||
// === Write API (双写同步) ===
|
||
|
||
pub fn insert_node(&self, node: &FileNode) -> Result<()> {
|
||
// 双写机制
|
||
self.sync_manager.dual_write_node(node)
|
||
}
|
||
|
||
pub fn update_node(&self, node_id: &str, updates: &FileNode) -> Result<()> {
|
||
// 更新SQLite + 失效Sled缓存
|
||
self.router.update_node_sqlite(node_id, updates)?;
|
||
self.sync_manager.invalidate_node(node_id)?;
|
||
Ok(())
|
||
}
|
||
|
||
pub fn delete_node(&self, node_id: &str) -> Result<()> {
|
||
// 删除SQLite + 失效Sled缓存
|
||
self.router.delete_node_sqlite(node_id)?;
|
||
self.sync_manager.invalidate_node(node_id)?;
|
||
Ok(())
|
||
}
|
||
|
||
pub fn import_batch(&self, nodes: &[FileNode]) -> Result<()> {
|
||
// 批量导入 → Sled快速导入 + 异步SQLite同步
|
||
self.router.import_batch(nodes)
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 六、性能优化策略
|
||
|
||
### 6.1 缓存策略
|
||
|
||
**缓存优先级:**
|
||
|
||
```
|
||
Cache Priority (Hot → Cold):
|
||
┌─────────────────────────────────┐
|
||
│ Level 5: FUSE hot path │ ← 最高优先级
|
||
│ - 最近1小时访问 >100次 │
|
||
│ - 常用目录节点 │
|
||
├─────────────────────────────────┤
|
||
│ Level 4: Frequently accessed │
|
||
│ - 最近24小时访问 >50次 │
|
||
│ - 用户常用文件 │
|
||
├─────────────────────────────────┤
|
||
│ Level 3: Normal accessed │
|
||
│ - 最近7天访问 >10次 │
|
||
│ - 普通文件节点 │
|
||
├─────────────────────────────────┤
|
||
│ Level 2: Rarely accessed │
|
||
│ - 最近30天访问 <10次 │
|
||
│ - 低频文件 │
|
||
├─────────────────────────────────┤
|
||
│ Level 1: Cold files │
|
||
│ - 最近90天未访问 │
|
||
│ - 冷数据 │
|
||
├─────────────────────────────────┤
|
||
│ Level 0: Archive files │ ← 淘汰候选
|
||
│ - 最近180天未访问 │
|
||
│ - 淘汰缓存 │
|
||
└─────────────────────────────────┘
|
||
```
|
||
|
||
**缓存配置:**
|
||
|
||
```rust
|
||
pub struct CacheConfig {
|
||
pub max_cache_size: usize, // 最大缓存节点数
|
||
pub default_ttl: u32, // 默认TTL (秒)
|
||
pub hot_threshold: u32, // 热点阈值(访问次数)
|
||
pub cold_threshold: u32, // 冷数据阈值
|
||
pub cleanup_interval: u32, // 清理间隔 (秒)
|
||
}
|
||
|
||
impl Default for CacheConfig {
|
||
fn default() -> Self {
|
||
CacheConfig {
|
||
max_cache_size: 10000, // 10K节点
|
||
default_ttl: 3600, // 1小时
|
||
hot_threshold: 50, // 50次访问
|
||
cold_threshold: 5, // 5次访问
|
||
cleanup_interval: 300, // 5分钟
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 6.2 并发优化
|
||
|
||
**并发策略:**
|
||
|
||
```rust
|
||
pub struct ConcurrencyConfig {
|
||
pub sqlite_pool_size: u32, // SQLite连接池大小
|
||
pub sled_batch_size: usize, // Sled批量大小
|
||
pub async_sync_workers: u32, // 异步同步worker数
|
||
pub cache_cleanup_threads: u32, // 缓存清理线程数
|
||
}
|
||
|
||
impl Default for ConcurrencyConfig {
|
||
fn default() -> Self {
|
||
ConcurrencyConfig {
|
||
sqlite_pool_size: 10, // 10个连接
|
||
sled_batch_size: 1000, // 1000节点批量
|
||
async_sync_workers: 4, // 4个worker
|
||
cache_cleanup_threads: 2, // 2个清理线程
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 七、迁移实施方案
|
||
|
||
### 7.1 Phase 1: 混合架构POC (2天)
|
||
|
||
**Day 1: 基础架构实现**
|
||
- ✅ 创建 HybridRouter 基础框架
|
||
- ✅ 实现数据路由逻辑
|
||
- ✅ 实现基础查询API
|
||
|
||
**Day 2: 缓存机制实现**
|
||
- ✅ 创建 metadata_cache Tree
|
||
- ✅ 实现缓存查询逻辑
|
||
- ✅ 实现缓存失效机制
|
||
|
||
### 7.2 Phase 2: 数据同步机制 (2天)
|
||
|
||
**Day 3: 双写机制实现**
|
||
- ✅ 实现双写同步逻辑
|
||
- ✅ 实现一致性检查
|
||
- ✅ 实现自动修复机制
|
||
|
||
**Day 4: 缓存优化实现**
|
||
- ✅ 实现热点文件缓存
|
||
- ✅ 实现LRU淘汰机制
|
||
- ✅ 实现TTL过期清理
|
||
|
||
### 7.3 Phase 3: 性能优化 (2天)
|
||
|
||
**Day 5: 性能测试**
|
||
- ✅ 混合架构性能测试
|
||
- ✅ 缓存命中率测试
|
||
- ✅ 并发性能测试
|
||
|
||
**Day 6: 优化调优**
|
||
- ✅ 缓存配置优化
|
||
- ✅ 并发配置优化
|
||
- ✅ 性能对比验证
|
||
|
||
### 7.4 Phase 4: 生产部署 (评估触发)
|
||
|
||
**触发条件:**
|
||
- 并发用户 > 10
|
||
- 导入吞吐需求 > 50K/sec
|
||
- 缓存命中率需求 > 80%
|
||
|
||
**部署步骤:**
|
||
1. 数据迁移(SQLite → SQLite + Sled)
|
||
2. API切换(纯SQLite → HybridAPI)
|
||
3. 监控部署(缓存命中率、延迟监控)
|
||
4. 性能验证(对比测试)
|
||
|
||
---
|
||
|
||
## 八、监控指标设计
|
||
|
||
### 8.1 缓存监控
|
||
|
||
```rust
|
||
pub struct CacheMetrics {
|
||
pub cache_size: usize, // 当前缓存大小
|
||
pub cache_hit_rate: f64, // 缓存命中率
|
||
pub cache_miss_rate: f64, // 缓存未命中率
|
||
pub avg_cache_latency: f64, // 平均缓存延迟
|
||
pub avg_sqlite_latency: f64, // 平均SQLite延迟
|
||
pub hot_files_count: usize, // 热点文件数量
|
||
pub cold_files_count: usize, // 冷文件数量
|
||
pub cleanup_count: u32, // 清理次数
|
||
}
|
||
|
||
impl CacheMetrics {
|
||
pub fn calculate_hit_rate(&self) -> f64 {
|
||
if self.cache_size == 0 {
|
||
return 0.0;
|
||
}
|
||
self.cache_hit_rate / (self.cache_hit_rate + self.cache_miss_rate)
|
||
}
|
||
}
|
||
```
|
||
|
||
### 8.2 性能监控
|
||
|
||
```rust
|
||
pub struct PerformanceMetrics {
|
||
pub query_latency_avg: f64, // 平均查询延迟
|
||
pub query_latency_p99: f64, // P99查询延迟
|
||
pub import_throughput: f64, // 导入吞吐
|
||
pub concurrent_ops: f64, // 并发OPS
|
||
pub sqlite_query_count: u32, // SQLite查询次数
|
||
pub sled_query_count: u32, // Sled查询次数
|
||
pub sync_queue_size: u32, // 同步队列大小
|
||
}
|
||
|
||
impl PerformanceMetrics {
|
||
pub fn calculate_ratio(&self) -> f64 {
|
||
// Sled查询比例
|
||
self.sled_query_count as f64 /
|
||
(self.sqlite_query_count + self.sled_query_count) as f64
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 九、风险评估与缓解
|
||
|
||
### 9.1 风险评估
|
||
|
||
| 风险项 | 风险等级 | 影响 | 缓解措施 |
|
||
|--------|----------|------|----------|
|
||
| **数据不一致** | ⚠️⚠️⚠️ 高 | 数据完整性受损 | 双写机制 + 一致性检查 |
|
||
| **缓存失效** | ⚠️⚠️ 中 | 性能下降 | TTL机制 + LRU淘汰 |
|
||
| **同步延迟** | ⚠️⚠️ 中 | 数据更新延迟 | 异步同步 + 批量同步 |
|
||
| **内存占用** | ⚠️ 低 | 系统资源消耗 | 缓存大小限制 + 清理机制 |
|
||
|
||
### 9.2 缓解措施
|
||
|
||
**数据不一致缓解:**
|
||
|
||
```rust
|
||
pub fn ensure_consistency(&self) -> Result<()> {
|
||
// 定期一致性检查(每5分钟)
|
||
let report = self.sync_manager.consistency_check()?;
|
||
|
||
if report.inconsistencies.len() > 0 {
|
||
log::warn!("Found {} inconsistencies", report.inconsistencies.len());
|
||
|
||
// 自动修复
|
||
for inc in &report.inconsistencies {
|
||
self.sync_manager.auto_fix_inconsistency(inc)?;
|
||
}
|
||
}
|
||
|
||
Ok(())
|
||
}
|
||
```
|
||
|
||
**缓存失效缓解:**
|
||
|
||
```rust
|
||
pub fn ensure_cache_valid(&self) -> Result<()> {
|
||
// TTL过期检查(每5分钟)
|
||
let cleaned = self.cache_invalidator.ttl_cleanup()?;
|
||
|
||
if cleaned > 0 {
|
||
log::info!("Cleaned {} expired cache entries", cleaned);
|
||
}
|
||
|
||
// LRU淘汰检查
|
||
let cache_tree = self.sled_db.open_tree("metadata_cache")?;
|
||
if cache_tree.len() > self.config.max_cache_size {
|
||
self.cache_invalidator.lru_eviction()?;
|
||
}
|
||
|
||
Ok(())
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 十、总结
|
||
|
||
### 10.1 架构优势
|
||
|
||
**✅ 保留 SQLite 优势:**
|
||
1. SQL查询支持(JOIN, WHERE)
|
||
2. 查询延迟最低(<1ms)
|
||
3. 空间效率最高(12.33MB)
|
||
4. 调试工具完善(SQLite Browser)
|
||
|
||
**✅ 利用 Sled 优势:**
|
||
1. 导入吞吐最高(163K/sec)
|
||
2. 并发读取最高(5.22M/sec)
|
||
3. 多writer并发(MVCC)
|
||
4. 纯Rust实现(无FFI)
|
||
|
||
**✅ 混合优势:**
|
||
1. 查询性能优化(缓存加速)
|
||
2. 写入性能优化(Sled快速导入)
|
||
3. 数据一致性保证(双写同步)
|
||
4. 渐进式迁移(可回滚)
|
||
|
||
### 10.2 实施建议
|
||
|
||
**短期:SQLite + 优化** (当前)
|
||
- 4天优化成本
|
||
- 零风险
|
||
- 立即见效
|
||
|
||
**中期:混合架构POC** (6个月后)
|
||
- 6天POC实施
|
||
- 中风险
|
||
- 性能验证
|
||
|
||
**长期:生产部署** (12个月后)
|
||
- 触发条件评估
|
||
- 高风险
|
||
- 监控部署
|
||
|
||
---
|
||
|
||
**设计完成日期:** 2026-05-29
|
||
**POC实施日期:** 2026-06-05 (计划) |