docs: Add File Scan System documentation
Added complete documentation for scan/hash commands: - CLI commands usage - Performance test results (11857 files in 0.89s import) - Design decisions (UUID strategy, path storage) - Async hash system architecture - Database structure examples - Usage workflows Performance highlights: - Fast import: 14243 nodes/sec - Async hash: 28 files/sec (4 threads) - Total: 12658 nodes (warren user) Version: 1.7 (File Scan System)
This commit is contained in:
128
AGENTS.md
128
AGENTS.md
@@ -1052,3 +1052,131 @@ curl http://localhost:11438/api/v2/config/validate
|
||||
|
||||
**最后更新:** 2026-05-16 20:35
|
||||
**版本:** 1.6(UI Settings系统版)
|
||||
|
||||
---
|
||||
|
||||
## File Scan System(2026-05-17新增)
|
||||
|
||||
### 功能概述
|
||||
|
||||
**异步导入系统:**
|
||||
- 快速导入(skip_hash=true):立即显示文件树
|
||||
- 后台hash:异步计算SHA256,不影响前端使用
|
||||
|
||||
### CLI命令
|
||||
|
||||
**scan命令:**
|
||||
```bash
|
||||
# 快速导入(默认skip_hash=true)
|
||||
cargo run -- scan --user warren --dir /Users/accusys/momentry/var/sftpgo/data/warren
|
||||
|
||||
# 导入并计算hash(skip_hash=false)
|
||||
cargo run -- scan --user warren --dir <path> --skip-hash false --threads 4
|
||||
|
||||
# 参数说明:
|
||||
--user <USER> 用户ID
|
||||
--dir <DIR> 扫描目录
|
||||
--batch <BATCH> 批量插入大小(默认:100)
|
||||
--skip-hash <BOOL> 跳过hash计算(默认:true)
|
||||
--threads <THREADS> hash计算线程数(默认:4)
|
||||
````
|
||||
|
||||
**hash命令:**
|
||||
```bash
|
||||
# 后台计算hash
|
||||
cargo run -- hash --user warren --threads 4
|
||||
|
||||
# 参数说明:
|
||||
--user <USER> 用户ID
|
||||
--threads <THREADS> 并行线程数(默认:4)
|
||||
````
|
||||
|
||||
### 效能测试结果(2026-05-17)
|
||||
|
||||
**测试配置:**
|
||||
- 文件总数:11857 files + 801 folders = 12658 nodes
|
||||
- 目录深度:多层子目录(Accusys/Accusys_FAE/VolPack_ME5012...)
|
||||
- CPU线程:4 threads (M4 Mac mini)
|
||||
|
||||
**第一阶段:快速导入(skip_hash=true)**
|
||||
- 目录扫描:0.10s
|
||||
- ID生成:0.57s(主要瓶颈)
|
||||
- 数据库插入:0.21s
|
||||
- 总时间:0.89s
|
||||
- 速度:14243 nodes/sec
|
||||
|
||||
**第二阶段:SHA256计算(4 threads)**
|
||||
- 文件数:11857
|
||||
- 总时间:417.58s
|
||||
- 速度:28 files/sec
|
||||
|
||||
**效能瓶颈分析:**
|
||||
1. ID生成(64%)- SHA256计算UUID耗时
|
||||
2. 数据库插入(24%)- 批量插入优化
|
||||
3. 目录扫描(11%)- fast, no bottleneck
|
||||
4. Hash计算 - IO瓶颈(多线程未提速)
|
||||
|
||||
### 设计决策
|
||||
|
||||
**UUID生成策略:**
|
||||
- 算法:SHA256(path|filename|mac|mtime).chars().take(32)
|
||||
- 特性:确定性UUID(同一文件 = 同一UUID)
|
||||
- 优势:无需外部API,支持增量导入
|
||||
|
||||
**路径存储方案:**
|
||||
- 存储位置:aliases.json(临时方案)
|
||||
- 格式:`{"path": "/full/path/to/file"}`
|
||||
- 原因:file_nodes表无path字段
|
||||
- 改进:未来添加file_locations表填充
|
||||
|
||||
**异步hash设计:**
|
||||
- 导入优先:用户立即查看文件树
|
||||
- hash后台:不影响前端使用
|
||||
- 多线程:并行计算(IO瓶颈限制)
|
||||
- 增量:只计算缺失hash的文件
|
||||
|
||||
### 数据库结构(warren.sqlite)
|
||||
|
||||
**节点统计:**
|
||||
- Folders: 801
|
||||
- Files: 11857
|
||||
- Total: 12658
|
||||
|
||||
**示例节点:**
|
||||
```
|
||||
node_id: 8b1ede3cd6970f02fa85b8e34b682caf
|
||||
label: Test_Plan_ME5.docx
|
||||
aliases_json: {"path":"/Users/accusys/.../Test_Plan_ME5.docx"}
|
||||
sha256: 355a063b697a812742fae2a021cdda5c
|
||||
node_type: file
|
||||
parent_id: (folder UUID)
|
||||
````
|
||||
|
||||
### 使用流程
|
||||
|
||||
**完整导入流程:**
|
||||
```
|
||||
1. cargo run -- scan --user warren --dir <path> # 快速导入(0.89s)
|
||||
2. cargo run -- hash --user warren --threads 4 # 后台hash(417s)
|
||||
3. 查看文件树:http://localhost:11438/ → File Tree → Login
|
||||
````
|
||||
|
||||
**增量导入流程:**
|
||||
```
|
||||
1. cargo run -- scan --user warren --dir <path> # 导入新文件
|
||||
2. cargo run -- hash --user warren # 计算新hash(只计算缺失的)
|
||||
````
|
||||
|
||||
### 相关文件
|
||||
|
||||
|文件 |功能 |
|
||||
|------|------|
|
||||
| src/scan.rs | 扫描导入 + hash计算(499行)|
|
||||
| src/main.rs | CLI命令定义(scan/hash)|
|
||||
| src/filetree/mod.rs | FileTree::init_user_db |
|
||||
| data/users/warren.sqlite | warren用户数据库(12658 nodes)|
|
||||
|
||||
---
|
||||
|
||||
**最后更新:** 2026-05-17 02:15
|
||||
**版本:** 1.7(File Scan System版)
|
||||
|
||||
Reference in New Issue
Block a user