docs: file lifecycle design — pre-process (birth certificate) + registration (civil registry)
This commit is contained in:
184
docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md
Normal file
184
docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md
Normal file
@@ -0,0 +1,184 @@
|
||||
---
|
||||
title: "File Lifecycle — Pre-Processing & Registration"
|
||||
version: "V1.0"
|
||||
date: "2026-05-15"
|
||||
author: "M5"
|
||||
status: "draft"
|
||||
---
|
||||
|
||||
# File Lifecycle — Pre-Processing & Registration
|
||||
|
||||
## Metaphor
|
||||
|
||||
```
|
||||
SHA256 = DNA or fingerprint (唯一不變的生物特徵)
|
||||
file created time = 出生時刻
|
||||
birthday (UUID anchor) = 出生時間戳
|
||||
.pre.json = 出生證明書
|
||||
POST /api/v1/files/register = 戶政登記
|
||||
status = registered = 完成戶籍登記
|
||||
```
|
||||
|
||||
## Two-Phase Flow
|
||||
|
||||
A file enters the system in two distinct phases:
|
||||
|
||||
| Phase | Action | Analogy | Automatic? | Status |
|
||||
|-------|--------|---------|:----------:|:------:|
|
||||
| **Birth** | Pre-process: SHA256 + probe + UUID | 出生 + 醫院開出生證明 | ✅ Watcher | `unregistered` |
|
||||
| **Citizenship** | Register: INSERT into DB | 戶政事務所登記 | ❌ User API | `registered` |
|
||||
|
||||
## Phase 1: Pre-Processing (Birth)
|
||||
|
||||
### Trigger
|
||||
|
||||
File watcher (`src/watcher/watcher.rs`) polls monitored directories every 60 seconds. When a new file is detected, pre-processor runs automatically.
|
||||
|
||||
### Computation Steps
|
||||
|
||||
```
|
||||
1. fs::metadata(path).created()
|
||||
→ birthday = file creation time (RFC 3339)
|
||||
|
||||
2. SHA256(full file, streaming 64KB chunks)
|
||||
→ content_hash = 512-bit hex string (檔案 DNA/指紋)
|
||||
|
||||
3. ffprobe (or minimal fs metadata fallback for non-video)
|
||||
→ probe_json
|
||||
|
||||
4. compute_birth_uuid(mac, birthday, canonical_path, filename)
|
||||
→ file_uuid = SHA256(mac | birthday | path | filename)[0:32]
|
||||
|
||||
5. Write {OUTPUT_DIR}/{file_uuid}.pre.json
|
||||
```
|
||||
|
||||
### Output: `.pre.json` Schema
|
||||
|
||||
Stored alongside other processor outputs:
|
||||
|
||||
```
|
||||
{OUTPUT_DIR}/
|
||||
{file_uuid}.probe.json ← ffprobe
|
||||
{file_uuid}.face.json ← face detection
|
||||
{file_uuid}.pre.json ← pre-processor (NEW)
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"file_name": "charade.mp4",
|
||||
"file_path": "/data/demo/charade.mp4",
|
||||
"canonical_path": "/private/data/demo/charade.mp4",
|
||||
"content_hash": "a1b2c3d4e5f6...",
|
||||
"probe_json": {
|
||||
"format": { "duration": "6879.3", "size": "2147483648" },
|
||||
"streams": [...]
|
||||
},
|
||||
"birthday": "2026-05-15T02:15:00Z",
|
||||
"file_uuid": "aeed71342a899fe4b4c57b7d41bcb692",
|
||||
"file_size": 2147483648,
|
||||
"file_type": "video",
|
||||
"pre_processed_at": "2026-05-15T02:15:05Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Key Design: UUID = f(mac, birthday, path, filename)
|
||||
|
||||
The `birthday` is `file created time` — obtained from `fs::metadata().created()`. This is the **true birth time** of the file, not the registration time.
|
||||
|
||||
```
|
||||
birthday = 2026-05-15T02:15:00Z ← 檔案出生時間,永不改變
|
||||
↓
|
||||
file_uuid = SHA256(mac | birthday | path | filename)
|
||||
↓
|
||||
同一檔案:相同 path + filename → 相同 UUID,無論註冊幾次
|
||||
不同檔案:不同 content_hash → 不同 UUID(即使同名)
|
||||
```
|
||||
|
||||
## Phase 2: Registration (Citizenship)
|
||||
|
||||
### POST /api/v1/files/register
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3002/api/v1/files/register \
|
||||
-H "X-API-Key: ..." \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"file_path":"/data/demo/charade.mp4"}'
|
||||
```
|
||||
|
||||
### Flow
|
||||
|
||||
```
|
||||
1. Check {OUTPUT_DIR}/{file_uuid}.pre.json
|
||||
├─ Exists AND content_hash matches → use cached (skip SHA256 + probe)
|
||||
└─ Not exists OR hash mismatch → compute fresh (existing logic)
|
||||
|
||||
2. Dedup check: SELECT file_uuid FROM videos WHERE content_hash = $1
|
||||
├─ Found → already_exists: true (identical DNA = same person)
|
||||
└─ Not found → continue
|
||||
|
||||
3. Name conflict check + auto-rename if needed
|
||||
└─ charade.mp4 → charade (1).mp4 (same name, different DNA)
|
||||
|
||||
4. INSERT INTO videos (
|
||||
file_uuid, file_path, file_name, file_type,
|
||||
duration, width, height, fps,
|
||||
probe_json, content_hash, status, registration_time
|
||||
) VALUES (
|
||||
$1, $2, $3, $4, $5, $6, $7, $8, $9, $10,
|
||||
'registered', NOW() ← status=registered, registration_time=NOW()
|
||||
)
|
||||
```
|
||||
|
||||
## Data Separation
|
||||
|
||||
| Field | Source | Computed When | Mutable |
|
||||
|-------|--------|---------------|:------:|
|
||||
| `birthday` | `fs::metadata().created()` | Pre-process (once) | ❌ Never |
|
||||
| `content_hash` (SHA256) | Full file | Pre-process (once) | ❌ Never (unless file modified) |
|
||||
| `file_uuid` | SHA256(mac\|birthday\|path\|filename) | Pre-process (once) | ❌ Never |
|
||||
| `registration_time` | `NOW()` at register | Register API | ✅ Per registration |
|
||||
| `status` | — | Register API | `unregistered` → `registered` |
|
||||
|
||||
## File Lifecycle State Diagram
|
||||
|
||||
```
|
||||
File detected by watcher
|
||||
│
|
||||
▼
|
||||
[Pre-Processor]
|
||||
├─ SHA256 (DNA/fingerprint)
|
||||
├─ ffprobe (vital signs)
|
||||
└─ UUID (birth certificate ID)
|
||||
│
|
||||
▼
|
||||
{file_uuid}.pre.json
|
||||
status = unregistered (no DB record)
|
||||
│
|
||||
│ (user calls POST /api/v1/files/register)
|
||||
▼
|
||||
[Register Handler]
|
||||
├─ Read .pre.json (skip recomputation)
|
||||
├─ Dedup check (content_hash collision?)
|
||||
├─ Name check + rename?
|
||||
└─ INSERT INTO videos
|
||||
│
|
||||
▼
|
||||
status = registered
|
||||
registration_time = NOW()
|
||||
```
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
| # | Task | File |
|
||||
|---|------|------|
|
||||
| 1 | Modify watcher pre-processor: SHA256 + probe + write `.pre.json` | `src/watcher/watcher.rs` |
|
||||
| 2 | Register: read `.pre.json`, skip SHA256/probe if cached | `src/api/server.rs` → `register_single_file` |
|
||||
| 3 | UUID: use `birthday` from `.pre.json` (or `fs::metadata().created()` fallback) | `src/api/server.rs` |
|
||||
| 4 | INSERT status: `registered`, registration_time: `NOW()` | `src/api/server.rs` |
|
||||
| 5 | Pre-process all file types (not just video) | `src/watcher/watcher.rs` |
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| V1.0 | 2026-05-15 | Initial design — birth certificate (pre-process) + civil registration two-phase flow |
|
||||
Reference in New Issue
Block a user