From d81aec736010750c1ec077be7dd37702efd4eec4 Mon Sep 17 00:00:00 2001 From: Accusys Date: Fri, 15 May 2026 12:05:13 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20file=20lifecycle=20design=20=E2=80=94?= =?UTF-8?q?=20pre-process=20(birth=20certificate)=20+=20registration=20(ci?= =?UTF-8?q?vil=20registry)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md | 184 +++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md diff --git a/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md b/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md new file mode 100644 index 0000000..d4610b4 --- /dev/null +++ b/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md @@ -0,0 +1,184 @@ +--- +title: "File Lifecycle — Pre-Processing & Registration" +version: "V1.0" +date: "2026-05-15" +author: "M5" +status: "draft" +--- + +# File Lifecycle — Pre-Processing & Registration + +## Metaphor + +``` +SHA256 = DNA or fingerprint (唯一不變的生物特徵) +file created time = 出生時刻 +birthday (UUID anchor) = 出生時間戳 +.pre.json = 出生證明書 +POST /api/v1/files/register = 戶政登記 +status = registered = 完成戶籍登記 +``` + +## Two-Phase Flow + +A file enters the system in two distinct phases: + +| Phase | Action | Analogy | Automatic? | Status | +|-------|--------|---------|:----------:|:------:| +| **Birth** | Pre-process: SHA256 + probe + UUID | 出生 + 醫院開出生證明 | ✅ Watcher | `unregistered` | +| **Citizenship** | Register: INSERT into DB | 戶政事務所登記 | ❌ User API | `registered` | + +## Phase 1: Pre-Processing (Birth) + +### Trigger + +File watcher (`src/watcher/watcher.rs`) polls monitored directories every 60 seconds. When a new file is detected, pre-processor runs automatically. + +### Computation Steps + +``` +1. fs::metadata(path).created() + → birthday = file creation time (RFC 3339) + +2. SHA256(full file, streaming 64KB chunks) + → content_hash = 512-bit hex string (檔案 DNA/指紋) + +3. ffprobe (or minimal fs metadata fallback for non-video) + → probe_json + +4. compute_birth_uuid(mac, birthday, canonical_path, filename) + → file_uuid = SHA256(mac | birthday | path | filename)[0:32] + +5. Write {OUTPUT_DIR}/{file_uuid}.pre.json +``` + +### Output: `.pre.json` Schema + +Stored alongside other processor outputs: + +``` +{OUTPUT_DIR}/ + {file_uuid}.probe.json ← ffprobe + {file_uuid}.face.json ← face detection + {file_uuid}.pre.json ← pre-processor (NEW) +``` + +```json +{ + "file_name": "charade.mp4", + "file_path": "/data/demo/charade.mp4", + "canonical_path": "/private/data/demo/charade.mp4", + "content_hash": "a1b2c3d4e5f6...", + "probe_json": { + "format": { "duration": "6879.3", "size": "2147483648" }, + "streams": [...] + }, + "birthday": "2026-05-15T02:15:00Z", + "file_uuid": "aeed71342a899fe4b4c57b7d41bcb692", + "file_size": 2147483648, + "file_type": "video", + "pre_processed_at": "2026-05-15T02:15:05Z" +} +``` + +### Key Design: UUID = f(mac, birthday, path, filename) + +The `birthday` is `file created time` — obtained from `fs::metadata().created()`. This is the **true birth time** of the file, not the registration time. + +``` +birthday = 2026-05-15T02:15:00Z ← 檔案出生時間,永不改變 + ↓ +file_uuid = SHA256(mac | birthday | path | filename) + ↓ +同一檔案:相同 path + filename → 相同 UUID,無論註冊幾次 +不同檔案:不同 content_hash → 不同 UUID(即使同名) +``` + +## Phase 2: Registration (Citizenship) + +### POST /api/v1/files/register + +```bash +curl -X POST http://localhost:3002/api/v1/files/register \ + -H "X-API-Key: ..." \ + -H "Content-Type: application/json" \ + -d '{"file_path":"/data/demo/charade.mp4"}' +``` + +### Flow + +``` +1. Check {OUTPUT_DIR}/{file_uuid}.pre.json + ├─ Exists AND content_hash matches → use cached (skip SHA256 + probe) + └─ Not exists OR hash mismatch → compute fresh (existing logic) + +2. Dedup check: SELECT file_uuid FROM videos WHERE content_hash = $1 + ├─ Found → already_exists: true (identical DNA = same person) + └─ Not found → continue + +3. Name conflict check + auto-rename if needed + └─ charade.mp4 → charade (1).mp4 (same name, different DNA) + +4. INSERT INTO videos ( + file_uuid, file_path, file_name, file_type, + duration, width, height, fps, + probe_json, content_hash, status, registration_time + ) VALUES ( + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, + 'registered', NOW() ← status=registered, registration_time=NOW() + ) +``` + +## Data Separation + +| Field | Source | Computed When | Mutable | +|-------|--------|---------------|:------:| +| `birthday` | `fs::metadata().created()` | Pre-process (once) | ❌ Never | +| `content_hash` (SHA256) | Full file | Pre-process (once) | ❌ Never (unless file modified) | +| `file_uuid` | SHA256(mac\|birthday\|path\|filename) | Pre-process (once) | ❌ Never | +| `registration_time` | `NOW()` at register | Register API | ✅ Per registration | +| `status` | — | Register API | `unregistered` → `registered` | + +## File Lifecycle State Diagram + +``` +File detected by watcher + │ + ▼ +[Pre-Processor] + ├─ SHA256 (DNA/fingerprint) + ├─ ffprobe (vital signs) + └─ UUID (birth certificate ID) + │ + ▼ +{file_uuid}.pre.json + status = unregistered (no DB record) + │ + │ (user calls POST /api/v1/files/register) + ▼ +[Register Handler] + ├─ Read .pre.json (skip recomputation) + ├─ Dedup check (content_hash collision?) + ├─ Name check + rename? + └─ INSERT INTO videos + │ + ▼ +status = registered +registration_time = NOW() +``` + +## Implementation Checklist + +| # | Task | File | +|---|------|------| +| 1 | Modify watcher pre-processor: SHA256 + probe + write `.pre.json` | `src/watcher/watcher.rs` | +| 2 | Register: read `.pre.json`, skip SHA256/probe if cached | `src/api/server.rs` → `register_single_file` | +| 3 | UUID: use `birthday` from `.pre.json` (or `fs::metadata().created()` fallback) | `src/api/server.rs` | +| 4 | INSERT status: `registered`, registration_time: `NOW()` | `src/api/server.rs` | +| 5 | Pre-process all file types (not just video) | `src/watcher/watcher.rs` | + +## Version History + +| Version | Date | Changes | +|---------|------|---------| +| V1.0 | 2026-05-15 | Initial design — birth certificate (pre-process) + civil registration two-phase flow |