docs: comply with V1.0 docs standard — add frontmatter, info table, English content
This commit is contained in:
@@ -1,4 +1,6 @@
|
||||
---
|
||||
document_type: "reference_doc"
|
||||
service: "MOMENTRY_CORE"
|
||||
title: "File Lifecycle — Pre-Processing & Registration"
|
||||
version: "V1.0"
|
||||
date: "2026-05-15"
|
||||
@@ -6,19 +8,26 @@ author: "M5"
|
||||
status: "draft"
|
||||
---
|
||||
|
||||
# File Lifecycle — Pre-Processing & Registration (All Managed Files)
|
||||
# File Lifecycle — Pre-Processing & Registration
|
||||
|
||||
| Item | Value |
|
||||
|------|-------|
|
||||
| Scope | All managed file types (video, image, document, spreadsheet, presentation) |
|
||||
| Status | Draft |
|
||||
| Applies to | Watcher pre-processor + Register API |
|
||||
| Key concept | Two-phase flow: birth certificate (`.pre.json`) → civil registration (DB INSERT) |
|
||||
|
||||
> **Applicable to all managed file types**: video, image, document (pdf, docx, pages, key, numbers), spreadsheet, presentation, and any other file registered in the system. The pre-processor registers any file type found by the watcher. ffprobe is used when applicable; files that ffprobe cannot parse receive minimal filesystem metadata as a fallback.
|
||||
|
||||
## Metaphor
|
||||
|
||||
```
|
||||
SHA256 = DNA or fingerprint (唯一不變的生物特徵)
|
||||
file created time = 出生時刻
|
||||
birthday (UUID anchor) = 出生時間戳
|
||||
.pre.json = 出生證明書
|
||||
POST /api/v1/files/register = 戶政登記
|
||||
status = registered = 完成戶籍登記
|
||||
SHA256 = DNA or fingerprint (immutable biometric identity)
|
||||
file created time = birth moment
|
||||
birthday (UUID anchor) = birth timestamp
|
||||
.pre.json = birth certificate
|
||||
POST /api/v1/files/register = civil registration
|
||||
status = registered = citizenship completed
|
||||
```
|
||||
|
||||
## Two-Phase Flow
|
||||
@@ -43,7 +52,7 @@ File watcher (`src/watcher/watcher.rs`) polls monitored directories every 60 sec
|
||||
→ birthday = file creation time (RFC 3339)
|
||||
|
||||
2. SHA256(full file, streaming 64KB chunks)
|
||||
→ content_hash = 512-bit hex string (檔案 DNA/指紋)
|
||||
→ content_hash = 512-bit hex string (file DNA / fingerprint)
|
||||
|
||||
3. ffprobe (or minimal fs metadata fallback for non-video)
|
||||
→ probe_json
|
||||
@@ -88,12 +97,12 @@ Stored alongside other processor outputs:
|
||||
The `birthday` is `file created time` — obtained from `fs::metadata().created()`. This is the **true birth time** of the file, not the registration time.
|
||||
|
||||
```
|
||||
birthday = 2026-05-15T02:15:00Z ← 檔案出生時間,永不改變
|
||||
birthday = 2026-05-15T02:15:00Z ← file birth time, never changes
|
||||
↓
|
||||
file_uuid = SHA256(mac | birthday | path | filename)
|
||||
↓
|
||||
同一檔案:相同 path + filename → 相同 UUID,無論註冊幾次
|
||||
不同檔案:不同 content_hash → 不同 UUID(即使同名)
|
||||
Same file: same path + filename → same UUID, regardless of registration count
|
||||
Different files: different content_hash → different UUID (even if same name)
|
||||
```
|
||||
|
||||
## Phase 2: Registration (Citizenship)
|
||||
@@ -115,11 +124,11 @@ curl -X POST http://localhost:3002/api/v1/files/register \
|
||||
└─ Not exists OR hash mismatch → compute fresh (existing logic)
|
||||
|
||||
2. Dedup check: SELECT file_uuid FROM videos WHERE content_hash = $1
|
||||
├─ Found → already_exists: true (identical DNA = same person)
|
||||
├─ Found → already_exists: true (identical DNA = same file)
|
||||
└─ Not found → continue
|
||||
|
||||
3. Name conflict check + auto-rename if needed
|
||||
└─ charade.mp4 → charade (1).mp4 (same name, different DNA)
|
||||
└─ charade.mp4 → charade (1).mp4 (same name, different content)
|
||||
|
||||
4. INSERT INTO videos (
|
||||
file_uuid, file_path, file_name, file_type,
|
||||
@@ -148,8 +157,8 @@ File detected by watcher
|
||||
│
|
||||
▼
|
||||
[Pre-Processor]
|
||||
├─ SHA256 (DNA/fingerprint)
|
||||
├─ ffprobe (vital signs)
|
||||
├─ SHA256 (DNA / fingerprint)
|
||||
├─ ffprobe (metadata extraction)
|
||||
└─ UUID (birth certificate ID)
|
||||
│
|
||||
▼
|
||||
|
||||
Reference in New Issue
Block a user