diff --git a/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md b/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md index 437d0e2..e274295 100644 --- a/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md +++ b/docs_v1.0/REFERENCE/FILE_LIFECYCLE_V1.0.md @@ -1,4 +1,6 @@ --- +document_type: "reference_doc" +service: "MOMENTRY_CORE" title: "File Lifecycle — Pre-Processing & Registration" version: "V1.0" date: "2026-05-15" @@ -6,19 +8,26 @@ author: "M5" status: "draft" --- -# File Lifecycle — Pre-Processing & Registration (All Managed Files) +# File Lifecycle — Pre-Processing & Registration + +| Item | Value | +|------|-------| +| Scope | All managed file types (video, image, document, spreadsheet, presentation) | +| Status | Draft | +| Applies to | Watcher pre-processor + Register API | +| Key concept | Two-phase flow: birth certificate (`.pre.json`) → civil registration (DB INSERT) | > **Applicable to all managed file types**: video, image, document (pdf, docx, pages, key, numbers), spreadsheet, presentation, and any other file registered in the system. The pre-processor registers any file type found by the watcher. ffprobe is used when applicable; files that ffprobe cannot parse receive minimal filesystem metadata as a fallback. ## Metaphor ``` -SHA256 = DNA or fingerprint (唯一不變的生物特徵) -file created time = 出生時刻 -birthday (UUID anchor) = 出生時間戳 -.pre.json = 出生證明書 -POST /api/v1/files/register = 戶政登記 -status = registered = 完成戶籍登記 +SHA256 = DNA or fingerprint (immutable biometric identity) +file created time = birth moment +birthday (UUID anchor) = birth timestamp +.pre.json = birth certificate +POST /api/v1/files/register = civil registration +status = registered = citizenship completed ``` ## Two-Phase Flow @@ -43,7 +52,7 @@ File watcher (`src/watcher/watcher.rs`) polls monitored directories every 60 sec → birthday = file creation time (RFC 3339) 2. SHA256(full file, streaming 64KB chunks) - → content_hash = 512-bit hex string (檔案 DNA/指紋) + → content_hash = 512-bit hex string (file DNA / fingerprint) 3. ffprobe (or minimal fs metadata fallback for non-video) → probe_json @@ -88,12 +97,12 @@ Stored alongside other processor outputs: The `birthday` is `file created time` — obtained from `fs::metadata().created()`. This is the **true birth time** of the file, not the registration time. ``` -birthday = 2026-05-15T02:15:00Z ← 檔案出生時間,永不改變 +birthday = 2026-05-15T02:15:00Z ← file birth time, never changes ↓ file_uuid = SHA256(mac | birthday | path | filename) ↓ -同一檔案:相同 path + filename → 相同 UUID,無論註冊幾次 -不同檔案:不同 content_hash → 不同 UUID(即使同名) +Same file: same path + filename → same UUID, regardless of registration count +Different files: different content_hash → different UUID (even if same name) ``` ## Phase 2: Registration (Citizenship) @@ -115,11 +124,11 @@ curl -X POST http://localhost:3002/api/v1/files/register \ └─ Not exists OR hash mismatch → compute fresh (existing logic) 2. Dedup check: SELECT file_uuid FROM videos WHERE content_hash = $1 - ├─ Found → already_exists: true (identical DNA = same person) + ├─ Found → already_exists: true (identical DNA = same file) └─ Not found → continue 3. Name conflict check + auto-rename if needed - └─ charade.mp4 → charade (1).mp4 (same name, different DNA) + └─ charade.mp4 → charade (1).mp4 (same name, different content) 4. INSERT INTO videos ( file_uuid, file_path, file_name, file_type, @@ -148,8 +157,8 @@ File detected by watcher │ ▼ [Pre-Processor] - ├─ SHA256 (DNA/fingerprint) - ├─ ffprobe (vital signs) + ├─ SHA256 (DNA / fingerprint) + ├─ ffprobe (metadata extraction) └─ UUID (birth certificate ID) │ ▼