From 28e927f7bbaa2d009d312f93821e5db1f2b664f0 Mon Sep 17 00:00:00 2001 From: M5 Date: Thu, 7 May 2026 23:42:19 +0800 Subject: [PATCH] Initial commit: docs_v1.0 structure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - API_V1.0.0: 正式 API 文件(spec、release、deploy、test) - M4_workspace: M4 工作記錄(review、issue、提案) - M5_workspace: M5 工作記錄(實作、評估、sync) - AGENTS.md: 專案規則 M5/M4 協作方式:git push/pull 同步 workspace 文件 --- .gitignore | 17 + docs_v1.0/API_V1.0.0/API_DOCUMENTATION.md | 1211 +++++ .../API_V1.0.0/API_DOCUMENTATION_v1.0.0.md | 1211 +++++ .../DEPLOY/EMBEDDING_DEPLOYMENT_V1.0.0.md | 83 + .../DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md | 316 ++ .../INTERNAL/AGENTS/5W1H_AGENT_V1.0.0.md | 91 + .../INTERNAL/AGENTS/IDENTITY_AGENT_V1.0.0.md | 84 + .../INTERNAL/API_DICTIONARY_V1.0.0.md | 173 + .../API_REFERENCE_v1.0.0.20260501md.md | 310 ++ .../INTERNAL/API_USAGE_DEMO_V1.0.0.md | 376 ++ .../CHILD_DETECTION_AGE_BENCHMARK_V1.0.0.md | 148 + .../INTERNAL/CHUNK_DEFINITION_V1.0.0.md | 298 ++ .../INTERNAL/CLASS_SYSTEM_DESIGN_V1.0.0.md | 192 + .../DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md | 328 ++ .../INTERNAL/DEV_API_REFERENCE_v1.0.0.md | 210 + .../DUAL_EMBEDDING_PIPELINE_V1.0.0.md | 1148 +++++ .../INTERNAL/MOMENTRY_CORE_API_V1.0.0.md | 241 + .../INTERNAL/PROCESSORS/ASRX_V1.0.0.md | 102 + .../INTERNAL/PROCESSORS/ASR_V1.0.0.md | 243 + .../INTERNAL/PROCESSORS/CAPTION_V1.0.0.md | 80 + .../INTERNAL/PROCESSORS/CUT_V1.0.0.md | 179 + .../PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md | 159 + .../INTERNAL/PROCESSORS/FACE_V1.0.0.md | 373 ++ .../INTERNAL/PROCESSORS/OCR_V1.0.0.md | 125 + .../INTERNAL/PROCESSORS/POSE_V1.0.0.md | 133 + .../INTERNAL/PROCESSORS/SCENE_V1.0.0.md | 95 + .../INTERNAL/PROCESSORS/STORY_V1.0.0.md | 80 + .../PROCESSORS/VISUAL_CHUNK_V1.0.0.md | 74 + .../PROCESSORS/VOICE_EMBEDDING_FLOW_V1.0.0.md | 139 + .../INTERNAL/PROCESSORS/YOLO_V1.0.0.md | 178 + .../INTERNAL/PROCESSOR_SELECTION_V1.0.0.md | 201 + .../RCA_TRACE39_TRACE45_COLLISION_V1.0.0.md | 191 + .../TRACE_QUALITY_AGENT_REPORT_V1.0.0.md | 84 + .../INTERNAL/UUID_ENCODING_RULES_V1.0.0.md | 322 ++ .../API_V1.0.0/INTERNAL/VECTOR_SPEC_V1.0.0.md | 144 + .../PIPELINE_PROGRESS_REPORT_V2.0.0.md | 73 + .../RELEASE/PRODUCTION_VERIFICATION_V1.0.0.md | 143 + .../RELEASE/RELEASE_API_REFERENCE_v1.0.0.md | 349 ++ .../RELEASE/RELEASE_TEST_REPORT_v1.0.0.md | 171 + .../RELEASE/RELEASE_VERIFICATION_V1.0.0.md | 316 ++ .../01_health_Health_check.json | 1 + .../01_health_Health_detailed.json | 1 + .../02_auth_Login.json | 1 + .../02_auth_Logout.json | 1 + .../03_files_File_chunks.json | 0 .../03_files_File_detail.json | 1 + .../03_files_File_identities.json | 1 + .../03_files_File_probe.json | 1 + .../03_files_List_files.json | 1 + .../03_files_Register_file.json | 1 + .../03_files_Scan_files.json | 1 + .../03_files_Trigger_processing.json | 1 + .../03_files_Unregister_file.json | 1 + .../04_identity_Bind_face.json | 1 + .../04_identity_Create_identity.json | 10 + .../04_identity_Delete_identity.json | 0 .../04_identity_Identity_chunks.json | 1 + .../04_identity_Identity_detail.json | 1 + .../04_identity_Identity_files.json | 1 + .../04_identity_List_identities.json | 1 + .../04_identity_Merge_into.json | 1 + .../04_identity_Unbind_face.json | 1 + .../05_faces_Face_candidates.json | 1 + .../06_search_BM25_search.json | 1 + .../06_search_Frame_search.json | 1 + .../06_search_Hybrid_search.json | 1 + .../06_search_Smart_search.json | 1 + .../06_search_Universal_search.json | 1 + .../06_search_Vector_search.json | 1 + .../06_search_Visual_by_class.json | 0 .../06_search_Visual_by_density.json | 0 .../06_search_Visual_combination.json | 0 .../06_search_Visual_search.json | 1 + .../06_search_Visual_stats.json | 0 .../07_jobs_Job_detail.json | 0 .../07_jobs_List_jobs.json | 1 + .../07_jobs_Progress.json | 1 + .../07_jobs_Rule_status.json | 1 + .../08_resources_List_resources.json | 1 + .../08_resources_Register_resource.json | 1 + .../08_resources_Resource_heartbeat.json | 1 + .../09_agents_5W1H_analyze.json | 1 + .../09_agents_5W1H_batch.json | 1 + .../09_agents_5W1H_status.json | 1 + .../09_agents_Identity_agent_status.json | 1 + .../09_agents_Identity_analyze.json | 1 + .../09_agents_Identity_suggest.json | 1 + .../09_agents_Suggest_merge.json | 1 + .../09_agents_Translate.json | 1 + .../10_stats_Cache_toggle.json | 1 + .../10_stats_Inference_health.json | 1 + .../10_stats_SFTPGo_status.json | 1 + .../01_health_Health_check.json | 1 + .../01_health_Health_detailed.json | 1 + .../02_auth_Login.json | 1 + .../02_auth_Logout.json | 1 + .../03_files_File_chunks.json | 0 .../03_files_File_detail.json | 1 + .../03_files_File_identities.json | 1 + .../03_files_File_probe.json | 0 .../03_files_List_files.json | 1 + .../03_files_Register_file.json | 1 + .../03_files_Scan_files.json | 1 + .../03_files_Trigger_processing.json | 0 .../03_files_Unregister_file.json | 0 .../04_identity_Bind_face.json | 1 + .../04_identity_Create_identity.json | 10 + .../04_identity_Delete_identity.json | 0 .../04_identity_Identity_chunks.json | 1 + .../04_identity_Identity_detail.json | 1 + .../04_identity_Identity_files.json | 1 + .../04_identity_List_identities.json | 1 + .../04_identity_Merge_into.json | 1 + .../04_identity_Unbind_face.json | 1 + .../05_faces_Face_candidates.json | 1 + .../06_search_BM25_search.json | 1 + .../06_search_Frame_search.json | 1 + .../06_search_Hybrid_search.json | 1 + .../06_search_Smart_search.json | 1 + .../06_search_Universal_search.json | 1 + .../06_search_Vector_search.json | 1 + .../06_search_Visual_by_class.json | 0 .../06_search_Visual_by_density.json | 0 .../06_search_Visual_combination.json | 0 .../06_search_Visual_search.json | 1 + .../06_search_Visual_stats.json | 0 .../07_jobs_Job_detail.json | 0 .../07_jobs_List_jobs.json | 1 + .../07_jobs_Progress.json | 1 + .../07_jobs_Rule_status.json | 0 .../08_resources_List_resources.json | 1 + .../08_resources_Register_resource.json | 1 + .../08_resources_Resource_heartbeat.json | 1 + .../09_agents_5W1H_analyze.json | 1 + .../09_agents_5W1H_batch.json | 1 + .../09_agents_5W1H_status.json | 1 + .../09_agents_Identity_agent_status.json | 1 + .../09_agents_Identity_analyze.json | 1 + .../09_agents_Identity_suggest.json | 1 + .../09_agents_Suggest_merge.json | 1 + .../09_agents_Translate.json | 1 + .../10_stats_Cache_toggle.json | 1 + .../10_stats_Inference_health.json | 1 + .../10_stats_SFTPGo_status.json | 1 + .../TEST_RESULTS/api_test_20260505_230407.md | 22 + .../TEST_RESULTS/api_test_20260505_230449.md | 26 + .../TEST_RESULTS/api_test_20260505_230751.md | 142 + .../TEST_RESULTS/api_test_20260505_231103.md | 1134 +++++ .../TEST_RESULTS/api_test_20260506_132742.md | 1134 +++++ .../TRACE/TRACE_API_REFERENCE_V1.0.0.md | 255 + .../2026-05-06_5w1h_verification.md | 79 + .../2026-05-06_5w1h_vs_story_comparison.md | 64 + .../2026-05-06_api_verification.md | 108 + .../M4_workspace/2026-05-06_pipeline_test.md | 132 + .../M4_workspace/2026-05-06_search_test.md | 95 + .../2026-05-06_vector_data_status.md | 102 + .../2026-05-07_M4_M5_pipeline_分工.md | 78 + ...2026-05-07_M4_pipeline_failure_analysis.md | 45 + ...-05-07_M5_proposal_embedding_deployment.md | 54 + .../2026-05-07_M5_recent_changes_for_sync.md | 68 + .../2026-05-07_ane_embedding_config_change.md | 61 + .../2026-05-07_ane_embedding_install_guide.md | 122 + .../2026-05-07_ane_embedding_test_plan.md | 155 + .../2026-05-07_ane_embedding_test_result.md | 27 + .../2026-05-07_embedding_benchmark.md | 35 + .../2026-05-07_embedding_benchmark_final.md | 38 + .../2026-05-07_embedding_benchmark_m4.md | 38 + .../2026-05-07_embedding_models_from_M5.md | 16 + .../2026-05-07_export_package_design.md | 113 + .../2026-05-07_pdf_processing_discussion.md | 93 + .../2026-05-07_pipeline_issues_analysis.md | 147 + ...05-07_pipeline_progress_report_template.md | 73 + .../M4_workspace/2026-05-07_response_to_M5.md | 45 + ...26-05-07_single_frame_photo_test_report.md | 84 + .../M4_workspace/Momentry_API_教材_Marcom.md | 488 ++ .../M4_workspace/convert_embed_to_coreml.py | 50 + docs_v1.0/M4_workspace/test_coreml_embed.py | 55 + docs_v1.0/M4_workspace/test_coreml_full.py | 51 + .../M5_workspace/2026-05-06_bug_chunks_500.md | 68 + .../2026-05-06_bug_search_missing_fps.md | 84 + .../2026-05-06_bug_universal_search_uuid.md | 42 + .../M5_workspace/2026-05-06_fix_report.md | 65 + ...026-05-07_5w1h_recursive_summary_design.md | 188 + .../2026-05-07_M4_3_embedding_models_ready.md | 30 + .../2026-05-07_M4_ANE_embedding_verified.md | 25 + .../2026-05-07_M4_ANE_verified.md | 27 + .../2026-05-07_M4_llama_embedding_ready.md | 18 + .../2026-05-07_M5_to_M4_embedding_plan.md | 55 + .../2026-05-07_bug_asr_pre_chunks_missing.md | 38 + ...6-05-07_bug_store_traced_faces_pipeline.md | 42 + .../2026-05-07_embedding_model_selection.md | 61 + .../2026-05-07_embedding_models_location.md | 32 + ...7_export_import_identity_merge_analysis.md | 139 + .../2026-05-07_gun_detection_evaluation.md | 55 + .../2026-05-07_gun_detection_training_log.md | 32 + .../2026-05-07_photo_processing_suggestion.md | 61 + .../2026-05-07_request_pipeline_M5.md | 38 + .../M5_workspace/2026-05-07_response_to_M4.md | 69 + .../2026-05-07_response_to_M4_trace_api.md | 32 + .../2026-05-07_response_to_M4_v2.md | 25 + ..._scene_classification_evaluation_report.md | 99 + .../2026-05-07_session_summary.md | 51 + .../2026-05-07_session_summary_v2.md | 66 + .../M5_workspace/2026-05-07_sync_to_M4.md | 48 + .../2026-05-07_template_condition_fix.md | 33 + ...7_visual_speaker_diarization_evaluation.md | 321 ++ .../REFERENCE/API_3002_VS_3003_COMPARISON.md | 198 + docs_v1.0/REFERENCE/API_ACCESS.md | 230 + docs_v1.0/REFERENCE/API_ENDPOINTS.md | 321 ++ docs_v1.0/REFERENCE/API_ERROR_CODES.md | 106 + docs_v1.0/REFERENCE/API_INDEX.md | 129 + docs_v1.0/REFERENCE/API_KEY_DESIGN.md | 731 +++ docs_v1.0/REFERENCE/API_QUICK_REFERENCE.md | 532 ++ docs_v1.0/REFERENCE/API_REFERENCE.md | 310 ++ docs_v1.0/REFERENCE/API_TRAINING_MARCOM.md | 427 ++ docs_v1.0/REFERENCE/DEVELOPMENT_LOG.md | 559 +++ .../REFERENCE/DOCUMENT_EMBEDDING_STRATEGY.md | 187 + docs_v1.0/REFERENCE/JSON_OUTPUT_SPEC.md | 538 +++ .../MODULE_STANDARDIZATION_SPECIFICATION.md | 647 +++ .../REFERENCE/MOMENTRY_CORE_REDIS_KEYS.md | 303 ++ .../REFERENCE/MOMENTRY_RAG_PRESENTATION.md | 353 ++ .../Momentry_Core_API.postman_collection.json | 127 + docs_v1.0/REFERENCE/N8N_API_FIX_SUMMARY.md | 106 + .../REFERENCE/N8N_VIDEO_SEARCH_SUCCESS.md | 321 ++ docs_v1.0/REFERENCE/NODEJS.md | 467 ++ docs_v1.0/REFERENCE/PENDING_ISSUES.md | 830 ++++ .../PLAYGROUND_BINARY_IMPLEMENTATION.md | 412 ++ docs_v1.0/REFERENCE/PORTAL_API_DEMO_GUIDE.md | 416 ++ .../REFERENCE/PORTAL_DEVELOPMENT_PLAN.md | 122 + .../REFERENCE/PROCESSING_STATUS_JSONB_SPEC.md | 682 +++ docs_v1.0/REFERENCE/PYTHON.md | 586 +++ docs_v1.0/REFERENCE/RUST_DEVELOPMENT.md | 1010 ++++ docs_v1.0/REFERENCE/SERVICES.md | 1092 +++++ docs_v1.0/REFERENCE/SFTPGO_DEMO_USER.md | 504 ++ docs_v1.0/REFERENCE/USER_MANUAL.md | 499 ++ docs_v1.0/REFERENCE/VERSION_MANAGEMENT.md | 280 ++ docs_v1.0/REFERENCE/VIDEO_PROCESSING_SPEC.md | 1453 ++++++ docs_v1.0/REFERENCE/VIDEO_REGISTRATION.md | 264 + .../AI_AGENTS/CONTEXT/METADATA_PROCESSORS.md | 563 +++ .../history/AI_AGENTS/CORE/AGENT_SPEC.md | 180 + .../FACE_SPEAKER_PERSON_IDENTITY_TUTORIAL.md | 183 + .../IDENTITY/FACE_SPEAKER_PERSON_PROGRESS.md | 97 + .../FACE_SPEAKER_PERSON_QUICK_START.md | 421 ++ .../IDENTITY/FACE_SPEAKER_PERSON_WORKFLOW.md | 372 ++ .../IDENTITY/FACE_TO_IDENTITY_FLOW.md | 768 +++ .../IDENTITY/FILE_IDENTITIES_TABLE_SPEC.md | 434 ++ .../AI_AGENTS/IDENTITY/IDENTITY_AGENT_SPEC.md | 549 +++ .../IDENTITY/IDENTITY_MANAGEMENT_API.md | 434 ++ .../IDENTITY/PHASE1_MIGRATION_PLAN.md | 282 ++ .../IDENTITY/PHASE2_MIGRATION_SUMMARY.md | 113 + .../IDENTITY/V4_MIGRATION_COMPLETE.md | 119 + .../AI_AGENTS/IDENTITY/V4_MIGRATION_STATUS.md | 121 + .../AI_AGENTS/SEARCH/SEARCH_PROMPTS.md | 139 + .../SUMMARIZATION/CHUNK_RULE_4_SUMMARY.md | 231 + .../AI_AGENTS/TRANSLATION/TEXT_TRANSLATION.md | 166 + .../REFERENCE/history/API_TEST_REPORT.md | 155 + .../ARCHITECTURE/API_KEY_ARCHITECTURE.md | 215 + .../API_WORKFLOW_WORDPRESS_N8N.md | 479 ++ .../ARCHITECTURE_DECISION_CARDS.md | 223 + .../ARCHITECTURE_DECISION_EXECUTION_PLAN.md | 163 + .../ARCHITECTURE_DOCUMENTATION_MAP.md | 389 ++ .../ARCHITECTURE/ARCHITECTURE_EVALUATION.md | 348 ++ .../ARCHITECTURE/ARCHITECTURE_OVERVIEW.md | 329 ++ .../ARCHITECTURE_REVIEW_PROCESS.md | 279 ++ .../ARCHITECTURE/ARCHITECTURE_ROADMAP.md | 371 ++ .../ARCHITECTURE/CACHE_ARCHITECTURE_PLAN.md | 1125 +++++ .../CLIP_EMBEDDING_BENCHMARK_PLAN.md | 535 ++ .../ARCHITECTURE/DESIGN_IMPLEMENTATION_GAP.md | 348 ++ .../EVENT_RECOGNITION_TECHNICAL_ANALYSIS.md | 918 ++++ .../REFERENCE/history/ARCHITECTURE/FAQ.md | 438 ++ .../IDENTITY_REFERENCE_VECTOR_DESIGN.md | 573 +++ .../JOB_WORKER_IMPLEMENTATION_PLAN.md | 814 ++++ .../ARCHITECTURE/MAC_INSTALLATION_PLAN.md | 800 +++ .../ARCHITECTURE/MCP_LAZY_LOADING_STRATEGY.md | 549 +++ ...ULE_STANDARDIZATION_IMPLEMENTATION_PLAN.md | 445 ++ .../MOMENTRY_CORE_ARCHITECTURE_V2.md | 671 +++ .../ARCHITECTURE/MONITORING_ARCHITECTURE.md | 392 ++ .../ARCHITECTURE/MONITORING_SETUP_GUIDE.md | 192 + .../MULTIMODAL_SEARCH_DESIGN_V5.md | 381 ++ .../history/ARCHITECTURE/N8N_DEMO_WORKFLOW.md | 709 +++ .../N8N_WORKFLOW_VIDEO_RAG_MCP.md | 190 + .../ON_THE_FLY_PROCESSING_DESIGN.md | 709 +++ .../PARENT_CHUNK_COVERAGE_ANALYSIS.md | 120 + .../PERFORMANCE_AND_SCALABILITY.md | 303 ++ .../PERSON_IDENTITY_INTEGRATION.md | 619 +++ .../PERSON_IDENTITY_USAGE_GUIDE.md | 395 ++ .../PIPELINE_AND_RESOURCE_ARCHITECTURE.md | 237 + .../ARCHITECTURE/PLAYGROUND_ARCHITECTURE.md | 521 ++ .../POSE_BASED_MATCHING_OPTIMIZATION_PLAN.md | 392 ++ .../ARCHITECTURE/PROCESSING_PIPELINE.md | 368 ++ .../history/ARCHITECTURE/QUICK_START_GUIDE.md | 165 + .../PROCESSOR_LIFECYCLE.md | 364 ++ .../PROCESSOR_REGISTRY_ARCHITECTURE.md | 330 ++ .../RESOURCE_MONITORING_SPEC.md | 120 + .../SERVICE_REGISTRY_ARCHITECTURE.md | 500 ++ .../UNIFIED_RESOURCE_REGISTRY.md | 162 + .../ARCHITECTURE/SECURITY_ARCHITECTURE.md | 165 + .../ARCHITECTURE/SEMANTIC_SEARCH_DESIGN.md | 247 + .../ARCHITECTURE/SERVICE_ADDITION_GUIDE.md | 698 +++ .../SOUND_RECOGNITION_EXTENSION.md | 408 ++ .../TECHNICAL_DECISION_RECORDS.md | 493 ++ .../ARCHITECTURE/TERMINOLOGY_MAPPING.md | 309 ++ .../ARCHITECTURE/TEST_AND_BENCHMARK_PLAN.md | 1219 +++++ .../ARCHITECTURE/USER_MANAGEMENT_PLAN.md | 443 ++ .../_deprecated/IDENTITY_SYSTEM_DESIGN.md | 498 ++ .../_deprecated/SPEAKER_INTEGRATION.md | 195 + .../_deprecated/TMDB_CHARACTER_INTEGRATION.md | 212 + .../BODY_ACTION_DECODER_CLASSIFICATION.md | 362 ++ docs_v1.0/REFERENCE/history/CHANGELOG.md | 143 + .../CHUNKING/CORE/CHUNKING_ARCHITECTURE.md | 273 ++ .../CORE/CHUNKING_ENRICHMENT_PIPELINE.md | 177 + .../CHUNKING/CORE/CHUNKING_SCHEMA_SPEC.md | 271 ++ .../CHUNKING/CORE/CHUNK_DATA_STRUCTURE.md | 398 ++ .../history/CHUNKING/CORE/CHUNK_DESIGN.md | 553 +++ .../history/CHUNKING/CORE/CHUNK_RULES_SPEC.md | 185 + .../history/CHUNKING/CORE/CHUNK_SPEC.md | 1132 +++++ .../SCENE_BASED/CHUNK_RULE_3_COMPOSITE.md | 337 ++ .../RULES/SCENE_BASED/CHUNK_RULE_3_SCENE.md | 215 + .../RULES/TEXT_BASED/CHUNK_RULE_1_SENTENCE.md | 202 + .../RULES/TEXT_BASED/CHUNK_RULE_1_SIMPLE.md | 378 ++ .../CHUNK_RULE_2_FRAME_OBJECTS.md | 310 ++ .../RULES/VISUAL_BASED/CHUNK_RULE_2_VISUAL.md | 242 + .../FACE_PROCESSOR_PERFORMANCE_2026-04-28.md | 196 + ...E_TRACKER_INTEGRATION_REPORT_2026-04-28.md | 206 + .../IDENTITY_SYSTEM_EXPERIMENT_2026-04-28.md | 204 + .../LANDMARKS_SOURCE_ANALYSIS_2026-04-28.md | 309 ++ ...O_MANY_MATCHING_OPTIMIZATION_2026-04-28.md | 184 + ..._BASED_MATCHING_FINAL_REPORT_2026-04-28.md | 231 + .../history/FACE_ANALYSIS_FINAL_ANSWER.md | 151 + .../history/FACE_LEARNING_VERIFICATION.md | 101 + .../history/FACE_RECOGNITION_DEPLOYMENT.md | 372 ++ .../history/FACE_RECOGNITION_FINAL_REPORT.md | 218 + .../history/FACE_RECOGNITION_FINAL_SUMMARY.md | 245 + .../history/FACE_THUMBNAIL_IMPLEMENTATION.md | 351 ++ .../history/FACE_TRACKER_DATA_STRUCTURE.md | 620 +++ .../REFERENCE/history/FACE_TRACKER_GUIDE.md | 261 + .../FEMALE_FACES_EXTRACTION_SUMMARY.md | 117 + docs_v1.0/REFERENCE/history/FILE_UUID_SPEC.md | 208 + .../REFERENCE/history/IDENTITY_API_SPEC.md | 811 ++++ .../AI_AGENT_DOCUMENTATION_GUIDE.md | 449 ++ .../IMPLEMENTATION/API_CURL_EXAMPLES.md | 571 +++ .../history/IMPLEMENTATION/API_EXAMPLES.md | 790 +++ .../API_KEY_INTEGRATION_TESTS.md | 255 + .../IMPLEMENTATION/API_KEY_MANAGEMENT.md | 713 +++ .../IMPLEMENTATION/API_KEY_OPTIMIZATION.md | 399 ++ .../history/IMPLEMENTATION/API_N8N_GUIDE.md | 239 + .../IMPLEMENTATION/API_WORDPRESS_GUIDE.md | 343 ++ .../IMPLEMENTATION/BUILD_VERSION_RECORD.md | 686 +++ .../IMPLEMENTATION/DEV_3003_REFACTOR.md | 199 + .../FILE_IDENTITY_API_DESIGN.md | 661 +++ .../IMPLEMENTATION/FRESH_MAC_INSTALLATION.md | 726 +++ .../history/IMPLEMENTATION/INSTALL_CADDY.md | 487 ++ .../history/IMPLEMENTATION/INSTALL_GITEA.md | 430 ++ .../IMPLEMENTATION/INSTALL_GITEA_MCP.md | 413 ++ .../history/IMPLEMENTATION/INSTALL_MARIADB.md | 416 ++ .../IMPLEMENTATION/INSTALL_MOMENTRY_API.md | 490 ++ .../history/IMPLEMENTATION/INSTALL_MONGODB.md | 412 ++ .../history/IMPLEMENTATION/INSTALL_N8N.md | 509 ++ .../history/IMPLEMENTATION/INSTALL_OLLAMA.md | 395 ++ .../history/IMPLEMENTATION/INSTALL_PHP.md | 415 ++ .../IMPLEMENTATION/INSTALL_POSTGRESQL.md | 417 ++ .../history/IMPLEMENTATION/INSTALL_QDRANT.md | 492 ++ .../history/IMPLEMENTATION/INSTALL_REDIS.md | 501 ++ .../IMPLEMENTATION/INSTALL_RUSTDESK.md | 320 ++ .../history/IMPLEMENTATION/INSTALL_SFTPGO.md | 1081 +++++ .../IMPLEMENTATION/INSTALL_SYNONYM_FOREST.md | 1315 +++++ .../IMPLEMENTATION/INSTALL_WORDPRESS.md | 352 ++ .../history/IMPLEMENTATION/N8N_DEMO.md | 266 + .../IMPLEMENTATION/N8N_DEMO_EXECUTION_LOG.md | 374 ++ .../IMPLEMENTATION/N8N_DEMO_WORKFLOW.md | 690 +++ .../IMPLEMENTATION/N8N_HTTP_REQUEST_GUIDE.md | 289 ++ .../IMPLEMENTATION/N8N_INTEGRATION_GUIDE.md | 593 +++ .../history/IMPLEMENTATION/N8N_MCP_SETUP.md | 245 + .../IMPLEMENTATION/N8N_MCP_TEST_REPORT.md | 194 + .../N8N_SEARCH_API_COMPARISON.md | 184 + .../N8N_SEARCH_API_TECHNICAL_SPEC.md | 161 + .../IMPLEMENTATION/N8N_SETUP_COMPLETE.md | 171 + .../IMPLEMENTATION/N8N_VIEW_OUTPUT_GUIDE.md | 158 + .../history/IMPLEMENTATION/OPENCODE_GUIDE.md | 448 ++ .../IMPLEMENTATION/OPENCODE_MCP_INSTALL.md | 554 +++ .../IMPLEMENTATION/PERSON_CORRECTION_GUIDE.md | 286 ++ .../PORTAL_BIRTH_UUID_ADAPTATION.md | 267 + .../SEARCH_ACCEPTANCE_CRITERIA.md | 193 + .../IMPLEMENTATION/SERVICE_ADDITION_GUIDE.md | 716 +++ .../IMPLEMENTATION/STAMP_SEARCH_PROGRESS.md | 93 + .../IMPLEMENTATION/SYNONYM_CONFIGURATION.md | 187 + .../IMPLEMENTATION/SYNONYM_FOREST_README.md | 209 + .../IMPLEMENTATION/USER_MANAGEMENT_PLAN.md | 425 ++ .../IMPLEMENTATION/YOLO_RESUME_INTEGRATION.md | 122 + .../MEDIAPIPE_HOLISTIC_INTEGRATION_REPORT.md | 370 ++ .../MOMENTRY_ANALYSIS_RECOMMENDATIONS.md | 223 + .../history/MOMENTRY_INTEGRATION_GUIDE.md | 443 ++ .../OPERATIONS/ARCHITECTURE_REVIEW_REPORT.md | 198 + .../history/OPERATIONS/BACKUP_VERSIONING.md | 468 ++ .../history/OPERATIONS/DOCS_STANDARD.md | 474 ++ .../OPERATIONS/DOCUMENT_AUDIT_REPORT.md | 266 + .../OPERATIONS/FILE_CHANGE_MANAGEMENT.md | 340 ++ .../IMPLEMENTATION_COMPATIBILITY_ANALYSIS.md | 98 + .../OPERATIONS/INCIDENT_RESPONSE_PROCEDURE.md | 457 ++ .../OPERATIONS/INTEGRATED_PLAYER_GUIDE.md | 364 ++ .../OPERATIONS/MOMENTRY_CORE_MONITORING.md | 693 +++ .../OPERATIONS/PROCESSING_PIPELINE.md.bak | 293 ++ .../OPERATIONS/PRODUCTION_DEPLOYMENT_GUIDE.md | 856 ++++ .../OPERATIONS/RELEASE_v0.4.0_2026-04-30.md | 196 + .../TRAINING_MAINTENANCE_RECORDS.md | 522 ++ .../OPERATIONS/VIDEO_REGISTRATION.md.bak | 248 + .../maintenance_records/MIGRATION_PLAN.md | 284 ++ .../OPERATIONS/maintenance_records/README.md | 248 + .../archive/N8N_API_FIX_SUMMARY.md | 106 + .../archive/N8N_MCP_TEST_REPORT.md | 194 + .../maintenance_records/archive/README.md | 27 + ...OCS_STANDARD_PHASE2_APPROVAL_2026_03_27.md | 150 + ...NGE_N8N_MCP_INTEGRATION_TEST_2026_03_23.md | 212 + ...DENT_TEST_SYSTEM_INTEGRATION_2026_03_27.md | 378 ++ ...AN_BIRTH_UUID_IMPLEMENTATION_2026_04_27.md | 486 ++ ...E_MOMENTRY_CORE_DATA_CLEANUP_2026_03_28.md | 496 ++ ...MENTRY_CORE_DATA_CLEANUP_FIX_2026_03_28.md | 129 + .../RCA_TEST_SYSTEM_INTEGRATION_2026_03_27.md | 459 ++ .../RCA_N8N_API_PORT_CONFLICT_2026_03_26.md | 140 + ...RESS_TIMEOUT_EXTERNAL_ACCESS_2026_03_27.md | 401 ++ .../ASR_PROCESSOR_COMPARISON_REPORT.md | 480 ++ ...EW_DOCS_STANDARD_IMPROVEMENT_2026_03_27.md | 254 + .../templates/TEMPLATE_CHANGE.md | 440 ++ .../templates/TEMPLATE_CHANGE_AI_OPTIMIZED.md | 440 ++ .../templates/TEMPLATE_CHANGE_LEGACY.md | 347 ++ .../templates/TEMPLATE_INCIDENT.md | 361 ++ .../TEMPLATE_INCIDENT_AI_OPTIMIZED.md | 361 ++ .../templates/TEMPLATE_INCIDENT_LEGACY.md | 269 ++ .../templates/TEMPLATE_MAINTENANCE.md | 593 +++ .../templates/TEMPLATE_RCA.md | 442 ++ .../templates/TEMPLATE_RCA_AI_OPTIMIZED.md | 442 ++ .../TEMPLATE_RCA_AI_OPTIMIZED_SIMPLE.md | 144 + .../templates/TEMPLATE_RCA_LEGACY.md | 351 ++ .../history/PHASE2_COMPLETION_SUMMARY.md | 228 + .../history/PORTAL_FACE_API_IMPLEMENTATION.md | 294 ++ .../history/PORTAL_FACE_DEMO_PLAN.md | 436 ++ .../PORTAL_FACE_FRONTEND_IMPLEMENTATION.md | 235 + .../history/PORTAL_FACE_VERIFICATION.md | 214 + .../history/PORTAL_UI_INTEGRATION_PROPOSAL.md | 721 +++ .../history/POSE_ACTION_DECODER_GUIDE.md | 378 ++ .../AI_PROCESSOR_MODULE_REVISION_RECORDS.md | 290 ++ .../PROCESSORS/CORE/LOCAL_PROCESSOR_FIX.md | 108 + .../CORE/PROCESSOR_IMPLEMENTATION_STATUS.md | 1165 +++++ ...ESSOR_PERFORMANCE_EVALUATION_2026_04_01.md | 457 ++ .../CORE/PROCESSOR_QUICK_REFERENCE.md | 255 + .../CORE/YOLO_PROCESSOR_TECHNICAL_REVIEW.md | 430 ++ .../AI_DRIVEN_PROCESSOR_CONTRACT.md | 201 + .../AI_PROCESSOR_COMPLIANCE_CHECKLIST.md | 253 + .../SPECIFICATION/PROCESSOR_OUTPUT_SPEC.md | 242 + .../PROCESSOR_STANDARDIZATION_TEMPLATE.md | 493 ++ .../ASRX_REPLACEMENT_MAC_STUDIO_ANALYSIS.md | 621 +++ .../ASRX_SELF_IMPLEMENTATION_FEASIBILITY.md | 505 ++ .../ASRX_SELF_LONG_MOVIE_TEST_REPORT.md | 328 ++ .../ASRX_SELF_LONG_MOVIE_TEST_REPORT_FIXED.md | 275 ++ .../ASRX_SELF_VS_PYANNOTE_COMPARISON.md | 240 + .../SPEECH/ASR_ASRX_SPEAKER_MODEL_ANALYSIS.md | 920 ++++ .../SPEECH/ASR_CONFIGURATION_UNIFICATION.md | 225 + .../PROCESSORS/SPEECH/ASR_IMPROVEMENT_PLAN.md | 185 + .../PROCESSORS/SPEECH/ASR_vs_ASRX_ANALYSIS.md | 348 ++ .../SPEECH/ASR_vs_ASRX_EDGE_AI_ANALYSIS.md | 786 +++ .../ASR_vs_ASRX_REPLACEMENT_ANALYSIS.md | 504 ++ .../PROCESSORS/VISUAL/FACE_MODEL_ANALYSIS.md | 658 +++ ...FACE_RECOGNITION_IMPLEMENTATION_SUMMARY.md | 218 + .../VISUAL/IMAGE_PROCESSING_ARCHITECTURE.md | 334 ++ .../LONG_MOVIE_SCENE_TEST_2026_04_01.md | 202 + .../VISUAL/PLACES365_INSTALLATION.md | 97 + .../VISUAL/PLACES365_MODEL_GUIDE.md | 168 + .../VISUAL/SCENE_CLASSIFICATION_MODULE.md | 407 ++ .../VISUAL/SCENE_CLASSIFICATION_TEST_PLAN.md | 337 ++ ...E_CLASSIFICATION_TEST_REPORT_2026_04_01.md | 212 + ..._CLASSIFICATION_TEST_RESULTS_2026_04_01.md | 151 + .../PROCESSORS/VISUAL/VISUAL_CHUNK_DESIGN.md | 212 + .../_CORE/PROCESSOR_RESUME_STRATEGY.md | 202 + .../_CORE/PROCESSOR_UPGRADE_ANALYSIS.md | 321 ++ .../PROCESSORS/_CORE/RULE_SPECIFICATION.md | 151 + .../history/PROCESSOR_STATUS_ANALYSIS.md | 328 ++ .../PROJECT_DOCS_V1_INTEGRATION_PLAN.md | 220 + .../history/RULE1_CHUNK_INGESTION_CHECK.md | 204 + .../history/RULE1_FACE_DATA_SOURCE_FIX.md | 239 + .../history/RULE1_TRIGGER_MECHANISM.md | 344 ++ .../history/SYNONYM_CONFIGURATION.md | 204 + .../history/TESTING/PLAYGROUND_TEST_PLAN.md | 378 ++ .../history/TESTING/PLAYGROUND_TEST_REPORT.md | 275 ++ .../TESTING/POSTGRESQL_ISOLATION_FIX_PLAN.md | 281 ++ .../history/TESTING/RELEASE_ANALYSIS.md | 195 + .../TESTING/SEARCH_API_UNIFICATION_PLAN.md | 104 + .../TESTING/TEST_AND_BENCHMARK_PLAN.md | 1201 +++++ .../REFERENCE/history/TEST_REPORT_CLI.md | 118 + .../history/TIME_FORMAT_UNIFICATION_PLAN.md | 345 ++ .../REFERENCE/history/UUID_CLEANUP_PLAN.md | 256 + .../REFERENCE/history/UUID_LENGTH_ISSUE.md | 284 ++ .../REFERENCE/history/V4_ISSUES_TRACKING.md | 249 + .../V4_MIGRATION_PHASE3_DISABLE_OLD_API.md | 187 + .../history/VIDEOS_TABLE_NAMING_ISSUE.md | 285 ++ .../REFERENCE/history/compliance_report.md | 197 + .../design/OBJECT_SNAPSHOT_SYSTEM_DESIGN.md | 710 +++ .../history/examples/custom_synonyms.json | 14 + .../examples/examples/custom_synonyms.json | 14 + .../examples/examples/momentry_cred.json | 11 + .../examples/n8n_momentry_search.json | 91 + .../n8n_momentry_search_credential.json | 88 + .../multilingual/language_routing.json | 118 + .../multilingual/multilingual_synonyms.json | 56 + .../unified_multilingual_synonyms.json | 136 + .../history/final_shutdown_instructions.md | 158 + docs_v1.0/REFERENCE/history/note.md | 86 + .../history/phase2_progress_summary.md | 208 + .../REFERENCE/history/session-ses_2f27.md | 4290 +++++++++++++++++ .../history/system_status_after_reboot.md | 149 + .../REFERENCE/n8n_workflow_core_v1.2.json | 818 ++++ docs_v1.0/REFERENCE/n8n_workflow_simple.json | 123 + .../REFERENCE/n8n_workflow_simple_test.json | 89 + .../REFERENCE/n8n_workflow_video_rag_mcp.json | 109 + .../REFERENCE/n8n_workflow_video_search.json | 138 + docs_v1.0/REFERENCE/test_all.sh | 100 + docs_v1.0/REFERENCE/test_momentry_api.sh | 33 + docs_v1.0/REFERENCE/test_workflow.sh | 104 + docs_v1.0/STANDARDS/DOCS_STANDARD.md | 990 ++++ ...TANDARD_IMPROVEMENT_PROPOSAL_2026_03_27.md | 431 ++ 519 files changed, 136077 insertions(+) create mode 100644 .gitignore create mode 100644 docs_v1.0/API_V1.0.0/API_DOCUMENTATION.md create mode 100644 docs_v1.0/API_V1.0.0/API_DOCUMENTATION_v1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/DEPLOY/EMBEDDING_DEPLOYMENT_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/5W1H_AGENT_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/IDENTITY_AGENT_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/API_DICTIONARY_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/API_REFERENCE_v1.0.0.20260501md.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/API_USAGE_DEMO_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/CHILD_DETECTION_AGE_BENCHMARK_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/CHUNK_DEFINITION_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/CLASS_SYSTEM_DESIGN_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/DEV_API_REFERENCE_v1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/DUAL_EMBEDDING_PIPELINE_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/MOMENTRY_CORE_API_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASRX_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASR_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CAPTION_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CUT_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/OCR_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/POSE_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/SCENE_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/STORY_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/VISUAL_CHUNK_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/VOICE_EMBEDDING_FLOW_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/YOLO_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/PROCESSOR_SELECTION_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/RCA_TRACE39_TRACE45_COLLISION_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/TRACE_QUALITY_AGENT_REPORT_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/UUID_ENCODING_RULES_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/INTERNAL/VECTOR_SPEC_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/RELEASE/PIPELINE_PROGRESS_REPORT_V2.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/RELEASE/PRODUCTION_VERIFICATION_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/RELEASE/RELEASE_API_REFERENCE_v1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/RELEASE/RELEASE_TEST_REPORT_v1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/RELEASE/RELEASE_VERIFICATION_V1.0.0.md create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/01_health_Health_check.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/01_health_Health_detailed.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/02_auth_Login.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/02_auth_Logout.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_File_chunks.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_File_detail.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_File_identities.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_File_probe.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_List_files.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_Register_file.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_Scan_files.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_Trigger_processing.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/03_files_Unregister_file.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Bind_face.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Create_identity.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Delete_identity.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Identity_chunks.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Identity_detail.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Identity_files.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_List_identities.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Merge_into.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/04_identity_Unbind_face.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/05_faces_Face_candidates.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_BM25_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Frame_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Hybrid_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Smart_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Universal_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Vector_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Visual_by_class.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Visual_by_density.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Visual_combination.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Visual_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/06_search_Visual_stats.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/07_jobs_Job_detail.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/07_jobs_List_jobs.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/07_jobs_Progress.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/07_jobs_Rule_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/08_resources_List_resources.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/08_resources_Register_resource.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/08_resources_Resource_heartbeat.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_5W1H_analyze.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_5W1H_batch.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_5W1H_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_Identity_agent_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_Identity_analyze.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_Identity_suggest.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_Suggest_merge.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/09_agents_Translate.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/10_stats_Cache_toggle.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/10_stats_Inference_health.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260505_231103/10_stats_SFTPGo_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/01_health_Health_check.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/01_health_Health_detailed.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/02_auth_Login.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/02_auth_Logout.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_File_chunks.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_File_detail.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_File_identities.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_File_probe.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_List_files.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_Register_file.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_Scan_files.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_Trigger_processing.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/03_files_Unregister_file.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Bind_face.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Create_identity.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Delete_identity.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Identity_chunks.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Identity_detail.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Identity_files.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_List_identities.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Merge_into.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/04_identity_Unbind_face.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/05_faces_Face_candidates.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_BM25_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Frame_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Hybrid_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Smart_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Universal_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Vector_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Visual_by_class.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Visual_by_density.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Visual_combination.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Visual_search.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/06_search_Visual_stats.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/07_jobs_Job_detail.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/07_jobs_List_jobs.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/07_jobs_Progress.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/07_jobs_Rule_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/08_resources_List_resources.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/08_resources_Register_resource.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/08_resources_Resource_heartbeat.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_5W1H_analyze.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_5W1H_batch.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_5W1H_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_Identity_agent_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_Identity_analyze.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_Identity_suggest.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_Suggest_merge.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/09_agents_Translate.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/10_stats_Cache_toggle.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/10_stats_Inference_health.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_responses_20260506_132742/10_stats_SFTPGo_status.json create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_test_20260505_230407.md create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_test_20260505_230449.md create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_test_20260505_230751.md create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_test_20260505_231103.md create mode 100644 docs_v1.0/API_V1.0.0/TEST_RESULTS/api_test_20260506_132742.md create mode 100644 docs_v1.0/API_V1.0.0/TRACE/TRACE_API_REFERENCE_V1.0.0.md create mode 100644 docs_v1.0/M4_workspace/2026-05-06_5w1h_verification.md create mode 100644 docs_v1.0/M4_workspace/2026-05-06_5w1h_vs_story_comparison.md create mode 100644 docs_v1.0/M4_workspace/2026-05-06_api_verification.md create mode 100644 docs_v1.0/M4_workspace/2026-05-06_pipeline_test.md create mode 100644 docs_v1.0/M4_workspace/2026-05-06_search_test.md create mode 100644 docs_v1.0/M4_workspace/2026-05-06_vector_data_status.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_M4_M5_pipeline_分工.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_M4_pipeline_failure_analysis.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_M5_proposal_embedding_deployment.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_M5_recent_changes_for_sync.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_ane_embedding_config_change.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_ane_embedding_install_guide.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_ane_embedding_test_plan.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_ane_embedding_test_result.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_embedding_benchmark.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_embedding_benchmark_final.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_embedding_benchmark_m4.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_embedding_models_from_M5.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_export_package_design.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_pdf_processing_discussion.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_pipeline_issues_analysis.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_pipeline_progress_report_template.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_response_to_M5.md create mode 100644 docs_v1.0/M4_workspace/2026-05-07_single_frame_photo_test_report.md create mode 100644 docs_v1.0/M4_workspace/Momentry_API_教材_Marcom.md create mode 100644 docs_v1.0/M4_workspace/convert_embed_to_coreml.py create mode 100644 docs_v1.0/M4_workspace/test_coreml_embed.py create mode 100644 docs_v1.0/M4_workspace/test_coreml_full.py create mode 100644 docs_v1.0/M5_workspace/2026-05-06_bug_chunks_500.md create mode 100644 docs_v1.0/M5_workspace/2026-05-06_bug_search_missing_fps.md create mode 100644 docs_v1.0/M5_workspace/2026-05-06_bug_universal_search_uuid.md create mode 100644 docs_v1.0/M5_workspace/2026-05-06_fix_report.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_5w1h_recursive_summary_design.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_M4_3_embedding_models_ready.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_M4_ANE_embedding_verified.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_M4_ANE_verified.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_M4_llama_embedding_ready.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_M5_to_M4_embedding_plan.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_bug_asr_pre_chunks_missing.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_bug_store_traced_faces_pipeline.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_embedding_model_selection.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_embedding_models_location.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_export_import_identity_merge_analysis.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_gun_detection_evaluation.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_gun_detection_training_log.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_photo_processing_suggestion.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_request_pipeline_M5.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_response_to_M4.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_response_to_M4_trace_api.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_response_to_M4_v2.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_scene_classification_evaluation_report.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_session_summary.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_session_summary_v2.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_sync_to_M4.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_template_condition_fix.md create mode 100644 docs_v1.0/M5_workspace/2026-05-07_visual_speaker_diarization_evaluation.md create mode 100644 docs_v1.0/REFERENCE/API_3002_VS_3003_COMPARISON.md create mode 100644 docs_v1.0/REFERENCE/API_ACCESS.md create mode 100644 docs_v1.0/REFERENCE/API_ENDPOINTS.md create mode 100644 docs_v1.0/REFERENCE/API_ERROR_CODES.md create mode 100644 docs_v1.0/REFERENCE/API_INDEX.md create mode 100644 docs_v1.0/REFERENCE/API_KEY_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/API_QUICK_REFERENCE.md create mode 100644 docs_v1.0/REFERENCE/API_REFERENCE.md create mode 100644 docs_v1.0/REFERENCE/API_TRAINING_MARCOM.md create mode 100644 docs_v1.0/REFERENCE/DEVELOPMENT_LOG.md create mode 100644 docs_v1.0/REFERENCE/DOCUMENT_EMBEDDING_STRATEGY.md create mode 100644 docs_v1.0/REFERENCE/JSON_OUTPUT_SPEC.md create mode 100644 docs_v1.0/REFERENCE/MODULE_STANDARDIZATION_SPECIFICATION.md create mode 100644 docs_v1.0/REFERENCE/MOMENTRY_CORE_REDIS_KEYS.md create mode 100644 docs_v1.0/REFERENCE/MOMENTRY_RAG_PRESENTATION.md create mode 100644 docs_v1.0/REFERENCE/Momentry_Core_API.postman_collection.json create mode 100644 docs_v1.0/REFERENCE/N8N_API_FIX_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/N8N_VIDEO_SEARCH_SUCCESS.md create mode 100644 docs_v1.0/REFERENCE/NODEJS.md create mode 100644 docs_v1.0/REFERENCE/PENDING_ISSUES.md create mode 100644 docs_v1.0/REFERENCE/PLAYGROUND_BINARY_IMPLEMENTATION.md create mode 100644 docs_v1.0/REFERENCE/PORTAL_API_DEMO_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/PORTAL_DEVELOPMENT_PLAN.md create mode 100644 docs_v1.0/REFERENCE/PROCESSING_STATUS_JSONB_SPEC.md create mode 100644 docs_v1.0/REFERENCE/PYTHON.md create mode 100644 docs_v1.0/REFERENCE/RUST_DEVELOPMENT.md create mode 100644 docs_v1.0/REFERENCE/SERVICES.md create mode 100644 docs_v1.0/REFERENCE/SFTPGO_DEMO_USER.md create mode 100644 docs_v1.0/REFERENCE/USER_MANUAL.md create mode 100644 docs_v1.0/REFERENCE/VERSION_MANAGEMENT.md create mode 100644 docs_v1.0/REFERENCE/VIDEO_PROCESSING_SPEC.md create mode 100644 docs_v1.0/REFERENCE/VIDEO_REGISTRATION.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/CONTEXT/METADATA_PROCESSORS.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/CORE/AGENT_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/FACE_SPEAKER_PERSON_IDENTITY_TUTORIAL.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/FACE_SPEAKER_PERSON_PROGRESS.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/FACE_SPEAKER_PERSON_QUICK_START.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/FACE_SPEAKER_PERSON_WORKFLOW.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/FACE_TO_IDENTITY_FLOW.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/FILE_IDENTITIES_TABLE_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/IDENTITY_AGENT_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/IDENTITY_MANAGEMENT_API.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/PHASE1_MIGRATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/PHASE2_MIGRATION_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/V4_MIGRATION_COMPLETE.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/IDENTITY/V4_MIGRATION_STATUS.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/SEARCH/SEARCH_PROMPTS.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/SUMMARIZATION/CHUNK_RULE_4_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/AI_AGENTS/TRANSLATION/TEXT_TRANSLATION.md create mode 100644 docs_v1.0/REFERENCE/history/API_TEST_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/API_KEY_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/API_WORKFLOW_WORDPRESS_N8N.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_DECISION_CARDS.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_DECISION_EXECUTION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_DOCUMENTATION_MAP.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_EVALUATION.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_OVERVIEW.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_REVIEW_PROCESS.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ARCHITECTURE_ROADMAP.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/CACHE_ARCHITECTURE_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/CLIP_EMBEDDING_BENCHMARK_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/DESIGN_IMPLEMENTATION_GAP.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/EVENT_RECOGNITION_TECHNICAL_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/FAQ.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/IDENTITY_REFERENCE_VECTOR_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/JOB_WORKER_IMPLEMENTATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MAC_INSTALLATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MCP_LAZY_LOADING_STRATEGY.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MODULE_STANDARDIZATION_IMPLEMENTATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MOMENTRY_CORE_ARCHITECTURE_V2.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MONITORING_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MONITORING_SETUP_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/MULTIMODAL_SEARCH_DESIGN_V5.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/N8N_DEMO_WORKFLOW.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/N8N_WORKFLOW_VIDEO_RAG_MCP.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/ON_THE_FLY_PROCESSING_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PARENT_CHUNK_COVERAGE_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PERFORMANCE_AND_SCALABILITY.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PERSON_IDENTITY_INTEGRATION.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PERSON_IDENTITY_USAGE_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PIPELINE_AND_RESOURCE_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PLAYGROUND_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/POSE_BASED_MATCHING_OPTIMIZATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/PROCESSING_PIPELINE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/QUICK_START_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/RESOURCE_MANAGEMENT/PROCESSOR_LIFECYCLE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/RESOURCE_MANAGEMENT/PROCESSOR_REGISTRY_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/RESOURCE_MANAGEMENT/RESOURCE_MONITORING_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/RESOURCE_MANAGEMENT/SERVICE_REGISTRY_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/RESOURCE_MANAGEMENT/UNIFIED_RESOURCE_REGISTRY.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/SECURITY_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/SEMANTIC_SEARCH_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/SERVICE_ADDITION_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/SOUND_RECOGNITION_EXTENSION.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/TECHNICAL_DECISION_RECORDS.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/TERMINOLOGY_MAPPING.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/TEST_AND_BENCHMARK_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/USER_MANAGEMENT_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/_deprecated/IDENTITY_SYSTEM_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/_deprecated/SPEAKER_INTEGRATION.md create mode 100644 docs_v1.0/REFERENCE/history/ARCHITECTURE/_deprecated/TMDB_CHARACTER_INTEGRATION.md create mode 100644 docs_v1.0/REFERENCE/history/BODY_ACTION_DECODER_CLASSIFICATION.md create mode 100644 docs_v1.0/REFERENCE/history/CHANGELOG.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNKING_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNKING_ENRICHMENT_PIPELINE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNKING_SCHEMA_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNK_DATA_STRUCTURE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNK_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNK_RULES_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/CORE/CHUNK_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/RULES/SCENE_BASED/CHUNK_RULE_3_COMPOSITE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/RULES/SCENE_BASED/CHUNK_RULE_3_SCENE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/RULES/TEXT_BASED/CHUNK_RULE_1_SENTENCE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/RULES/TEXT_BASED/CHUNK_RULE_1_SIMPLE.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/RULES/VISUAL_BASED/CHUNK_RULE_2_FRAME_OBJECTS.md create mode 100644 docs_v1.0/REFERENCE/history/CHUNKING/RULES/VISUAL_BASED/CHUNK_RULE_2_VISUAL.md create mode 100644 docs_v1.0/REFERENCE/history/EXPERIMENT_REPORTS/FACE_PROCESSOR_PERFORMANCE_2026-04-28.md create mode 100644 docs_v1.0/REFERENCE/history/EXPERIMENT_REPORTS/FACE_TRACKER_INTEGRATION_REPORT_2026-04-28.md create mode 100644 docs_v1.0/REFERENCE/history/EXPERIMENT_REPORTS/IDENTITY_SYSTEM_EXPERIMENT_2026-04-28.md create mode 100644 docs_v1.0/REFERENCE/history/EXPERIMENT_REPORTS/LANDMARKS_SOURCE_ANALYSIS_2026-04-28.md create mode 100644 docs_v1.0/REFERENCE/history/EXPERIMENT_REPORTS/ONE_TO_MANY_MATCHING_OPTIMIZATION_2026-04-28.md create mode 100644 docs_v1.0/REFERENCE/history/EXPERIMENT_REPORTS/POSE_BASED_MATCHING_FINAL_REPORT_2026-04-28.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_ANALYSIS_FINAL_ANSWER.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_LEARNING_VERIFICATION.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_RECOGNITION_DEPLOYMENT.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_RECOGNITION_FINAL_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_RECOGNITION_FINAL_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_THUMBNAIL_IMPLEMENTATION.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_TRACKER_DATA_STRUCTURE.md create mode 100644 docs_v1.0/REFERENCE/history/FACE_TRACKER_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/FEMALE_FACES_EXTRACTION_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/FILE_UUID_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/IDENTITY_API_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/AI_AGENT_DOCUMENTATION_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_CURL_EXAMPLES.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_EXAMPLES.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_KEY_INTEGRATION_TESTS.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_KEY_MANAGEMENT.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_KEY_OPTIMIZATION.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_N8N_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/API_WORDPRESS_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/BUILD_VERSION_RECORD.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/DEV_3003_REFACTOR.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/FILE_IDENTITY_API_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/FRESH_MAC_INSTALLATION.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_CADDY.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_GITEA.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_GITEA_MCP.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_MARIADB.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_MOMENTRY_API.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_MONGODB.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_N8N.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_OLLAMA.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_PHP.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_POSTGRESQL.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_QDRANT.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_REDIS.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_RUSTDESK.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_SFTPGO.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_SYNONYM_FOREST.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/INSTALL_WORDPRESS.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_DEMO.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_DEMO_EXECUTION_LOG.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_DEMO_WORKFLOW.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_HTTP_REQUEST_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_INTEGRATION_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_MCP_SETUP.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_MCP_TEST_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_SEARCH_API_COMPARISON.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_SEARCH_API_TECHNICAL_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_SETUP_COMPLETE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/N8N_VIEW_OUTPUT_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/OPENCODE_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/OPENCODE_MCP_INSTALL.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/PERSON_CORRECTION_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/PORTAL_BIRTH_UUID_ADAPTATION.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/SEARCH_ACCEPTANCE_CRITERIA.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/SERVICE_ADDITION_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/STAMP_SEARCH_PROGRESS.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/SYNONYM_CONFIGURATION.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/SYNONYM_FOREST_README.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/USER_MANAGEMENT_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/IMPLEMENTATION/YOLO_RESUME_INTEGRATION.md create mode 100644 docs_v1.0/REFERENCE/history/MEDIAPIPE_HOLISTIC_INTEGRATION_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/MOMENTRY_ANALYSIS_RECOMMENDATIONS.md create mode 100644 docs_v1.0/REFERENCE/history/MOMENTRY_INTEGRATION_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/ARCHITECTURE_REVIEW_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/BACKUP_VERSIONING.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/DOCS_STANDARD.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/DOCUMENT_AUDIT_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/FILE_CHANGE_MANAGEMENT.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/IMPLEMENTATION_COMPATIBILITY_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/INCIDENT_RESPONSE_PROCEDURE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/INTEGRATED_PLAYER_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/MOMENTRY_CORE_MONITORING.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/PROCESSING_PIPELINE.md.bak create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/PRODUCTION_DEPLOYMENT_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/RELEASE_v0.4.0_2026-04-30.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/TRAINING_MAINTENANCE_RECORDS.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/VIDEO_REGISTRATION.md.bak create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/MIGRATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/README.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/archive/N8N_API_FIX_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/archive/N8N_MCP_TEST_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/archive/README.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/changes/_completed/CHANGE_DOCS_STANDARD_PHASE2_APPROVAL_2026_03_27.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/changes/_completed/CHANGE_N8N_MCP_INTEGRATION_TEST_2026_03_23.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/incidents/_active/INCIDENT_TEST_SYSTEM_INTEGRATION_2026_03_27.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/plans/_active/PLAN_BIRTH_UUID_IMPLEMENTATION_2026_04_27.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/plans/_completed/MAINTENANCE_MOMENTRY_CORE_DATA_CLEANUP_2026_03_28.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/plans/_completed/MAINTENANCE_MOMENTRY_CORE_DATA_CLEANUP_FIX_2026_03_28.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/rca/_active/RCA_TEST_SYSTEM_INTEGRATION_2026_03_27.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/rca/_completed/RCA_N8N_API_PORT_CONFLICT_2026_03_26.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/rca/_completed/RCA_WORDPRESS_TIMEOUT_EXTERNAL_ACCESS_2026_03_27.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/reviews/ASR_PROCESSOR_COMPARISON_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/reviews/REVIEW_DOCS_STANDARD_IMPROVEMENT_2026_03_27.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_CHANGE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_CHANGE_AI_OPTIMIZED.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_CHANGE_LEGACY.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_INCIDENT.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_INCIDENT_AI_OPTIMIZED.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_INCIDENT_LEGACY.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_MAINTENANCE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_RCA.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_RCA_AI_OPTIMIZED.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_RCA_AI_OPTIMIZED_SIMPLE.md create mode 100644 docs_v1.0/REFERENCE/history/OPERATIONS/maintenance_records/templates/TEMPLATE_RCA_LEGACY.md create mode 100644 docs_v1.0/REFERENCE/history/PHASE2_COMPLETION_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/PORTAL_FACE_API_IMPLEMENTATION.md create mode 100644 docs_v1.0/REFERENCE/history/PORTAL_FACE_DEMO_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/PORTAL_FACE_FRONTEND_IMPLEMENTATION.md create mode 100644 docs_v1.0/REFERENCE/history/PORTAL_FACE_VERIFICATION.md create mode 100644 docs_v1.0/REFERENCE/history/PORTAL_UI_INTEGRATION_PROPOSAL.md create mode 100644 docs_v1.0/REFERENCE/history/POSE_ACTION_DECODER_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/CORE/AI_PROCESSOR_MODULE_REVISION_RECORDS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/CORE/LOCAL_PROCESSOR_FIX.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/CORE/PROCESSOR_IMPLEMENTATION_STATUS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/CORE/PROCESSOR_PERFORMANCE_EVALUATION_2026_04_01.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/CORE/PROCESSOR_QUICK_REFERENCE.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/CORE/YOLO_PROCESSOR_TECHNICAL_REVIEW.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPECIFICATION/AI_DRIVEN_PROCESSOR_CONTRACT.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPECIFICATION/AI_PROCESSOR_COMPLIANCE_CHECKLIST.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPECIFICATION/PROCESSOR_OUTPUT_SPEC.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPECIFICATION/PROCESSOR_STANDARDIZATION_TEMPLATE.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASRX_REPLACEMENT_MAC_STUDIO_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASRX_SELF_IMPLEMENTATION_FEASIBILITY.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASRX_SELF_LONG_MOVIE_TEST_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASRX_SELF_LONG_MOVIE_TEST_REPORT_FIXED.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASRX_SELF_VS_PYANNOTE_COMPARISON.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASR_ASRX_SPEAKER_MODEL_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASR_CONFIGURATION_UNIFICATION.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASR_IMPROVEMENT_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASR_vs_ASRX_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASR_vs_ASRX_EDGE_AI_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/SPEECH/ASR_vs_ASRX_REPLACEMENT_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/FACE_MODEL_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/FACE_RECOGNITION_IMPLEMENTATION_SUMMARY.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/IMAGE_PROCESSING_ARCHITECTURE.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/LONG_MOVIE_SCENE_TEST_2026_04_01.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/PLACES365_INSTALLATION.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/PLACES365_MODEL_GUIDE.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/SCENE_CLASSIFICATION_MODULE.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/SCENE_CLASSIFICATION_TEST_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/SCENE_CLASSIFICATION_TEST_REPORT_2026_04_01.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/SCENE_CLASSIFICATION_TEST_RESULTS_2026_04_01.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/VISUAL/VISUAL_CHUNK_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/_CORE/PROCESSOR_RESUME_STRATEGY.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/_CORE/PROCESSOR_UPGRADE_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSORS/_CORE/RULE_SPECIFICATION.md create mode 100644 docs_v1.0/REFERENCE/history/PROCESSOR_STATUS_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/PROJECT_DOCS_V1_INTEGRATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/RULE1_CHUNK_INGESTION_CHECK.md create mode 100644 docs_v1.0/REFERENCE/history/RULE1_FACE_DATA_SOURCE_FIX.md create mode 100644 docs_v1.0/REFERENCE/history/RULE1_TRIGGER_MECHANISM.md create mode 100644 docs_v1.0/REFERENCE/history/SYNONYM_CONFIGURATION.md create mode 100644 docs_v1.0/REFERENCE/history/TESTING/PLAYGROUND_TEST_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/TESTING/PLAYGROUND_TEST_REPORT.md create mode 100644 docs_v1.0/REFERENCE/history/TESTING/POSTGRESQL_ISOLATION_FIX_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/TESTING/RELEASE_ANALYSIS.md create mode 100644 docs_v1.0/REFERENCE/history/TESTING/SEARCH_API_UNIFICATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/TESTING/TEST_AND_BENCHMARK_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/TEST_REPORT_CLI.md create mode 100644 docs_v1.0/REFERENCE/history/TIME_FORMAT_UNIFICATION_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/UUID_CLEANUP_PLAN.md create mode 100644 docs_v1.0/REFERENCE/history/UUID_LENGTH_ISSUE.md create mode 100644 docs_v1.0/REFERENCE/history/V4_ISSUES_TRACKING.md create mode 100644 docs_v1.0/REFERENCE/history/V4_MIGRATION_PHASE3_DISABLE_OLD_API.md create mode 100644 docs_v1.0/REFERENCE/history/VIDEOS_TABLE_NAMING_ISSUE.md create mode 100644 docs_v1.0/REFERENCE/history/compliance_report.md create mode 100644 docs_v1.0/REFERENCE/history/design/OBJECT_SNAPSHOT_SYSTEM_DESIGN.md create mode 100644 docs_v1.0/REFERENCE/history/examples/custom_synonyms.json create mode 100644 docs_v1.0/REFERENCE/history/examples/examples/custom_synonyms.json create mode 100644 docs_v1.0/REFERENCE/history/examples/examples/momentry_cred.json create mode 100644 docs_v1.0/REFERENCE/history/examples/examples/n8n_momentry_search.json create mode 100644 docs_v1.0/REFERENCE/history/examples/examples/n8n_momentry_search_credential.json create mode 100644 docs_v1.0/REFERENCE/history/examples/multilingual/language_routing.json create mode 100644 docs_v1.0/REFERENCE/history/examples/multilingual/multilingual_synonyms.json create mode 100644 docs_v1.0/REFERENCE/history/examples/multilingual/unified_multilingual_synonyms.json create mode 100644 docs_v1.0/REFERENCE/history/final_shutdown_instructions.md create mode 100644 docs_v1.0/REFERENCE/history/note.md create mode 100644 docs_v1.0/REFERENCE/history/phase2_progress_summary.md create mode 100644 docs_v1.0/REFERENCE/history/session-ses_2f27.md create mode 100644 docs_v1.0/REFERENCE/history/system_status_after_reboot.md create mode 100644 docs_v1.0/REFERENCE/n8n_workflow_core_v1.2.json create mode 100644 docs_v1.0/REFERENCE/n8n_workflow_simple.json create mode 100644 docs_v1.0/REFERENCE/n8n_workflow_simple_test.json create mode 100644 docs_v1.0/REFERENCE/n8n_workflow_video_rag_mcp.json create mode 100644 docs_v1.0/REFERENCE/n8n_workflow_video_search.json create mode 100755 docs_v1.0/REFERENCE/test_all.sh create mode 100755 docs_v1.0/REFERENCE/test_momentry_api.sh create mode 100755 docs_v1.0/REFERENCE/test_workflow.sh create mode 100644 docs_v1.0/STANDARDS/DOCS_STANDARD.md create mode 100644 docs_v1.0/STANDARDS/DOCS_STANDARD_IMPROVEMENT_PROPOSAL_2026_03_27.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..4d6a46c --- /dev/null +++ b/.gitignore @@ -0,0 +1,17 @@ +target/ +.DS_Store +.env +.env.development +*.gguf +*.mlpackage +*.pt +*.pth +*.bin +*.onnx +*.zip +*.tar.gz +venv/ +__pycache__/ +node_modules/ +*.log +/tmp/ diff --git a/docs_v1.0/API_V1.0.0/API_DOCUMENTATION.md b/docs_v1.0/API_V1.0.0/API_DOCUMENTATION.md new file mode 100644 index 0000000..0455a2f --- /dev/null +++ b/docs_v1.0/API_V1.0.0/API_DOCUMENTATION.md @@ -0,0 +1,1211 @@ +# Momentry Core API v1.0.0 + +**Release**: v1.0.0 +**Last Updated**: 2026-05-06 +**Base URL**: `http://{host}:{port}` (dev: 3003, prod: 3002) + +--- + +## Authentication + +### API Key (Protected Routes) + +``` +Header: X-API-Key: +``` + +Protected routes require a valid API key in the `X-API-Key` header. Unauthorized requests return `401 Unauthorized`. + +### Login (Unprotected) + +``` +POST /api/v1/auth/login +Content-Type: application/json + +{ + "username": "string", + "password": "string" +} +``` + +Response `200`: +```json +{ + "success": true, + "message": "Login successful", + "api_key": "muser_xxx_xxx", + "user": { "id": 1, "name": "string" } +} +``` + +--- + +## 1. File Management + +### 1.1 Register File + +Registers a video file into the system. Runs ffprobe probe + scene detection synchronously. + +``` +POST /api/v1/files/register +X-API-Key: +Content-Type: application/json + +{ + "file_path": "/path/to/video.mp4", + "pattern": null, + "user_id": null +} +``` + +Response `200`: +```json +{ + "success": true, + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "file_path": "/path/to/video.mp4", + "file_type": "video", + "duration": 6879.0, + "width": 1920, + "height": 1080, + "fps": 25.0, + "total_frames": 171975, + "registration_time": "2026-05-06T12:00:00Z", + "already_exists": false, + "message": "File registered successfully" +} +``` + +### 1.2 Unregister File + +``` +POST /api/v1/unregister +X-API-Key: +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "file_path": null, + "pattern": null +} +``` + +Response `200`: +```json +{ + "success": true, + "uuid": "32-char-hex-string", + "message": "File unregistered successfully", + "deleted_face_detections": 6186, + "deleted_processor_results": 42, + "deleted_chunks": 10546 +} +``` + +### 1.3 Scan Files + +Scans the configured watch directory and reports all files found. + +``` +GET /api/v1/files/scan +X-API-Key: +``` + +Response `200`: +```json +{ + "files": [ + { + "name": "video.mp4", + "path": "/data/demo/video.mp4", + "size": 1600000000, + "is_registered": true, + "file_uuid": "32-char-hex-string" + } + ], + "total": 22, + "registered_count": 20, + "unregistered_count": 2 +} +``` + +### 1.4 File Probe + +Returns ffprobe metadata for a registered video. + +``` +GET /api/v1/file/{file_uuid}/probe +X-API-Key: +``` + +Response `200`: +```json +{ + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "duration": 6785.0, + "width": 1920, + "height": 1080, + "fps": 25.0, + "total_frames": 169625, + "cached": true, + "format": "mov,mp4,m4a,3gp,3g2,mj2", + "streams": [ + { "index": 0, "codec_type": "video", "codec_name": "av1", "width": 1920, "height": 1080 }, + { "index": 1, "codec_type": "audio", "codec_name": "opus", "sample_rate": 48000, "channels": 2 } + ] +} +``` + +### 1.5 Trigger Processing + +Triggers video processing pipeline for the specified processors. + +``` +POST /api/v1/file/{file_uuid}/process +X-API-Key: +Content-Type: application/json + +{ + "processors": ["asr", "cut", "yolo", "ocr", "face", "pose", "asrx"] +} +``` + +Response `200`: +```json +{ + "job_id": 139, + "file_uuid": "32-char-hex-string", + "status": "PENDING", + "pids": [], + "message": "Processing triggered for video.mp4" +} +``` + +### 1.6 List Pre-Chunks + +Lists pre-chunks (raw processor output) for a video with pagination. + +``` +GET /api/v1/file/{file_uuid}/chunks +X-API-Key: +Query: ?processor_type=face&page=1&page_size=20 +``` + +Response `200`: +```json +{ + "pre_chunks": [ + { + "id": 537507, + "processor_type": "asr", + "coordinate_type": "time", + "coordinate_index": 0, + "start_frame": null, + "end_frame": null, + "start_time": 1.66, + "end_time": 18.95, + "fps": 24.0, + "data": { "text": "Hello and welcome...", "language": "en" }, + "created_at": "2026-05-06T12:00:00.000000Z" + } + ], + "count": 3, + "page": 1, + "page_size": 20 +} +``` + +### 1.7 List Jobs + +``` +GET /api/v1/jobs +X-API-Key: +Query: ?page=1&page_size=10&status=completed +``` + +Response `200`: +```json +{ + "jobs": [ + { + "id": 139, + "uuid": "32-char-hex-string", + "status": "completed", + "current_processor": null + } + ], + "count": 1, + "page": 1, + "page_size": 10 +} +``` + +### 1.8 Get Progress + +``` +GET /api/v1/progress/{uuid} +X-API-Key: +``` + +Response `200`: +```json +{ + "file_uuid": "32-char-hex-string", + "overall_progress": 100.0, + "processors": [ + { "type": "asr", "status": "completed", "progress": 100.0 }, + { "type": "face", "status": "completed", "progress": 100.0 } + ] +} +``` + +--- + +## 2. Videos List (Unprotected) + +### 2.1 List Videos + +``` +GET /api/v1/files +Query: ?page=1&page_size=10&uuid=xxx +``` + +Response `200`: +```json +{ + "success": true, + "total": 25, + "page": 1, + "page_size": 10, + "data": [ + { + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "file_path": "/data/demo/video.mp4", + "duration": 6785.0, + "status": "completed", + "created_at": "2026-05-06T12:00:00Z" + } + ] +} +``` + +### 2.2 Get File Detail + +``` +GET /api/v1/file/{file_uuid} +``` + +Response `200`: +```json +{ + "success": true, + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "file_path": "/data/demo/video.mp4", + "metadata": {}, + "created_at": "2026-05-06T12:00:00Z" +} +``` + +### 2.3 Get File Identities + +``` +GET /api/v1/file/{file_uuid}/identities +Query: ?page=1&page_size=20 +``` + +--- + +## 3. Media & Video Streaming + +### 3.1 Stream Video + +Streams video with HTTP range support for seeking. + +``` +GET /api/v1/file/{file_uuid}/video +Headers: Range: bytes=0-1000000 +``` + +Returns `video/mp4` binary with `206 Partial Content` if Range header provided. + +### 3.2 BBOX Overlay Video + +Returns video with face bounding boxes overlaid. + +``` +GET /api/v1/file/{file_uuid}/video/bbox +Query: ?start=0&end=300&face_uuid=xxx +``` + +Returns `video/mp4` binary with red bboxes drawn at frame intervals. + +### 3.3 Trace Video + +Returns video highlighting a specific face trace with text label. + +``` +GET /api/v1/file/{file_uuid}/trace/{trace_id}/video +Query: ?padding=1 +``` + +Returns `video/mp4` binary. Shows face trace with ID label held at last detection position. + +### 3.4 Thumbnail + +Extracts a single frame as JPEG thumbnail. + +``` +GET /api/v1/file/{file_uuid}/thumbnail +Query: ?frame=840&x=0&y=0&w=100&h=100 +``` + +Returns `image/jpeg` binary. + +--- + +## 4. Identity Management + +### 4.1 List Identities (Protected) + +``` +GET /api/v1/identities +X-API-Key: +Query: ?page=1&page_size=20 +``` + +Response `200`: +```json +{ + "identities": [ + { + "identity_uuid": "uuid-string", + "name": "Cary Grant", + "identity_type": "actor", + "face_count": 120, + "confidence": 0.95 + } + ], + "count": 41, + "page": 1, + "page_size": 20 +} +``` + +### 4.2 Create Identity + +``` +POST /api/v1/identity +X-API-Key: +Content-Type: application/json + +{ + "face_json_path": "/path/to/face.json", + "identity_name": "Cary Grant" +} +``` + +### 4.3 Get Identity Detail (Unprotected) + +``` +GET /api/v1/identity/{identity_uuid} +``` + +Response `200`: +```json +{ + "success": true, + "uuid": "identity-uuid", + "name": "Cary Grant", + "identity_type": "actor", + "source": "tmdb", + "status": "active", + "metadata": {}, + "reference_data": {}, + "tmdb_id": 1234, + "tmdb_profile": "/path/to/profile.jpg", + "created_at": "2026-01-01T00:00:00Z", + "updated_at": "2026-05-06T00:00:00Z" +} +``` + +> `tmdb_id` 和 `tmdb_profile` 只有在 `identity_type` 為 `"actor"` 時才會出現。其他類型(如 `"stranger"`)無此欄位。 + + +### 4.4 Delete Identity + +``` +DELETE /api/v1/identity/{identity_uuid} +``` + +Returns `204 No Content`. + +### 4.5 Get Identity Files + +``` +GET /api/v1/identity/{identity_uuid}/files +Query: ?page=1&page_size=20 +``` + +### 4.6 Get Identity Chunks + +``` +GET /api/v1/identity/{identity_uuid}/chunks +Query: ?page=1&page_size=20 +``` + +### 4.7 List Face Candidates (Protected) + +``` +GET /api/v1/faces/candidates +X-API-Key: +Query: ?file_uuid=xxx&min_confidence=0.5&page=1&page_size=20 +``` + +### 4.8 Bind Face to Identity + +``` +POST /api/v1/identity/{identity_uuid}/bind +X-API-Key: +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "face_id": "face_123" +} +``` + +Response `200`: +```json +{ + "success": true, + "message": "Face bound to identity", + "data": { "rows_affected": 1 } +} +``` + +### 4.9 Unbind Face from Identity + +``` +POST /api/v1/identity/{identity_uuid}/unbind +X-API-Key: +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "face_id": "face_123" +} +``` + +### 4.10 Merge Identities + +``` +POST /api/v1/identity/{from_uuid}/mergeinto +X-API-Key: +Content-Type: application/json + +{ + "into_uuid": "target-identity-uuid", + "keep_history": true +} +``` + +--- + +## 5. Search + +### 5.1 Universal Search + +Multi-type search across chunks, frames, and persons. + +``` +POST /api/v1/search/universal +Content-Type: application/json + +{ + "query": "Cary Grant", + "uuid": "32-char-hex-string", + "types": ["chunk", "frame", "person"], + "time_range": null, + "filters": null, + "limit": 10, + "offset": 0 +} +``` + +Response `200`: +```json +{ + "query": "Cary Grant", + "results": [ + { + "type": "chunk", + "chunk_id": "chunk_123", + "score": 0.9, + "text": "[59s-77s] Cast: Cary Grant, Walter Matthau.", + "start_time": 59.0, + "end_time": 77.0, + "start_frame": 1475, + "end_frame": 1925, + "fps": 25.0, + "speaker_id": null, + "metadata": {} + } + ], + "total": 3, + "took_ms": 45 +} +``` + +### 5.2 Smart Search + +LLM-powered search with query understanding. + +``` +POST /api/v1/search/smart +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "query": "who said how do you shave in there?", + "limit": 10 +} +``` + +Response `200`: +```json +{ + "query": "who said how do you shave in there?", + "results": [ + { + "chunk_id": "chunk_123", + "type": "sentence", + "score": 0.95, + "text": "[2035s-2038s] Cary Grant: \"how do you shave in there?\"", + "start_time": 2035.09, + "end_time": 2037.62, + "start_frame": 50877, + "end_frame": 50940, + "fps": 25.0 + } + ], + "strategy": "semantic" +} +``` + +### 5.3 Frame Search + +Search individual video frames by object class, OCR text, or face. + +``` +POST /api/v1/search/frames +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "object_class": "person", + "ocr_text": "welcome", + "face_id": null, + "time_range": null, + "limit": 20 +} +``` + +Response `200`: +```json +{ + "frames": [ + { + "frame_number": 54, + "timestamp": 2.16, + "score": 0.85, + "objects": ["person"], + "ocr_texts": ["welcome"], + "faces": ["face_1"], + "pose_persons": [] + } + ], + "total": 1 +} +``` + +### 5.4 Visual Chunk Search + +Searches for visual chunks (time segments with object detections) matching criteria. + +``` +POST /api/v1/search/visual +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "criteria": { + "min_unique_classes": 2, + "required_classes": ["person", "car"] + } +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "object_classes": ["person", "car"], + "total_objects": 5 + } + ], + "total": 1 +} +``` + +### 5.5 Visual Chunk Search by Class + +``` +POST /api/v1/search/visual/class +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "object_class": "car", + "min_count": 1, + "max_count": 10 +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "object_class": "car", + "count": 3 + } + ], + "total": 1 +} +``` + +### 5.6 Visual Chunk Search by Density + +``` +POST /api/v1/search/visual/density +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "min_density": 0.1, + "max_density": 0.8 +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "density": 0.35 + } + ], + "total": 1 +} +``` + +### 5.7 Visual Chunk Stats + +``` +POST /api/v1/search/visual/stats +Content-Type: application/json + +{ + "uuid": "32-char-hex-string" +} +``` + +Response `200`: +```json +{ + "uuid": "32-char-hex-string", + "stats": { + "total_chunks": 45, + "total_frames": 18000, + "unique_classes": ["person", "car", "dog"], + "class_counts": { "person": 120, "car": 30, "dog": 5 } + } +} +``` + +### 5.8 Visual Chunk Search by Combination + +``` +POST /api/v1/search/visual/combination +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "combination": [["person", 1], ["car", 1]] +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "combination": [["person", 2], ["car", 1]], + "total_objects": 3 + } + ], + "total": 1 +} +``` + +--- + +## 6. Agents + +### 6.1 Translate Text + +``` +POST /api/v1/agents/translate +Content-Type: application/json + +{ + "text": "Hello world", + "target_language": "zh-TW", + "source_language": null +} +``` + +Response `200`: +```json +{ + "success": true, + "translated_text": "你好世界", + "source_language_detected": "en", + "model_used": "gemma4" +} +``` + +### 6.2 5W1H Analyze + +Generates 5W1H+ summary for scenes in a video using LLM. + +``` +POST /api/v1/agents/5w1h/analyze +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "scene_group_size": 7, + "model": "gemma-4-31B-it-Q5_K_M.gguf" +} +``` + +Response `200`: +```json +{ + "success": true, + "file_uuid": "32-char-hex-string", + "summaries": [ + { + "scene_number": 1, + "start_time": 59.0, + "end_time": 302.0, + "summary": "Cary Grant and Audrey Hepburn engage in a tense conversation...", + "who": "Cary Grant, Audrey Hepburn", + "what": "Conversation about a mysterious situation", + "where": "Paris apartment", + "when": "1963", + "why": "To uncover the truth about the stolen money", + "how": "Through dialogue and interrogation" + } + ], + "processing_status": { "status": "completed", "progress": 100.0 } +} +``` + +### 6.3 5W1H Batch + +``` +POST /api/v1/agents/5w1h/batch +Content-Type: application/json + +{ + "file_uuids": ["uuid1", "uuid2"], + "scene_group_size": 7 +} +``` + +### 6.4 5W1H Status + +``` +GET /api/v1/agents/5w1h/status +``` + +Response `200`: +```json +{ + "success": true, + "videos": [] +} +``` + +### 6.5 Identity Analyze + +``` +POST /api/v1/agents/identity/analyze +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "use_llm": true, + "model": "gemma-4-31B-it-Q5_K_M.gguf" +} +``` + +### 6.6 Suggest Merges + +``` +POST /api/v1/agents/identity/suggest +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string" +} +``` + +### 6.7 Identity Agent Status + +``` +GET /api/v1/agents/identity/status +``` + +### 6.8 Suggest Clustering + +``` +POST /api/v1/agents/suggest/clustering +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "min_cluster_size": 3, + "similarity_threshold": 0.7 +} +``` + +### 6.9 Suggest Merge + +``` +POST /api/v1/agents/suggest/merge +Content-Type: application/json + +{ + "identity_id": "identity-uuid", + "similarity_threshold": 0.75 +} +``` + +--- + +## 7. System & Configuration + +### 7.1 Health + +``` +GET /health +``` + +Response `200`: +```json +{ + "status": "ok", + "version": "1.0.0", + "uptime_ms": 1397142 +} +``` + +### 7.2 Detailed Health + +``` +GET /health/detailed +``` + +Response `200`: +```json +{ + "status": "ok", + "version": "1.0.0", + "uptime_ms": 1397142, + "services": { + "postgres": { "status": "ok" }, + "redis": { "status": "ok" }, + "qdrant": { "status": "ok" }, + "mongodb": { "status": "ok" } + } +} +``` + +### 7.3 Ingest Stats + +``` +GET /api/v1/stats/ingest +``` + +Response `200`: +```json +{ + "total_videos": 25, + "total_chunks": 10546, + "sentence_chunks": 7547, + "cut_chunks": 0, + "time_chunks": 0, + "searchable_chunks": 4, + "chunks_with_visual": 0, + "chunks_with_summary": 0, + "pending_videos": 1 +} +``` + +### 7.4 SFTPGo Status + +``` +GET /api/v1/stats/sftpgo +``` + +### 7.5 Inference Health + +``` +GET /api/v1/stats/inference +``` + +Response `200`: +```json +{ + "embedding": { + "engine": "EmbeddingGemma 300M (Python MPS)", + "model": "embeddinggemma-300m", + "status": "ok", + "latency_ms": 10, + "error": null + }, + "llm": { + "engine": "llama-server (Gemma4 26B MoE)", + "model": "google_gemma-4-26B-A4B-it-Q5_K_M.gguf", + "status": "ok", + "latency_ms": 0, + "error": null + } +} +``` + +### 7.6 Cache Toggle + +``` +POST /api/v1/config/cache +X-API-Key: +Content-Type: application/json + +{ + "enabled": true +} +``` + +--- + +## 8. Resource Management (Unprotected) + +### 8.1 List Resources + +``` +GET /api/v1/resources +``` + +### 8.2 Register Resource + +``` +POST /api/v1/resource/register +Content-Type: application/json + +{ + "resource_id": "worker-01", + "resource_type": "processor", + "category": "ml", + "capabilities": ["face", "asr"], + "config": {}, + "metadata": {} +} +``` + +### 8.3 Resource Heartbeat + +``` +POST /api/v1/resource/heartbeat +Content-Type: application/json + +{ + "resource_id": "worker-01", + "status": "running" +} +``` + +--- + +## 9. Auth Endpoints + +### 9.1 Login + +``` +POST /api/v1/auth/login +Content-Type: application/json + +{ + "username": "admin", + "password": "password" +} +``` + +Response `200` (success): +```json +{ + "success": true, + "message": "Login successful", + "api_key": "muser_xxx_xxx", + "user": { "id": 1, "name": "admin" } +} +``` + +Response `200` (failure): +```json +{ + "success": false, + "message": "Invalid username or password", + "api_key": null, + "user": null +} +``` + +### 9.2 Logout + +``` +POST /api/v1/auth/logout +``` + +Response `200`: +```json +{ "success": true } +``` + +--- + +## Common Error Responses + +### 401 Unauthorized +```json +No body (empty response) +``` + +### 404 Not Found +```json +No body (empty response) +``` + +### 500 Internal Server Error +```json +No body (empty response) +``` + +Or error text for agent endpoints: +```json +{ "error": "error description" } +``` + +--- + +## Processor Reference + +| Processor | Script | Description | Dependencies | Default | +|-----------|--------|-------------|-------------|---------| +| `asr` | `asr_processor.py` | Speech-to-text (faster-whisper) | None | Yes | +| `asrx` | `asrx_processor.py` | Speaker diarization | asr | Yes | +| `cut` | `cut_processor.py` | Scene detection (PySceneDetect) | None | Yes | +| `yolo` | `yolo_processor.py` | Object detection (YOLO) | None | Yes | +| `ocr` | `ocr_processor.py` | Text recognition | None | Yes | +| `face` | `face_processor.py` | Face detection + recognition (Vision + FaceNet) | None | Yes | +| `pose` | `pose_processor.py` | Pose estimation | None | Yes | +| `visual_chunk` | — | Visual object-based chunking | yolo | No | +| `story` | — | Narrative generation | asr + asrx + cut + yolo + face | No | + +--- + +## Post-Processing Pipeline + +After all specified processors complete, the system triggers: + +| Step | Trigger | Description | +|------|---------|-------------| +| **Rule 1 Chunking** | ASR + ASRX completed | Converts ASR segments into `sentence` chunks in `dev.chunks` | +| **Face Trace** | Face completed | Runs `store_traced_faces.py` to assign trace_ids, stores in `dev.face_detections` | +| **Qdrant Face Sync** | After Face Trace | Syncs face embeddings to Qdrant `_face` collection | +| **Rule 3 Scene Chunking** | All processors completed | Groups sentence chunks by scene boundaries, generates LLM 5W1H summaries | +| **5W1H Agent** | After Rule 3 | Generates 5W1H+ analysis for each scene | + +--- + +## Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `DATABASE_URL` | `postgres://accusys@localhost:5432/momentry` | PostgreSQL connection | +| `DATABASE_SCHEMA` | `public` | Database schema name | +| `MOMENTRY_SERVER_PORT` | `3002` (prod) / `3003` (dev) | API server port | +| `MOMENTRY_REDIS_PREFIX` | `momentry:` / `momentry_dev:` | Redis key prefix | +| `MOMENTRY_API_KEY` | — | API key for authentication | +| `MOMENTRY_OUTPUT_DIR` | `~/momentry/output` | Output JSON directory | +| `MOMENTRY_SCRIPTS_DIR` | `./scripts` | Python scripts directory | +| `MOMENTRY_PYTHON_PATH` | `python3` | Python interpreter path | +| `MOMENTRY_LLM_SUMMARY_URL` | `http://127.0.0.1:8081/v1/chat/completions` | LLM endpoint for 5W1H | +| `MOMENTRY_LLM_SUMMARY_MODEL` | `gemma4` | LLM model name for summaries | +| `MOMENTRY_LLM_SUMMARY_ENABLED` | `true` | Enable/disable LLM summaries | +| `REDIS_URL` | `redis://:accusys@localhost:6379` | Redis connection | + +--- + +## Status Codes + +| Code | Description | +|------|-------------| +| 200 | Success | +| 204 | No Content (DELETE success) | +| 206 | Partial Content (video range requests) | +| 400 | Bad Request | +| 401 | Unauthorized (missing/invalid API key) | +| 404 | Not Found | +| 500 | Internal Server Error | diff --git a/docs_v1.0/API_V1.0.0/API_DOCUMENTATION_v1.0.0.md b/docs_v1.0/API_V1.0.0/API_DOCUMENTATION_v1.0.0.md new file mode 100644 index 0000000..1539e4b --- /dev/null +++ b/docs_v1.0/API_V1.0.0/API_DOCUMENTATION_v1.0.0.md @@ -0,0 +1,1211 @@ +# Momentry Core API v1.0.0 + +**Release**: v1.0.0 +**Last Updated**: 2026-05-06 +**Base URL**: `http://{host}:{port}` (dev: 3003, prod: 3002) + +--- + +## Authentication + +### API Key (Protected Routes) + +``` +Header: X-API-Key: +``` + +Protected routes require a valid API key in the `X-API-Key` header. Unauthorized requests return `401 Unauthorized`. + +### Login (Unprotected) + +``` +POST /api/v1/auth/login +Content-Type: application/json + +{ + "username": "string", + "password": "string" +} +``` + +Response `200`: +```json +{ + "success": true, + "message": "Login successful", + "api_key": "muser_xxx_xxx", + "user": { "id": 1, "name": "string" } +} +``` + +--- + +## 1. File Management + +### 1.1 Register File + +Registers a video file into the system. Runs ffprobe probe + scene detection synchronously. + +``` +POST /api/v1/files/register +X-API-Key: +Content-Type: application/json + +{ + "file_path": "/path/to/video.mp4", + "pattern": null, + "user_id": null +} +``` + +Response `200`: +```json +{ + "success": true, + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "file_path": "/path/to/video.mp4", + "file_type": "video", + "duration": 6879.0, + "width": 1920, + "height": 1080, + "fps": 25.0, + "total_frames": 171975, + "registration_time": "2026-05-06T12:00:00Z", + "already_exists": false, + "message": "File registered successfully" +} +``` + +### 1.2 Unregister File + +``` +POST /api/v1/unregister +X-API-Key: +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "file_path": null, + "pattern": null +} +``` + +Response `200`: +```json +{ + "success": true, + "uuid": "32-char-hex-string", + "message": "File unregistered successfully", + "deleted_face_detections": 6186, + "deleted_processor_results": 42, + "deleted_chunks": 10546 +} +``` + +### 1.3 Scan Files + +Scans the configured watch directory and reports all files found. + +``` +GET /api/v1/files/scan +X-API-Key: +``` + +Response `200`: +```json +{ + "files": [ + { + "name": "video.mp4", + "path": "/data/demo/video.mp4", + "size": 1600000000, + "is_registered": true, + "file_uuid": "32-char-hex-string" + } + ], + "total": 22, + "registered_count": 20, + "unregistered_count": 2 +} +``` + +### 1.4 File Probe + +Returns ffprobe metadata for a registered video. + +``` +GET /api/v1/file/{file_uuid}/probe +X-API-Key: +``` + +Response `200`: +```json +{ + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "duration": 6785.0, + "width": 1920, + "height": 1080, + "fps": 25.0, + "total_frames": 169625, + "cached": true, + "format": "mov,mp4,m4a,3gp,3g2,mj2", + "streams": [ + { "index": 0, "codec_type": "video", "codec_name": "av1", "width": 1920, "height": 1080 }, + { "index": 1, "codec_type": "audio", "codec_name": "opus", "sample_rate": 48000, "channels": 2 } + ] +} +``` + +### 1.5 Trigger Processing + +Triggers video processing pipeline for the specified processors. + +``` +POST /api/v1/file/{file_uuid}/process +X-API-Key: +Content-Type: application/json + +{ + "processors": ["asr", "cut", "yolo", "ocr", "face", "pose", "asrx"] +} +``` + +Response `200`: +```json +{ + "job_id": 139, + "file_uuid": "32-char-hex-string", + "status": "PENDING", + "pids": [], + "message": "Processing triggered for video.mp4" +} +``` + +### 1.6 List Pre-Chunks + +Lists pre-chunks (raw processor output) for a video with pagination. + +``` +GET /api/v1/file/{file_uuid}/chunks +X-API-Key: +Query: ?processor_type=face&page=1&page_size=20 +``` + +Response `200`: +```json +{ + "pre_chunks": [ + { + "id": 537507, + "processor_type": "asr", + "coordinate_type": "time", + "coordinate_index": 0, + "start_frame": null, + "end_frame": null, + "start_time": 1.66, + "end_time": 18.95, + "fps": 24.0, + "data": { "text": "Hello and welcome...", "language": "en" }, + "created_at": "2026-05-06T12:00:00.000000Z" + } + ], + "count": 3, + "page": 1, + "page_size": 20 +} +``` + +### 1.7 List Jobs + +``` +GET /api/v1/jobs +X-API-Key: +Query: ?page=1&page_size=10&status=completed +``` + +Response `200`: +```json +{ + "jobs": [ + { + "id": 139, + "uuid": "32-char-hex-string", + "status": "completed", + "current_processor": null + } + ], + "count": 1, + "page": 1, + "page_size": 10 +} +``` + +### 1.8 Get Progress + +``` +GET /api/v1/progress/{uuid} +X-API-Key: +``` + +Response `200`: +```json +{ + "file_uuid": "32-char-hex-string", + "overall_progress": 100.0, + "processors": [ + { "type": "asr", "status": "completed", "progress": 100.0 }, + { "type": "face", "status": "completed", "progress": 100.0 } + ] +} +``` + +--- + +## 2. Videos List (Unprotected) + +### 2.1 List Videos + +``` +GET /api/v1/files +Query: ?page=1&page_size=10&uuid=xxx +``` + +Response `200`: +```json +{ + "success": true, + "total": 25, + "page": 1, + "page_size": 10, + "data": [ + { + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "file_path": "/data/demo/video.mp4", + "duration": 6785.0, + "status": "completed", + "created_at": "2026-05-06T12:00:00Z" + } + ] +} +``` + +### 2.2 Get File Detail + +``` +GET /api/v1/file/{file_uuid} +``` + +Response `200`: +```json +{ + "success": true, + "file_uuid": "32-char-hex-string", + "file_name": "video.mp4", + "file_path": "/data/demo/video.mp4", + "metadata": {}, + "created_at": "2026-05-06T12:00:00Z" +} +``` + +### 2.3 Get File Identities + +``` +GET /api/v1/file/{file_uuid}/identities +Query: ?page=1&page_size=20 +``` + +--- + +## 3. Media & Video Streaming + +### 3.1 Stream Video + +Streams video with HTTP range support for seeking. + +``` +GET /api/v1/file/{file_uuid}/video +Headers: Range: bytes=0-1000000 +``` + +Returns `video/mp4` binary with `206 Partial Content` if Range header provided. + +### 3.2 BBOX Overlay Video + +Returns video with face bounding boxes overlaid. + +``` +GET /api/v1/file/{file_uuid}/video/bbox +Query: ?start=0&end=300&face_uuid=xxx +``` + +Returns `video/mp4` binary with red bboxes drawn at frame intervals. + +### 3.3 Trace Video + +Returns video highlighting a specific face trace with text label. + +``` +GET /api/v1/file/{file_uuid}/trace/{trace_id}/video +Query: ?padding=1 +``` + +Returns `video/mp4` binary. Shows face trace with ID label held at last detection position. + +### 3.4 Thumbnail + +Extracts a single frame as JPEG thumbnail. + +``` +GET /api/v1/file/{file_uuid}/thumbnail +Query: ?frame=840&x=0&y=0&w=100&h=100 +``` + +Returns `image/jpeg` binary. + +--- + +## 4. Identity Management + +### 4.1 List Identities (Protected) + +``` +GET /api/v1/identities +X-API-Key: +Query: ?page=1&page_size=20 +``` + +Response `200`: +```json +{ + "identities": [ + { + "identity_uuid": "uuid-string", + "name": "Cary Grant", + "identity_type": "actor", + "face_count": 120, + "confidence": 0.95 + } + ], + "count": 41, + "page": 1, + "page_size": 20 +} +``` + +### 4.2 Create Identity + +``` +POST /api/v1/identity +X-API-Key: +Content-Type: application/json + +{ + "face_json_path": "/path/to/face.json", + "identity_name": "Cary Grant" +} +``` + +### 4.3 Get Identity Detail (Unprotected) + +``` +GET /api/v1/identity/{identity_uuid} +``` + +Response `200`: +```json +{ + "success": true, + "uuid": "identity-uuid", + "name": "Cary Grant", + "identity_type": "actor", + "source": "tmdb", + "status": "active", + "metadata": {}, + "reference_data": {}, + "tmdb_id": 1234, + "tmdb_profile": "/path/to/profile.jpg", + "created_at": "2026-01-01T00:00:00Z", + "updated_at": "2026-05-06T00:00:00Z" +} +``` + +> `tmdb_id` 和 `tmdb_profile` 只有在 `identity_type` 為 `"actor"` 時才會出現。其他類型(如 `"stranger"`)無此欄位。 + + +### 4.4 Delete Identity + +``` +DELETE /api/v1/identity/{identity_uuid} +``` + +Returns `204 No Content`. + +### 4.5 Get Identity Files + +``` +GET /api/v1/identity/{identity_uuid}/files +Query: ?page=1&page_size=20 +``` + +### 4.6 Get Identity Chunks + +``` +GET /api/v1/identity/{identity_uuid}/chunks +Query: ?page=1&page_size=20 +``` + +### 4.7 List Face Candidates (Protected) + +``` +GET /api/v1/faces/candidates +X-API-Key: +Query: ?file_uuid=xxx&min_confidence=0.5&page=1&page_size=20 +``` + +### 4.8 Bind Face to Identity + +``` +POST /api/v1/identity/{identity_uuid}/bind +X-API-Key: +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "face_id": "face_123" +} +``` + +Response `200`: +```json +{ + "success": true, + "message": "Face bound to identity", + "data": { "rows_affected": 1 } +} +``` + +### 4.9 Unbind Face from Identity + +``` +POST /api/v1/identity/{identity_uuid}/unbind +X-API-Key: +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "face_id": "face_123" +} +``` + +### 4.10 Merge Identities + +``` +POST /api/v1/identity/{from_uuid}/mergeinto +X-API-Key: +Content-Type: application/json + +{ + "into_uuid": "target-identity-uuid", + "keep_history": true +} +``` + +--- + +## 5. Search + +### 5.1 Universal Search + +Multi-type search across chunks, frames, and persons. + +``` +POST /api/v1/search/universal +Content-Type: application/json + +{ + "query": "Cary Grant", + "uuid": "32-char-hex-string", + "types": ["chunk", "frame", "person"], + "time_range": null, + "filters": null, + "limit": 10, + "offset": 0 +} +``` + +Response `200`: +```json +{ + "query": "Cary Grant", + "results": [ + { + "type": "chunk", + "chunk_id": "chunk_123", + "score": 0.9, + "text": "[59s-77s] Cast: Cary Grant, Walter Matthau.", + "start_time": 59.0, + "end_time": 77.0, + "start_frame": 1475, + "end_frame": 1925, + "fps": 25.0, + "speaker_id": null, + "metadata": {} + } + ], + "total": 3, + "took_ms": 45 +} +``` + +### 5.2 Smart Search + +LLM-powered search with query understanding. + +``` +POST /api/v1/search/smart +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "query": "who said how do you shave in there?", + "limit": 10 +} +``` + +Response `200`: +```json +{ + "query": "who said how do you shave in there?", + "results": [ + { + "chunk_id": "chunk_123", + "type": "sentence", + "score": 0.95, + "text": "[2035s-2038s] Cary Grant: \"how do you shave in there?\"", + "start_time": 2035.09, + "end_time": 2037.62, + "start_frame": 50877, + "end_frame": 50940, + "fps": 25.0 + } + ], + "strategy": "semantic" +} +``` + +### 5.3 Frame Search + +Search individual video frames by object class, OCR text, or face. + +``` +POST /api/v1/search/frames +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "object_class": "person", + "ocr_text": "welcome", + "face_id": null, + "time_range": null, + "limit": 20 +} +``` + +Response `200`: +```json +{ + "frames": [ + { + "frame_number": 54, + "timestamp": 2.16, + "score": 0.85, + "objects": ["person"], + "ocr_texts": ["welcome"], + "faces": ["face_1"], + "pose_persons": [] + } + ], + "total": 1 +} +``` + +### 5.4 Visual Chunk Search + +Searches for visual chunks (time segments with object detections) matching criteria. + +``` +POST /api/v1/search/visual +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "criteria": { + "min_unique_classes": 2, + "required_classes": ["person", "car"] + } +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "object_classes": ["person", "car"], + "total_objects": 5 + } + ], + "total": 1 +} +``` + +### 5.5 Visual Chunk Search by Class + +``` +POST /api/v1/search/visual/class +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "object_class": "car", + "min_count": 1, + "max_count": 10 +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "object_class": "car", + "count": 3 + } + ], + "total": 1 +} +``` + +### 5.6 Visual Chunk Search by Density + +``` +POST /api/v1/search/visual/density +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "min_density": 0.1, + "max_density": 0.8 +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "density": 0.35 + } + ], + "total": 1 +} +``` + +### 5.7 Visual Chunk Stats + +``` +POST /api/v1/search/visual/stats +Content-Type: application/json + +{ + "uuid": "32-char-hex-string" +} +``` + +Response `200`: +```json +{ + "uuid": "32-char-hex-string", + "stats": { + "total_chunks": 45, + "total_frames": 18000, + "unique_classes": ["person", "car", "dog"], + "class_counts": { "person": 120, "car": 30, "dog": 5 } + } +} +``` + +### 5.8 Visual Chunk Search by Combination + +``` +POST /api/v1/search/visual/combination +Content-Type: application/json + +{ + "uuid": "32-char-hex-string", + "combination": [["person", 1], ["car", 1]] +} +``` + +Response `200`: +```json +{ + "chunks": [ + { + "chunk_id": "vis_001", + "start_time": 120.0, + "end_time": 135.0, + "start_frame": 3000, + "end_frame": 3375, + "fps": 25.0, + "combination": [["person", 2], ["car", 1]], + "total_objects": 3 + } + ], + "total": 1 +} +``` + +--- + +## 6. Agents + +### 6.1 Translate Text + +``` +POST /api/v1/agents/translate +Content-Type: application/json + +{ + "text": "Hello world", + "target_language": "zh-TW", + "source_language": null +} +``` + +Response `200`: +```json +{ + "success": true, + "translated_text": "你好世界", + "source_language_detected": "en", + "model_used": "gemma4" +} +``` + +### 6.2 5W1H Analyze + +Generates 5W1H+ summary for scenes in a video using LLM. + +``` +POST /api/v1/agents/5w1h/analyze +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "scene_group_size": 7, + "model": "gemma-4-31B-it-Q5_K_M.gguf" +} +``` + +Response `200`: +```json +{ + "success": true, + "file_uuid": "32-char-hex-string", + "summaries": [ + { + "scene_number": 1, + "start_time": 59.0, + "end_time": 302.0, + "summary": "Cary Grant and Audrey Hepburn engage in a tense conversation...", + "who": "Cary Grant, Audrey Hepburn", + "what": "Conversation about a mysterious situation", + "where": "Paris apartment", + "when": "1963", + "why": "To uncover the truth about the stolen money", + "how": "Through dialogue and interrogation" + } + ], + "processing_status": { "status": "completed", "progress": 100.0 } +} +``` + +### 6.3 5W1H Batch + +``` +POST /api/v1/agents/5w1h/batch +Content-Type: application/json + +{ + "file_uuids": ["uuid1", "uuid2"], + "scene_group_size": 7 +} +``` + +### 6.4 5W1H Status + +``` +GET /api/v1/agents/5w1h/status +``` + +Response `200`: +```json +{ + "success": true, + "videos": [] +} +``` + +### 6.5 Identity Analyze + +``` +POST /api/v1/agents/identity/analyze +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "use_llm": true, + "model": "gemma-4-31B-it-Q5_K_M.gguf" +} +``` + +### 6.6 Suggest Merges + +``` +POST /api/v1/agents/identity/suggest +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string" +} +``` + +### 6.7 Identity Agent Status + +``` +GET /api/v1/agents/identity/status +``` + +### 6.8 Suggest Clustering + +``` +POST /api/v1/agents/suggest/clustering +Content-Type: application/json + +{ + "file_uuid": "32-char-hex-string", + "min_cluster_size": 3, + "similarity_threshold": 0.7 +} +``` + +### 6.9 Suggest Merge + +``` +POST /api/v1/agents/suggest/merge +Content-Type: application/json + +{ + "identity_id": "identity-uuid", + "similarity_threshold": 0.75 +} +``` + +--- + +## 7. System & Configuration + +### 7.1 Health + +``` +GET /health +``` + +Response `200`: +```json +{ + "status": "ok", + "version": "1.0.0", + "uptime_ms": 1397142 +} +``` + +### 7.2 Detailed Health + +``` +GET /health/detailed +``` + +Response `200`: +```json +{ + "status": "ok", + "version": "1.0.0", + "uptime_ms": 1397142, + "services": { + "postgres": { "status": "ok" }, + "redis": { "status": "ok" }, + "qdrant": { "status": "ok" }, + "mongodb": { "status": "ok" } + } +} +``` + +### 7.3 Ingest Stats + +``` +GET /api/v1/stats/ingest +``` + +Response `200`: +```json +{ + "total_videos": 25, + "total_chunks": 10546, + "sentence_chunks": 7547, + "cut_chunks": 0, + "time_chunks": 0, + "searchable_chunks": 4, + "chunks_with_visual": 0, + "chunks_with_summary": 0, + "pending_videos": 1 +} +``` + +### 7.4 SFTPGo Status + +``` +GET /api/v1/stats/sftpgo +``` + +### 7.5 Inference Health + +``` +GET /api/v1/stats/inference +``` + +Response `200`: +```json +{ + "ollama": { + "engine": "Ollama", + "model": "mxbai-embed-large", + "status": "ok", + "latency_ms": 2, + "error": null + }, + "llama_server": { + "engine": "llama-server", + "model": "gemma4_e4b_q5", + "status": "ok", + "latency_ms": 0, + "error": null + } +} +``` + +### 7.6 Cache Toggle + +``` +POST /api/v1/config/cache +X-API-Key: +Content-Type: application/json + +{ + "enabled": true +} +``` + +--- + +## 8. Resource Management (Unprotected) + +### 8.1 List Resources + +``` +GET /api/v1/resources +``` + +### 8.2 Register Resource + +``` +POST /api/v1/resource/register +Content-Type: application/json + +{ + "resource_id": "worker-01", + "resource_type": "processor", + "category": "ml", + "capabilities": ["face", "asr"], + "config": {}, + "metadata": {} +} +``` + +### 8.3 Resource Heartbeat + +``` +POST /api/v1/resource/heartbeat +Content-Type: application/json + +{ + "resource_id": "worker-01", + "status": "running" +} +``` + +--- + +## 9. Auth Endpoints + +### 9.1 Login + +``` +POST /api/v1/auth/login +Content-Type: application/json + +{ + "username": "admin", + "password": "password" +} +``` + +Response `200` (success): +```json +{ + "success": true, + "message": "Login successful", + "api_key": "muser_xxx_xxx", + "user": { "id": 1, "name": "admin" } +} +``` + +Response `200` (failure): +```json +{ + "success": false, + "message": "Invalid username or password", + "api_key": null, + "user": null +} +``` + +### 9.2 Logout + +``` +POST /api/v1/auth/logout +``` + +Response `200`: +```json +{ "success": true } +``` + +--- + +## Common Error Responses + +### 401 Unauthorized +```json +No body (empty response) +``` + +### 404 Not Found +```json +No body (empty response) +``` + +### 500 Internal Server Error +```json +No body (empty response) +``` + +Or error text for agent endpoints: +```json +{ "error": "error description" } +``` + +--- + +## Processor Reference + +| Processor | Script | Description | Dependencies | Default | +|-----------|--------|-------------|-------------|---------| +| `asr` | `asr_processor.py` | Speech-to-text (faster-whisper) | None | Yes | +| `asrx` | `asrx_processor.py` | Speaker diarization | asr | Yes | +| `cut` | `cut_processor.py` | Scene detection (PySceneDetect) | None | Yes | +| `yolo` | `yolo_processor.py` | Object detection (YOLO) | None | Yes | +| `ocr` | `ocr_processor.py` | Text recognition | None | Yes | +| `face` | `face_processor.py` | Face detection + recognition (Vision + FaceNet) | None | Yes | +| `pose` | `pose_processor.py` | Pose estimation | None | Yes | +| `visual_chunk` | — | Visual object-based chunking | yolo | No | +| `story` | — | Narrative generation | asr + asrx + cut + yolo + face | No | + +--- + +## Post-Processing Pipeline + +After all specified processors complete, the system triggers: + +| Step | Trigger | Description | +|------|---------|-------------| +| **Rule 1 Chunking** | ASR + ASRX completed | Converts ASR segments into `sentence` chunks in `dev.chunks` | +| **Face Trace** | Face completed | Runs `store_traced_faces.py` to assign trace_ids, stores in `dev.face_detections` | +| **Qdrant Face Sync** | After Face Trace | Syncs face embeddings to Qdrant `_face` collection | +| **Rule 3 Scene Chunking** | All processors completed | Groups sentence chunks by scene boundaries, generates LLM 5W1H summaries | +| **5W1H Agent** | After Rule 3 | Generates 5W1H+ analysis for each scene | + +--- + +## Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `DATABASE_URL` | `postgres://accusys@localhost:5432/momentry` | PostgreSQL connection | +| `DATABASE_SCHEMA` | `public` | Database schema name | +| `MOMENTRY_SERVER_PORT` | `3002` (prod) / `3003` (dev) | API server port | +| `MOMENTRY_REDIS_PREFIX` | `momentry:` / `momentry_dev:` | Redis key prefix | +| `MOMENTRY_API_KEY` | — | API key for authentication | +| `MOMENTRY_OUTPUT_DIR` | `~/momentry/output` | Output JSON directory | +| `MOMENTRY_SCRIPTS_DIR` | `./scripts` | Python scripts directory | +| `MOMENTRY_PYTHON_PATH` | `python3` | Python interpreter path | +| `MOMENTRY_LLM_SUMMARY_URL` | `http://127.0.0.1:8081/v1/chat/completions` | LLM endpoint for 5W1H | +| `MOMENTRY_LLM_SUMMARY_MODEL` | `gemma4` | LLM model name for summaries | +| `MOMENTRY_LLM_SUMMARY_ENABLED` | `true` | Enable/disable LLM summaries | +| `REDIS_URL` | `redis://:accusys@localhost:6379` | Redis connection | + +--- + +## Status Codes + +| Code | Description | +|------|-------------| +| 200 | Success | +| 204 | No Content (DELETE success) | +| 206 | Partial Content (video range requests) | +| 400 | Bad Request | +| 401 | Unauthorized (missing/invalid API key) | +| 404 | Not Found | +| 500 | Internal Server Error | diff --git a/docs_v1.0/API_V1.0.0/DEPLOY/EMBEDDING_DEPLOYMENT_V1.0.0.md b/docs_v1.0/API_V1.0.0/DEPLOY/EMBEDDING_DEPLOYMENT_V1.0.0.md new file mode 100644 index 0000000..298c30f --- /dev/null +++ b/docs_v1.0/API_V1.0.0/DEPLOY/EMBEDDING_DEPLOYMENT_V1.0.0.md @@ -0,0 +1,83 @@ +# Embedding 跨機器部署方案 v1.0.0 + +## 分工原則 + +``` +M5(Pipeline + 主力 Embedding) M4(Portal + Fallback Embedding) +├── 批量 vectorize(1709 chunks) ├── Portal search query embedding +├── EmbeddingGemma 主 server ├── 備援 embed server +├── 模型已上線(port 11436) └── 預設呼叫 M5 API +└── 出門 demo 可離線運作 +``` + +## 部署架構 + +``` +Portal Search Query + │ + ▼ + ┌─────────────┐ 成功 ┌──────────────────┐ + │ M4 Portal │ ──────────→ │ M5:11436 │ + │ embed │ │ EmbeddingGemma │ + │ client │ │ (主力) │ + │ │ 失敗 └──────────────────┘ + │ retry │ ──────────→ ┌──────────────────┐ + │ fallback │ │ M4:11436 │ + └─────────────┘ │ EmbeddingGemma │ + │ (備援) │ + └──────────────────┘ +``` + +## M4 安裝步驟 + +```bash +# 1. 安裝 Python 依賴 +pip install torch transformers flask + +# 2. 登入 HuggingFace(需接受授權) +open https://huggingface.co/google/embeddinggemma-300m +huggingface-cli login --token YOUR_TOKEN + +# 3. 取得 script +rsync -av accusys@192.168.110.201:/Users/accusys/momentry_core_0.1/scripts/embeddinggemma_server.py \ + ./scripts/embeddinggemma_server.py + +# 4. 啟動備援 server +python3 scripts/embeddinggemma_server.py --port 11436 +``` + +## Portal Embed Client + +```javascript +async function embedQuery(text) { + const servers = [ + 'http://192.168.110.201:11436/v1/embeddings', // M5 主力 + 'http://localhost:11436/v1/embeddings', // M4 備援 + ]; + for (const url of servers) { + try { + const res = await fetch(url, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ input: text }), + }); + const data = await res.json(); + return data.data[0].embedding; + } catch (e) { + continue; // 下一台 + } + } + throw new Error('Embedding servers unreachable'); +} +``` + +## 模型一致性 + +| 項目 | M5 | M4 | +|------|-----|-----| +| 模型 | EmbeddingGemma 300M | EmbeddingGemma 300M | +| 維度 | 768D | 768D | +| Server | Python MPS (port 11436) | Python CPU/MPS (port 11436) | +| Qdrant | 192.168.110.201:6333 | 192.168.110.201:6333 | + +兩台使用同一模型、同一維度,確保 query embedding 與索引 embedding 可比對。 diff --git a/docs_v1.0/API_V1.0.0/DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md b/docs_v1.0/API_V1.0.0/DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md new file mode 100644 index 0000000..610618f --- /dev/null +++ b/docs_v1.0/API_V1.0.0/DEPLOY/GEM4_LLM_DEPLOY_PLAN_V1.0.0.md @@ -0,0 +1,316 @@ +--- +document_type: "deployment_record" +service: "MOMENTRY_CORE" +title: "Gemma 4 31B — M5 Max 部署記錄" +date: "2026-05-06" +version: "V1.1" +status: "active" +owner: "Warren" +created_by: "OpenCode" +--- + +# Gemma 4 31B — M5 Max 部署記錄 + +## 1. 環境 + +| 項目 | M4(開發機) | M5 Max(LLM 伺服器) | +|------|------------|-------------------| +| 機型 | MacBook Pro M4 | MacBook Pro M5 Max | +| 記憶體 | 16 GB | **48 GB** | +| 架構 | arm64 | arm64 | +| OS | macOS 26.x | macOS 26.4.1 | +| IP(初始) | — | 10.10.10.10 | +| IP(最終) | — | **192.168.110.201** | +| 外網 | 有 | 先無 → 後有(接上同網段 192.168.110.x) | +| Homebrew | 有 | 無(用戶非 admin,無法 sudo brew) | +| Xcode CLT | 有 | 無(install_name_tool、codesign 不可用) | +| Rust | 有 | rustup 已安裝 (1.95.0) | +| 專案目錄 | `/Users/accusys/momentry_core_0.1/` | `~/momentry_core_0.1/`(已 clone) | + +## 2. 模型規格 + +| 屬性 | 值 | +|------|-----| +| 模型 | **Gemma 4 31B-it**(Image-Text-to-Text) | +| 參數量 | 33B (30,697,345,596) | +| 量化 | Q5_K_M | +| GGUF 大小 | **20.16 GB** (`21658399744 bytes`) | +| Embedding dim | 5376 | +| Vocabulary | 262144 | +| Context | 4096 (訓練 262144) | +| 來源 | `unsloth/gemma-4-31B-it-GGUF` | +| HF 下載數 | 1,685,377 | +| HF 許可 | Gated(需 `huggingface-cli login`) | +| License | Gemma (Apache 2.0 derived) | + +## 3. Binary 與依賴 + +### 3.1 建置方式 + +llama.cpp 從 source build,不透過 Homebrew。原因:Homebrew binary 有**絕對路徑** dylib 參照,無法搬移至 M5。 + +```bash +# M4 上執行 +cd /tmp +git clone https://github.com/ggerganov/llama.cpp.git +cd llama.cpp +cmake -B build -DGGML_METAL=ON +cmake --build build -j10 --target llama-server +``` + +### 3.2 Binary 依賴 + +llama-server binary 依賴以下 dylib(共 26 個檔案): + +| 類別 | 檔案 | 來源 | +|------|------|------| +| 核心 GGML | `libggml.0.dylib`, `libggml.dylib` | `build/bin/` | +| 核心 GGML | `libggml-base.0.dylib`, `libggml-base.dylib` | `build/bin/` | +| Metal GPU | `libggml-metal.0.dylib`, `libggml-metal.dylib` | `build/bin/` | +| CPU | `libggml-cpu.0.dylib`, `libggml-cpu.dylib` | `build/bin/` | +| BLAS | `libggml-blas.0.dylib`, `libggml-blas.dylib` | `build/bin/` | +| LLama | `libllama.0.dylib`, `libllama.dylib` | `build/bin/` | +| LLamaCommon | `libllama-common.0.dylib`, `libllama-common.dylib` | `build/bin/` | +| MTMD | `libmtmd.0.dylib`, `libmtmd.dylib` | `build/bin/` | +| OpenSSL | `libssl.3.dylib`, `libcrypto.3.dylib` | `/opt/homebrew/opt/openssl@3/lib/` | + +### 3.3 @rpath 修復 + +build 時期 embedded 的 @rpath 指向 `/tmp/llama.cpp/build/bin/`,需改為 `@executable_path/../lib`。 + +在 **M4** 上執行(Xcode CLT 可用): + +```bash +cp build/bin/llama-server /tmp/llama_final +chmod +w /tmp/llama_final + +# 修復 OpenSSL 絕對路徑 +install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib @rpath/libssl.3.dylib /tmp/llama_final +install_name_tool -change /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib @rpath/libcrypto.3.dylib /tmp/llama_final + +# 修復 GGML 絕對路徑(Homebrew build 才需要,source build 不需要) +install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml.0.dylib @rpath/libggml.0.dylib /tmp/llama_final +install_name_tool -change /opt/homebrew/opt/ggml/lib/libggml-base.0.dylib @rpath/libggml-base.0.dylib /tmp/llama_final + +# 修正 @rpath +install_name_tool -delete_rpath /tmp/llama.cpp/build/bin /tmp/llama_final +install_name_tool -add_rpath @executable_path/../lib /tmp/llama_final + +# 重新簽章(install_name_tool 會破壞 code signature) +codesign --force --sign - /tmp/llama_final +``` + +### 3.4 libssl.3.dylib 自身也需修復 + +libssl.3.dylib 內部也參照了 `/opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib`: + +```bash +cp /opt/homebrew/opt/openssl@3/lib/libssl.3.dylib /tmp/libssl_fixed.dylib +cp /opt/homebrew/opt/openssl@3/lib/libcrypto.3.dylib /tmp/libcrypto_fixed.dylib +chmod +w /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib +install_name_tool -change /opt/homebrew/Cellar/openssl@3/3.6.1/lib/libcrypto.3.dylib @loader_path/libcrypto.3.dylib /tmp/libssl_fixed.dylib +codesign --force --sign - /tmp/libssl_fixed.dylib /tmp/libcrypto_fixed.dylib +``` + +### 3.5 全部傳送至 M5 + +```bash +# 模型(20GB) +scp ~/llama.cpp/models/gemma-4-31B-it-Q5_K_M.gguf \ + accusys@192.168.110.201:~/models/ + +# binary + 全部 dylib +ssh accusys@192.168.110.201 'rm -rf ~/llama && mkdir -p ~/llama/bin ~/llama/lib' +scp /tmp/llama_final accusys@192.168.110.201:~/llama/bin/llama-server +scp /tmp/llama.cpp/build/bin/*.dylib accusys@192.168.110.201:~/llama/lib/ +scp /tmp/libssl_fixed.dylib accusys@192.168.110.201:~/llama/lib/libssl.3.dylib +scp /tmp/libcrypto_fixed.dylib accusys@192.168.110.201:~/llama/lib/libcrypto.3.dylib +``` + +## 4. 啟動與驗證 + +### 4.1 一次性手動啟動 + +```bash +ssh accusys@192.168.110.201 +export DYLD_LIBRARY_PATH=$HOME/llama/lib +codesign --force --sign - ~/llama/bin/llama-server +codesign --force --sign - ~/llama/lib/*.dylib +nohup ~/llama/bin/llama-server \ + -m ~/models/gemma-4-31B-it-Q5_K_M.gguf \ + --host 0.0.0.0 --port 8081 \ + --n-gpu-layers 999 --ctx-size 4096 \ + --threads 10 --mlock \ + --reasoning off \ + > ~/llama.log 2>&1 & +``` + +### 4.2 啟動腳本 + +`~/start_llm.sh`(已建立): + +```bash +#!/bin/bash +export DYLD_LIBRARY_PATH=$HOME/llama/lib +pkill -9 -f llama-server 2>/dev/null +sleep 1 +nohup $HOME/llama/bin/llama-server \ + -m $HOME/models/gemma-4-31B-it-Q5_K_M.gguf \ + --host 0.0.0.0 --port 8081 \ + --n-gpu-layers 999 --ctx-size 4096 \ + --threads 10 --mlock \ + --reasoning off \ + > $HOME/llama.log 2>&1 & +echo "llama-server PID: $!" +``` + +### 4.3 參數說明 + +| 參數 | 值 | 說明 | +|------|-----|------| +| `-m` | `~/models/gemma-4-31B-it-Q5_K_M.gguf` | 模型路徑 | +| `--host` | `0.0.0.0` | 綁定所有網路介面 | +| `--port` | `8081` | HTTP API port | +| `--n-gpu-layers` | `999` | 所有層進 GPU (Metal) | +| `--ctx-size` | `4096` | 上下文長度 | +| `--threads` | `10` | M5 Max P-core 數量 | +| `--mlock` | — | 鎖住記憶體以防 swap | +| `--reasoning` | `off` | 關閉 thinking,否則 content 進 `reasoning_content` | +| `DYLD_LIBRARY_PATH` | `~/llama/lib` | dylib 搜尋路徑 | + +### 4.4 啟動過程中遇到的問題 + +| # | 問題 | 原因 | 解決 | +|---|------|------|------| +| 1 | `Library not loaded: libmtmd.0.dylib` | 未拷貝 Metal 相關 dylib | 從 build 拷貝全部 26 個 dylib | +| 2 | `Library not loaded: /opt/homebrew/.../libssl.3.dylib` | binary 有 OpenSSL 絕對路徑 | `install_name_tool -change → @rpath` | +| 3 | `Killed: 9` (exit 137) | code signature 被破壞 | `codesign --force --sign -` | +| 4 | `Library not loaded: /opt/homebrew/Cellar/.../libcrypto.3.dylib` | libssl.3.dylib 內部也有絕對路徑 | `install_name_tool` 修復 libssl | +| 5 | `no backends are loaded` | 缺少 Metal GPU backend | source build 時需 `-DGGML_METAL=ON` | +| 6 | `couldn't bind HTTP server socket` | 前一個 process 未完全釋放 port | `pkill -9 -f llama-server` 先 | +| 7 | **content 全在 reasoning_content** | Gemma4 預設為 thinking model | `--reasoning off` | + +## 5. API 驗證 + +### 5.1 Health Check + +```bash +curl -s http://192.168.110.201:8081/health +# → {"status":"ok"} +``` + +### 5.2 推理測試(--reasoning off 後) + +```bash +curl -s http://192.168.110.201:8081/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "gemma-4-31B-it-Q5_K_M.gguf", + "messages": [{"role": "user", "content": "Hello"}], + "max_tokens": 100 + }' +``` + +回應(OpenAI-compatible): + +```json +{ + "choices": [{ + "finish_reason": "stop", + "message": { + "role": "assistant", + "content": "Hello! How can I help you today?", + "reasoning_content": "" + } + }], + "usage": { + "completion_tokens": 100, + "prompt_tokens": 18, + "total_tokens": 118 + }, + "model": "gemma-4-31B-it-Q5_K_M.gguf", + "object": "chat.completion" +} +``` + +### 5.3 效能 + +| 指標 | 實測 | +|------|------| +| Prompt 速度 | 60.8 tok/s | +| 生成速度 | **25.8 tok/s** | +| Prompt 延遲 | 296 ms(18 tokens) | +| 生成延遲 | 387 ms(10 tokens) | + +## 6. 整合至 OpenCode + +`~/.config/opencode/config.json` 中新增 provider: + +```json +{ + "m5-gemma4": { + "npm": "@ai-sdk/openai-compatible", + "name": "M5 Max Gemma 4", + "options": { "baseURL": "http://192.168.110.201:8081/v1" }, + "models": { + "gemma-4-31B-it-Q5_K_M.gguf": { "name": "Gemma 4 31B" } + } + } +} +``` + +預設 model 設為 `"m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf"`。Provider list 確認: + +```bash +opencode models m5-gemma4 +# → m5-gemma4/gemma-4-31B-it-Q5_K_M.gguf +``` + +## 7. M5 網路異動記錄 + +| 時間 | IP | 網路 | 原因 | +|------|-----|------|------| +| 初始 | `10.10.10.10` | bridge (Thunderbolt) | 無外網,需透過 M4 NAT | +| 切換後 | `192.168.110.201` | en0 (WiFi/Ethernet) | 改接同網段,有外網 | + +## 8. Rust 安裝(for Momentry dev) + +```bash +curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y +source $HOME/.cargo/env +``` + +- rustc 1.95.0 +- cargo 1.95.0 +- 免 sudo + +## 9. 記憶體使用 + +``` +48 GB total + ├─ 20 GB Gemma 4 31B Q5_K_M (process RSS ~28 GB) + ├─ 4 GB macOS + 系統 + └─ 24 GB 剩餘 +``` + +實測啟動後 RSS: `28,325,600 KB` (~28 GB)。 + +## 10. 維護指令 + +| 操作 | 指令 | +|------|------| +| 啟動 | `ssh accusys@192.168.110.201 '~/start_llm.sh'` | +| 停止 | `ssh accusys@192.168.110.201 'pkill -9 -f llama-server'` | +| 查看日誌 | `ssh accusys@192.168.110.201 'tail -50 ~/llama.log'` | +| 健康檢查 | `curl http://192.168.110.201:8081/health` | +| 模型檔案 | `~/models/gemma-4-31B-it-Q5_K_M.gguf (20G)` | +| Binary 與 lib | `~/llama/bin/llama-server`, `~/llama/lib/*.dylib` | +| config | `~/.config/opencode/config.json` | +| 監控 | `htop -p $(pgrep llama-server)` | +| 記憶體 | `ps -o rss= -p $(pgrep llama-server)` | + +## 11. 已知限制 + +- **Thinking model**: Gemma4 為 thinking 模型(`--reasoning off` 關閉後 content 正常,但某些場景可能需要 reasoning) +- **無 Homebrew**: 非 admin 帳號,無法 `brew install`。Momentry 其他服務(PostgreSQL, Redis, MongoDB)需用 portable binary 手動安裝 +- **無 Xcode CLT**: `install_name_tool`, `codesign` 不可用於 M5。binary 修復需在 M4 完成後 scp diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/5W1H_AGENT_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/5W1H_AGENT_V1.0.0.md new file mode 100644 index 0000000..1b81df1 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/5W1H_AGENT_V1.0.0.md @@ -0,0 +1,91 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "5W1H+ Agent v1.0.0" +date: "2026-05-07" +version: "V1.0" +status: "active" +owner: "Warren" +tags: + - "momentry" + - "agent" + - "5w1h" + - "llm" + - "summary" +related_documents: + - "../../TRACE/TRACE_API_REFERENCE_V1.0.0.md" + - "../CHUNK_DEFINITION_V1.0.0.md" + - "../VECTOR_SPEC_V1.0.0.md" +--- + +# 5W1H+ Agent v1.0.0 + +## 概述 + +對每個 cut scene 產生 5W1H+ 摘要(parent summary + child enhanced text)。 + +## 遞迴 Context(Story So Far) + +採用方案 B:每段 scene 的 LLM call 帶入前面所有 scene 的摘要。 + +``` +Scene 1 → LLM(context="") → summary_1 +Scene 2 → LLM(context=summary_1) → summary_2 +Scene 3 → LLM(context=summary_1+summary_2) → summary_3 +``` + +Context truncation:保留最近 ~500 tokens 的前情,避免超過模型 limit。 + +## Prompt 結構 + +每個 scene 的 LLM call 包含以下資訊: + +| Prompt 區塊 | 來源 | 說明 | +|------------|------|------| +| Scene time | chunk metadata | 目前 scene 的時間區間 | +| Dialogue | sentences in scene | 該 scene 內的對話行 | +| Actors present | face_detections JOIN identity_bindings JOIN identities | 場景中出現的演員 | +| Objects detected | pre_chunks WHERE processor_type='yolo' | YOLO 偵測到的物體 | +| Face traces | face_detections JOIN identity_bindings JOIN identities | trace 與對應的演員名稱 | +| Active speakers | pre_chunks WHERE processor_type='asrx' JOIN identity_bindings | 說話者與對應的演員 | +| Story so far | 前 N 個 scene 的 parent_summary | 前情摘要 | + +## LLM 模型 + +| 項目 | 值 | +|------|-----| +| 模型 | Gemma4 26B MoE (Q5_K_M, 18GB) | +| 部署 | llama-server(Metal GPU, port 8082) | +| 環境變數 | `MOMENTRY_LLM_SUMMARY_URL=http://localhost:8082/v1/chat/completions` | +| 溫度 | 0.1 | +| max_tokens | 4096 | + +## 產出 + +| 輸出 | 儲存位置 | 說明 | +|------|---------|------| +| parent_summary | `cut.summary_text` | 5 句 scene_summary(5W1H 流暢段落) | +| parent_5w1h | `cut.metadata -> 5w1h` | 結構化 who/what/where/when/why/how | +| child_enhanced | `sentence.text_content` | 自包含的 enhanced sentence(供 embedding + search) | +| child_5w1h | `sentence.content -> 5w1h` | 逐句的 5w1h 結構 | +| embedding | `sentence.embedding` | EmbeddingGemma 300M 768D(產出 summary 後自動 vectorize) | + +## API + +``` +POST /api/v1/agents/5w1h/analyze +POST /api/v1/agents/5w1h/batch +GET /api/v1/agents/5w1h/status +``` + +## Pipeline 觸發 + +Job Worker 中的 P4 trigger: + +```rust +// all_completed + has_cut + has_asr → run_5w1h_agent(db, uuid) +``` + +## 選型文件 + +詳細方案比較:`M5_workspace/2026-05-07_5w1h_recursive_summary_design.md` diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/IDENTITY_AGENT_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/IDENTITY_AGENT_V1.0.0.md new file mode 100644 index 0000000..c6b23b4 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/AGENTS/IDENTITY_AGENT_V1.0.0.md @@ -0,0 +1,84 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Identity Agent v1.0.0" +date: "2026-05-07" +version: "V1.0" +status: "active" +owner: "Warren" +tags: + - "momentry" + - "agent" + - "identity" + - "face" + - "speaker" +related_documents: + - "../DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md" + - "../../TRACE/TRACE_API_REFERENCE_V1.0.0.md" + - "../PROCESSORS/FACE_V1.0.0.md" + - "../PROCESSORS/ASRX_V1.0.0.md" +--- + +# Identity Agent v1.0.0 + +## 概述 + +將 face trace 與 speaker 綁定到人物身份(identity),實現跨場景的人員辨識。 + +## 處理流程 + +``` +face_clustered.json + asrx.json + → extract_persons (face clusters) + → extract_speakers (ASRX segments) + → analyze_person_speaker_overlap + → 寫入 dev.identities + → match_faces_iterative (TMDb seed → propagation) + → bind_speakers (speaker_id → identity_id) +``` + +## 迭代多角度 Face Matching + +``` +TMDb seeds (12 identities, with mulitple angles) + → Round 1: ~33% trace-to-identity + → Round 2: propagate matched traces as new seeds + → Round 3: propagate again + → Final: 99% binding (6,175 / 6,186 face detections) +``` + +## Speaker Binding + +``` +face_detections (trace_id, frame_number) + + ASRX segments (speaker_id, start_time, end_time) + → frame-level overlap computation + → winner-takes-all: best_overlap > 30% + → 寫入 identity_bindings (identity_type='speaker') +``` + +## Pipeline 觸發 + +Job Worker 中的 P3 trigger: + +```rust +// has_face + has_asrx → run_identity_agent(db, uuid) +``` + +觸發時機:all_completed,face 與 asrx 皆完成後。 + +## DB 結構 + +| Table | 用途 | +|-------|------| +| `identities` | 身份主表(name, type, metadata, embedding) | +| `identity_bindings` | 綁定表(identity_id → trace_id 或 speaker_id) | +| `file_identities` | 檔案級身份對應 | + +## API + +``` +POST /api/v1/agents/identity/analyze +POST /api/v1/agents/identity/suggest +GET /api/v1/agents/identity/status +``` diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/API_DICTIONARY_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/API_DICTIONARY_V1.0.0.md new file mode 100644 index 0000000..ab20311 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/API_DICTIONARY_V1.0.0.md @@ -0,0 +1,173 @@ +--- +document_type: "reference_doc" +service: "MOMENTRY_CORE" +title: "Momentry Core API 字典 V1.0.0" +date: "2026-05-06" +version: "V1.3" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "api" + - "dictionary" + - "v1.0.0" +ai_query_hints: + - "Momentry Core API 字典查詢" + - "API 端點與參數說明" + - "API 回應格式定義" + - "查詢所有 Public/Internal/Admin API 端點列表" + - "API 端點的 HTTP 方法與路徑結構" + - "搜尋 API 有哪些端點(search/bm25/hybrid/visual)" + - "API 端點的狀態分類(Public/Internal/Admin)" +related_documents: + - "API_V1.0.0/MOMENTRY_CORE_API_V1.0.0.md" + - "API_V1.0.0/API_USAGE_DEMO_V1.0.0.md" + - "API_V1.0.0/CHUNK_DEFINITION_V1.0.0.md" + - "API_V1.0.0/VECTOR_SPEC_V1.0.0.md" +--- + +# Momentry Core API 字典 V1.0.0 + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| Public API | 供前端與外部系統使用的標準介面 | +| Internal API | 系統內部流程或狀態查詢用 | +| Admin API | 管理員專用 | +| file_uuid | 32 碼 birth UUID(MAC + time + path + filename) | +| identity_uuid | 32 碼 UUIDv5(source + external_id) | +| RESTful | 以資源為中心的 API 設計風格,collection 複數、resource 單數 | + +## 端點統計 + +| 分類 | 數量 | 說明 | +|---|---|---| +| Public | 40 | 供前端與外部系統使用的標準介面 | +| Internal | 4 | 系統內部流程或狀態查詢 | +| Admin | 3 | 管理員專用 | +| Health | 2 | 服務健康檢查 | +| **總計** | **48** | 所有已註冊路由 | + +## 設計原則 + +### 1. RESTful 命名規範 +- Collection(複數): `/api/v1/files`, `/api/v1/identities` +- Resource(單數): `/api/v1/file/:file_uuid`, `/api/v1/identity/:identity_uuid` +- Action on resource: `/api/v1/identity/:identity_uuid/bind` + +### 2. File-Centric +- 每個媒體檔案由 32 碼 UUID (`file_uuid`) 唯一標識 +- File 是所有資料的根節點,Chunk、Job 隸屬於特定 File + +### 3. Global Identity +- Identity 跨檔案關聯,不受單一檔案限制 +- 透過 bind/unbind/mergeinto 管理 Face → Identity 的直接 FK 綁定(V4.0) + +--- + +## 1. 系統與認證 + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `GET` | `/health` | Health | +| `GET` | `/health/detailed` | Health | +| `POST` | `/api/v1/auth/login` | Public | +| `POST` | `/api/v1/auth/logout` | Public | + +## 2. 檔案管理 (Files) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `GET` | `/api/v1/files` | Public | +| `GET` | `/api/v1/files/scan` | Public | +| `POST` | `/api/v1/files/register` | Public | +| `POST` | `/api/v1/files/unregister` | Public | +| `GET` | `/api/v1/file/:file_uuid` | Public | +| `GET` | `/api/v1/file/:file_uuid/probe` | Public | +| `POST` | `/api/v1/file/:file_uuid/process` | Public | +| `GET` | `/api/v1/file/:file_uuid/identities` | Public | +| `GET` | `/api/v1/file/:file_uuid/chunks` | Public | +| `GET` | `/api/v1/file/:file_uuid/thumbnail?frame=&x=&y=&w=&h=` | Public | +| `POST` | `/api/v1/file/:file_uuid/face_trace/sortby` | Public | + +## 3. 管線與任務 (Pipeline & Jobs) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `GET` | `/api/v1/progress/:file_uuid` | Public | +| `GET` | `/api/v1/jobs` | Public | +| `GET` | `/api/v1/job/:job_id` | Public | +| `GET` | `/api/v1/rule/:rule_id/status` | Public | +| `POST` | `/api/v1/resource/register` | Internal | +| `POST` | `/api/v1/resource/heartbeat` | Internal | +| `GET` | `/api/v1/resources` | Internal | + +## 4. 搜尋 (Search) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `POST` | `/api/v1/search` | Public | +| `POST` | `/api/v1/search/bm25` | Public | +| `POST` | `/api/v1/search/hybrid` | Public | +| `POST` | `/api/v1/search/smart` | Public | +| `POST` | `/api/v1/search/universal` | Public | +| `POST` | `/api/v1/search/frames` | Public | +| `POST` | `/api/v1/search/visual` | Public | +| `POST` | `/api/v1/search/visual/class` | Public | +| `POST` | `/api/v1/search/visual/density` | Public | +| `POST` | `/api/v1/search/visual/combination` | Public | +| `POST` | `/api/v1/search/visual/stats` | Public | + +## 5. 身份管理 (Identity) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `GET` | `/api/v1/identities` | Public | +| `POST` | `/api/v1/identity` | Public | +| `GET` | `/api/v1/identity/:identity_uuid` | Public | +| `DELETE` | `/api/v1/identity/:identity_uuid` | Public | +| `GET` | `/api/v1/identity/:identity_uuid/files` | Public | +| `GET` | `/api/v1/identity/:identity_uuid/chunks` | Public | +| `POST` | `/api/v1/identity/:identity_uuid/bind` | Public | +| `POST` | `/api/v1/identity/:identity_uuid/unbind` | Public | +| `POST` | `/api/v1/identity/:from_uuid/mergeinto` | Public | + +## 6. 臉部 (Faces) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `GET` | `/api/v1/faces/candidates` | Public | + +## 7. 代理人 (Agents) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `POST` | `/api/v1/agents/translate` | Public | +| `POST` | `/api/v1/agents/identity/analyze` | Public | +| `POST` | `/api/v1/agents/identity/suggest` | Public | +| `GET` | `/api/v1/agents/identity/status` | Public | +| `POST` | `/api/v1/agents/suggest/merge` | Public | +| `POST` | `/api/v1/agents/5w1h/analyze` | Public | +| `POST` | `/api/v1/agents/5w1h/batch` | Public | +| `GET` | `/api/v1/agents/5w1h/status` | Public | + +## 8. 狀態與管理 (Stats & Admin) + +| 方法 | 路徑 | 狀態 | +|------|------|------| +| `GET` | `/api/v1/stats/sftpgo` | Internal | +| `GET` | `/api/v1/stats/inference` | Internal | +| `POST` | `/api/v1/config/cache` | Admin | + +--- + +## 變更歷史 + +| 版本 | 日期 | 作者 | 說明 | +|------|------|------|------| +| V1.3 | 2026-05-06 | OpenCode | 新增 `face_thumbnail` ffmpeg 即時裁切端點 + `face_trace/sortby` 端點;portal 修復 hardcoded URL/API key/legacy endpoints | +| V1.1 | 2026-05-01 | OpenCode | Route fixes + arch notes | +| V1.0 | 2026-04 | OpenCode | 初始版本 | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/API_REFERENCE_v1.0.0.20260501md.md b/docs_v1.0/API_V1.0.0/INTERNAL/API_REFERENCE_v1.0.0.20260501md.md new file mode 100644 index 0000000..0ccd574 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/API_REFERENCE_v1.0.0.20260501md.md @@ -0,0 +1,310 @@ +--- +document_type: "reference_doc" +service: "MOMENTRY_CORE" +title: "Momentry Core API 參考文件 V1.0.0 (Demo 完整指南)" +date: "2026-05-01" +version: "V3.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "api" + - "reference" + - "v1.0.0" + - "demo" + - "marcom" +ai_query_hints: + - "查詢 V1.0.0 Demo 所需 API 列表" + - "Momentry Core Demo 流程如何使用 API?" + - "API 的檔案註冊、處理、臉部綁定流程" + - "Demo 流程中 Scan → Unregister → Register → Probe → Process → Faces → Bind 的完整步驟" + - "API 的 curl 範例與回應格式" + - "Process 回傳 400 Bad Request 的常見原因與解決方法" + - "臉部查詢回傳空結果的疑難排解步驟" +related_documents: + - "STANDARDS/DOCS_STANDARD.md" + - "API_V1.0.0/MOMENTRY_CORE_API_V1.0.0.md" + - "TEST_REPORT_CLI.md" +--- + +# Momentry Core API 參考文件 V1.0.0 (Demo 完整指南) + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| file_uuid | 32 碼 SHA256 檔案識別碼 | +| X-API-Key | API 認證方式,透過 HTTP Header 傳遞 | +| Scan | 掃描檔案系統,列出所有檔案及當前狀態 | +| Register | 將檔案加入資料庫系統 | +| Probe | 讀取檔案 metadata(時長、解析度、幀率) | +| Bind | 將臉部綁定到指定身份 | +| Progress | 獲取處理進度與目前階段 | + +## 📊 文件統計 (Document Statistics) + +| 項目 | 數值 | +|---|---| +| **收錄端點** | 15+ (Demo 核心流程) | +| **涵蓋率** | Demo 流程 100% | +| **測試狀態** | ✅ CLI Verified | + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-01 | +| 文件版本 | V3.0 | + +--- + +## 1. Demo 流程總覽 (Demo Workflow) + +本文件專注於 **Demo 測試計畫** 所需的 API。以下是完整流程與對應 API: + +``` +1. 掃描狀態 (Scan) → GET /api/v1/files/scan +2. 檔案重置 (Unregister) → POST /api/v1/unregister +3. 檔案註冊 (Register) → POST /api/v1/files/register +4. 檔案探測 (Probe) → GET /api/v1/files/:file_uuid/probe +5. 開始處理 (Process) → POST /api/v1/files/:file_uuid/process +6. 監控進度 (Progress) → GET /api/v1/progress/:file_uuid** +7. 查詢臉部 (Faces) → GET /api/v1/faces/candidates +8. 綁定身份 (Bind) → POST /api/v1/identities/bind +``` + +--- + +## 2. 快速資訊 + +- **Base URL (Dev)**: `http://localhost:3003` +- **Base URL (Prod)**: `http://localhost:3002` +- **認證方式**: Header `X-API-Key: muser_test_001` +- **測試 Key**: `muser_test_001` + +--- + +## 3. API 詳細說明 (依 Demo 順序) + +### 3.1 掃描檔案系統 (Scan Files) +**路徑**: `GET /api/v1/files/scan` + +**用途**: 列出檔案系統中所有檔案及當前狀態,**是 Demo 流程的第一步**。 + +**Response**: +```json +{ + "files": [ + { + "file_name": "A12T3-Share-User Experience of Thunderbolt 3 Shareable Storage.mp4", + "file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/A12T3-Share-User Experience of Thunderbolt 3 Shareable Storage.mp4", + "file_uuid": "7ab7e25f48b58675e33aca44d15c1ecc", + "is_registered": true, + "status": "processing" + } + ], + "total": 20, + "registered_count": 20, + "unregistered_count": 0 +} +``` + +--- + +### 3.2 取消註冊 (Unregister File) +**路徑**: `POST /api/v1/unregister` + +**用途**: 從 Scan 結果中選取 `file_uuid`,對該檔案執行取消註冊。 + +**Request**: +```json +{ + "uuid": "53e3a229bf68878b7a799e811e097f9c" +} +``` + +**Response**: +```json +{ + "success": true, + "uuid": "53e3a229bf68878b7a799e811e097f9c", + "message": "File unregistered successfully" +} +``` + +--- + +### 3.3 註冊檔案 (Register File) +**路徑**: `POST /api/v1/files/register` + +**用途**: 從 Scan 結果中選取 `file_path`,將檔案加入資料庫系統。 + +**Request**: +```json +{ + "file_path": "/Users/accusys/momentry/var/sftpgo/data/demo/view15.mp4" +} +``` + +**Response**: +```json +{ + "success": true, + "file_uuid": "53e3a229bf68878b7a799e811e097f9c", + "file_name": "view15.mp4", + "file_path": "/Users/.../demo/view15.mp4", + "already_exists": false +} +``` + +--- + +### 3.4 檔案探測 (Probe File) +**路徑**: `GET /api/v1/files/:file_uuid/probe` + +**用途**: 讀取檔案的 metadata (時長、解析度、幀率)。**必須在 Process 前執行**。 + +**Response**: +```json +{ + "file_uuid": "7ab7e25f48b58675e33aca44d15c1ecc", + "file_name": "A12T3-Share-User Experience of Thunderbolt 3 Shareable Storage.mp4", + "duration": 621.55, + "width": 1920, + "height": 1080, + "fps": 29.97, + "cached": true +} +``` + +--- + +### 3.5 觸發處理 (Process File) +**路徑**: `POST /api/v1/files/:file_uuid/process` + +**用途**: 啟動後端 Worker 進行分析 (ASR, Face, YOLO, 等)。 + +**Request**: +```json +{} +``` + +**Response**: +```json +{ + "success": true, + "message": "Processing started" +} +``` + +--- + +### 3.6 查詢進度 (Progress) +**路徑**: `GET /api/v1/progress/:file_uuid` + +**用途**: 獲取處理進度與目前階段。 + +**Response**: +```json +{ + "file_uuid": "53e3a229bf68878b7a799e811e097f9c", + "overall_progress": 65, + "current_processor": "face", + "status": "running", + "processors": [ + { "name": "probe", "status": "completed" }, + { "name": "asr", "status": "completed" }, + { "name": "face", "status": "running" } + ] +} +``` + +--- + +### 3.6 查詢未綁定臉部 (List Face Candidates) +**路徑**: `GET /api/v1/faces/candidates` + +**用途**: 列出檔案中尚未綁定身份的臉部。 + +**Query Parameters**: +- `file_uuid` (必填): 檔案 UUID +- `min_confidence` (選填): 最低信心值 (預設 0.5) +- `page_size` (選填): 每頁數量 (預設 20) + +**Response**: +```json +{ + "candidates": [ + { + "id": 123, + "face_id": "123_RoleA", + "file_uuid": "384b0ff44aaaa1f14cb2cd63b3fea966", + "frame_number": 115, + "confidence": 0.98, + "bbox": { "x": 50, "y": 50, "w": 100, "h": 100 } + } + ], + "total": 1, + "page": 1, + "page_size": 20 +} +``` + +--- + +### 3.7 綁定身份 (Bind Identity) +**路徑**: `POST /api/v1/identities/bind` + +**用途**: 將臉部綁定到指定身份 (或建立新身份)。 + +**Request**: +```json +{ + "identity_id": 22, + "binding_type": "face", + "binding_value": "123_RoleA" +} +``` + +**Response**: +```json +{ + "success": true, + "message": "Bound face '123_RoleA' to Identity 'Cary Grant'" +} +``` + +--- + +## 4. 補充 API (Demo 選用) + +### 4.1 列出身份 (List Identities) +**路徑**: `GET /api/v1/identities` + +**用途**: 列出系統中所有已建立的身份。 + +--- + +## 5. 常見問題 (FAQ) + +### Q1: 為什麼 Process 回傳 400 Bad Request? +**Ans**: 必須先執行 **Probe** (`GET /api/v1/files/:file_uuid/probe`),確保系統已知曉檔案的幀數資訊。 + +### Q2: 為什麼 Unregister 回傳 404? +**Ans**: 確認伺服器是否已更新至最新版本。舊版可能尚未包含此路由。 + +### Q3: 臉部查詢回傳空結果? +**Ans**: +1. 確認檔案已**處理完成** (Progress = 100%)。 +2. 嘗試降低 `min_confidence` 參數 (例如設為 0.0)。 +3. 確認該檔案內容確實包含可辨識的臉部。 + +--- + +## 6. 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | +|------|------|------|--------| +| V1.0 | 2026-04-30 | 初始 API 列表 | OpenCode | +| V2.0 | 2026-05-01 | 基於 Production 測試結果補足文件 | OpenCode | +| V3.0 | 2026-05-01 | 重構為 Demo 流程導向,補齊 Probe/Unregister 說明 | OpenCode | +| V3.1 | 2026-05-01 | 修正 `:uuid`→`:file_uuid`,修正 port 3002→3003,移除重複 Scan 章節 | OpenCode | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/API_USAGE_DEMO_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/API_USAGE_DEMO_V1.0.0.md new file mode 100644 index 0000000..35fa355 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/API_USAGE_DEMO_V1.0.0.md @@ -0,0 +1,376 @@ +--- +document_type: "develop_guide" +service: "MOMENTRY_CORE" +title: "Momentry Core V1.0.0 API 示範與整合指南" +date: "2026-05-01" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "api-usage" + - "demo" + - "n8n" + - "wordpress" +ai_query_hints: + - "查詢 V1.0.0 API 示範與整合指南的內容" + - "如何使用 n8n 呼叫 V1.0.0 API?" + - "如何整合 V1.0.0 API 到 WordPress?" + - "V1.0.0 API 的 curl 範例" + - "PHP 整合 V1.0.0 API 的方式(wp_remote_request)" + - "n8n 工作流如何串接 V1.0.0 API" + - "Face 綁定錯誤修正的 API 操作步驟" + - "前端 Face Interpolation 的實作方式" +related_documents: + - "API_V1.0.0/MOMENTRY_CORE_API_V1.0.0.md" + - "API_V1.0.0/API_DICTIONARY_V1.0.0.md" + - "API_V1.0.0/API_REFERENCE_v1.0.0.20260501md.md" + - "API_V1.0.0/CHUNK_DEFINITION_V1.0.0.md" + - "API_V1.0.0/PROCESSOR_SELECTION_V1.0.0.md" +--- + +# Momentry Core V1.0.0 API 示範與整合指南 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-01 | +| 文件版本 | V1.0 | +| 適用版本 | Momentry Core V1.0.0+ | + +--- + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| file_uuid | 32 碼 SHA256 檔案識別碼 | +| X-API-Key | API 認證方式,透過 HTTP Header 傳遞 | +| face_id | 單一幀中的人臉偵測 ID,格式為 `<檢測ID>_<角色後綴>` | +| Identity | 全域人物身份,跨檔案關聯同一人物 | +| Face Interpolation | 前端線性插值,補足非逐幀臉部標記的顯示 | +| Scan | 掃描檔案系統,列出所有檔案及當前狀態 | + +## 1. 快速開始 (Quick Start) + +### 1.1 環境 URL + +| 環境 | URL | 用途 | +|------|-----|------| +| **對外 URL** | `https://api.momentry.ddns.net` | 外部存取 | +| **Dev Server** | `http://localhost:3003` | **開發環境,所有測試用** | +| **Local Server** | `http://localhost:3002` | Production,僅 release 用 | + +### 1.2 測試連線 + +```bash +curl http://localhost:3003/health +``` + +```json +{ + "status": "ok", + "version": "1.0.0 (build: ...)", + "uptime_ms": 64880 +} +``` + +--- + +## 2. 核心 API 工作流 (Workflows) + +### 2.1 掃描檔案系統 (Scan Files) +**入口 API**: `GET /api/v1/files/scan` — 所有 Demo 流程從這裡開始。 + +**掃描檔案**: +```bash +curl -s "http://localhost:3003/api/v1/files/scan" \ + -H "X-API-Key: " +``` + +**列出檔案 (分頁)**: +```bash +curl -s "http://localhost:3003/api/v1/files?page=1&page_size=10" \ + -H "X-API-Key: " +``` + +**取得單一檔案詳情**: +```bash +curl -s "http://localhost:3003/api/v1/files/" \ + -H "X-API-Key: " +``` + +### 2.2 搜尋 (Search) +支援語意搜尋、混合搜尋與視覺搜尋。 + +```bash +curl -X POST "http://localhost:3003/api/v1/search" \ + -H "X-API-Key: " \ + -H "Content-Type: application/json" \ + -d '{"query": "尋找紅色信封", "uuid": ""}' +``` + +### 2.3 單獨 Face 綁定流程 (Single Face Binding Workflow) + +此流程適用於手動將特定臉部關聯到已知人物或建立新人物的場景。系統支援**一人分飾多角**,透過 `face_id` 加上角色後綴來區分。 + +#### 步驟 1: 選定 Face (Input Format) +使用者需提供一個 **`file_uuid`** 搭配 **`face_id`** 來鎖定目標。 +選定的意思是輸入 **`:`** 的組合。 + +* **命名規則**: `face_id` 格式通常為 `<原始檢測 ID>_<後綴>`,用於區分同一人的不同臉部實體或角色。 + * **有角色名稱**: 使用角色名 (如 `123_PeterJoshua`)。 + * **無角色名稱**: 使用通用代號 (如 `123_RoleA`, `123_RoleB`)。 + +#### 步驟 2: 列出 Identities 或新增 Identity +使用者決定將該 Face 綁定到系統中已存在的全域人物 (Identity),或是建立一個新人物。 +* **Identity 特性**: 代表現實世界中的真實人物,具備**全域唯一性** (如 "Cary Grant")。 + +- **選項 A: 列出人物清單** + ```bash + curl -s "http://localhost:3003/api/v1/identities?page=1&page_size=20" \ + -H "X-API-Key: " + ``` + +- **選項 B: 決定新增人物名稱** + 若列表中沒有對應人物,使用者需準備一個新名稱(如 "Cary Grant")。 + +#### 步驟 3: 確認綁定 +透過 `POST /api/v1/identities/bind` 完成綁定。 +* **若提供 `identity_id`**: 將帶有後綴的 `face_id` 綁定至該人物。 +* **若提供 `name`**: 系統自動建立新人物 (Identity),並將該臉部綁定上去。 + +- **綁定至現有身份 (範例)**: + 假設我們要綁定的目標是檔案 `file_uuid_abc` 中的臉部 `123_PeterJoshua`。 + ```bash + curl -X POST "http://localhost:3003/api/v1/identities/bind" \ + -H "X-API-Key: " \ + -H "Content-Type: application/json" \ + -d '{ + "identity_id": 101, + "binding_type": "face", + "binding_value": "123_PeterJoshua" + }' + ``` + *註: 雖然 API 接收的是 `binding_value`,但系統內部會根據選定的 `file_uuid` 與 `face_id` 組合來精確鎖定目標。* + +#### 步驟 4: 循環 +完成綁定後,返回列表處理下一個未綁定的 Face。 + +--- + +### 2.4 取得 Face 截圖 (Retrieve Face Snapshots) + +在確認綁定前,通常需要檢視臉部截圖。根據使用場景,取得截圖有兩種方式: + +#### 1. Local Path / Filename (本地路徑) +* **適用**: Tauri 桌面應用、本機腳本。 +* **說明**: 直接從硬碟讀取圖片檔案,速度最快,無需經過網路層。 +* **路徑**: `//snapshots/faces/.jpg` + +#### 2. URL (網路存取) +* **適用**: Web 前端、外部系統。 +* **說明**: 透過 HTTP GET 請求取得影像串流。 +* **API Endpoint**: `GET /api/v1/files//faces//thumbnail` +* **範例**: + ```bash + curl -s -o face.jpg \ + "http://localhost:3003/api/v1/files//faces//thumbnail" \ + -H "X-API-Key: " + ``` + +--- + +### 2.4.1 前端動態辨識與插值 (Face Interpolation Logic) + +由於系統對臉部標記並非逐幀 (Frame-by-Frame) 進行(為節省運算資源或受限於取樣率),在 Client 端進行**逐幀播放**或**時間軸拖曳**時,若直接顯示會導致臉部框選忽閃忽滅。 + +#### 運作邏輯 +前端需實作**線性插值 (Linear Interpolation)** 機制: + +1. **取得資料**:從 API 取得該 `face_id` 在所有 `frame_number` 的座標列表(例如:Frame 10, Frame 15 有資料)。 +2. **插值計算**: + * 當使用者停在 **Frame 12** 時,系統無直接資料。 + * 前端應找出前後最近的有資料幀(Frame 10 與 Frame 15)。 + * 根據時間差比例,動態計算出 Frame 12 的座標 `x, y, w, h`。 + +#### 實作範例 (JavaScript/TypeScript) + +```typescript +// 假設 API 回傳該 Face 的軌跡點 +const detections = [ + { frame: 10, bbox: { x: 100, y: 100, w: 50, h: 60 } }, + { frame: 15, bbox: { x: 110, y: 105, w: 50, h: 60 } }, +]; + +// 計算 Frame 12 的預測框選 +function getInterpolatedBBox(frameIndex: number, detections) { + // 找到前一幀與後一幀 + const prev = detections.find(d => d.frame <= frameIndex); // Frame 10 + const next = detections.find(d => d.frame > frameIndex); // Frame 15 + + if (!prev) return null; // 還沒開始出現 + if (!next) return prev.bbox; // 結束了,維持最後位置 + + // 計算比例 (0.0 - 1.0) + const ratio = (frameIndex - prev.frame) / (next.frame - prev.frame); + + return { + x: prev.bbox.x + (next.bbox.x - prev.bbox.x) * ratio, + y: prev.bbox.y + (next.bbox.y - prev.bbox.y) * ratio, + // w, h 亦可依此邏輯進行縮放插值 + w: prev.bbox.w, + h: prev.bbox.h, + }; +} +``` + +--- + +### 2.5 Face 綁定錯誤修正 (Face Binding Error Correction) + +此流程適用於移除錯誤綁定的臉部資料,使其恢復為未綁定狀態。 + +1. **選定 Face**: 確認需要解除綁定的臉部 `face_id` 以及所屬的 `file_uuid`。 +2. **解除綁定 (Unbind)**: + ```bash + curl -X POST "http://localhost:3003/api/v1/identities/unbind" \ + -H "X-API-Key: " \ + -H "Content-Type: application/json" \ + -d '{ + "binding_type": "face", + "binding_value": "" + }' + ``` + +--- + +## 3. n8n 整合範例 + +### 3.1 HTTP Request 設定 + +| 欄位 | 值 | +|---|---| +| Method | `GET` 或 `POST` | +| URL | `http://localhost:3003/api/v1/files` (Dev) 或 `https://` (Prod) | +| Header `X-API-Key` | `` | + +### 3.2 列出檔案 Workflow (JSON) +使用 `GET /api/v1/files/scan` 作為入口。 + +```json +{ + "nodes": [ + { + "name": "Get Files", + "type": "n8n-nodes-base.httpRequest", + "parameters": { + "method": "GET", + "url": "http://localhost:3003/api/v1/files/scan", + "sendHeaders": true, + "headerParameters": { + "parameters": [{ "name": "X-API-Key", "value": "{{ $env.API_KEY }}" }] + }, + "options": { "qs": { "page": 1, "page_size": 10 } } + }, + "position": [450, 300] + }, + { + "name": "Extract List", + "type": "n8n-nodes-base.code", + "parameters": { + "jsCode": "return $input.first().json.data.map(f => ({\n json: {\n uuid: f.file_uuid,\n name: f.file_name,\n status: f.status\n }\n}));" + }, + "position": [650, 300] + } + ] +} +``` + +--- + +## 4. WordPress / PHP 整合範例 + +### 4.1 PHP Client Library (V1.0.0 相容) + +```php +'; + + private function request(string $endpoint, array $data = [], string $method = 'GET'): array { + $url = self::API_URL . $endpoint; + $args = [ + 'headers' => [ + 'X-API-Key' => self::API_KEY, + 'Content-Type' => 'application/json', + ], + 'timeout' => 30, + ]; + + if ($method === 'POST') { + $args['method'] = 'POST'; + $args['body'] = json_encode($data); + } + + $response = wp_remote_request($url, $args); + if (is_wp_error($response)) { + throw new Exception($response->get_error_message()); + } + return json_decode(wp_remote_retrieve_body($response), true); + } + + // 掃描檔案 + public function scan_files(): array { + return $this->request('/api/v1/files/scan'); + } + + // 列出檔案 + public function list_files(): array { + return $this->request('/api/v1/files'); + } + + // 搜尋 + public function search(string $query): array { + return $this->request('/api/v1/search', ['query' => $query], 'POST'); + } +} +?> +``` + +--- + +## 5. 疑難排解 + +| 錯誤 | 原因 | 解決方案 | +|------|------|----------| +| `401 Unauthorized` | API Key 無效 | 檢查 Key 格式與權限 | +| `404 Not Found` | 端點不存在 | 確認是否使用了舊版 `/api/v1/videos`,應改為 `/api/v1/files` | +| `400 Bad Request on Process` | 缺少 Probe 資料 | 先執行 `GET /api/v1/files/:file_uuid/probe` | +| `500 Error` | 伺服器錯誤 | 檢查資料庫連線與 Schema 版本 | + +--- + +## 6. 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-01 | 初始版本 | OpenCode | deepseek-chat | +| V1.1 | 2026-05-01 | 修正 port 為 Dev(3003),更新 API 路徑與掃描入口 | OpenCode | deepseek-chat | + +--- + +## 7. 附錄:UUID 格式說明 + +V1.0.0 使用 **32 碼 SHA256** 作為 `file_uuid`。 + +``` +/Users/.../demo/video.mp4 + ↓ +SHA256 Hash (前 32 字元) + ↓ +53e3a229bf68878b7a799e811e097f9c +``` diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/CHILD_DETECTION_AGE_BENCHMARK_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/CHILD_DETECTION_AGE_BENCHMARK_V1.0.0.md new file mode 100644 index 0000000..ee2a44c --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/CHILD_DETECTION_AGE_BENCHMARK_V1.0.0.md @@ -0,0 +1,148 @@ +--- +document_type: "experiment_report" +service: "MOMENTRY_CORE" +title: "兒童偵測與年齡估算模型選型報告" +date: "2026-05-06" +version: "V1.0" +status: "completed" +owner: "Warren" +created_by: "OpenCode" +--- + +# 兒童偵測與年齡估算模型選型報告 + +## 1. 實驗目標 + +在 Momentry Core 的 Face Trace 資料中,尋找「非主要演員中的兒童角色」並評估三種年齡估算方案的可行性: +1. **DeepFace AgeNet** — 深度學習年齡估算(MIT License) +2. **Apple Vision 頭肩比** — 用頭寬/肩寬比例推測年齡(系統內建) +3. **MiVOLO** — HuggingFace 年齡模型(Apache 2.0) + +## 2. 實驗環境 + +| 項目 | 內容 | +|------|------| +| 測試影片 | Charade (1963), 113 min, 24fps | +| Face detections | 6182 faces, 2347 traces | +| Face 偵測 | Apple Vision `VNDetectFaceRectanglesRequest` (swift_face) | +| Face 嵌入 | CoreML FaceNet512 | +| 取樣間隔 | 60 幀 (2.5 秒) | +| 體態偵測 | Apple Vision `VNDetectHumanBodyPoseRequest` | + +## 3. 實驗方法 + +### 3.1 主要角色年齡估算 + +從 2347 個 trace 中挑選 face_count ≥ 5 的 12 個主要 trace,提取中間幀進行 DeepFace 年齡估算 + Apple Vision 頭肩比計算。 + +### 3.2 非主要角色搜尋 + +搜尋小臉(< 60px)、低 face_count(≤ 2)的 trace,找出群眾演員(可能包含兒童)。 + +### 3.3 滑雪場水槍場景 + +Charade 開場 Megève 滑雪場有一名男孩用水槍噴灑女主角的場景。對此場景進行密集幀掃描(30 幀間隔)搜尋兒童臉。 + +## 4. 模型選型結果 + +### 4.1 模型可用性 + +| 方案 | 可用 | 速度/face | License | 結論 | +|------|------|----------|---------|------| +| **DeepFace AgeNet** | ✓ | 0.2s(快取後) | MIT | **推薦** | +| Apple Vision 年齡 | ✗ | — | 系統內建 | Vision 無年齡 API | +| Apple Vision 頭肩比 | ✓ | 即時 | 系統內建 | 僅成人/兒童分類 | +| MiVOLO | ✗ | — | Apache 2.0 | 模型不可用(HuggingFace 不存在) | + +### 4.2 DeepFace 年齡估算(12 主要角色取樣) + +| Trace | Faces | 出現時間 | 臉寬 | DeepFace 年齡 | 性別 | 情緒 | +|-------|-------|----------|------|-------------|------|------| +| 0 | 45 | 35s | 160px | 35 | Man | sad | +| 24 | 6 | 708s | 100px | 34 | Man | neutral | +| 26 | 5 | 728s | 100px | 31 | Woman | neutral | +| 39 | 14 | 760s | 120px | 30 | Man | sad | +| 43 | 12 | 765s | 120px | 25 | Man | sad | +| 45 | 8 | 775s | — | 36 | Woman | neutral | +| 46 | 9 | 795s | — | 29 | Woman | neutral | +| 48 | 6 | 818s | 140px | 50 | Man | angry | +| 76 | 13 | 908s | — | 29 | Man | sad | +| 87 | 5 | 972s | — | 35 | Man | sad | +| 103 | 7 | 1022s | — | 35 | Woman | neutral | +| 132 | 5 | 1158s | — | 27 | Man | surprise | + +**年齡範圍:25–50 歲,全成人。** + +### 4.3 Apple Vision 頭肩比 + +| Frame | 臉寬 | 肩寬 | 頭肩比 | DeepFace 年齡 | 場景 | +|-------|------|------|--------|-------------|------| +| 840 | 160px | 407px | **0.39** | 35 | 滑雪場(主角) | +| 17460 | 100px | 354px | **0.28** | 31 | 中段場景 | +| 18360 | 120px | 306px | **0.39** | 25 | 中段場景 | +| 19620 | 140px | 425px | **0.33** | 50 | 最年長角色 | +| 27780 | 110px | 381px | **0.29** | 27 | 後段場景 | + +**頭肩比範圍:0.28–0.39(全成人範圍)。兒童預期 > 0.6。** + +### 4.4 非主要演員(群眾) + +| Trace | Faces | 臉寬 | DeepFace 年齡 | 性別 | 頭肩比 | 場景 | +|-------|-------|------|-------------|------|--------|------| +| 129 | 1 | 42px | 37 | Man | 0.13 | 遠景群眾 | +| 172 | 2 | 51px | 31 | Man | 0.22 | 遠景群眾 | +| 304 | 2 | 47px | 41 | Man | 0.14 | 遠景群眾 | +| 57 | 1 | 52px | 35 | Woman | — | 遠景群眾 | +| 322 | 1 | 52px | 34 | Man | 0.18 | 遠景群眾 | + +**全成人。遠景群眾頭肩比更低 (0.13–0.22),因相機距離影響 > 體型差異。** + +## 5. 水槍場景搜尋結果 + +**成功找到小孩,但無法可靠估算年齡。** + +| 參數 | 數值 | +|------|------| +| 影片 | Charade (1963) | +| 場景 | Megève 滑雪場戶外餐廳 | +| 時間 | Frame 2450 (102 秒 / 1:42) | +| 臉部尺寸 | **29 × 29 px** | +| Swift Face 偵測 | ✓ 已偵測(trace_id 未分配,單幀) | +| DeepFace 年齡 | 33 Man ❌ **誤判**(解析度不足) | +| Apple Vision 頭肩比 | 無法計算(身體被遮擋) | + +### 誤判原因 + +29×29px 遠低於年齡估算模型的最低解析度需求(一般需 ≥ 50×50px)。在遠景中,兒童的臉太小,神經網路無法提取足夠的年齡特徵,導致: +- DeepFace 將兒童誤判為成人 +- 頭肩比受距離影響大於實際年齡 + +## 6. 結論與建議 + +| 發現 | 說明 | +|------|------| +| Charade 無兒童主要角色 | 全卡司成人,DeepFace 年齡範圍 25–50 | +| 水槍小孩已找到 | Frame 2450,102 秒,但 29px 太小無法估齡 | +| DeepFace 可行 | MIT license,0.2s/face,適合 ≥ 50px 臉部 | +| Apple Vision 頭肩比 | 僅適合作近景成人/兒童分類(非精確年齡) | +| MiVOLO | 不可用(HuggingFace 模型不存在) | + +### 建議 + +1. **整合 DeepFace** 年齡估算入 `face_processor.py` pipeline,對 ≥ 50px 的臉進行年齡標記 +2. **保留頭肩比** 做為輔助驗證(成人/兒童二元分類) +3. **降低取樣間隔** 從 60 幀降至 10–15 幀以捕捉更多短暫出現的角色 +4. **若需測試兒童年齡**:使用片庫中的 `Alice Comedies (1926)`,該片有近景小女孩(Virginia Davis,6–8 歲),臉部可達 150px+ + +--- + +## 附錄:測試資料 + +| 檔案 | 路徑 | +|------|------| +| DeepFace 年齡 JSON | `output_dev/experiments/age_benchmark/age_benchmark_report.json` | +| 頭肩比 JSON | `output_dev/experiments/head_shoulder/head_shoulder_report.json` | +| 水槍場景幀 | `output_dev/experiments/head_shoulder/child_f2450.jpg` | +| 年齡基準腳本 | `scripts/age_benchmark.py` | +| 頭肩比腳本 | `scripts/head_shoulder_quick.py` | +| Face trace 排序 API | `POST /api/v1/file/:file_uuid/face_trace/sortby` | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/CHUNK_DEFINITION_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/CHUNK_DEFINITION_V1.0.0.md new file mode 100644 index 0000000..2fe9b83 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/CHUNK_DEFINITION_V1.0.0.md @@ -0,0 +1,298 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Story Parent-Child Chunk Rules V1.0" +date: "2026-05-05" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "chunk" + - "story" + - "parent-child" + - "v1.0" +ai_query_hints: + - "Story parent-child chunk generation rules" + - "CUT scene → parent chunk, ASR sentence → child chunk" + - "boundary overlap: partial match enriches child context" + - "parent_summary template + child_summary template" + - "children per parent distribution" +related_documents: + - "../CHUNK_DEFINITION_V1.0.0.md" + - "../DUAL_EMBEDDING_PIPELINE_V1.0.0.md" + - "../PROCESSORS/ASR_V1.0.0.md" + - "../PROCESSORS/CUT_V1.0.0.md" +--- + +# Story Parent-Child Chunk Rules V1.0 + +## 核心概念 + +- **Parent chunk** = CUT 場景邊界內的所有對話 → 一個場景敘述 +- **Child chunk** = 單一 ASR sentence → 一句對白 +- **Boundary overlap** = 場景邊界重疊的句子 → 同時歸屬前後 parent + +## 匹配規則 + +### Rule 1: Fully-Contained Matching + +``` +ASR sentence 完全在 CUT 場景時間範圍內 + → seg.start >= scene.start_time AND seg.end <= scene.end_time + → 加入該 scene 的 children 列表 +``` + +### Rule 2: Boundary Overlap (所有 parent) + +``` +對於每個 parent chunk(即使只有 1 child): + → 找出與 scene 時間範圍有 partial overlap 的 ASR sentence + → seg.start < scene.end_time AND seg.end > scene.start_time + → AND 未被 Rule 1 匹配(不是 fully-contained) + → 加入該 scene 的 children 列表 +``` + +邊界 overlap 讓 child chunk 可以同時歸屬前後兩個 parent,提供更多上下文。 + +### Rule 3: Scene Filter + +``` +CUT scene duration < 1s → 跳過(場景太短無意義) +``` + +## Parent Summary 模板 + +``` +[{start}s-{end}s, {duration}s] +Cast: {character_list} +Total dialogue: N lines, W words +Speakers: {name} (N lines): "sample text..." +``` + +## Child Summary 模板 + +``` +[{start}s-{end}s] {speaker_name}: "{asr_text}" +``` + +### Embedding Target + +Child summary text → Ollama nomic-embed-text-v2-moe → 768D vector → pgvector + +## 數據實例:Charade (1963) — 長片 113min + +### 輸入 + +| 來源 | 數量 | 說明 | +|------|------|------| +| ASR segments | **1,629** | Whisper small 英文字幕 | +| ASR with text | 1,629 | 全部有文字 | +| ASR total duration | 6,760s (113 min) | | +| CUT scenes | **1,331** | PySceneDetect 場景切割 | +| CUT scenes ≥ 1s | 1,200 | 過濾後有效場景 | +| CUT mean duration | 5.2s | 平均場景長度 | +| CUT scene gap (unmatched) | 131 | < 1s 場景被過濾 | + +### 輸出 (V2.1 — boundary overlap for ALL scenes, duration filter removed) + +| 指標 | 數值 | +|------|------| +| **Parent chunks** | **1,313** (all CUT scenes ≥ 0s) | +| **Child chunks** (total in DB) | **2,927** (1,629 unique + 1,298 overlaps) | +| **Unique children** | **1,629** (100% ASR coverage) | +| DB duplicates (shared) | 1,298 (ON CONFLICT merge) | +| Children per parent | 1 ~ 43, avg **2.2** | +| Unmatched | **0** | + +### 分佈 + +``` +Children per parent: + 1: 128 parents (獨白/短場景) + 2: 58 parents + 3: 0 parents ← 邊界 overlap 後 3 被 2/4 吸收 + 4-9: 64 parents (中等對話場景) + 10-27: 50 parents (多人對話場景) +``` + +### 已匹配率 + +| 指標 | 數值 | +|------|------| +| ASR unmatched | **0** (V2.1: boundary overlap for ALL scenes) | +| 已匹配率 | **100%** | + +## 輸入/輸出範例 + +### Big Parent(多子女) + +**輸入原始數據**: +``` +CUT scene [2783s-2847s, 65s] +27 ASR sentences, all spoken by Audrey Hepburn + Cary Grant + SPEAKER_2 +``` + +**輸出 Parent Summary**: +``` +[2783s-2847s, 65s] Cast: Audrey Hepburn, Cary Grant, SPEAKER_2. +Total dialogue: 27 lines, 143 words. +``` + +**輸出 Child Summaries**(embedding target): +``` +[2784s-2786s] Audrey Hepburn: "they stole it" +[2786s-2788s] Audrey Hepburn: "by burying it" +[2788s-2790s] Audrey Hepburn: "then reporting the Germans had captured it" +... (27 total) +``` + +**Metadata 信度**(隨 parent/child 傳遞): + +```json +// Parent metadata +{ + "speaker_confidence": { "Audrey Hepburn": 0.85, "Cary Grant": 0.64 }, + "face_confidence": { "Audrey Hepburn": 0.60, "Cary Grant": 0.64 }, + "yolo_objects": { "car": 0.72, "bottle": 0.55, "chair": 0.68 } +} + +// Child metadata +{ + "speaker_name": "Audrey Hepburn", + "speaker_confidence": 0.85, // MAR lip: 57% events during SPEAKER_1 + "face_confidence": 0.60, // clustering composite + "asr_confidence": 0.92 // Whisper confidence +} +``` + +### 1:1 Parent(單子女) + +**輸入原始數據**: +``` +CUT scene [304s-318s, 14s] +1 ASR sentence, spoken by Cary Grant alone +``` + +**輸出 Parent Summary**: +``` +[304s-318s, 14s] Cast: Cary Grant. +Total dialogue: 1 lines, 13 words. +``` + +**輸出 Child Summary**(embedding target): +``` +[309s-317s] Cary Grant: "Sylvia I'm getting a divorce what from Charles he's the only husband I" +``` + +## 與 LLM Pipeline 的關係 + +``` +Pipeline 1 (Story): template summary → DB + embedding +Pipeline 2 (LLM): LLM summary → DB + embedding (future) + +chunk_type: + story_parent / story_child ← Pipeline 1 + llm_parent / llm_child ← Pipeline 2 (future) +``` + +## 版本歷史 + +| 版本 | 日期 | 變更 | +|------|------|------| +| V1.0 | 2026-05-05 | 初始規則:fully-contained + boundary overlap | +| V2.1 | 2026-05-05 | 移除 duration filter,boundary overlap 對所有場景(含空場景)。100% ASR coverage。Speaker mapping 從 DB 動態讀取。 | + +## Charade 1963 統計分析記錄 + +### 影片資料 + +| 指標 | 值 | +|------|-----| +| 片長 | 113 分鐘 | +| 總幀數 | 412,343 | +| FPS | 59.94 | +| 解析度 | 1920×1080 | + +### 處理器產出 + +| Processor | 輸出行數 | 說明 | +|-----------|---------|------| +| CUT | 1,331 scenes | 平均 5.2s/scene,min 0.2s,max 64.5s | +| ASR | 1,629 segments | Whisper small,113 min total | +| ASRX | 10 speakers | SPEAKER_0/1 為主要角色 | +| Face | 4,008 frames, 6,182 faces | sample=60, Vision+CoreML ANE | +| Face Trace | 6,182 detections, 2,347 traces | IoU+embedding tracking | +| Identity | 677 traces → 7 identities | 99.4% coverage, MAR lip speaker binding | +| YOLO | 328,800 frames, 57 object classes | CoreML ANE | + +### Matching 迭代記錄 + +#### Iteration 1: Fully-contained only, >= 1s scene filter + +``` +Rule: seg.start >= scene.start AND seg.end <= scene.end +Scene filter: duration >= 1s (131 scenes filtered out) + +Result: 990/1629 (61%) matched + 454 unmatched, 74 in filtered scenes + Only scenes with children got boundary overlaps +``` + +#### Iteration 2: Add boundary overlap for scenes with >= 3 children + +``` +Rule: For scenes with >= 3 children, add partial overlaps + +Result: 1,210 children (+220 partial) + Still 454 unmatched (boundary overlap only for rich scenes) +``` + +#### Iteration 3: Remove duration filter + +``` +Rule: Remove >=1s scene filter + +Result: 1,496 unique children (92% coverage) + 133 unmatched + Root cause: boundary overlap still gated by "if children:" +``` + +#### Iteration 4: Boundary overlap for ALL scenes (regardless of children) + +``` +Rule: Move boundary overlap code outside "if children:" guard + All 1,331 scenes participate + +Result: 1,629 unique children (100% coverage) + 1,313 parents (all scenes) + 2,927 total children (1,629 unique + 1,298 overlaps) +``` + +### 關鍵決策 + +| 決策 | 理由 | 影響 | +|------|------|------| +| 移除 duration filter | 131 scenes <1s 會漏掉句子 | +24 parents, +321 children | +| 移除 children guard | 空場景也要加 boundary children | +133 children (100%) | +| 用 overlap 而非 fully-contained | ASR/CUT 時間邊界不對齊 | 避免 565 sentences orphan | +| Partial overlaps 存兩次 | 邊界句可歸屬兩個 parent | 1,298 duplicates via ON CONFLICT | +| Speaker map 從 DB 讀 | 不再 hardcode 演員名 | 通用化任何影片 | + +### 效能指標 + +| 指標 | 值 | +|------|-----| +| Story 生成時間 | < 1s (template, instant) | +| Embedding 時間 (Ollama) | ~2 min for 1,629 chunks | +| Qdrant sync 時間 | ~3 min for rule1, ~1 min for story | +| BM25 search 時間 | < 10ms per query | + +### 教學要點 + +1. **時間邊界不對齊是常態**:ASR(語音邊界)與 CUT(視覺邊界)用不同演算法,永遠不會完美對齊。overlap matching 是必要設計。 +2. **Boundary overlap 需對所有場景生效**:不能只限有 children 的場景,否則會產生 orphan sentences。 +3. **ON CONFLICT merge**:同一 sentence 出現在兩個 parent 時,DB 層面用最後一個 parent。如需多對多關係,需 junction table。 +4. **Hardcoded 到 Dynamic**:speaker map 從 hardcode → DB-driven 是通用化的關鍵一步。 diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/CLASS_SYSTEM_DESIGN_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/CLASS_SYSTEM_DESIGN_V1.0.0.md new file mode 100644 index 0000000..7806380 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/CLASS_SYSTEM_DESIGN_V1.0.0.md @@ -0,0 +1,192 @@ +--- +document_type: "design" +service: "MOMENTRY_CORE" +title: "Class 分類系統設計 V1.0" +date: "2026-05-05" +version: "V1.0" +status: "design" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "class" + - "taxonomy" + - "design" + - "v1.0" +ai_query_hints: + - "Class 分層分類系統設計" + - "參照 IPC (國際專利分類) 及 HS (海關稅則)" + - "編碼格式: {section}-{NNNN}" + - "用於 identity 多層分類、快速定位" +related_documents: + - "../DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md" + - "../UUID_ENCODING_RULES_V1.0.0.md" +--- + +# Class 分類系統設計 V1.0 + +> 狀態:設計階段,尚未實施 + +## 設計參考 + +IPC(國際專利分類)與 HS(海關稅則)。 + +共通原則:**層級碼**、**數字越長越精細**、**全球通用**、**可無限擴展**。 + +## 設計目標 + +- IPC/HS 式的 hierarchical code → **快速定位** +- Tag 式的 multi-label 使用 → **靈活分類** +- 同一 entity 可擁有多條 class path +- 新增分類只需 INSERT,無 migration + +``` +Cary Grant + → P-0201 (演員/主角) + → T-0102 (1960s) + → S-0200 (場景/戶外 — 他在片中出現的場景) + +Ferrari 250 GT + → O-0101 (汽車) + → B-0300 (汽車品牌/Ferrari) + → T-0102 (1960s) + +## 編碼格式 + +``` +{section}-{NNNN} + │ └── 4 digits,每 2 digits 一層 + └───────── 1 char section prefix +``` + +| 層級 | 範例 | 意義 | +|------|------|------| +| `P-0000` | top section | 人物 | +| `P-0200` | subclass | 人物 → 演員 | +| `P-0201` | group | 人物 → 演員 → 主角 | +| `P-0202` | group | 人物 → 演員 → 配角 | + +層級判斷:`code.length`。`P-` = section,`P-02` = subclass,`P-0201` = group。 + +### Section 定義 + +| Section | 名稱 | 範疇 | 預留 | +|---------|------|------|------| +| `P` | 人物 | 演員、導演、公眾人物、虛構角色、運動員... | 01-99 | +| `O` | 物件 | 交通工具、家具、武器、工具、電子產品... | 01-99 | +| `B` | 品牌/組織 | 時尚、科技、汽車品牌、政府機構、NGO... | 01-99 | +| `C` | 概念/抽象 | 情感、思想、事件、主題、風格... | 01-99 | +| `A` | 生物 | 動物、植物、真菌... | 01-99 | +| `S` | 場景/地點 | 室內、戶外、城市、自然地標、建築內部... | 01-99 | +| `E` | 環境/自然 | 天氣、地形、天象、自然災害... | 01-99 | +| `M` | 音樂/聲音 | 樂器、音樂類型、自然聲音、人工聲音... | 01-99 | +| `L` | 語言/文字 | 語言、方言、書寫系統、符號... | 01-99 | +| `T` | 時間/時期 | 年代、季節、節日、歷史時期... | 01-99 | +| `F` | 檔案類型 | 影片格式、文件類型、圖片格式... | 01-99 | +| `D` | 領域/學科 | 科學、藝術、體育、政治、經濟... | 01-99 | + +12 個 Section,各 99 subclass × 99 group = ~117K 分類槽位。可隨時新增 Section。 + +## 初始 Class Tree + +``` +P-0000 人物 +├── P-0100 公眾人物 +├── P-0200 演員 +│ ├── P-0201 主角 +│ └── P-0202 配角 +├── P-0300 導演 +├── P-0400 虛構角色 +└── P-9900 其他人物 + +O-0000 物件 +├── O-0100 交通工具 +│ ├── O-0101 汽車 +│ ├── O-0102 船 +│ └── O-0103 飛機 +├── O-0200 建築 +├── O-0300 家具 +└── O-9900 其他物件 + +B-0000 品牌 +├── B-0100 時尚 +├── B-0200 科技 +└── B-9900 其他品牌 + +C-0000 概念 +├── C-0100 情感 +├── C-0200 思想 +└── C-9900 其他概念 +``` + +## Table + +```sql +CREATE TABLE classes ( + code VARCHAR(8) PRIMARY KEY, -- P-0201 + name TEXT NOT NULL, -- 主角 + description TEXT, + created_at TIMESTAMPTZ DEFAULT now() +); + +-- 多對多:同一 identity 可有多個 class code(如 tag 使用) +CREATE TABLE identity_classes ( + identity_id INTEGER REFERENCES identities(id), + class_code VARCHAR(8) REFERENCES classes(code), + confidence REAL DEFAULT 1.0, + source VARCHAR(20), -- which agent classified + PRIMARY KEY (identity_id, class_code) +); +``` + +## Query 範例 + +```sql +-- 查某 identity 的所有 class +SELECT c.code, c.name +FROM identity_classes ic +JOIN classes c ON ic.class_code = c.code +WHERE ic.identity_id = 8; + +-- 查所有屬於 "演員" (P-0200) 的 identity +SELECT i.name +FROM identity_classes ic +JOIN identities i ON ic.identity_id = i.id +WHERE ic.class_code LIKE 'P-02%'; + +-- 查某 section 下的所有 identity +SELECT DISTINCT i.name +FROM identity_classes ic +JOIN identities i ON ic.identity_id = i.id +WHERE ic.class_code LIKE 'P-%'; +``` + +## 擴展方式 + +1. 新增 leaf class:`INSERT INTO classes VALUES ('P-0203', '配音員')` — P-02 底下的新 group +2. 新增 subclass:`INSERT INTO classes VALUES ('P-0500', '製作團隊')` — P 底下的新 subclass +3. 新增 section:`INSERT INTO classes VALUES ('X-0000', '新分類')` — 全新 top-level + +無需 migration,insert 即可。 + +## 版本歷史 + +| 版本 | 日期 | 狀態 | +|------|------|------| +| V1.0 | 2026-05-05 | 設計階段 | + +## Future: Class-Based Search + +實施 class 系統後,search API 可加入 class filter 提升命中率: + +``` +GET /api/v1/search?q=car&class=O-0101 + → 只搜被分類為「汽車」的內容,過濾 "care", "car accident", "car wash" + +GET /api/v1/search/hybrid?q=divorce&class=P-0200 + → 只搜演員說出的 "divorce",排除旁白、字幕 + +GET /api/v1/search/universal?class=T-0102 + → 搜所有 1960s 相關內容 +``` diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md new file mode 100644 index 0000000..6bf5fdd --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/DATA_SCHEMA_FILE_IDENTITY_V1.0.0.md @@ -0,0 +1,328 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Data Schema: File & Identity V1.0" +date: "2026-05-05" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "schema" + - "file" + - "identity" + - "v1.0" +ai_query_hints: + - "File & Identity DB schema" + - "face_detections.identity_id direct FK" + - "identity multi-modal: face + voice + TMDb + manual" +related_documents: + - "../DUAL_EMBEDDING_PIPELINE_V1.0.0.md" + - "../UUID_ENCODING_RULES_V1.0.0.md" +--- + +# Data Schema: File & Identity V1.0 + +## 1. File Schema + +### videos / files + +| Column | Type | 說明 | +|--------|------|------| +| `id` | SERIAL PK | | +| `file_uuid` | VARCHAR(32) | Birth UUID | +| `file_path` | VARCHAR(512) | 檔案完整路徑 | +| `file_name` | VARCHAR(256) | | +| `probe_json` | JSONB | ffprobe raw output | +| `status` | VARCHAR(20) | ready / processing / completed | +| `processing_status` | JSONB | per-processor progress | +| `total_frames` | INTEGER | | +| `fps` | DOUBLE | | +| `duration` | DOUBLE | 影片長度(秒) | +| `width` / `height` | INTEGER | 解析度 | +| `registration_time` | TIMESTAMP | 註冊時間 | + +### face_detections (per-file face data) + +| Column | Type | 說明 | +|--------|------|------| +| `id` | SERIAL PK | | +| `file_uuid` | VARCHAR(32) | → videos.file_uuid | +| `frame_number` | BIGINT | 幀號 | +| `face_id` | VARCHAR(64) | per-file face identifier | +| `trace_id` | INTEGER | 跨幀追蹤 ID | +| `x, y, width, height` | INTEGER | bbox | +| `confidence` | REAL | 偵測信度 | +| `embedding` | REAL[] | 512D CoreML FaceNet | +| `identity_id` | INTEGER | → identities.id (V4.0 direct FK) | + +### chunks (per-file parent/child chunks) + +| Column | Type | 說明 | +|--------|------|------| +| `id` | SERIAL PK | | +| `chunk_id` / `old_chunk_id` | VARCHAR | chunk identifier | +| `file_uuid` | VARCHAR(32) | → videos.file_uuid | +| `chunk_type` | VARCHAR(32) | story_parent / story_child / rule1_sentence | +| `chunk_index` | INTEGER | per-file ordering | +| `start_time` / `end_time` | DOUBLE | time range | +| `content` | JSONB | metadata | +| `text_content` | TEXT | summary text → embedding target | +| `embedding` | VECTOR | pgvector 768D | +| `search_vector` | TSVECTOR | BM25 full-text | +| `parent_chunk_id` | VARCHAR | → chunks.chunk_id | + +## 2. Identity Schema + +### 概念 + +Identity 是可命名的任何識別標的,不限於人。 + +| identity_type | 範例 | 識別模型 | +|--------------|------|---------| +| `people` | Cary Grant, Audrey Hepburn | face, voice, name | +| `animal` | 電影中的狗、馬 | face, body, sound | +| `object` | 特定道具、車輛 | yolo, image embedding | +| `plant` | 場景中的特定植物 | image embedding | +| `building` | 艾菲爾鐵塔、特定建築 | image embedding, OCR | +| `place` | Paris, 咖啡廳 | scene classification | +| `concept` | "離婚", "復仇" | text embedding | +| `brand` | Coca-Cola | OCR, logo detection | + +每種 identity_type 可以使用不同的識別模型組合。 + +### 識別模型 + +| model | dimension | source | 適用 identity_type | +|-------|-----------|--------|-------------------| +| `face` | 512D | CoreML FaceNet | people, animal | +| `voice` | 192D | SpeechBrain ECAPA-TDNN | people | +| `text` | 768D | Ollama nomic-embed | concept, place | +| `image` | 768D | — (future) | object, building, plant | +| `yolo_class` | — | YOLO label | object | + +### Table + +```sql +CREATE TABLE identities ( + id SERIAL PRIMARY KEY, + uuid UUID, -- 32-char UUIDv5 (source:external_id) + name TEXT NOT NULL UNIQUE, + identity_type VARCHAR(30) DEFAULT 'people', -- people/animal/object/building/place/concept + source VARCHAR(20) DEFAULT 'manual', -- tmdb/manual/face_cluster/yolo + status VARCHAR(20) DEFAULT 'pending', + + -- Reference vectors per model (in JSONB for extensibility) + reference_vectors JSONB DEFAULT '{}', + -- { + -- "face": [{"vec":[...], "pose":"frontal", "source":"video_trace"}], + -- "voice": [{"vec":[...], "speaker_id":"SPEAKER_0"}], + -- "image": [{"vec":[...], "source":"manual"}] + -- } + + -- Legacy columns (migrating to reference_vectors) + face_embedding VECTOR(512), + voice_embedding VECTOR(192), + identity_embedding VECTOR(768), + + reference_data JSONB DEFAULT '{}', + metadata JSONB DEFAULT '{}', + tmdb_id INTEGER, + tmdb_profile TEXT, + created_at TIMESTAMP DEFAULT now() +); +``` + +### 彈性設計 + +現有 `face_embedding` / `voice_embedding` column 維持向下相容。 +未來全部移入 `reference_vectors` JSONB,支援任意 model × 多個 reference vectors: + +```json +{ + "reference_vectors": { + "face": [ + {"vec": [0.1, 0.2, ...], "pose": "frontal", "source": "video_trace_0", "confidence": 0.95}, + {"vec": [0.3, 0.4, ...], "pose": "profile", "source": "video_trace_0", "confidence": 0.88} + ], + "voice": [ + {"vec": [0.5, 0.6, ...], "speaker_id": "SPEAKER_0", "source": "asrx"} + ], + "image": [] + } +} +``` + +### 識別 Agent 架構 + +每個識別模型由對應的 Agent 負責。Identity 本身只存 reference vectors,不綁定特定 model。 + +``` + ┌─────────────────────────┐ + │ identities │ + │ name, type, source │ + │ reference_vectors (JSONB)│ + └──────────┬──────────────┘ + │ + ┌────────────────────┼────────────────────┐ + │ │ │ + ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ + │FaceAgent│ │VoiceAgent│ │ImageAgent│ + │ │ │ │ │ (future) │ + │ input: │ │ input: │ │ input: │ + │ face_ │ │ asrx │ │ image │ + │ detect │ │ segments│ │ features│ + │ ions │ │ │ │ │ + │ │ │ │ │ │ + │ output: │ │ output: │ │ output: │ + │ face → │ │ voice → │ │ img → │ + │ identity│ │ identity│ │ identity│ + └─────────┘ └─────────┘ └─────────┘ +``` + +### Agent 定義 + +| Agent | 輸入 | 模型 | 輸出 | 狀態 | +|-------|------|------|------|------| +| **FaceAgent** | `face_detections` | CoreML FaceNet 512D | `identity_id` on face_detections | ✅ | +| **VoiceAgent** | ASRX segments | ECAPA-TDNN 192D + MAR lip | `metadata.speaker_id` | ✅ | +| **ImageAgent** | — | — | — | ⬜ future | +| **YoloAgent** | YOLO detections | — | object → identity | ⬜ future | +| **TextAgent** | chunk text | nomic-embed 768D | concept → identity | ⬜ future | + +### Agent 運作模式 + +``` +1. Agent 讀取 raw detections(face / voice / yolo) +2. 對比 identities.reference_vectors[model] +3. 相似度達標 → bind to existing identity +4. 不達標 → create new identity +5. 更新 identities.reference_vectors(enrich reference set) +``` + +同一個 identity 可以被多個 Agent 同時更新。例如: +- FaceAgent 寫入 `reference_vectors.face` +- VoiceAgent 寫入 `reference_vectors.voice` +- 兩者指向同一個 identity (Cary Grant) + +### Face → Identity 綁定(V4.0) + +``` +face_detections.identity_id ──── FK ────→ identities.id +``` + +Direct FK。不需要 intermediate table。操作 API: + +``` +POST /api/v1/identities/bind + { "file_uuid": "...", "face_id": "face_1", "identity_uuid": "..." } + → UPDATE face_detections SET identity_id = X + +POST /api/v1/identities/unbind + { "file_uuid": "...", "face_id": "face_1" } + → UPDATE face_detections SET identity_id = NULL +``` + +### Voice/Speaker → Identity 綁定 + +透過 `identities.metadata.speaker_id`: + +``` +identities.metadata = {"speaker_id": "SPEAKER_0", "speaker_confidence": 0.85} +``` + +Voice embedding 直接寫入 `identities.voice_embedding`。 + +## 3. File-Identity 關聯 + +``` +file (1a04db97...) identity (Cary Grant) +│ │ +├── face_detections │ +│ ├── face_id="face_1" │ +│ │ identity_id ──────────────────┤ +│ ├── face_id="face_2" │ +│ │ identity_id ──────────────────┤ +│ └── face_id="face_3" │ +│ identity_id = NULL │ ← unbounded +│ │ +├── chunks │ +│ ├── story_parent │ +│ │ content.metadata.characters │ +│ │ = ["Cary Grant", ...] │ +│ └── story_child │ +│ content.metadata.speaker │ +│ = "Cary Grant" │ +│ │ +└── asrx.json │ + └── segments[].speaker_id │ + = "SPEAKER_0" ────────────────┘ + +file_identities (N:N junction, if needed) + file_uuid → identity_uuid +``` + +## 4. Class 分層分類(參照 IPC + HS) + +### 設計參考 + +IPC(國際專利分類)與 HS(海關稅則)的分層編碼體系。 + +| 標準 | 結構 | +|------|------| +| **IPC** | Section(A-H) → Class(2digits) → Subclass → Group/NNN | +| **HS** | Section → Chapter(2digits) → Heading(4digits) → Subheading(6digits) | + +共通原則:**層級碼**、**數字越長越精細**、**全球通用**。 + +### 編碼格式 + +``` +{SECTION}-{NNN}-{NNN}-{NNN} + │ │ │ └─ subgroup + │ │ └──────── main_group + │ └─────────────── subclass + └─────────────────────── section +``` + +| Section | 涵蓋 | +|---------|------| +| `P` | People | +| `O` | Object | +| `B` | Brand | +| `C` | Concept | +| `A` | Animal | +| `S` | Scene | +| `E` | Environment | +| `M` | Music/Sound | + +### Table + +```sql +CREATE TABLE classes ( + code VARCHAR(20) PRIMARY KEY, -- P-001-010/010 + name TEXT NOT NULL, + parent_code VARCHAR(20) REFERENCES classes(code), + section CHAR(1), + level INTEGER DEFAULT 0, + description TEXT, + created_at TIMESTAMPTZ DEFAULT now() +); + +CREATE TABLE identity_classes ( + identity_id INTEGER REFERENCES identities(id), + class_code VARCHAR(20) REFERENCES classes(code), + confidence REAL DEFAULT 1.0, + source VARCHAR(20), + PRIMARY KEY (identity_id, class_code) +); +``` + +## 版本歷史 + +| 版本 | 日期 | 變更 | +|------|------|------| +| V1.0 | 2026-05-05 | File & Identity schema,V4.0 direct FK binding | +| V1.1 | 2026-05-05 | Class 分層分類(IPC/HS),Agent 識別架構 | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/DEV_API_REFERENCE_v1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/DEV_API_REFERENCE_v1.0.0.md new file mode 100644 index 0000000..a8acc2a --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/DEV_API_REFERENCE_v1.0.0.md @@ -0,0 +1,210 @@ +--- +document_type: "reference_doc" +service: "MOMENTRY_CORE" +title: "Momentry Core Dev API 參考文件" +date: "2026-05-06" +version: "V1.1" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "api" + - "reference" + - "dev" + - "v1.1" + - "restful" +related_documents: + - "MOMENTRY_CORE_API_V1.0.0.md" + - "RELEASE/RELEASE_API_REFERENCE_v1.0.0.md" +--- + +# Momentry Core Dev API 參考文件 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-06 | +| 文件版本 | V1.1 | +| Base URL | `http://localhost:3003` | +| 認證方式 | Header `X-API-Key`(部分端點需要) | + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | +|------|------|------|--------| +| V1.1 | 2026-05-06 | 從程式碼實際路由重新產生 53 端點清單 | OpenCode | +| V1.0 | 2026-04-30 | 原始文件,含多個不存在之端點 | OpenCode | + +--- + +## 認證 + +- **Header**: `X-API-Key: ` +- 目前 `/api/v1/auth/login` 回傳固定 demo Key: `muser_test_001` +- Protected routes 透過 `api_key_validation` middleware 驗證 +- Public routes(免 Key): `/health`, `/health/detailed`, `/api/v1/auth/login` + +--- + +## 端點列表 + +總計 **53 個註冊路由**(另有 1 個定義但未掛載)。 + +### 1. 系統與認證(System & Auth) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 1 | GET | `/health` | 基本健康檢查(回傳 status/version/uptime) | ❌ | +| 2 | GET | `/health/detailed` | 詳細健康狀態(含 PG/Redis/Qdrant/MongoDB 各別延遲) | ❌ | +| 3 | POST | `/api/v1/auth/login` | 登入(固定 demo/demo,回傳 API Key) | ❌ | +| 4 | POST | `/api/v1/auth/logout` | 登出 | ✅ | + +### 2. 檔案管理(File Management) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 5 | GET | `/api/v1/files` | 檔案列表(支援分頁、status、q、uuid 過濾) | ✅ | +| 6 | GET | `/api/v1/file/:file_uuid` | 檔案詳細資訊(含 probe_json、metadata) | ✅ | +| 7 | POST | `/api/v1/files/register` | 從磁碟註冊新檔案(支援 pattern 批次註冊) | ✅ | +| 8 | POST | `/api/v1/unregister` | 取消註冊檔案 | ✅ | +| 9 | GET | `/api/v1/files/scan` | 掃描 SFTPGo demo 目錄中的新檔案 | ✅ | +| 10 | GET | `/api/v1/file/:file_uuid/probe` | 取得/快取 ffprobe 資訊 | ✅ | +| 11 | POST | `/api/v1/file/:file_uuid/process` | 啟動處理 pipeline(建立 monitor job) | ✅ | +| 12 | GET | `/api/v1/file/:file_uuid/chunks` | 列出 pre_chunks | ✅ | +| 13 | GET | `/api/v1/progress/:uuid` | 即時處理進度(來自 Redis PubSub) | ✅ | +| 14 | GET | `/api/v1/jobs` | 任務列表(支援分頁、status 過濾) | ✅ | + +### 3. 搜尋(Search) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 15 | POST | `/api/v1/search/visual` | 視覺搜尋 | ✅ | +| 16 | POST | `/api/v1/search/visual/class` | 依物件類別過濾搜尋 | ✅ | +| 17 | POST | `/api/v1/search/visual/density` | 依視覺密度搜尋 | ✅ | +| 18 | POST | `/api/v1/search/visual/stats` | 視覺統計資料 | ✅ | +| 19 | POST | `/api/v1/search/visual/combination` | 視覺組合搜尋(多條件) | ✅ | +| 20 | POST | `/api/v1/search/smart` | 智慧搜尋(語意向量) | ✅ | +| 21 | POST | `/api/v1/search/universal` | 通用搜尋 | ✅ | +| 22 | POST | `/api/v1/search/frames` | 影格搜尋 | ✅ | + +### 4. 身份管理(Identity) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 23 | GET | `/api/v1/identities` | 身份列表 | ✅ | +| 24 | POST | `/api/v1/identity` | 建立身份(從 face.json 建立參考向量) | ✅ | +| 25 | GET | `/api/v1/identity/:identity_uuid` | 身份詳細資訊 | ✅ | +| 26 | DELETE | `/api/v1/identity/:identity_uuid` | 刪除身份 | ✅ | +| 27 | GET | `/api/v1/identity/:identity_uuid/files` | 該身份出現的所有檔案 | ✅ | +| 28 | GET | `/api/v1/identity/:identity_uuid/chunks` | 該身份的時間軸片段 | ✅ | +| 29 | POST | `/api/v1/identity/:identity_uuid/bind` | 綁定信號至身份 | ✅ | +| 30 | POST | `/api/v1/identity/:identity_uuid/unbind` | 解除綁定 | ✅ | +| 31 | POST | `/api/v1/identity/:from_uuid/mergeinto` | 合併身份(將 from 合併至目標) | ✅ | + +### 5. 臉部(Face) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 32 | GET | `/api/v1/faces/candidates` | 臉部候選列表(未綁定者) | ✅ | + +### 6. 媒體串流(Media) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 33 | GET | `/api/v1/file/:file_uuid/video` | 影片串流 | ✅ | +| 34 | GET | `/api/v1/file/:file_uuid/video/bbox` | 含 Bounding Box 的影片串流 | ✅ | +| 35 | GET | `/api/v1/file/:file_uuid/trace/:trace_id/video` | 特定 trace 的影片片段 | ✅ | +| 36 | GET | `/api/v1/file/:file_uuid/thumbnail` | 影片縮圖 | ✅ | + +### 7. 檔案身份關聯(File-Identity) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 37 | GET | `/api/v1/file/:file_uuid/identities` | 該檔案的所有關聯身份 | ✅ | + +### 8. Agent + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 38 | POST | `/api/v1/agents/translate` | 翻譯 Agent | ✅ | +| 39 | POST | `/api/v1/agents/identity/analyze` | 身份分析 Agent | ✅ | +| 40 | POST | `/api/v1/agents/identity/suggest` | 身份合併建議 | ✅ | +| 41 | GET | `/api/v1/agents/identity/status` | 身份 Agent 狀態 | ✅ | +| 42 | POST | `/api/v1/agents/suggest/clustering` | 聚類建議 | ✅ | +| 43 | POST | `/api/v1/agents/suggest/merge` | 合併建議 | ✅ | +| 44 | POST | `/api/v1/agents/5w1h/analyze` | 5W1H 分析 | ✅ | +| 45 | POST | `/api/v1/agents/5w1h/batch` | 5W1H 批量分析 | ✅ | +| 46 | GET | `/api/v1/agents/5w1h/status` | 5W1H 狀態 | ✅ | + +### 9. 資源管理(Resource) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 47 | POST | `/api/v1/resource/register` | 註冊運算資源 | ✅ | +| 48 | POST | `/api/v1/resource/heartbeat` | 資源心跳回報 | ✅ | +| 49 | GET | `/api/v1/resources` | 資源列表 | ✅ | + +### 10. 統計與設定(Stats & Config) + +| # | Method | Path | 說明 | 需 Key | +|---|--------|------|------|--------| +| 50 | GET | `/api/v1/stats/ingest` | 攝取統計(video/chunk 計數) | ✅ | +| 51 | GET | `/api/v1/stats/sftpgo` | SFTPGo 使用者狀態 | ✅ | +| 52 | GET | `/api/v1/stats/inference` | 推理叢集健康狀態 | ✅ | +| 53 | POST | `/api/v1/config/cache` | 切換快取開關 | ✅ | + +--- + +## 未掛載的端點(定義了 handler 但未註冊路由) + +| Handler | 位置 | 說明 | +|---------|------|------| +| `POST /api/v1/file/:file_uuid/face_trace/sortby` | `trace_agent_api.rs` | 定義了 `trace_agent_routes()` 但從未被 `server.rs` merge | + +--- + +## 程式碼中存在 handler 但未註冊路由的端點 + +下列 handler 有實作但**沒有對應的 `.route()` 呼叫**,無法透過 HTTP 存取: + +- `GET /api/v1/assets/:uuid/status` — `get_asset_status` +- `GET /api/v1/jobs/:job_id` — `get_job` +- `GET /api/v1/rules/:rule/status` — `get_rule_status` +- `GET /api/v1/videos/:uuid/details` — `video_details` +- `DELETE /api/v1/videos/:uuid` — `delete_video` +- `POST /api/v1/search` — `search`(語意搜尋) +- `POST /api/v1/search/hybrid` — `hybrid_search` +- `POST /api/v1/search/bm25` — `search_bm25` +- `GET /api/v1/lookup` — `lookup` +- `POST /api/v1/search/smart` — `search_smart`(server.rs 版,實際註冊的是 search.rs 版) + +--- + +## 與 V1.0 文件的差異 + +V1.0 文件(`MOMENTRY_CORE_API_V1.0.0.md`)宣稱的端點中有以下**不存在於實際程式碼**: + +| 文件宣稱 | 實際狀況 | +|----------|---------| +| `DELETE /api/v1/videos/:uuid` | handler 存在但未註冊路由 | +| `POST /api/v1/search` | handler 存在但未註冊路由 | +| `POST /api/v1/search/hybrid` | handler 存在但未註冊路由 | +| `POST /api/v1/assets/:uuid/process` | 實際是 `POST /api/v1/file/:file_uuid/process` | +| `GET /api/v1/files/:uuid/snapshots` | 不存在 | +| `POST /api/v1/files/:uuid/snapshots/migrate` | 不存在 | +| `GET /api/v1/face/list` | 不存在 | +| `POST /api/v1/face/recognize` | 不存在 | + +--- + +## 路徑命名慣例 + +| 資源 | 路由格式 | 參數 | +|------|---------|------| +| 檔案 | `/api/v1/file/:file_uuid` | 32 碼 hex string | +| 身份 | `/api/v1/identity/:identity_uuid` | UUID v4 | +| 資源 | `/api/v1/resource/...` | - | + +注意路徑使用**單數**(`file`, `identity`),與 RELEASE 文件的 `files`, `identities` 不同。 diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/DUAL_EMBEDDING_PIPELINE_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/DUAL_EMBEDDING_PIPELINE_V1.0.0.md new file mode 100644 index 0000000..9c2922b --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/DUAL_EMBEDDING_PIPELINE_V1.0.0.md @@ -0,0 +1,1148 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Pipeline & Rule Architecture: Processor Lifecycle, Embedding, Search V2.0" +date: "2026-05-05" +version: "V2.0" +status: "deprecated" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "chunk" + - "embedding" + - "qdrant" + - "bm25" + - "lifecycle" + - "versioning" + - "v1.0" + - "deprecated" +ai_query_hints: + - "⚠️ 歷史設計文件,非當前實作" + - "Story (template) summarization 已由 5W1H+ Agent 取代" + - "Qdrant 3-collection 架構已簡化為 1 collection + chunk_type 區分" +related_documents: + - "../PROCESSORS/FACE_V1.0.0.md" + - "../PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md" + - "../VECTOR_SPEC_V1.0.0.md" + - "../CHUNK_DEFINITION_V1.0.0.md" + - "../PROCESSOR_SELECTION_V1.0.0.md" +--- + +# Pipeline & Rule Architecture: Processor Lifecycle, Embedding, Search V2.0 + +> ⚠️ **歷史設計文件** — 此文件描述 v1.0 早期開發階段的雙軌 pipeline 設計。Story (template) 與 LLM (on-demand) 兩條 pipeline 皆曾實作,後期因 M5 的 LLM 算力充足,template-based summarization 已被 5W1H+ Agent 取代。當前實作請參考: +> +> - `AGENTS/5W1H_AGENT_V1.0.0.md` — 5W1H+ 遞迴摘要 +> - `AGENTS/IDENTITY_AGENT_V1.0.0.md` — Identity Agent +> - `VECTOR_SPEC_V1.0.0.md` — 向量化規範 + +## 架構概述 + +兩個獨立 pipeline,共用同一底層(Qdrant + BM25),chunk_type 區隔: + +``` + ASR + ASRX + CUT + Face + YOLO + │ + ┌───────────────┴───────────────┐ + ▼ ▼ + Pipeline 1: Story Pipeline 2: LLM + (template, instant) (LLM, on-demand) + │ │ + ┌──────────┼──────────┐ ┌──────────┼──────────┐ + ▼ ▼ ▼ ▼ ▼ ▼ + Parent Child Embedding Parent Child Embedding + Summary Summary (nomic) Summary Summary (nomic) + │ │ │ │ │ │ + └──────────┴──────────┘ └──────────┴──────────┘ + │ │ + ▼ ▼ + ┌───────────────────────┐ ┌───────────────────────┐ + │ Qdrant (vector) │ │ Qdrant (vector) │ + │ PG tsvector (BM25) │ │ PG tsvector (BM25) │ + │ chunk_type: story_* │ │ chunk_type: llm_* │ + └───────────────────────┘ └───────────────────────┘ +``` + +## Pipeline 1: Story (Template-Based) + +### 輸入 +- `{uuid}.asr.json` — 1629 segments (text + timestamps) +- `{uuid}.asrx.json` — 10 speakers (speaker_id + segments) +- `{uuid}.cut.json` — 1331 scenes (start/end times) +- Identity mapping (SPEAKER_0→Cary Grant, SPEAKER_1→Audrey Hepburn, etc.) + +### 處理流程 + +``` +1. build_child_chunks() + ├── CUT scenes 分組 ASR segments + ├── ASRX 時間對齊 → speaker_id → identity name + └── 產出: scenes[children: [{start, end, text, speaker_name}]] + +2. generate_story_parent_summary(scene) + └── Template: "[{start}-{end}, {dur}] Cast: X. Total: N lines, W words. Speakers: ..." + +3. generate_story_child_summary(child, parent) + └── Template: "[{start}-{end}] {name}: \"{text}\"" + +4. embed_text(summary) → EmbeddingGemma 300M (Python MPS, port 11436) → 768D vector + +5. Store: + ├── Qdrant: upsert (point_id=chunk_id, vector=768D, payload={chunk_type, file_uuid, text}) + └── PostgreSQL: chunks table (text_content, tsvector, parent_chunk_id) +``` + +### Chunk Types + +| chunk_type | 說明 | 數量 (Charade) | +|------------|------|----------------| +| `story_parent` | Template parent scene summary | ~300 | +| `story_child` | Per-sentence summary | ~1600 | + +### Embedding Target + +``` +chunk_summary text → EmbeddingGemma 300M (768D, port 11436) → Qdrant collection "momentry_dev" + → PostgreSQL chunks.embedding (VECTOR(768)) +``` + +## Pipeline 2: LLM (On-Demand) + +### 輸入 + +同 Pipeline 1 + LLM server (Gemma4/Qwen) + +### 處理流程 + +``` +1. build_child_chunks() (同 Pipeline 1) + +2. generate_llm_parent_summary(scene) + └── LLM Prompt: "Summarize this scene: {dialogue}" → paragraph (60-100 words) + +3. generate_llm_child_summary(child, parent_summary) + └── LLM Prompt: "{parent_summary} → Summarize this line: {text}" → one sentence +``` + +### Chunk Types + +| chunk_type | 說明 | +|------------|------| +| `llm_parent` | LLM parent scene summary | +| `llm_child` | LLM child sentence summary | + +## Qdrant Storage + +### Collection Architecture (3 Collections) + +| Collection | Source Data | Content | Embedding Target | Point Count (Charade) | Dev Name | +|------------|------------|---------|-----------------|----------------------|----------| +| **rule1** | ASR raw + ASRX speaker (1:1) | 原始對白句 + speaker_id + timestamp | Raw ASR text | 1,629 | `momentry_dev_rule1` | +| **story** | Story Pipeline 1 (template) | Parent scene + Child sentence with identity | Child summary: `[{t}s] {name}: "{text}"` | 1,629 child + 1,313 parent | `momentry_dev_story` | +| **llm_summary** | Story Pipeline 2 (LLM, future) | Parent LLM narrative + Child LLM summary | LLM-generated child summary | TBD | `momentry_dev_llm_summary` | + +### 各 Collection 資料結構對比 + +| | rule1 | story | llm_summary | +|---|---|---|---| +| **輸入** | ASR 原始段 | ASR + CUT + identity + YOLO + TKG | ASR + CUT + identity + YOLO + TKG + LLM | +| **Parent** | ❌ 無 | ✅ scene summary (template) | ✅ LLM narrative paragraph | +| **Child** | 原始句子 | `[時間] 角色: "對白"` | LLM-generated 1-sentence summary | +| **Speaker** | speaker_id | resolved identity name | resolved identity name | +| **搜尋特色** | 精確字詞匹配 | 角色+上下文語意 | 高層次語意理解 | +| **產生成本** | Zero | Zero (template) | High (LLM inference) | +| **狀態** | ⬜ pending | ✅ implemented | ⬜ future | + +## Metadata 信度 (Confidence Scores) + +所有已識別的內容都附帶信度,提供給 Story/LLM processor 做加權參考: + +### 信度來源 + +| 來源 | 欄位 | 值域 | 說明 | +|------|------|------|------| +| **Speaker identity** | `speaker_confidence` | 0-1 | MAR lip analysis: mouth open/close events matched to speaker | +| **Face identity** | `face_confidence` | 0-1 | Identity clustering composite score (similarity + speaker weight) | +| **YOLO object** | `object_confidence` | 0-1 | YOLO detection confidence per object class | +| **ASR text** | `asr_confidence` | 0-1 | Whisper transcription confidence (per segment) | +| **Scene boundary** | `scene_confidence` | — | CUT scene detection is deterministic (binary) | + +### Parent Chunk Metadata + +```json +{ + "start_time": 304, + "end_time": 318, + "characters": ["Cary Grant"], + "speaker_confidence": { + "Cary Grant": 0.85 // MAR lip: 4/4 events during SPEAKER_0 + }, + "face_identities": { + "Cary Grant": 0.64 // clustering composite score + }, + "yolo_objects": { + "car": 0.72, + "bottle": 0.55 + }, + "child_count": 1, + "total_words": 13 +} +``` + +### Child Chunk Metadata + +```json +{ + "start": 308.51, + "end": 317.35, + "text": "Sylvia I'm getting a divorce...", + "speaker_id": "SPEAKER_0", + "speaker_name": "Cary Grant", + "speaker_confidence": 0.85, + "face_confidence": 0.64, + "asr_confidence": 0.92, + "language": "en", + "yolo_objects": ["bottle", "car", "chair"], + "scene_id": 48 +} +``` + +### 信度使用方式 + +Story processor (template) 用信度做: +- 過濾低信度 objects(<0.5 排除,避免 "cell phone" 誤報) +- 標註高信度 speaker identity(>0.8 → 確定的角色名) + +LLM processor (future) 用信度做: +- Prompt context: "Cary Grant (confidence 0.85) says: ..." +- 低信度內容放後段或不放入 prompt + +### Collection 1: rule1 + +Source: ASR sentence + ASRX speaker (1:1, no parent grouping) + +```json +{ + "point_id": "rul1_{uuid}_{start_sec}_{end_sec}", + "vector": "", + "payload": { + "file_uuid": "1a04db97...", + "chunk_type": "rule1_sentence", + "text": "Hello and welcome to the old-time movie show...", + "speaker_id": "SPEAKER_4", + "speaker_name": "Walter Matthau", + "start_time": 1.7, + "end_time": 18.9, + "language": "en" + } +} +``` + +### Collection 2: story + +Source: Story parent-child chunks (Pipeline 1) + +```json +{ + "point_id": "story_{uuid}_{start_sec}_{end_sec}", + "vector": "", + "payload": { + "file_uuid": "1a04db97...", + "chunk_type": "story_child", + "text": "[309s-317s] Cary Grant: \"Sylvia I'm getting a divorce...\"", + "speaker": "Cary Grant", + "parent_chunk_id": "story_parent_xxx_304_318", + "parent_summary": "[304s-318s, 14s] Cast: Cary Grant. Total dialogue: 1 lines, 13 words.", + "start_time": 308.5, + "end_time": 317.4 + } +} +``` + +### Collection 3: llm_summary + +Source: LLM parent-child summaries (Pipeline 2, future) + +```json +{ + "point_id": "llm_{uuid}_{start_sec}_{end_sec}", + "vector": "", + "payload": { + "file_uuid": "1a04db97...", + "chunk_type": "llm_child", + "text": "Cary Grant discusses his divorce from Sylvia with Charles.", + "speaker": "Cary Grant", + "parent_chunk_id": "llm_parent_xxx_304_318", + "parent_llm_summary": "In this scene, Cary Grant reveals...", + "start_time": 308.5, + "end_time": 317.4 + } +} +``` + +### Collection Initialization (Rust) + +```rust +// In QdrantDb or config +pub fn rule1_collection() -> &str { "momentry_dev_rule1" } +pub fn story_collection() -> &str { "momentry_dev_story" } +pub fn llm_summary_collection() -> &str { "momentry_dev_llm_summary" } +``` + +### Search Strategy + +``` +Hybrid Search: "divorce Charles" + ├── Qdrant rule1: vector search raw sentences + ├── Qdrant story: vector search enriched child chunks + └── PG tsvector: BM25 across all three chunk_types + ↓ + Merge scores → ranked results +``` + +### 對應 chunk_type (PostgreSQL) + +| Collection | chunk_type | 說明 | +|------------|-----------|------| +| rule1 | `rule1_sentence` | 原始 1:1 ASR sentence | +| story | `story_child` | Story child chunk (with parent context) | +| story | `story_parent` | Story parent chunk | +| llm_summary | `llm_child` | LLM child summary (future) | +| llm_summary | `llm_parent` | LLM parent summary (future) | + +## BM25 (PostgreSQL Full-Text Search) + +### Strategy + +利用 PostgreSQL 內建 `tsvector` + `tsquery`: + +```sql +-- 已有 search_vector 欄位 (tsvector type) +-- 需 trigger 自動更新 +ALTER TABLE dev.chunks ADD COLUMN IF NOT EXISTS search_vector tsvector; + +-- Trigger: 自動從 text_content 產生 tsvector +CREATE OR REPLACE FUNCTION update_chunk_search_vector() RETURNS trigger AS $$ +BEGIN + NEW.search_vector := to_tsvector('english', COALESCE(NEW.text_content, '')); + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +-- BM25 查詢 +SELECT *, ts_rank(search_vector, query) as bm25_score +FROM dev.chunks +WHERE file_uuid = $1 + AND chunk_type IN ('story_parent', 'story_child', 'llm_parent', 'llm_child') + AND search_vector @@ to_tsquery('english', $2) +ORDER BY bm25_score DESC +LIMIT 20; +``` + +## Search API + +### Hybrid Search (Vector + BM25) + +``` +GET /api/v1/search/hybrid?q=query&file_uuid=xxx + +1. Qdrant: vector search → top N candidates (cosine_score) +2. PostgreSQL: BM25 search → top N candidates (bm25_score) +3. Merge: weighted_rank = 0.7 * cosine_score + 0.3 * bm25_score +4. Return: ranked results with parent context +``` + +### Implementation Status + +| Component | Status | File | +|-----------|--------|------| +| Story processor (template) | ✅ | `scripts/parent_chunk_5w1h.py` | +| LLM processor | ⬜ | `scripts/parent_chunk_5w1h.py` (--mode llm) | +| ProcessorType::Story | ✅ | `src/core/db/postgres_db.rs` | +| Rust processor call | ✅ | `src/worker/processor.rs:ProcessorType::Story` | +| Qdrant collection: rule1 | ⬜ | `momentry_dev_rule1` | +| Qdrant collection: story | ⬜ | `momentry_dev_story` | +| Qdrant collection: llm_summary | ⬜ | `momentry_dev_llm_summary` | +| pgvector embedding | ✅ | `dev.chunks.embedding` (VECTOR) | +| BM25 trigger | ✅ | migration 031, `dev.chunks.search_vector` | +| Hybrid search API | ⬜ | `src/api/server.rs` | + +## Migration Plan + +### 031_add_chunk_search_trigger.sql + +```sql +-- Add search_vector if not exists +ALTER TABLE dev.chunks ADD COLUMN IF NOT EXISTS search_vector tsvector; + +-- Drop old trigger if exists +DROP TRIGGER IF EXISTS trg_chunk_search_vector ON dev.chunks; + +-- Create trigger function +CREATE OR REPLACE FUNCTION update_chunk_search_vector() RETURNS trigger AS $$ +BEGIN + NEW.search_vector := to_tsvector('english', COALESCE(NEW.text_content, '')); + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +-- Create trigger +CREATE TRIGGER trg_chunk_search_vector + BEFORE INSERT OR UPDATE ON dev.chunks + FOR EACH ROW EXECUTE FUNCTION update_chunk_search_vector(); +``` + +## Processor Order (Pipeline DAG) + +``` +Cut ──→ Scene +Asr ──→ Asrx ──→ Story ──→ Embedding Pipeline + ↑ + requires: Asr, Asrx, Cut, Yolo, Face + ↓ + Qdrant + BM25 +``` + +Story processor depends on: Asr, Asrx, Cut, Yolo, Face +Position in `ProcessorType::all()`: **last** (after VisualChunk) + +## Test Plan + +1. `python parent_chunk_5w1h.py --file-uuid xxx --mode story` → produces story_story.json +2. Apply migration 031 → adds search_vector trigger +3. Implement `QdrantDb::upsert_chunk_embedding()` +4. Wire Story processor into Rust pipeline +5. Run `--mode story --embed` → verify Qdrant + BM25 +6. Run `--mode llm --embed` → verify LLM pipeline (when resources allow) + +## Metadata 版本與更新機制 + +### 問題 + +Metadata 信度(speaker_confidence, face_confidence, asr_confidence 等)依賴處理器的模型選型。當模型升級(如 Whisper small → medium,InsightFace → Vision+FaceNet),原有的信度值不再準確,需要重新計算。 + +### 設計:處理器版本追蹤 + +每個處理器記錄其模型版本,metadata 附帶 `source_version` 以便判斷是否需要更新。 + +```json +// chunk metadata 中的 source_version 欄位 +{ + "source_versions": { + "asr": "faster-whisper/small/v1", // ← 處理器 + 模型 + 版本 + "asrx": "speechbrain/ecapa-tdnn/v1", + "face": "apple-vision+coreml-facenet/v2", + "cut": "pyscenedetect/default", + "yolo": "yolov5-coreml/v2", + "speaker_binding": "mar-lip/v1", + "identity_clustering": "cosine-threshold/v1", + "story_processor": "template/v2.0" + }, + "generated_at": "2026-05-05T02:30:00Z" +} +``` + +### 處理器版本表 + +| Processor | 當前版本 | 上一版本 | 升級影響 | +|-----------|---------|---------|---------| +| ASR | `faster-whisper/small/v1` | — | 換 medium/large → asr_confidence 變更 | +| ASRX | `speechbrain/ecapa-tdnn/v1` | — | 換模型 → speaker_id 可能變更 | +| Face detection | `apple-vision/v2` | `insightface/buffalo-l/v1` | bbox, pose 變化 | +| Face embedding | `coreml-facenet/v2` | `insightface-arcface/v1` | embedding 空間完全變更 | +| YOLO | `yolov5-coreml/v2` | `yolov8/v1` | object_confidence 分布變更 | +| Speaker binding | `mar-lip/v1` | — | 換方案 → speaker_confidence 變更 | +| Identity clustering | `cosine-threshold/v1` | — | 換 DBSCAN → face_confidence 變更 | +| Story processor | `template/v2.0` | — | 換 LLM → summary 品質變化 | + +### 更新策略 + +#### Tier 1: 軟更新(metadata only) + +當處理器升級但輸出不變更時(如 YOLO v8 → v5),只需重新計算信度: + +``` +1. 更新 version table +2. 標記受影響的 chunk 為 stale +3. 重新計算 confidence,更新 metadata +4. 不重新生成 embedding(文本不變) +``` + +#### Tier 2: 硬更新(重新產生) + +當處理器輸出結構變更時(如 InsightFace → Vision),需完整重跑: + +``` +1. 標記所有 downstream 為 stale +2. 重跑 Face → Trace → Identity → Story → Embedding +3. 更新所有 source_version +``` + +#### 觸發時機 + +| 事件 | 更新範圍 | +|------|---------| +| ASR 模型升級 | ASR chunks → Rule1 → Story → 重新 embed | +| Face 模型切換 | Face traces → Identity → Story → 重新 embed | +| Speaker binding 改進 | Speaker metadata → Story → 重新 embed | +| Story template 修改 | Story chunks → 重新 embed | + +### 實作 + +```rust +// processor_results table 加入 model_version 欄位 +pub struct ProcessorResult { + ... + pub model_version: Option, // "faster-whisper/small/v1" +} + +// chunks metadata 記錄所有上游版本 +// 比對 current_versions vs source_versions → 決定是否需要更新 +pub fn needs_refresh(chunk_versions: &HashMap, + current_versions: &HashMap) -> bool { + chunk_versions.iter().any(|(proc, ver)| { + current_versions.get(proc) != Some(ver) + }) +} +``` + +### 處理器版本註冊 + +```sql +CREATE TABLE dev.processor_versions ( + processor VARCHAR(32) PRIMARY KEY, + model_version VARCHAR(128) NOT NULL, + updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP +); +``` + +--- + +## Pipeline & Rule 架構定義 + +### 名詞定義 + +| 名詞 | 定義 | 責任 | +|------|------|------| +| **Pipeline** | 排程管理的工作流。包含入庫與出庫。 | Processor 執行順序、依賴、產出、寫入 DB、指定資料來源路徑與欄位 | +| **Processor** | 單一處理單元 | 產出原始數據(JSON, DB rows, embeddings) | +| **Agent** | 智能處理單元 | 推理、匹配、總結(TMDb, Identity, Story) | +| **Rule** | 搜尋 API 的內涵定義 | 定義如何將 Pipeline 產出組織成可搜尋結構 | +| **Search API** | 查詢介面 | 查詢 Rule 產生的 chunk(vector search, BM25, hybrid) | + +### Pipeline → Rule 關係 + +``` +Pipeline (排程管理) Rule (搜尋內涵) +───────────────────── ──────────────── +CUT ──→ Scene +ASR ──→ ASRX ──┐ Rule 1: Sentence Chunks + │ ├── 輸入: ASR + ASRX +YOLO ──────────┤ ├── 產出: 1:1 sentence chunks + │ └── 搜尋: 原始對白 text search +OCR ───────────┤ + │ Rule 2: Visual Chunks +Face ──→ Trace ─┐ ├── 輸入: YOLO + OCR + │ ├── 產出: frame-level visual chunks + ├──→ Identity Agent └── 搜尋: 物件/文字位置查詢 +Pose ──────────┘ + Rule 3: Scene Chunks + ├── 輸入: CUT + ASR + Rule1 + ├── 產出: scene-level parent chunks + └── 搜尋: 場景摘要 + + Rule 4: Story Chunks (5W1H) + ├── 輸入: Rule3 + Identity + YOLO + ├── 產出: parent + child summaries + └── 搜尋: 情節/角色語意搜尋 +``` + +### Pipeline 做什麼 + +1. 管理 Processor 生命週期(Select → Schedule → Execute → Produce → Complete) +2. 處理依賴 DAG(ASRX 需等 ASR 完成) +3. 排程資源(max_concurrent, slot allocation) +4. 處理錯誤(retry, skip non-essential) +5. **入庫** — 產出數據並寫入儲存層 +6. **出庫** — 指定數據來源路徑與欄位,供 Rule/API 查詢 + +### Pipeline 入庫/出庫 明細 + +#### 內部產出(from Path) + +| Processor | 入庫 Path | 入庫 Table.Column | 出庫 | +|-----------|----------|-------------------|------| +| CUT | `{uuid}.cut.json` | — | Rule3: scenes | +| ASR | `{uuid}.asr.json` | — | Rule1: segments | +| ASRX | `{uuid}.asrx.json` | — | Rule1: speaker_id | +| YOLO | `{uuid}.yolo.json` | — | Rule2: detections | +| OCR | `{uuid}.ocr.json` | — | Rule2: texts | +| Face Detect | `{uuid}_detect.json` | — | Face Embed: bbox | +| Face Embed | `{uuid}.face.json` | — | Face Trace: embedding | +| Face Trace | `{uuid}.face_traced.json` | `face_detections` | Identity: trace_id | +| Identity Agent | — | `face_detections.identity_id` | Story: names | +| Story Agent | `{uuid}.story_story.json` | `chunks` | Search API | +| Embedding Agent | — | `chunks.embedding` | Qdrant sync | +| *All Processors* | — | `processor_results` (job_id, processor, status, started_at, completed_at, error_message) | 生命週期追蹤 | +| *All Processors* | — | `monitor_jobs` (uuid, status, processors, completed_processors) | Pipeline 排程 | + +#### 外部取得(from URL) + +每個唯一的 (URL + Method + 參數組合) 獨立列管。參數或模型不同 → 視為不同資料源。 + +| ID | 來源 | URL | Method | 參數/模型 | Auth | 入庫 | 出庫 | +|----|------|-----|--------|----------|------|------|------| +| **U01** | TMDb | `api.themoviedb.org/3/search/movie` | GET | `query={name}`, `language=en-US` | `TMDB_API_KEY` | `identities.tmdb_id` | Identity Agent | +| **U02** | TMDb | `api.themoviedb.org/3/movie/{id}/credits` | GET | `movie_id` | `TMDB_API_KEY` | `identities.metadata` | Identity Agent | +| **U03** | TMDb | `image.tmdb.org/t/p/w185/{path}` | GET | `profile_path` | — | `identities.tmdb_profile` | Identity Agent | +| **U04** | TMDb | `tmdb_embed_extractor.py` (local) | Py | `model=coreml-facenet/v2` | — | `identities.face_embedding (512D)` | Identity Agent | +| **U05** | EmbeddingGemma | `localhost:11436/v1/embeddings` | POST | `input=text`, `model=embeddinggemma-300m` | — | `chunks.embedding (768D)` | Qdrant search | +| **U07** | Ollama | `localhost:11434/api/chat` | POST | `model=qwen3:8b` (future) | — | `chunks.text_content` | Story LLM | +| **U08** | Ollama | `localhost:11434/api/chat` | POST | `model=gemma4` (future) | — | `chunks.text_content` | Story LLM | +| **U09** | Qdrant | `localhost:6333/collections/{name}/points` | PUT | `collection=momentry_dev_rule1` | `QDRANT_API_KEY` | rule1 vectors | Search | +| **U10** | Qdrant | `localhost:6333/collections/{name}/points` | PUT | `collection=momentry_dev_story` | `QDRANT_API_KEY` | story vectors | Search | +| **U11** | Qdrant | `localhost:6333/collections/{name}/points` | PUT | `collection=momentry_dev_llm_summary` | `QDRANT_API_KEY` | llm vectors | Search | + +#### Ollama 模型變體影響 + +| Model | Dim | 用途 | 影響範圍 | +|-------|-----|------|---------| +| `EmbeddingGemma 300M` | 768 | 多語言 embedding | 所有 chunk embedding 需重算 | +| `mxbai-embed-large` | 1024 | 英文為主(已棄用) | 改 dim → Qdrant collection 重建 | +| `qwen3:8b` | — | LLM summarization | Story parent/child summary 文本變更 | +| `qwen3:14b` | — | 同上,品質較好 | 同上 | +| `gemma4:4b` | — | 同上,較輕量 | 同上 | + +#### 參數變更觸發規則 + +| 變更類型 | 觸發 | 範例 | +|---------|------|------| +| 換 model | 所有 downstream stale | `EmbeddingGemma 300M`(768D)取代 `nomic-embed-text-v2-moe`(768D),dim 不變 | +| 同 model 參數變更 | 只影響該層 | Qdrant collection rename | +| API endpoint 變更 | 重試策略 + 通知 | TMDb API v3 → v4 | + +### Rule 做什麼 + +1. 讀取 Pipeline 產出的原始數據 +2. 組織成父子 chunk 結構 +3. 生成 summary text +4. 呼叫 Embedding (EmbeddingGemma 300M, Python MPS, port 11436) +5. 存入 Qdrant + PostgreSQL (vector + BM25) +6. 提供 Search API 查詢 + +### 五階段運作模型 + +每個 Processor/Agent 遵循標準五階段: + +``` +1. 選擇 ──→ 2. 排程 ──→ 3. 執行 ──→ 4. 產出 ──→ 5. 完成 +(Select) (Schedule) (Execute) (Produce) (Complete) +``` + +| 階段 | 英文 | 責任 | 實例(Face V2.0) | +|------|------|------|-------------------| +| **1. 選擇** | Select | 決定模型/演算法/版本 | `face_detection=apple-vision/v2`, `face_embedding=coreml-facenet/v2` | +| **2. 排程** | Schedule | 依賴檢查、slot allocation、max_concurrent | Face depends on [] → slot available → queued | +| **3. 執行** | Execute | 執行 scripts / agent logic、監控資源、timeout | `python face_processor.py video out.json --sample 60` | +| **4. 產出** | Produce | 寫入輸出 JSON、DB table、觸發 post-processing | face.json → face_detections DB → store_traced_faces.py | +| **5. 完成** | Complete | 標記 DONE、記錄 source_versions、觸發 downstream | `processor_results.status='completed'`, 通知下游 StoryAgent | + +### 每階段記錄 + +```json +{ + "processor": "face", + "run_id": "run_20260505_001", + "select": { + "model_version": "apple-vision+coreml-facenet/v2", + "sample_interval": 60, + "config": {"detector": "Vision", "embedder": "FaceNet"} + }, + "schedule": { + "queued_at": "2026-05-05T00:00:00Z", + "started_at": "2026-05-05T00:00:05Z", + "wait_reason": null + }, + "execute": { + "process_pid": 16490, + "cpu_percent": 30, + "memory_mb": 50, + "duration_sec": 433, + "error": null + }, + "produce": { + "output_file": "face.json", + "output_size_bytes": 115343360, + "frames_processed": 4008, + "embeddings_generated": 6182, + "db_rows_inserted": 6182 + }, + "complete": { + "completed_at": "2026-05-05T00:07:18Z", + "status": "DONE", + "source_versions": { + "face_detection": "apple-vision/v2", + "face_embedding": "coreml-facenet/v2" + } + } +} +``` + +### 以 Face V2.0 為例 + +``` +1. SELECT: 選擇 apple-vision (detection) + coreml-facenet (embedding) + sample_interval=60, max_concurrent=1 + ↓ +2. SCHEDULE: Face 無依賴,slot available → 排入 processor pool + ↓ +3. EXECUTE: swift_face (Vision ANE detection, 26% CPU, ~3min) + face_processor.py (CoreML embedding, 0% CPU, ~4min) + total: 433s, CPU 30%, memory 50MB + ↓ +4. PRODUCE: face.json (110MB, 4008 frames, 6182 embeddings) + face_detections DB (6182 rows with trace_id) + face_traced.json → store_traced_faces.py + ↓ +5. COMPLETE: status=DONE, source_versions written + ↓ 觸發下游 + TMDbAgent → IdentityAgent → StoryAgent → EmbeddingAgent +``` + +### State Machine + +``` + ┌─────────┐ + │ PENDING │ ← queued, waiting for dependencies + └────┬────┘ + │ dependencies met, slot available + ┌────▼────┐ + ┌─────│ RUNNING │─────┐ + │ └────┬────┘ │ + │ │ │ + crash/timeout success error + │ │ │ + ┌────▼────┐ ┌───▼───┐ ┌───▼───┐ + │ STALE │ │DONE │ │FAILED │ + └────┬────┘ └───┬───┘ └───┬───┘ + │ │ │ + │ version change │ + │ │ │ + └──────────┼──────────┘ + │ + ┌────▼────┐ + │ Refresh │ → mark downstream stale → re-run + └─────────┘ +``` + +### Processor 實作類別 + +三種實作語言,不同執行方式: + +| 類別 | 執行方式 | 特點 | 範例 | +|------|---------|------|------| +| **Python** | `PythonExecutor.run(script, args)` | 彈性最大,模型多 | ASR, YOLO, Face embedding, Story | +| **Swift** | compiled binary, `subprocess.call(binary)` | ANE 原生,極低 CPU | Face detection, OCR, Pose | +| **Rust** | native tokio task | DB 操作,pipeline 排程 | Rule1/3 ingest, IdentityAgent, TMDbAgent | + +### 各 Processor 類別 + +| Processor | 類別 | 執行內容 | +|-----------|------|---------| +| CUT | **Python** | `cut_processor.py` (PySceneDetect) | +| ASR | **Python** | `asr_processor.py` (faster-whisper) | +| ASRX | **Python** | `asrx_processor_custom.py` (SpeechBrain) | +| OCR | **Swift** | `swift_ocr` → Python wrapper | +| YOLO | **Python** | `yolo_processor.py` (CoreML YOLOv5) | +| Face detection | **Swift** | `swift_face` (Vision ANE) | +| Face embedding | **Python** | `face_processor.py` (CoreML FaceNet) | +| Face trace | **Python** | `store_traced_faces.py` → `face_tracker.py` | +| Pose | **Swift** | `swift_pose` (Vision ANE) | +| Scene | **Python** | `scene_classification.py` | +| VisualChunk | **Rust** | `visual_chunk.rs` (native) | +| Story | **Python** | `parent_chunk_5w1h.py` (template/LLM) | +| Rule1 Ingest | **Rust** | `rule1_ingest.rs` (DB + embedding) | +| Rule3 Ingest | **Rust** | `rule3_ingest.rs` + LLM client | +| TMDbAgent | **Rust** | `tmdb/probe.rs` + `tmdb/face_agent.rs` | +| IdentityAgent | **Python** | `experiments/identity_clustering/runner_v2.py` | +| EmbeddingAgent | **Python** | Ollama API call → pgvector write | + +--- + +## Processor/Agent 登錄冊 + +### CUT — Scene Detection + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Python | +| **簡要說明** | PySceneDetect 偵測視覺場景切換。輸出 scene boundaries。1331 scenes for Charade 113min。 | +| **依賴** | 無 | +| **選型測試** | `scripts/swift_processors/swift_cut_test.swift` (Swift AVFoundation — slower, not adopted) | +| **相關文件** | `docs_v1.0/.../PROCESSORS/CUT_V1.0.0.md` | +| **輸入** | `video_path` (MP4/MOV file) | +| **產出** | `{uuid}.cut.json`: `{scenes: [{scene_number, start_frame, end_frame, start_time, end_time}]}` | + +### ASR — Speech Recognition + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Python | +| **簡要說明** | faster-whisper small 語音轉文字。1,629 segments, 113 min for Charade。支援 language detection。 | +| **依賴** | 無 | +| **選型測試** | `scripts/swift_processors/asr_swift.swift` (Apple Speech Framework — quality insufficient) | +| **相關文件** | `docs_v1.0/.../PROCESSORS/ASR_V1.0.0.md` | +| **輸入** | `video_path`, `{uuid}.cut.json`, `output_path` | +| **產出** | `{uuid}.asr.json`: `{language, segments: [{start, end, text, language}]}` | + +### ASRX — Speaker Diarization + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Python | +| **簡要說明** | SpeechBrain ECAPA-TDNN 192D speaker embedding + clustering。10 speakers in Charade。 | +| **依賴** | ASR | +| **選型測試** | `scripts/swift_processors/asrx_swift.swift`, `speaker_test.swift` (Apple Speech Framework — no speaker embedding API) | +| **相關文件** | `docs_v1.0/.../PROCESSORS/ASRX_V1.0.0.md` | +| **輸入** | `video_path`, `{uuid}.asr.json`, `output_path` | +| **產出** | `{uuid}.asrx.json`: `{segments: [{start_time, end_time, speaker_id}], embeddings: [192D], speaker_stats}` | + +### OCR — Text Recognition + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 PaddleOCR → V2.0 Swift Vision / 2026-05 | +| **類別** | Swift + Python wrapper | +| **簡要說明** | VNRecognizeTextRequest, 30+ languages, ANE accelerated。 | +| **依賴** | 無 | +| **選型測試** | `scripts/swift_processors/swift_ocr.swift`, `vision_ocr_test.swift` | +| **相關文件** | `docs_v1.0/.../PROCESSORS/OCR_V1.0.0.md` | +| **輸入** | `video_path`, `output_path`, `--sample-interval` | +| **產出** | `{uuid}.ocr.json`: `{frames: [{frame, timestamp, texts: [{text, confidence, bbox}]}]}` | + +### YOLO — Object Detection + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 YOLOv8 → V2.0 CoreML YOLOv5 / 2026-05 | +| **類別** | Python | +| **簡要說明** | CoreML YOLOv5 80-class, ANE accelerated。328K frames for Charade。 | +| **依賴** | 無 | +| **選型測試** | `scripts/swift_processors/vision_object_test.swift` (Vision — no general object API) | +| **相關文件** | `docs_v1.0/.../PROCESSORS/YOLO_V1.0.0.md` | +| **輸入** | `video_path`, `output_path`, `--uuid` | +| **產出** | `{uuid}.yolo.json`: `{frames: {frame_num: {detections: [{class_name, confidence, x1,y1,x2,y2}]}}}` | + +### Face Detection — 人臉偵測 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 InsightFace → V2.0 Apple Vision / 2026-05 | +| **類別** | Swift | +| **簡要說明** | VNDetectFaceRectanglesRequest + VNDetectFaceLandmarksRequest。ANE, 0% CPU。Bbox + pose + lip landmarks。 | +| **依賴** | 無 | +| **選型測試** | `scripts/swift_processors/face_vision_test.swift`, `face_compare_test.swift` | +| **相關文件** | `docs_v1.0/.../PROCESSORS/FACE_V1.0.0.md` | +| **輸入** | `video_path`, `output_detect.json`, `--sample-interval` | +| **產出** | `{uuid}_detect.json`: `{frames: [{frame, timestamp, faces: [{bbox, confidence, pose, lips}]}]}` | + +### Face Embedding — 人臉向量 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 InsightFace ArcFace → V2.0 CoreML FaceNet / 2026-05 | +| **類別** | Python | +| **簡要說明** | CoreML FaceNet InceptionResnetV1 512D embedding, MIT license, ANE。 | +| **依賴** | Face Detection | +| **選型測試** | `scripts/swift_processors/face_vision_test.swift` (VNFaceprint — private API, unusable) | +| **相關文件** | `docs_v1.0/.../PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md` | +| **輸入** | `video_path`, `{uuid}_detect.json`, `output_path` | +| **產出** | `{uuid}.face.json`: `FaceResult {frame_count, fps, frames: [{frame, timestamp, faces: [{x,y,w,h, embedding(512D), pose_angle, lips}]}]}` | + +### Pose — 姿態估計 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 MediaPipe → V2.0 Apple Vision / 2026-05 | +| **類別** | Swift + Python wrapper | +| **簡要說明** | VNDetectHumanBodyPoseRequest, 19 joints, ANE。8,190 poses for Charade。 | +| **依賴** | 無 | +| **選型測試** | `scripts/swift_processors/swift_pose.swift`, `pose_benchmark.swift` | +| **相關文件** | `docs_v1.0/.../PROCESSORS/POSE_V1.0.0.md` | +| **輸入** | `video_path`, `output_path`, `--sample-interval` | +| **產出** | `{uuid}.pose.json`: `{frame_count, fps, frames: [{frame, timestamp, persons: [{keypoints, bbox}]}]}` | + +### Face Trace — 人臉追蹤 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-05 / OpenCode | +| **類別** | Python | +| **簡要說明** | IoU + embedding cosine cross-frame tracking → trace_id。存入 DB + pgvector。 | +| **依賴** | Face Detection, Face Embedding | +| **選型測試** | `scripts/utils/face_tracker.py` (553 frames → 2 traces) | +| **相關文件** | `docs_v1.0/.../FACE_TRACKER_GUIDE.md` | +| **輸入** | `{uuid}.face.json`, `--file-uuid` | +| **產出** | `{uuid}.face_traced.json`, DB: `face_detections` (trace_id + bbox + embedding) | + +### Speaker Binding Agent — 語者綁定 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-05 / OpenCode | +| **類別** | Python | +| **簡要說明** | MAR (Mouth Aspect Ratio) lip movement → correlate with speaker segments → identify who is speaking。 | +| **依賴** | ASRX, Face Detection | +| **選型測試** | Inline: solo-frame overlap + MAR open/close event counting | +| **相關文件** | `docs_v1.0/.../DUAL_EMBEDDING_PIPELINE_V1.0.0.md` (Metadata 信度) | +| **輸入** | `{uuid}.face.json` (lip landmarks), `{uuid}.asrx.json` (speaker segments) | +| **產出** | Speaker→identity mapping + confidence scores。Stored in `identities.metadata`。 | + +### TMDb Agent — 電影資料庫 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Rust | +| **簡要說明** | TMDb API search → cast download → identity creation with face embeddings。15 cast members, 9 with embeddings for Charade。 | +| **依賴** | 無 (external API) | +| **選型測試** | N/A (API integration) | +| **相關文件** | `src/core/tmdb/probe.rs`, `src/core/tmdb/face_agent.rs` | +| **輸入** | `TMDB_API_KEY` env var, video filename | +| **產出** | `identities` table (name, tmdb_id, tmdb_profile, face_embedding) | + +### Identity Agent — 身份聚類 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V2.0 / 2026-05 / OpenCode | +| **類別** | Python | +| **簡要說明** | Multi-stage face trace clustering: TMDb bind → video reference enrichment → iterative cosine matching → 99.4% coverage。 | +| **依賴** | Face Trace, TMDb Agent, Speaker Binding | +| **選型測試** | `experiments/identity_clustering/configs/exp_001-008.json` | +| **相關文件** | `experiments/identity_clustering/README.md`, `docs_v1.0/.../FACE_TO_IDENTITY_FLOW.md` | +| **輸入** | `face_detections` DB, `tkg_edges` SPEAKS_AS data | +| **產出** | `labels.json` (trace→identity mapping), DB: `face_detections.identity_id` | + +### Story Agent — 故事摘要 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V2.0 / 2026-05 / OpenCode | +| **類別** | Python | +| **簡要說明** | Template parent-child chunk generation + embedding。300 parents, 1,175 children for Charade。支援 --mode llm (future)。 | +| **依賴** | ASR, ASRX, CUT, YOLO, Identity Agent | +| **選型測試** | `experiments/identity_clustering/runner_v2.py` (identity binding for speaker resolution) | +| **相關文件** | `docs_v1.0/.../DUAL_EMBEDDING_PIPELINE_V1.0.0.md`, `docs_v1.0/.../CHUNK_DEFINITION_V1.0.0.md` | +| **輸入** | `{uuid}.asr.json`, `{uuid}.asrx.json`, `{uuid}.cut.json`, `{uuid}.yolo.json` | +| **產出** | `{uuid}.story_story.json`, DB: `chunks` table (parent + child), pgvector (768D) | + +### Embedding Agent — 向量化 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-05 / OpenCode | +| **類別** | Python | +| **簡要說明** | EmbeddingGemma 300M(Python MPS, port 11436)→ 768D vector → pgvector + Qdrant。 | +| **依賴** | Story Agent | +| **選型測試** | N/A (API integration) | +| **相關文件** | `docs_v1.0/.../VECTOR_SPEC_V1.0.0.md` | +| **輸入** | `chunks` table (text_content column) | +| **產出** | `chunks.embedding` (VECTOR 768D) | + +### Scene — 場景分類 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Python | +| **簡要說明** | Places365 場景分類模型,為每個 CUT scene 分配場景標籤(室內/室外、街道/辦公室等)。 | +| **依賴** | CUT | +| **選型測試** | — | +| **相關文件** | `docs_v1.0/.../PROCESSORS/SCENE_V1.0.0.md` | +| **輸入** | `video_path`, `{uuid}.cut.json`, `output_path` | +| **產出** | `{uuid}.scene.json`: `{scenes: [{scene_id, labels, confidence}]}` | + +### VisualChunk — 視覺分塊 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Rust | +| **簡要說明** | YOLO 物件偵測結果 → 視覺 chunk(相似物件組合)。目前 fixed-frame + similarity-based 兩種策略。 | +| **依賴** | YOLO | +| **選型測試** | `src/core/processor/visual_chunk.rs` (Jaccard similarity) | +| **相關文件** | — | +| **輸入** | `{uuid}.yolo.json`, DB | +| **產出** | `pre_chunks` table (chunk_type='visual') | + +### Rule1 Ingest — 句塊入庫 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Rust | +| **簡要說明** | ASR sentence + ASRX speaker → sentence chunks → DB + Qdrant。未實作(Sentence chunks = 0 for Charade)。 | +| **依賴** | ASR, ASRX | +| **選型測試** | `src/core/chunk/rule1_ingest.rs` (time overlap matching) | +| **相關文件** | `docs_v1.0/.../CHUNK_DEFINITION_V1.0.0.md` | +| **輸入** | `{uuid}.asr.json`, `{uuid}.asrx.json`, DB | +| **產出** | `chunks` table (chunk_type='sentence', rule='rule_1'), Qdrant `momentry_dev_rule1` | + +### Rule3 Ingest — 場景塊入庫 + +| 項目 | 內容 | +|------|------| +| **出生登記** | V1.0 / 2026-04 / OpenCode | +| **類別** | Rust | +| **簡要說明** | CUT scene + Rule1 children → parent scene chunks + LLM 5W1H summary。 | +| **依賴** | CUT, ASR, Rule1 | +| **選型測試** | `src/core/chunk/rule3_ingest.rs` (LLM generate_5w1h_summary) | +| **相關文件** | `docs_v1.0/.../CHUNK_DEFINITION_V1.0.0.md` | +| **輸入** | `{uuid}.cut.json`, `chunks` table (rule1) | +| **產出** | `chunks` table (chunk_type='cut', rule='rule_3'), Qdrant | + +### Processor 生命週期 + +| State | 觸發 | 動作 | +|-------|------|------| +| **PENDING** | job created, dependencies not met | 等待 processor pool slot | +| **RUNNING** | dependencies met, slot acquired | Python script / agent running | +| **DONE** | script exit 0, output valid | → trigger downstream processors | +| **FAILED** | script exit ≠0, timeout, OOM | → retry or skip (non-essential) | +| **STALE** | upstream processor version changed | → needs refresh | + +### Agent 生命週期 + +Agents(TMDb Agent, Identity Agent, Story Agent)與 Processor 共用生命週期,但有額外特性: + +| Agent | 觸發條件 | 生命週期特點 | +|-------|---------|-------------| +| **TMDbAgent** | Face DONE | 外部 API call,網路 unstable → retry 次數限制 | +| **IdentityAgent** | Face DONE + ASRX DONE | 多階段(Stage 1→2→3),每階段獨立狀態 | +| **StoryAgent** | All processor DONE | 可選 LLM mode(資源允許時),template mode always available | +| **EmbeddingAgent** | Story DONE | 純 CPU task,批次處理,可 resume | + +### 依賴圖 (DAG) + +``` +CUT ──────→ Scene + │ +ASR ──→ ASRX ─┤ + │ +YOLO ─────────┤ + │ +OCR ──────────┤ + │ +Face ──→ Trace ──→ IdentityAgent ─┐ + │ │ + ├───────────────────┤ +Pose ─────────┤ │ + │ │ +VisualChunk ──┘ │ + │ │ + └─── StoryAgent ────┤ + │ │ + EmbeddingAgent │ + │ │ + Qdrant + BM25 │ + │ + └─ TMDbAgent +``` + +### 下游傳播規則 + +當上游版本變更時: + +``` +ASR version change (v1 → v2) + → ASRX: STALE (depends on ASR) + → Rule1 chunks: STALE + → StoryAgent: STALE (depends on ASR + Rule1) + → Embedding: STALE + +Face version change (InsightFace → Vision) + → Trace: STALE + → IdentityAgent: STALE (embeddings changed) + → TKG: STALE (trace_id changed) + → StoryAgent: STALE (identity names may change) + → Embedding: STALE + +Template change (Story template v1 → v2) + → Story chunks: STALE + → Embedding: STALE (text changed) +``` + +### Stale Detection Logic + +```python +def check_stale(file_uuid, current_versions): + """Check which processors/agents need refresh""" + chunks = db.query("SELECT source_versions, chunk_type FROM chunks WHERE file_uuid=?", file_uuid) + + stale_agents = set() + for chunk in chunks: + for proc, ver in current_versions.items(): + if chunk.source_versions.get(proc) != ver: + stale_agents.add(proc) + # Propagate downstream + stale_agents.update(get_downstream(proc)) + + return stale_agents +``` + +### Refresh Pipeline + +``` +1. check_stale() → identify affected processors/agents +2. Mark affected chunks as STALE +3. Re-run stale processors/agents in dependency order +4. Re-embed updated chunks +5. Update source_versions in chunk metadata +``` + +### Current Version Registry (Charade) + +| Processor | Model Version | Status | +|-----------|--------------|--------| +| CUT | `pyscenedetect/default` | ✅ | +| ASR | `faster-whisper/small/v1` | ✅ | +| ASRX | `speechbrain/ecapa-tdnn/v1` | ✅ | +| OCR | `apple-vision/v1` | ✅ | +| YOLO | `yolov5-coreml/v2` | ✅ | +| Face detection | `apple-vision/v2` | ✅ | +| Face embedding | `coreml-facenet/v2` | ✅ | +| Pose | `apple-vision/v1` | ✅ | +| Trace | `iou+embedding/v1` | ✅ | +| TMDbAgent | `tmdb-api/v1` | ✅ | +| IdentityAgent | `cosine-threshold/v1` | ✅ | +| StoryAgent | `template/v2.0` | ✅ | +| EmbeddingAgent | `embeddinggemma-300m/v1` | ✅ | + +## Schema 隔離原則 + +`dev` 與 `public` 完全獨立,禁止交叉污染。 + +| 規則 | 說明 | +|------|------| +| 資料隔離 | dev 不讀取 public table,反之亦然 | +| 擴展獨立 | pgvector 兩邊各自安裝 | +| Migration | 標明 target schema(`dev.table` vs `public.table`) | +| Sequence | 各自獨立,不共用 | +| Index | 各自維護 | +| Qdrant | `momentry_dev_*` vs `momentry_*` | +| EmbeddingGemma | embedding server 共用(port 11436,不分 dev/prod) | + +## Version History + +| Version | Date | Purpose | Author | +|---------|------|---------|--------| +| V1.0 | 2026-05-05 | Initial design | OpenCode | +| V1.1 | 2026-05-05 | 3-collection Qdrant + metadata confidence + version tracking | OpenCode | +| V1.2 | 2026-05-07 | EmbeddingGemma 300M 取代 nomic-embed-text-v2-moe(768D, Python MPS, port 11436) | OpenCode | +| V2.0 | 2026-05-07 | ⚠️ 標記為 deprecated — Story template pipeline 已由 5W1H+ Agent 取代 | OpenCode | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/MOMENTRY_CORE_API_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/MOMENTRY_CORE_API_V1.0.0.md new file mode 100644 index 0000000..44869d6 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/MOMENTRY_CORE_API_V1.0.0.md @@ -0,0 +1,241 @@ +--- +document_type: "reference_doc" +service: "MOMENTRY_CORE" +title: "Momentry Core V1.0.0 API 參考文件" +date: "2026-04-30" +version: "V1.0" +status: "superseded" +owner: "Warren" +created_by: "OpenCode" +tags: + - "api" + - "reference" + - "v1.0.0" + - "marcom" + - "restful" + - "endpoint" + - "file-centric" +ai_query_hints: + - "Momentry Core V1.0.0 API 參考文件的主要內容是什麼?" + - "查詢 V1.0.0 API 列表包含哪些端點?" + - "Marcom 團隊如何使用 API Reference?" + - "API 的 Progressive Workflow 範例" + - "Momentry API 的檔案管理與搜尋功能" + - "API 的 Progressive Workflow 操作步驟" + - "API 的檔案管理與搜尋功能" +related_documents: + - "STANDARDS/DOCS_STANDARD.md" + - "DEV_API_V1.0/API_REFERENCE_v1.0.0.md" + - "API_DICTIONARY_V1.0.0.md" + - "API_USAGE_DEMO_V1.0.0.md" + - "PRODUCTION_VERIFICATION_V1.0.0.md" +--- + +# Momentry Core V1.0.0 API 參考文件 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-04-30 | +| 文件版本 | V1.0 | + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-04-30 | 創建 V1.0.0 API 列表,移除過時端點 | OpenCode | OpenCode | +| V1.1 | 2026-05-06 | 被 DEV_API_REFERENCE_v1.0.0.md 取代(實際路由與此文件有大量差異) | OpenCode | OpenCode | + +--- + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| file_uuid | 媒體檔案(影片/圖片/音訊)的唯一 32 碼 SHA256 識別碼 | +| identity_uuid | 全域人物身份識別碼,跨檔案關聯同一人物 | +| Chunk | 可搜尋單位,由 Rule 組合 pre_chunks 產出 | +| Snapshot | 臉部或場景的快取快照,需 migrate 後供 UI 使用 | +| API Key | 認證方式,透過 Header `X-API-Key` 傳遞 | + +## 概述 + +本文檔定義 Momentry Core **V1.0.0** 版本供 **Marcom 團隊** 使用的 API 列表與開發範例。此列表已移除舊版、冗餘及內部使用的端點,確保前端開發使用的是標準且穩定的介面。 + +--- + +## 🚀 設計原則 (Design Principles) + +### 1. Clear API (介面清晰化) +* **去蕪存菁**: 嚴格區分 **Public** (公開) 與 **Internal** (內部) 端點。舊版冗餘路徑(如 `/api/v1/videos`, `/api/v1/probe`)已全面移除或合併。 +* **標準化回應**: 所有列表型 API 均回傳統一結構 `{ "success": true, "data": [...], "total": N }`。 +* **命名規範**: 採用 RESTful 風格,資源以複數名詞或明確動作命名(如 `files`, `identities`)。 + +### 2. File-Centric (以檔案為核心) +* **唯一識別**: 每個媒體檔案(影片/圖片/音訊)均由 **32 碼 UUID** (`file_uuid`) 唯一標識。 +* **生命週期**: `File` 是所有資料的根節點。所有的 `Chunk` (片段), `Snapshot` (快照), `Jobs` (任務) 皆隸屬於特定的 `File`。 +* **操作模式**: 前端應優先呼叫 `GET /api/v1/files` 取得清單,再透過 `POST /api/v1/files/:uuid/snapshots/migrate` 載入詳細資源。 + +### 3. Global Identity (全域身份識別) +* **跨檔案關聯**: `Identity` 代表一個獨立的人物或角色,不受單一檔案限制。 +* **綁定機制 (Binding)**: 透過 `POST /api/v1/identities/bind`,我們可以將多個檔案中偵測到的臉部 (`face`) 或聲音 (`speaker`) 聚合到同一個 `Identity` 下。 +* **資料聚合**: 查詢某個 `Identity` 即可看到該人物在所有歷史檔案中的軌跡 (`/api/v1/identities/:uuid/files`)。 + +--- + +## 當前狀態 + +| 項目 | 狀態 | +|------|------| +| API 版本 | V1.0.0 | +| 開發環境 Port | 3003 | +| 正式環境 Port | 3002 | +| 認證方式 | Header `X-API-Key` | + +--- + +## 1. API Dictionary (端點清單) + +### 1.1 系統與認證 (System & Auth) +| Method | Endpoint | 說明 | +| :--- | :--- | :--- | +| `GET` | `/health` | 基本健康檢查 | +| `POST` | `/api/v1/auth/login` | 登入以取得 API Key | + +### 1.2 檔案管理 (File Management) +*主要入口:瀏覽與管理資產* +| Method | Endpoint | 說明 | +| :--- | :--- | :--- | +| `GET` | `/api/v1/files` | **列出所有檔案** (支援分頁) | +| `GET` | `/api/v1/files/:uuid` | 取得檔案詳情 (包含 probe_json, metadata) | +| `POST` | `/api/v1/files/register` | 從磁碟註冊新檔案 | +| `DELETE`| `/api/v1/videos/:uuid` | **刪除影片** 及其關聯資料 | + +### 1.3 搜尋與檢索 (Search & Retrieval) +| Method | Endpoint | 說明 | +| :--- | :--- | :--- | +| `POST` | `/api/v1/search` | **語意搜尋** (Text-based, 使用 Embedding) | +| `POST` | `/api/v1/search/hybrid` | 混合搜尋 (Vector + BM25 關鍵字) | +| `POST` | `/api/v1/search/visual` | 視覺搜尋 (尋找物件/形狀) | +| `POST` | `/api/v1/search/visual/class`| 依物件類別過濾 (如 "person", "car") | + +### 1.4 身份與人物管理 (Identity Management) +*跨影片的人物/角色關聯* +| Method | Endpoint | 說明 | +| :--- | :--- | :--- | +| `GET` | `/api/v1/identities` | **列出所有身份** (人物/角色) | +| `GET` | `/api/v1/identities/:uuid` | 取得身份詳情 (名稱, 品質, 來源) | +| `GET` | `/api/v1/identities/:uuid/files`| 列出該身份出現的所有檔案 | +| `GET` | `/api/v1/identities/:uuid/chunks`| 列出特定的時間軸片段 (Chunks) | +| `POST` | `/api/v1/identities/bind` | 將臉部/聲音訊號綁定至身份 | + +### 1.5 臉部與快照 (Face & Snapshots) +| Method | Endpoint | 說明 | +| :--- | :--- | :--- | +| `GET` | `/api/v1/face/list` | 列出特定影片中偵測到的所有臉部 | +| `POST` | `/api/v1/face/recognize` | 對指定影片觸發臉部辨識流程 | +| `GET` | `/api/v1/files/:uuid/snapshots` | 檢查快照快取狀態 (Hot/Cold) | +| `POST` | `/api/v1/files/:uuid/snapshots/migrate`| **載入快照至記憶體** (UI 顯示快圖前需呼叫) | + +### 1.6 任務與代理人 (Jobs & Agents) +| Method | Endpoint | 說明 | +| :--- | :--- | :--- | +| `GET` | `/api/v1/progress/:uuid` | 檢查即時處理進度 | +| `POST` | `/api/v1/assets/:uuid/process` | 觸發處理流程 (ASR, YOLO, 等) | +| `POST` | `/api/v1/agents/identity/analyze` | AI Agent: 分析身份重複情況 | + +--- + +## 2. Progressive Workflow Examples (操作範例) + +此章節展示典型的使用者操作情境:**尋找影片 → 處理 → 搜尋 → 人物綁定**。 + +### Phase 1: 瀏覽與檢視 +*使用者瀏覽檔案庫以尋找目標影片。* + +**Step 1: 登入** +```bash +curl -s -X POST http://localhost:3003/api/v1/auth/login \ + -H "Content-Type: application/json" \ + -d '{"username": "demo", "password": "demo"}' +# 回應範例: { "api_key": "muser_test_001..." } +``` + +**Step 2: 列出檔案** +```bash +curl -s "http://localhost:3003/api/v1/files?page=1&page_size=5" \ + -H "X-API-Key: muser_test_001" +# 回應範例: { "success": true, "data": [ { "file_uuid": "...", "file_name": "Demo.mp4" ... } ] } +``` + +### Phase 2: 處理與監控 +*使用者決定分析該影片的臉部與語音內容。* + +**Step 3: 觸發處理** +```bash +curl -s -X POST "http://localhost:3003/api/v1/assets/{file_uuid}/process" \ + -H "X-API-Key: muser_test_001" \ + -H "Content-Type: application/json" \ + -d '{}' +# 啟動 ASR, 臉部偵測等處理器 +``` + +**Step 4: 檢查進度** +```bash +curl -s "http://localhost:3003/api/v1/progress/{file_uuid}" \ + -H "X-API-Key: muser_test_001" +# 回應範例: { "overall_progress": 50, "processors": [...] } +``` + +### Phase 3: 搜尋內容 +*使用者搜尋影片中的特定內容。* + +**Step 5: 語意搜尋 (文字描述)** +```bash +curl -s -X POST "http://localhost:3003/api/v1/search" \ + -H "X-API-Key: muser_test_001" \ + -H "Content-Type: application/json" \ + -d '{"query": "一個人拿著紅色的信封", "uuid": "{file_uuid}"}' +# 回應範例: 符合文字描述的片段列表 +``` + +### Phase 4: 身份管理 (GUI 開發重點) +*使用者發現了一張臉,確認該人物,並將其綁定到已知身份。* + +**Step 6: 載入快照 (Migrate Snapshots)** +*在 GUI 渲染大量臉部縮圖前,必須先將快取載入記憶體以加速讀取。* +```bash +curl -s -X POST "http://localhost:3003/api/v1/files/{file_uuid}/snapshots/migrate" \ + -H "X-API-Key: muser_test_001" \ + -H "Content-Type: application/json" \ + -d '{"parent_uuid": "{file_uuid}"}' +# 回應範例: { "success": true, "migrated_types": ["faces", ...] } +``` + +**Step 7: 綁定臉部到身份 (Bind Face)** +*假設偵測到臉部 `face_123`,欲綁定至身份 `uuid_identity`。* +```bash +curl -s -X POST "http://localhost:3003/api/v1/identities/bind" \ + -H "X-API-Key: muser_test_001" \ + -H "Content-Type: application/json" \ + -d '{ + "identity_id": null, + "name": "Cary Grant", + "binding_type": "face", + "binding_value": "face_123" + }' +``` + +--- + +## 3. 棄用聲明 (Deprecation Notices) + +以下端點已在 V1.0.0 移除或棄用,**請勿**在新的開發中使用。 + +* `GET /api/v1/videos` (列表) → 已取代為 `GET /api/v1/files` +* `POST /api/v1/register` → 已取代為 `POST /api/v1/files/register` +* `POST /api/v1/probe` → 已取代為 `GET /api/v1/files/:uuid` +* `GET /api/v1/people/...` → 已合併為 `GET /api/v1/identities/...` +* `/api/v1/n8n/search/...` → 僅供內部 n8n 工作流使用 (請使用標準 `/api/v1/search`) diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASRX_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASRX_V1.0.0.md new file mode 100644 index 0000000..c46c5bf --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASRX_V1.0.0.md @@ -0,0 +1,102 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "ASRX Processor V1.0.0" +date: "2026-05-02" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +parent: "PROCESSOR_SELECTION_V1.0.0.md" +tags: + - "momentry" + - "core" + - "processor" + - "asrx" + - "speaker-diarization" + - "speechbrain" + - "v1.0.0" +ai_query_hints: + - "ASRX 使用 SpeechBrain ECAPA-TDNN 進行說話者日誌化" + - "ASRX 從 Pyannote 遷移至自定義 SpeechBrain,快 6 倍" + - "ASRX 不需要 HuggingFace token(相較 Pyannote)" + - "ASRX Charade 6879s 長片輸出 1118 segments, 8 說話人" + - "ASRX 依賴 ASR processor 的轉錄結果" +related_documents: + - "PROCESSOR_SELECTION_V1.0.0.md" + - "../ASR_V1.0.0.md" + - "../CUT_V1.0.0.md" + - "../VOICE_EMBEDDING_FLOW_V1.0.0.md" + - "../VECTOR_SPEC_V1.0.0.md" +--- + +# ASRX Processor V1.0.0 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-02 | +| 文件版本 | V1.0 | + +**狀態**: ⚠️ 80% | **模型**: SpeechBrain ECAPA-TDNN | **GPU**: 否 + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| ASRX | 進階語音處理,包含說話者日誌化(Speaker Diarization) | +| Speaker Diarization | 說話者日誌化,區分「誰在什麼時候說話」 | +| ECAPA-TDNN | SpeechBrain 提供的說話人辨識模型,產出 192-D embedding | +| VAD | Voice Activity Detection,語音活動檢測(使用 Silero) | +| Spectral Clustering | 頻譜聚類,將 embedding 分群以區分不同說話人 | + +--- + +## 選型過程 + +| 指標 | Pyannote-based(原始) | Custom SpeechBrain(新) | +|------|----------------------|------------------------| +| Pipeline | VAD → Whisper → Align → Diarize | VAD (Silero) → ECAPA-TDNN → Spectral Clustering | +| 處理時間 | 4.79s(輸出為空) | **1.66s** (96.25x) | +| 比 Pyannote 快 | 基準 | **6x 更快** | +| HuggingFace token | ✅ **需要** | ❌ **不需要** | +| 重疊語音 | ✅ 支援 | ❌ 不支援 | + +**決策**: 因 pyannote.audio 需要 HuggingFace token、import 錯誤頻繁、輸出為空,已改為自定義 SpeechBrain 實作。 + +--- + +## 處理時間分解(Custom SpeechBrain) + +| 步驟 | 時間 | 佔比 | +|------|------|------| +| VAD (Silero) | 0.41s | 24.7% | +| Speaker embedding (ECAPA-TDNN) | 1.15s | 69.3% | +| Spectral clustering | 0.10s | 6.0% | + +--- + +## Charade 長片(6879s) + +| 指標 | 值 | +|------|-----| +| Segments | 1118 | +| 說話人數 | 8 | +| 匹配率 | 99.82% | + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat | + +## 資源預估 + +| 資源 | 值 | +|------|-----| +| CPU | 0.8 | +| 記憶體 | 2048 MB | +| GPU | 不使用 | +| 依賴 | ASR | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASR_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASR_V1.0.0.md new file mode 100644 index 0000000..34d88e6 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/ASR_V1.0.0.md @@ -0,0 +1,243 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "ASR Processor V1.0.0" +date: "2026-05-02" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +parent: "PROCESSOR_SELECTION_V1.0.0.md" +tags: + - "momentry" + - "core" + - "processor" + - "asr" + - "whisper" + - "speech-recognition" + - "v1.0.0" +ai_query_hints: + - "ASR 使用 faster-whisper/small 模型及 INT8 CPU 量化" + - "ASR 以 CUT 場景邊界為基礎分段處理長片" + - "ASR 每個 segment 記錄 scene_number 對應 CUT 場景序號" + - "ASR 處理 159.6s 影片約 12.68s,即時倍率 12.6x" + - "ASR 依賴 CUT processor 的場景邊界輸出" +related_documents: + - "PROCESSOR_SELECTION_V1.0.0.md" + - "../CUT_V1.0.0.md" + - "../ASRX_V1.0.0.md" + - "../STORY_V1.0.0.md" + - "../CHUNK_DEFINITION_V1.0.0.md" +--- + +# ASR Processor V1.0.0 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-02 | +| 文件版本 | V1.0 | + +**狀態**: ✅ 100% | **模型**: faster-whisper/small | **GPU**: 否 + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| ASR | Automatic Speech Recognition,自動語音辨識 | +| faster-whisper | 基於 OpenAI Whisper 的優化版本,支援 INT8 CPU 量化 | +| segment | Whisper 輸出的語音片段,包含 start/end/time/text | +| scene_number | CUT 場景序號(1-based),標示 segment 所屬場景 | +| real-time factor | 即時倍率,處理時間與影片時長的比值 | + +--- + +## 選型過程 + +| 模型 | 參數 | 大小 | English WER | Chinese CER | 速度 | +|------|------|------|-------------|-------------|------| +| tiny | 39M | ~40MB | 9.5% | 15.0% | ~1x RT | +| base | 74M | ~75MB | 7.3% | 11.2% | ~1.5x RT | +| **small** | **244M** | **~250MB** | **5.5%** | **8.4%** | **~2x RT** | +| medium | 769M | ~800MB | 4.3% | 6.4% | ~3x RT | +| large-v3 | 1.5B | ~1.5GB | 3.5% | 4.9% | ~5x RT | + +**決策**: small 在準確率與速度間取得最佳平衡,經實驗驗證最少要使用 small 才能較好處理多語種及台灣腔國語。 + +--- + +## 效能實測(ExaSAN 159.6s 影片) + +| 指標 | 值 | +|------|-----| +| 處理時間 | 12.68s | +| 即時倍率 | 12.6x | +| 輸出 | 78~79 segments, ~15KB | + +--- + +## 長片分段處理 + +對於長片(如 Charade 6879s),ASR 以 CUT processor 產出的場景邊界為基礎分段處理: + +1. CUT 先產出 `{file_uuid}.cut.json`(含 `scenes[]`,每個有 `start_time`/`end_time`) +2. ASR 讀取 CUT JSON,依 `scene_number` 順序對每個場景萃取音訊 +3. 每個場景分別用 Whisper 轉錄 +4. 合併結果,每個 segment 記錄所屬的 `scene_number` + +每個 segment 的 JSON 格式: +```json +{ + "start": 12.5, + "end": 15.3, + "text": "Hello world", + "scene_number": 42 +} +``` + +`scene_number` 是在該 `file_uuid` 下的 CUT 場景序號(1-based)。 + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat | + +--- + +## 資源預估 + +| 資源 | 值 | +|------|-----| +| CPU | 1.0(一個完整核心) | +| 記憶體 | 2048 MB(長片因分段處理,實際低於此值) | +| GPU | 不使用(INT8 CPU 量化) | +| 依賴 | 無 | + +--- + +## Swift ASR (Apple Speech Framework) 實驗記錄 + +### 選型結論 + +使用現有做法(faster-whisper small),Swift ASR 不取代 Whisper。 + +> **注意**:Apple Speech Framework 會隨著 macOS / Siri 版本更新而改善。每次主要 macOS 版本更新時(如 macOS 15→16),應重新執行 `scripts/compare_segmentation.py` 對比 Swift vs Whisper 的品質差異,以評估是否可切換。 + +### POC 狀態 + +Swift processor 位於 `scripts/swift_processors/`,已編譯。Apple Speech Framework 在記憶體(11MB vs 1.1GB)和速度(4.19s vs 17.46s)有優勢,但準確度不足。 + +### 效能對比(Charade 60s 片段) + +| 指標 | Swift (Speech Framework) | Python (faster-whisper small) | +|------|------------------------|-------------------------------| +| **RTF** | 0.07 (14x) | 0.29 (3.4x) | +| **記憶體** | 11MB | 1.1GB | +| **Segments** | 18(句子級) | 23(句子級) | +| **品質** | 漏字较多("Let's see"→"And see") | 準確 | +| **語音分離改善** | Demucs +35s,僅小幅改善 | 不需要 | + +### 已知問題 + +1. 語言自動偵測順序錯誤(先試 zh-TW),需指定 `--language en-US` +2. RunLoop timeout 已修復(改為 semaphore 等待 callback) +3. 逐字輸出已合併(94 → 18 segments) + +### 相關檔案 + +``` +scripts/swift_processors/ +├── Package.swift +├── asr_swift.swift +├── asrx_swift.swift +├── entitlements.plist +└── .build/debug/asr_swift +``` + +--- + +## Speaker Diarization (ASRX) 選型記錄 + +### 現有方案:Python ASRX (ECAPA-TDNN + Spectral Clustering) + +使用 SpeechBrain ECAPA-TDNN 提取 192-D speaker embedding,搭配 spectral clustering 進行語者分離。 + +| 指標 | 值 | +|------|-----| +| Embedding 維度 | 192-D | +| Charade 偵測 speaker 數 | 10(正確區分 narrator、主角、配角) | +| 總 ASRX pre_chunks | 5,848 | +| Qdrant collection | `{prefix}_voice` | +| 依賴 | 需 ASR 完成後執行(時間對齊) | +| 輸出 | segments 含 `speaker_id`, `start_time`, `end_time` | + +### Swift SFSpeechAnalyzer 評估 + +**目標**:使用 Apple 內建 Speech Framework(ANE 加速)取代 Python ASRX。 + +| API | macOS 14 可用性 | 說明 | +|-----|----------------|------| +| `SFSpeechRecognizer` | ✅ | 語音辨識 | +| `SFSpeechAnalyzer` | ✅ 存在 | 語音分析,但無暴露 speaker embedding | +| `SFSpeechRecognitionMetadata` | ✅ 存在 | 辨識中繼資料,但 speaker 資訊為空 | +| `SFSpeakerEmbedding` | ❌ | Speaker embedding API 不存在 | +| `SFSpeakerIdentification` | ❌ | Speaker 識別 API 不存在 | +| KVC 取 speaker metadata | ❌ | 透過 KVC 也無法取得 speaker 資訊 | + +**結論:目前不可行。** Apple 尚未在 macOS 14 上開放 Speaker Recognition API 給開發者使用。 + +### 選型結論 + +維持 Python ASRX (ECAPA-TDNN) 方案。待未來 macOS 版本開放 Speaker Recognition API 後重新評估。 + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat | +| V1.1 | 2026-05-04 | 新增 Swift ASR 實驗記錄與 Speaker Diarization 選型記錄 | OpenCode | deepseek-chat | +| V1.2 | 2026-05-04 | 新增 Text Embedding ANE 加速可行性研究 | OpenCode | deepseek-chat | + +--- + +## Text Embedding ANE 加速研究 + +### 背景 + +ASR 產出的 sentence chunk 需要 embedding(用於 semantic search / RAG)。 +目前使用 Ollama `nomic-embed-text-v2-moe`(768-D, 多語言,MIT license,CPU/GPU)。 + +### 研究目標 + +評估是否可用 Apple ANE 方案取代 Ollama embedding,降低 CPU 負載。 + +### 選項評估 + +| 方案 | 模型 | Dimension | 多語言 | ANE | 狀態 | +|------|------|-----------|--------|-----|------| +| **Apple NLEmbedding (sentence)** | 系統內建 | 未知 | ✅ 宣稱支援 | ✅ 原生 ANE | ❌ macOS 26.4.1 無模型檔 | +| **Apple NLEmbedding (word)** | GloVe | 300D | ❌ 僅英文 | ✅ | ❌ dim 不足,無多語言 | +| **Apple NLContextualEmbedding** | Transformer | 未知 | 未知 | ✅ | ❌ API 不可用 | +| **CoreML custom (MiniLM)** | BERT-based | 384D | ✅ 50+ languages | ✅ | ❌ torch.jit.trace 失敗 | +| **Ollama nomic-embed-text** | nomic-ai | 768D | ✅ 多語言 | ❌ | ✅ 現行方案 | + +### 測試結論 (2026-05-04) + +1. **NLEmbedding default**: dim=0, 所有 vector 回傳 nil。macOS 26.4.1 未預裝 sentence embedding 模型。 +2. **NLEmbedding word (GloVe)**: dim=300, 僅英文。法文/中文 dim=0(不支援)。 +3. **NLContextualEmbedding**: API compile error,方法不存在於公開 header。 +4. **CoreML 自轉 MiniLM**: `torch.jit.trace` 對 BERT 架構拋出 `Placeholder storage not allocated on MPS` 及 `dictconstruct` op 未支援。 +5. **Ollama nomic-embed**: 效能 ~6M embeddings/sec,768D 多語言,已整合穩定。 + +### 建議 + +維持 Ollama `nomic-embed-text-v2-moe`。 +ANE text embedding 待以下條件成熟後重新評估: +- Apple 開放 NLEmbedding 多語言 sentence 模型下載 +- 或 coremltools 支援 BERT `dictconstruct` op +- 或 Apple 發布預訓練 CoreML 多語言 embedding 模型 diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CAPTION_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CAPTION_V1.0.0.md new file mode 100644 index 0000000..d2f51a1 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CAPTION_V1.0.0.md @@ -0,0 +1,80 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Caption Processor V1.0.0" +date: "2026-05-02" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +parent: "PROCESSOR_SELECTION_V1.0.0.md" +tags: + - "momentry" + - "core" + - "processor" + - "caption" + - "moondream2" + - "image-captioning" + - "v1.0.0" +ai_query_hints: + - "Caption 使用 Moondream2 進行本地圖像描述生成" + - "Caption 已從 GPT-4o 雲端 API 本地化為 Moondream2" + - "Caption Moondream2 模型約 1.8GB,完全本地執行" + - "Caption 處理速度約 5s/frame" + - "Caption 備援方案為 YOLO + OCR + Scene 串接" +related_documents: + - "PROCESSOR_SELECTION_V1.0.0.md" + - "../SCENE_V1.0.0.md" + - "../STORY_V1.0.0.md" + - "../YOLO_V1.0.0.md" + - "../OCR_V1.0.0.md" +--- + +# Caption Processor V1.0.0 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-02 | +| 文件版本 | V1.0 | + +**狀態**: ✅ 100% | **模型**: Moondream2 | **GPU**: 否 + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| Caption | 圖像描述生成,為每個場景產出文字敘述 | +| Moondream2 | HuggingFace transformers 提供的本地圖像描述模型 | +| GPT-4o | (已移除)先前使用的雲端 API 方案 | +| local deployment | 完全本地執行,不依賴任何雲端 API | +| fallback | 備援方案:YOLO + OCR + Scene 結果串接 | + +--- + +## 選型過程 + +| 指標 | GPT-4o(已移除) | Moondream2(新) | +|------|-----------------|-----------------| +| 速度 | 2s/frame | 5s/frame | +| 品質 | 高 | 良好 | +| 依賴 | ✅ 雲端 API Key | ❌ 完全本地 | + +**決策**: 已從 GPT-4o 雲端 API 本地化為 Moondream2(HuggingFace transformers, ~1.8GB)。備援方案為 YOLO + OCR + Scene 結果串接。 + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat | + +## 資源預估 + +| 資源 | 值 | +|------|-----| +| CPU | - | +| 記憶體 | ~1.8 GB(模型載入後) | +| GPU | 不使用 | +| 依賴 | Scene | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CUT_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CUT_V1.0.0.md new file mode 100644 index 0000000..5e1cc2e --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/CUT_V1.0.0.md @@ -0,0 +1,179 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "CUT Processor (Scene Cut Detection) V1.0.0" +date: "2026-05-03" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +parent: "PROCESSOR_SELECTION_V1.0.0.md" +tags: + - "momentry" + - "core" + - "processor" + - "cut" + - "scene-detection" + - "pyscenedetect" + - "v1.0.0" +ai_query_hints: + - "CUT 場景檢測的輸出結構與檔案後綴規則" + - "CUT 的 cut_count 與 cut_max_duration 用途" + - "長影片動態調度如何將 Face 移到 ASR 前" + - "CUT 與 Scene 的執行階段(register 同步)" + - "CUT 輸出 JSON 結構(start_time/end_time)" +related_documents: + - "PROCESSORS/SCENE_V1.0.0.md" + - "PROCESSOR_SELECTION_V1.0.0.md" + - "PROCESSORS/ASR_V1.0.0.md" + - "PROCESSORS/FACE_V1.0.0.md" + - "CHUNK_DEFINITION_V1.0.0.md" +--- + +# CUT Processor (Scene Cut Detection) V1.0.0 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-03 | +| 文件版本 | V1.0 | + +**狀態**: ✅ 100% | **模型**: PySceneDetect (ContentDetector) | **GPU**: 否 + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| CUT | 場景切換檢測,使用 PySceneDetect ContentDetector | +| scene boundary | 場景邊界,以 start_time/end_time 定義 | +| cut_count | 場景數量,register 階段寫入 DB | +| cut_max_duration | 最長場景秒數,用於長影片動態調度 | +| ContentDetector | 基於幀差異的場景切換檢測演算法 | + +--- + +## 選型過程 + +無 ML 模型,基於幀差異的場景切換檢測。門檻值 threshold=27 為實驗最佳值。 + +--- + +## 輸出結構 + +CUT 產出 `{file_uuid}.cut.json`,結構如下: + +```json +{ + "scenes": [ + { "start_time": 0.0, "end_time": 120.5 }, + { "start_time": 120.5, "end_time": 245.0 } + ] +} +``` + +--- + +## 執行階段 + +CUT 在 **register 階段同步執行**(`register_single_file`),不做 worker pipeline 排程。完成後寫入 DB 欄位: +- `cut_done: bool` — 是否完成 +- `cut_count: i32` — 場景數量 +- `cut_max_duration: f64` — 最長場景秒數 + +--- + +## 狀態後綴 + +| 後綴 | 意義 | 行為 | +|------|------|------| +| `.cut.json` | 完成 | 直接載入使用 | +| `.cut.json.tmp` | 執行中 | 跳過、等待 | +| `.cut.json.err` | 失敗 | 跳過、不重試 | + +--- + +## 長影片動態調度 + +當 `cut_count ≤ 3 && cut_max_duration > 600s`(如會議紀錄長鏡頭),Worker 自動調整 pipeline 順序: +- **Face 移到 ASR 前面**,先用 face detection 找出人物進出點 +- 後續可用 face 分佈切分長 scene,輔助 ASR 分段 + +--- + +## 效能實測 + +**ExaSAN 159.6s 影片**: +| 指標 | 值 | +|------|-----| +| 處理時間 | 0.08s | +| 即時倍率 | 2036.5x(最快的 processor) | +| 輸出 | 52 bytes | + +**Charade 長片(6879s, 412343 幀)**: +| 指標 | 值 | +|------|-----| +| 場景數 | 1331 | +| 輸出 | 217 KB | + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-03 | 初始版本 | OpenCode | deepseek-chat | + +--- + +## 資源預估 + +| 資源 | 值 | +|------|-----| +| CPU | 0.5 | +| 記憶體 | 512 MB | +| GPU | 不使用 | + +--- + +## Swift AVFoundation 替代評估 + +### POC 目標 + +使用 AVFoundation 逐幀 histogram 分析取代 Python PySceneDetect(ContentDetector),目標利用 ANE 加速。 + +### 測試結果(Charade 60s clip, 3597 frames, 59.9fps) + +| 指標 | Python PySceneDetect | Swift AVFoundation (luminance histogram) | +|------|---------------------|------------------------------------------| +| **Scenes 偵測** | **3** ✅ 合理 | **63** ❌ 過度敏感 | +| **處理時間** | **7.93s** | 15.42s | +| **RTF** | **0.132** (7.6x) | 0.257 (3.9x) | +| **記憶體** | ~512MB | 極低(系統框架) | +| **演算法** | ContentDetector(adaptive threshold + frame normalization) | 單純 histogram diff(64 bins luminance) | + +### 問題分析 + +1. **準確度** — 63 vs 3 scenes。簡單的 luminance histogram diff 對 camera movement、lighting change 過度敏感。PySceneDetect 的 ContentDetector 使用 adaptive threshold + 幀正規化,穩定性高很多。 +2. **速度** — 15.42s vs 7.93s。AVAssetReader 必須 sequential decode 所有 frames,無法像 ffmpeg 那樣 efficient frame skipping。 + +### 選型結論 + +| 項目 | 方案 | +|------|------| +| **Scene Cut Detection** | Python PySceneDetect **維持現狀** | + +### 相關檔案 + +``` +scripts/swift_processors/swift_cut_test.swift +``` + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-03 | 初始版本 | OpenCode | deepseek-chat | +| V1.1 | 2026-05-04 | 新增 Swift AVFoundation 替代評估記錄 | OpenCode | deepseek-chat | +| 依賴 | 無 | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md new file mode 100644 index 0000000..dfe1374 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_EMBEDDING_FLOW_V1.0.0.md @@ -0,0 +1,159 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Face Embedding 產出流程 V2.0.0" +date: "2026-05-04" +version: "V2.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +tags: + - "momentry" + - "core" + - "face" + - "embedding" + - "pgvector" + - "qdrant" + - "v2.0.0" +ai_query_hints: + - "Face Embedding 的完整處理流程(Vision detection → CoreML FaceNet → pgvector + Qdrant)" + - "V2.0 使用 Apple Vision Framework 取代 InsightFace detection" + - "V2.0 使用 CoreML FaceNet (MIT) 產出 512-D embedding" + - "Face processor 的輸出結構與 embedding 欄位說明" + - "Qdrant face collection 的 payload 結構與點位 ID 規則" + - "Face embedding 使用 Cosine 距離計算" + - "Face detection 使用 ANE(Apple Vision Framework),embedding 使用 ANE(CoreML FaceNet)" + - "face_detections 表與 Qdrant 的資料同步方式" +related_documents: + - "../VECTOR_SPEC_V1.0.0.md" + - "../PROCESSORS/FACE_V1.0.0.md" + - "../PROCESSOR_SELECTION_V1.0.0.md" + - "../CHUNK_DEFINITION_V1.0.0.md" + - "../MOMENTRY_CORE_API_V1.0.0.md" +--- + +# Face Embedding 產出流程 V2.0.0 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-04 | +| 文件版本 | V2.0 | + +## V2.0 變更摘要 + +| 項目 | V1.x | V2.0 | +|------|------|------| +| **Detection** | InsightFace SCRFD-10G (CPU, 450%) | **Apple Vision VNDetectFaceRectangles** (ANE, ~0%) | +| **Pose** | InsightFace 2D landmarks → angle | **Apple Vision VNDetectFaceLandmarks** (roll/yaw/pitch) | +| **Embedding** | CoreML FaceNet 512-D (ANE) | 同左,MIT license | +| **CPU usage** | 450%+ | **~0%** | +| **Script** | `face_processor.py` | **`face_processor_vision.py` + `swift_face`** | + +## 處理流程 + +``` +1. swift_face (Vision/ANE) + ├── AVAssetReader 逐幀讀取 + ├── VNDetectFaceRectanglesRequest → bbox (x, y, w, h) + confidence + ├── VNDetectFaceLandmarksRequest → roll, yaw, pitch + └── 輸出: {uuid}_detect.json + +2. face_processor_vision.py + ├── 讀取 detect.json + ├── cv2 逐幀 crop face by bbox + ├── CoreML FaceNet → 512-D embedding (ANE) + ├── classify_pose(roll, yaw) → frontal/three_quarter/profile + └── 輸出: {uuid}.face.json (FaceResult format) + +3. Rust pipeline (job_worker.rs) + ├── 讀取 face.json → FaceResult struct + ├── store_face_chunks() → pre_chunks table + └── store_face_embeddings_to_qdrant() → Qdrant + +4. Post-Face (job_worker.rs) + ├── store_traced_faces.py + │ ├── face_tracker.py (IoU + embedding) → trace_id + │ └── INSERT face_detections (trace_id + bbox + embedding pgvector) + ├── sync_face_embeddings() → Qdrant face points + └── cluster_face_embeddings() / search_similar_faces() → pgvector query +``` + +## 輸出結構 + +### face.json (FaceResult) + +```json +{ + "frame_count": 6872, + "fps": 59.94, + "frames": [ + { + "frame": 30, + "timestamp": 0.5, + "faces": [ + { + "x": 917, "y": 125, "width": 181, "height": 250, + "confidence": 0.88, + "embedding": [0.01, -0.04, 0.12, ...], // 512-D + "pose_angle": {"angle": "frontal", "roll": 2.5, "yaw": -5.0, "pitch": 1.2}, + "landmarks": null, + "attributes": null + } + ] + } + ] +} +``` + +### face_detections (PostgreSQL + pgvector) + +| 欄位 | 型別 | 說明 | +|------|------|------| +| `file_uuid` | VARCHAR | 來源影片 | +| `frame_number` | BIGINT | 幀編號 | +| `trace_id` | INTEGER | 跨幀追蹤 ID(face_tracker 分配) | +| `bbox` | JSONB | `{"x", "y", "width", "height"}` | +| `confidence` | DOUBLE | 檢測信心度 | +| `embedding` | VECTOR(512) | pgvector index (ivfflat, cosine) | +| `identity_id` | BIGINT | 綁定的 identity(可為 NULL) | + +### Qdrant Payload (momentry_dev/dev collection) + +```json +{ + "file_uuid": "1a04db97...", + "trace_id": 0, + "frame_number": 825, + "type": "face_embedding" +} +``` + +## Vector 規格 + +| 屬性 | 值 | +|------|-----| +| 模型 | CoreML FaceNet (InceptionResnetV1, VGGFace2) | +| License | MIT | +| 維度 | 512 | +| 距離 | Cosine | +| Index | pgvector ivfflat (lists=100) | +| Qdrant | Cosine distance, shared collection | + +## 來源 Processor 資源預估 + +| 資源 | V1.x (InsightFace) | V2.0 (Vision + FaceNet) | +|------|--------------------|-------------------------| +| Detection 模型 | IntegrationFace SCRFD-10G (~150MB) | Apple Vision (系統內建) | +| Embedding 模型 | CoreML FaceNet (90MB) | 同左 | +| CPU | 450%+ | **~0%** | +| 記憶體 | ~1.5GB | **<50MB** | +| ANE | 僅 embedding | **detection + embedding** | +| Total time (2hr film, interval=30) | ~1.3hr | **~40min** | + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 (InsightFace) | OpenCode | deepseek-chat | +| V2.0 | 2026-05-04 | Apple Vision detection + CoreML FaceNet embedding | OpenCode | deepseek-chat | diff --git a/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_V1.0.0.md b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_V1.0.0.md new file mode 100644 index 0000000..66e1581 --- /dev/null +++ b/docs_v1.0/API_V1.0.0/INTERNAL/PROCESSORS/FACE_V1.0.0.md @@ -0,0 +1,373 @@ +--- +document_type: "spec" +service: "MOMENTRY_CORE" +title: "Face Processor V1.0.0" +date: "2026-05-02" +version: "V1.0" +status: "active" +owner: "Warren" +created_by: "OpenCode" +parent: "PROCESSOR_SELECTION_V1.0.0.md" +tags: + - "momentry" + - "core" + - "processor" + - "face" + - "insightface" + - "face-detection" + - "v1.0.0" +ai_query_hints: + - "Face 使用 InsightFace buffalo_l 進行人臉偵測與辨識" + - "Face 在 ExaSAN 159.6s 影片上僅需 1.22s,即時倍率 130.5x" + - "Face 支援 GPU 加速,CoreML 可達 50~80 FPS" + - "Face 輸出 512-D embedding 用於比對" + - "Face 不再使用 Haar Cascade fallback,強制使用 InsightFace" +related_documents: + - "PROCESSOR_SELECTION_V1.0.0.md" + - "../FACE_EMBEDDING_FLOW_V1.0.0.md" + - "../CUT_V1.0.0.md" + - "../VECTOR_SPEC_V1.0.0.md" + - "../CHUNK_DEFINITION_V1.0.0.md" +--- + +# Face Processor V1.0.0 + +| 項目 | 內容 | +|------|------| +| 建立者 | OpenCode | +| 建立時間 | 2026-05-02 | +| 文件版本 | V1.0 | + +**狀態**: ✅ 100% | **模型**: InsightFace buffalo_l | **GPU**: 是 + +## 關鍵術語定義 + +| 術語 | 定義 | +|------|------| +| Face Detection | 人臉偵測,使用 InsightFace SCRFD-10G | +| Face Recognition | 人臉辨識,使用 ArcFace w600k_r50 產出 512-D embedding | +| embedding | 向量嵌入,用於人臉比對與搜尋 | +| CoreML | Apple Silicon 上的 GPU 加速方案 | +| LFW | Labeled Faces in the Wild,人臉辨識基準資料集 | + +--- + +## 選型過程 + +| 模型 | 類型 | 大小 | 檢測率 | 辨識率 | Embedding | +|------|------|------|--------|--------|-----------| +| **InsightFace Buffalo_l** | **完整套件** | **~150MB** | **97.3% mAP** | **99.77% (LFW)** | **512-D ✅** | +| MediaPipe BlazeFace | 輕量檢測 | 1~2MB | 95.2% mAP | 無 | ❌ | +| OpenCV Haar Cascade | 傳統 ML | 900KB | 70~85% | 無 | ❌ | + +**關鍵決策**: 舊版 Haar Cascade fallback 會產生全鏈路失敗(0 embeddings),已改為強制使用 InsightFace。 + +--- + +## 效能實測(ExaSAN 159.6s 影片) + +| 指標 | 值 | +|------|-----| +| 處理時間 | 1.22s | +| 即時倍率 | 130.5x | +| 輸出 | 49 frames, 67 faces | + +--- + +## GPU 加速 + +| 平台 | FPS | +|------|-----| +| CoreML (Apple Silicon) | 50~80 FPS | +| CUDA (NVIDIA) | 80~120 FPS | +| CPU | 15~20 FPS | + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat | + +--- + +## 資源預估 + +| 資源 | 值 | +|------|-----| +| CPU | 0.6 | +| 記憶體 | 1536 MB | +| GPU | 支援(`uses_gpu = true`) | +| 依賴 | 無 | + +--- + +## Apple Vision Framework 實驗記錄 + +### POC 目標 + +評估 Apple Vision Framework 是否可取代 InsightFace(buffalo_l)進行臉部處理,目標是利用 ANE 加速降低記憶體使用。 + +### 測試結果 + +測試環境:macOS 14, Apple Silicon M4, 使用 `VNDetectFaceRectanglesRequest` + `VNDetectFaceLandmarksRequest` + `VNDetectFaceCaptureQualityRequest`。 + +| 功能 | Vision Framework | InsightFace (buffalo_l) | +|------|----------------|------------------------| +| **Face Detection** | ✅ 通過(1 face, conf=0.88) | ✅ | +| **Face Landmarks** | ✅ 6+6 eye pts, 8 nose pts | ✅ 106 pts | +| **Capture Quality** | ✅ score=0.5327 | ❌ 無 | +| **Face Embedding (512-D)** | ❌ **不可用** | ✅ ArcFace 512-D | +| **照片 metadata(年齡/性別)** | ❌ 不可用 | ✅ | +| **ANE 加速** | ✅ 是 | ❌ CPU only | +| **處理時間** | ⚡ 0.31s | ~0.5-1s | +| **記憶體** | ✅ 低(系統框架) | ~1.5GB | + +### 關鍵發現 + +`VNFaceprint` class 存在但無法透過公開 API 或 KVC 取得 face embedding 資料。Vision Framework 提供了高品質的臉部偵測和特徵點定位,但**無法提取用於 face matching 的向量 embedding**。 + +### 選型結論 + +| 用途 | 方案 | +|------|------| +| **Face Detection** | Vision Framework **可取代** InsightFace(更輕量、更快) | +| **Face Landmarks** | Vision Framework **可取代** | +| **Face Embedding** | InsightFace **維持現狀**(Vision Framework 無法取代) | +| **Face Recognition** | InsightFace **維持現狀** | + +若未來 Apple 開放 `VNFaceprint` 的 embedding 資料,可重新評估全面切換。 + +### 相關檔案 + +``` +scripts/swift_processors/face_vision_test.swift +``` + +--- + +## MediaPipe Face 評估 + +### 測試狀態 + +MediaPipe 0.10.33 已安裝,提供 Face Detection (BlazeFace) + Face Landmarker (468 mesh)。 + +| 功能 | API | 狀態 | +|------|-----|------| +| Face Detection | `mediapipe.tasks.python.vision.face_detector` | ✅ 可用 | +| Face Mesh | `mediapipe.tasks.python.vision.face_landmarker` | ✅ 468 3D landmarks | +| Face Embedding | 無 | ❌ 不支援 | + +### 三方案比較 + +| 功能 | MediaPipe | Vision Framework | InsightFace | +|------|-----------|-----------------|-------------| +| **Face Detection** | ✅ BlazeFace (~2MB) | ✅ VNDetectFaceRectangles | ✅ RetinaFace | +| **Bounding Box** | ✅ | ✅ | ✅ | +| **Keypoints** | ✅ **6 點** (eyes+nose+mouth) | ❌ | ✅ 106 點 | +| **Face Mesh** | ✅ **468 點** (獨立模型) | ❌ | ❌ | +| **512-D Embedding** | ❌ | ❌ | ✅ **ArcFace** | +| **Age/Gender** | ❌ | ❌ | ✅ | +| **Capture Quality** | ❌ | ✅ score 0.06~0.25 | ❌ | +| **速度** | ⚡ 極快 (mobile optimized) | ⚡ ANE 加速 | 🐢 CPU bound | +| **模型大小** | ~2MB | 系統內建 | ~150MB | +| **跨平台** | ✅ Linux/Windows/macOS | ❌ Apple only | ✅ | + +### 選型結論 + +| 用途 | 建議方案 | +|------|---------| +| **Face Detection** | MediaPipe 或 Vision Framework(速度快、輕量) | +| **Face Mesh / 468 landmarks** | MediaPipe(唯一方案) | +| **Face Embedding (512-D)** | InsightFace **維持現狀** | +| **Age/Gender** | InsightFace **維持現狀** | + +MediaPipe 和 Vision Framework 在 detection 層級相當,兩者都遠快於 InsightFace。但最終 embedding extraction 仍需 InsightFace。 + +### 分段實施建議 + +若要以 Swift/Vision 加速 face pipeline: + +``` +Swift face_detector (ANE, fast) + └── 輸出 {file_uuid}.bbox.json (face_id, bbox, timestamp) + +Python embed_extractor (InsightFace, only on detected crops) + └── 讀取 .bbox.json → crop face region + → InsightFace 提取 512-D embedding + → 產出完整 {file_uuid}.face.json +``` + +--- + +## FaceNet-PyTorch CoreML Embedding 實驗 + +### 動機 + +InsightFace 的 buffalo_l pre-trained weights 使用 CC BY-NC-SA 4.0 license,商用有爭議。需要一個 MIT/Apache 2.0 licensed 的 face embedding 方案。 + +### 測試結果 + +使用 Facenet-PyTorch (`facenet-pytorch`, MIT license) 的 InceptionResnetV1 (pretrained on VGGFace2),匯出 ONNX 並轉換為 CoreML。 + +| 步驟 | 時間 | 產出 | +|------|------|------| +| 模型載入 | 10.5s | InceptionResnetV1, 512-D output | +| ONNX 匯出 | 1.2s | `/tmp/facenet512.onnx` (90MB) | +| CoreML 轉換 | 6s | `/tmp/facenet512.mlpackage` (90MB) | + +### 效能對比 + +| 指標 | PyTorch (CPU) | CoreML (CPU/GPU/ANE) | +|------|--------------|---------------------| +| **推論時間 (avg)** | 30.9ms | **4.8ms** ⚡ | +| **加速比** | 1x | **6.4x** | +| **Embedding 維度** | 512-D | 512-D | +| **Normalized** | ✅ norm=1.0 | ✅ norm=1.0 | +| **精度比對 (cosine)** | 1.0 | **0.999532** ✅ | + +### License 確認 + +| 元件 | License | 商用 | +|------|---------|------| +| Facenet-PyTorch 原始碼 | **MIT** | ✅ | +| VGGFace2 weights | 研究用,但可重新訓練 | ✅ (自有資料訓練後) | +| ONNX Runtime | MIT | ✅ | +| CoreML | macOS 內建 | ✅ | +| InsightFace buffalo_l (現行) | CC BY-NC-SA 4.0 | ❌ **有爭議** | + +### 結論 + +Facenet-PyTorch CoreML 模型可完全取代 InsightFace 的 embedding extraction,MIT license 無商用障礙,且 CoreML 推論快 6.4 倍。 + +### 整合入 Face Processor + +`scripts/face_processor.py` 已整合 CoreML FaceNet 作為 embedding extractor: + +| 項目 | 實作 | +|------|------| +| **Detection** | InsightFace buffalo_l(維持不變) | +| **Embedding** | CoreML FaceNet(`models/facenet512.mlpackage`)✅ 已取代 | +| **Fallback** | CoreML 失敗時自動回退到 InsightFace embedding | +| **啟動載入** | script 初始化時一次載入 CoreML model(~2s) | +| **推論流程** | 對每個 detected face crop → resize 160x160 → normalize → CoreML infer → 512-D embedding | +| **Metadata** | 輸出記錄 `embedding_method: coreml_facenet` | + +Model 檔案路徑:`models/facenet512.mlpackage`(專案根目錄) + +### 相關檔案 + +``` +models/facenet512.mlpackage # CoreML model (90MB, MIT license) +/tmp/facenet512.onnx # ONNX format (90MB, for reference) +scripts/face_processor.py # Face processor with CoreML integration +``` + +--- + +## 版本歷史 + +| 版本 | 日期 | 目的 | 操作人 | 工具/模型 | +|------|------|------|--------|-----------| +| V1.0 | 2026-05-02 | 初始版本 | OpenCode | deepseek-chat | +| V1.1 | 2026-05-04 | 新增 Apple Vision Framework + MediaPipe + FaceNet CoreML 整合記錄 | OpenCode | deepseek-chat | +| V2.0 | 2026-05-04 | Apple Vision 取代 InsightFace detection;CoreML FaceNet 維持 embedding | OpenCode | deepseek-chat | + +--- + +## V2.0 Architecture: Vision Detection + CoreML FaceNet Embedding + +### 架構變更 + +V1.x 使用 InsightFace 同時做 detection + embedding(CPU bound, 450%+ CPU)。 +V2.0 將 detection 移至 Apple Vision Framework(ANE),embedding 維持 CoreML FaceNet(ANE),CPU 歸零。 + +``` +V1.x: + face_processor.py + ├── InsightFace buffalo_l (CPU, 450%) → detection + bbox + landmarks + └── CoreML FaceNet (ANE) → 512-D embedding + +V2.0: + face_processor_vision.py + ├── swift_face (Vision/ANE) → VNDetectFaceRectanglesRequest → bbox + │ → VNDetectFaceLandmarksRequest → pose (roll, yaw, pitch) + └── CoreML FaceNet (ANE) → 512-D embedding on cropped face +``` + +### 處理流程 + +``` +1. swift_face