refactor: remove face embedding architecture - single Qdrant _faces collection

- Delete FaceEmbeddingDb module (face_embedding_db.rs)
- Stub match_faces_iterative, generate_seed_embeddings, tmdb_match_handler
- Remove sync_trace_embeddings, populate_face_embeddings_to_qdrant
- Remove embedding from face.json output (face_processor.py)
- Remove embedding from PG UPDATE (store_traced_faces.py)
- Remove workspace traces staging (checkin.rs, qdrant_workspace.rs)
- Fix tests: add pose_angle to Face, hand_nodes to TkgResult

Disabled functions (need reimplement with _faces):
- match_faces_iterative (identity agent)
- generate_seed_embeddings (TMDb seeds)
- tmdb_match_handler (TMDb matching)
- cluster_face_embeddings, search_similar_faces
- merge_traces_within_cuts
This commit is contained in:
Accusys
2026-06-24 22:27:09 +08:00
parent 360cb991e1
commit 074cdcdbed
60 changed files with 657 additions and 9454 deletions

View File

@@ -51,8 +51,8 @@ curl -s -X POST "$API/api/v1/file/$FILE_UUID/process" \
| `success` | boolean | Always true on 200 | | `success` | boolean | Always true on 200 |
| `job_id` | integer | Monitor job ID (for job tracking) | | `job_id` | integer | Monitor job ID (for job tracking) |
| `file_uuid` | string | 32-char hex UUID of the file | | `file_uuid` | string | 32-char hex UUID of the file |
| `status` | string | `"processing"` | | `status` | string | `"queued"` — file enters the FIFO queue |
| `pids` | integer[] | Process IDs of started processors | | `pids` | integer[] | Process IDs of started processors (empty for queued) |
| `message` | string | Human-readable status | | `message` | string | Human-readable status |
#### Error Responses #### Error Responses
@@ -237,6 +237,105 @@ curl -s "$API/api/v1/jobs" -H "X-API-Key: $KEY" | jq '{count, jobs: [.jobs[] | {
| `page` | integer | Current page number | | `page` | integer | Current page number |
| `page_size` | integer | Jobs per page | | `page_size` | integer | Jobs per page |
### `GET /api/v1/job/:uuid`
**Auth**: Required
**Scope**: file-level
Get detailed information about a specific processing job, including its queue position.
#### Response (200)
```json
{
"id": 51,
"uuid": "c36f35685177c981aa139b66bbbccc5b",
"status": "queued",
"current_processor": null,
"progress_current": 0,
"progress_total": 0,
"processors": [],
"created_at": "2026-06-22 23:08:48.497018",
"started_at": null,
"updated_at": null,
"queue_position": 3
}
```
| Field | Type | Description |
|-------|------|-------------|
| `id` | integer | Monitor job ID |
| `uuid` | string | File UUID |
| `status` | string | `"pending"`, `"queued"`, `"running"`, `"completed"`, `"failed"` |
| `current_processor` | string | Currently active processor, or null |
| `progress_current` | integer | Current progress count |
| `progress_total` | integer | Total progress count |
| `processors` | array | Processor list |
| `created_at` | string | Job creation timestamp |
| `started_at` | string | Processing start timestamp, or null |
| `updated_at` | string | Last update timestamp, or null |
| `queue_position` | integer | Position in FIFO queue (null if not pending/queued) |
---
### Status Lifecycle
```
register ──→ pending
trigger (POST /process)
queued ←── queue_position counts jobs ahead
worker picks up
processing
┌────────┴────────┐
▼ ▼
completed failed
checkin ──→ indexed
checkout ──→ checked_out
```
| Status | Meaning |
|--------|---------|
| `pending` | File registered, not yet triggered |
| `queued` | Triggered, waiting for worker in FIFO queue |
| `processing` | Worker actively processing |
| `completed` | All processors finished successfully |
| `failed` | One or more essential processors failed |
| `indexed` | Post-processing checkin complete |
| `checked_out` | User checked out the file |
Queue order is FIFO (`created_at ASC`). The `GET /api/v1/job/:uuid` endpoint returns `queue_position` showing how many jobs are ahead.
### Frontend Status Mapping
When displaying file status in the frontend list (e.g. after `GET /api/v1/files/scan`), map the `status` field as follows:
| DB Status | Status Label | Filter: 待處理 | Filter: 處理中 | Count: pendingCount | Count: processingCount |
|-----------|-------------|----------------|----------------|---------------------|-----------------------|
| `unregistered` | 未註冊 | No | No | No | No |
| `registered` | 待處理 | **Yes** | No | **Yes** | No |
| `pending` | 待處理 | **Yes** | No | **Yes** | No |
| `queued` | 排隊中 | **Yes** | **Yes** | **Yes** | **Yes** |
| `processing` | 處理中 | No | **Yes** | No | **Yes** |
| `completed` | 已完成 | No | No | No | No |
| `failed` | 處理失敗 | No | No | No | No |
| `indexed` | 已入庫 | No | No | No | No |
**`queued` 的特殊處理**
- `statusLabel` → 顯示「排隊中」,加 `ms-badge-warn` 樣式(黃色)
- `filterPending` → 應包含 `queued`讓它在「待處理」filter 可見
- `pendingCount` + `processingCount` → 兩者都應包含 `queued`,因它既是「待處理」也是「正在排隊」
-`refreshAllStatus` / `loadFiles` 中,如果檔案狀態是 `queued`,應顯示簡單的排隊訊息(無需 polling progress
- 當 worker pickup 後,狀態會變為 `processing`,此時 `refreshAllStatus` 會自動偵測到並開始 polling progress
- 也可以提供一個「queue_position」顯示呼叫 `GET /api/v1/job/:uuid` 取得排在第幾位
---
### `GET /api/v1/file/:file_uuid/processor-counts` ### `GET /api/v1/file/:file_uuid/processor-counts`
**Auth**: Required **Auth**: Required
@@ -407,4 +506,4 @@ curl -s -X POST "$API/api/v1/file/$FILE_UUID/complete" \
Phase 1 (`/phase1`) combines store-asrx + rule1 + vectorize into one call. Phase 1 (`/phase1`) combines store-asrx + rule1 + vectorize into one call.
--- ---
*Updated: 2026-06-20 12:00:00* *Updated: 2026-06-23 — Added queued status, FIFO queue order, queue_position in job detail, frontend status mapping table*

View File

@@ -119,12 +119,12 @@ curl<span class="w"> </span>-s<span class="w"> </span>-X<span class="w"> </span>
<tr> <tr>
<td><code>status</code></td> <td><code>status</code></td>
<td>string</td> <td>string</td>
<td><code>"processing"</code></td> <td><code>"queued"</code> — file enters the FIFO queue</td>
</tr> </tr>
<tr> <tr>
<td><code>pids</code></td> <td><code>pids</code></td>
<td>integer[]</td> <td>integer[]</td>
<td>Process IDs of started processors</td> <td>Process IDs of started processors (empty for queued)</td>
</tr> </tr>
<tr> <tr>
<td><code>message</code></td> <td><code>message</code></td>
@@ -507,6 +507,239 @@ curl<span class="w"> </span>-s<span class="w"> </span>-X<span class="w"> </span>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<h3><code>GET /api/v1/job/:uuid</code></h3>
<p><strong>Auth</strong>: Required
<strong>Scope</strong>: file-level</p>
<p>Get detailed information about a specific processing job, including its queue position.</p>
<h4>Response (200)</h4>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;id&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">51</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;uuid&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;c36f35685177c981aa139b66bbbccc5b&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;status&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;queued&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;current_processor&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;progress_current&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;progress_total&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;processors&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[],</span>
<span class="w"> </span><span class="nt">&quot;created_at&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;2026-06-22 23:08:48.497018&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;started_at&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;updated_at&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;queue_position&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span>
<span class="p">}</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>id</code></td>
<td>integer</td>
<td>Monitor job ID</td>
</tr>
<tr>
<td><code>uuid</code></td>
<td>string</td>
<td>File UUID</td>
</tr>
<tr>
<td><code>status</code></td>
<td>string</td>
<td><code>"pending"</code>, <code>"queued"</code>, <code>"running"</code>, <code>"completed"</code>, <code>"failed"</code></td>
</tr>
<tr>
<td><code>current_processor</code></td>
<td>string</td>
<td>Currently active processor, or null</td>
</tr>
<tr>
<td><code>progress_current</code></td>
<td>integer</td>
<td>Current progress count</td>
</tr>
<tr>
<td><code>progress_total</code></td>
<td>integer</td>
<td>Total progress count</td>
</tr>
<tr>
<td><code>processors</code></td>
<td>array</td>
<td>Processor list</td>
</tr>
<tr>
<td><code>created_at</code></td>
<td>string</td>
<td>Job creation timestamp</td>
</tr>
<tr>
<td><code>started_at</code></td>
<td>string</td>
<td>Processing start timestamp, or null</td>
</tr>
<tr>
<td><code>updated_at</code></td>
<td>string</td>
<td>Last update timestamp, or null</td>
</tr>
<tr>
<td><code>queue_position</code></td>
<td>integer</td>
<td>Position in FIFO queue (null if not pending/queued)</td>
</tr>
</tbody>
</table>
<hr />
<h3>Status Lifecycle</h3>
<div class="codehilite"><pre><span></span><code><span class="n">register</span><span class="w"> </span><span class="err">──→</span><span class="w"> </span><span class="n">pending</span>
<span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="n">trigger</span><span class="w"> </span><span class="p">(</span><span class="n">POST</span><span class="w"> </span><span class="o">/</span><span class="n">process</span><span class="p">)</span>
<span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="n">queued</span><span class="w"> </span><span class="err">←──</span><span class="w"> </span><span class="n">queue_position</span><span class="w"> </span><span class="n">counts</span><span class="w"> </span><span class="n">jobs</span><span class="w"> </span><span class="n">ahead</span>
<span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="n">worker</span><span class="w"> </span><span class="n">picks</span><span class="w"> </span><span class="n">up</span>
<span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="n">processing</span>
<span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="err">┌────────┴────────┐</span>
<span class="w"> </span><span class="err"></span><span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="n">completed</span><span class="w"> </span><span class="n">failed</span>
<span class="w"> </span><span class="err"></span>
<span class="w"> </span><span class="n">checkin</span><span class="w"> </span><span class="err">──→</span><span class="w"> </span><span class="n">indexed</span>
<span class="w"> </span><span class="n">checkout</span><span class="w"> </span><span class="err">──→</span><span class="w"> </span><span class="n">checked_out</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Status</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>pending</code></td>
<td>File registered, not yet triggered</td>
</tr>
<tr>
<td><code>queued</code></td>
<td>Triggered, waiting for worker in FIFO queue</td>
</tr>
<tr>
<td><code>processing</code></td>
<td>Worker actively processing</td>
</tr>
<tr>
<td><code>completed</code></td>
<td>All processors finished successfully</td>
</tr>
<tr>
<td><code>failed</code></td>
<td>One or more essential processors failed</td>
</tr>
<tr>
<td><code>indexed</code></td>
<td>Post-processing checkin complete</td>
</tr>
<tr>
<td><code>checked_out</code></td>
<td>User checked out the file</td>
</tr>
</tbody>
</table>
<p>Queue order is FIFO (<code>created_at ASC</code>). The <code>GET /api/v1/job/:uuid</code> endpoint returns <code>queue_position</code> showing how many jobs are ahead.</p>
<h3>Frontend Status Mapping</h3>
<p>When displaying file status in the frontend list (e.g. after <code>GET /api/v1/files/scan</code>), map the <code>status</code> field as follows:</p>
<table class="table">
<thead>
<tr>
<th>DB Status</th>
<th>Status Label</th>
<th>Filter: 待處理</th>
<th>Filter: 處理中</th>
<th>Count: pendingCount</th>
<th>Count: processingCount</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>unregistered</code></td>
<td>未註冊</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td><code>registered</code></td>
<td>待處理</td>
<td><strong>Yes</strong></td>
<td>No</td>
<td><strong>Yes</strong></td>
<td>No</td>
</tr>
<tr>
<td><code>pending</code></td>
<td>待處理</td>
<td><strong>Yes</strong></td>
<td>No</td>
<td><strong>Yes</strong></td>
<td>No</td>
</tr>
<tr>
<td><code>queued</code></td>
<td>排隊中</td>
<td><strong>Yes</strong></td>
<td><strong>Yes</strong></td>
<td><strong>Yes</strong></td>
<td><strong>Yes</strong></td>
</tr>
<tr>
<td><code>processing</code></td>
<td>處理中</td>
<td>No</td>
<td><strong>Yes</strong></td>
<td>No</td>
<td><strong>Yes</strong></td>
</tr>
<tr>
<td><code>completed</code></td>
<td>已完成</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td><code>failed</code></td>
<td>處理失敗</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td><code>indexed</code></td>
<td>已入庫</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
</tbody>
</table>
<p><strong><code>queued</code> 的特殊處理</strong>
- <code>statusLabel</code> → 顯示「排隊中」,加 <code>ms-badge-warn</code> 樣式(黃色)
- <code>filterPending</code> → 應包含 <code>queued</code>讓它在「待處理」filter 可見
- <code>pendingCount</code> + <code>processingCount</code> → 兩者都應包含 <code>queued</code>,因它既是「待處理」也是「正在排隊」
- 在 <code>refreshAllStatus</code> / <code>loadFiles</code> 中,如果檔案狀態是 <code>queued</code>,應顯示簡單的排隊訊息(無需 polling progress
- 當 worker pickup 後,狀態會變為 <code>processing</code>,此時 <code>refreshAllStatus</code> 會自動偵測到並開始 polling progress
- 也可以提供一個「queue_position」顯示呼叫 <code>GET /api/v1/job/:uuid</code> 取得排在第幾位</p>
<hr />
<h3><code>GET /api/v1/file/:file_uuid/processor-counts</code></h3> <h3><code>GET /api/v1/file/:file_uuid/processor-counts</code></h3>
<p><strong>Auth</strong>: Required <p><strong>Auth</strong>: Required
<strong>Scope</strong>: file-level</p> <strong>Scope</strong>: file-level</p>
@@ -652,7 +885,7 @@ curl<span class="w"> </span>-s<span class="w"> </span>-X<span class="w"> </span>
<p>Phase 1 (<code>/phase1</code>) combines store-asrx + rule1 + vectorize into one call.</p> <p>Phase 1 (<code>/phase1</code>) combines store-asrx + rule1 + vectorize into one call.</p>
<hr /> <hr />
<p><em>Updated: 2026-06-20 12:00:00</em></p> <p><em>Updated: 2026-06-23 — Added queued status, FIFO queue order, queue_position in job detail, frontend status mapping table</em></p>
</div> </div>
</body> </body>
</html> </html>

View File

@@ -51,8 +51,8 @@ curl -s -X POST "$API/api/v1/file/$FILE_UUID/process" \
| `success` | boolean | Always true on 200 | | `success` | boolean | Always true on 200 |
| `job_id` | integer | Monitor job ID (for job tracking) | | `job_id` | integer | Monitor job ID (for job tracking) |
| `file_uuid` | string | 32-char hex UUID of the file | | `file_uuid` | string | 32-char hex UUID of the file |
| `status` | string | `"processing"` | | `status` | string | `"queued"` — file enters the FIFO queue |
| `pids` | integer[] | Process IDs of started processors | | `pids` | integer[] | Process IDs of started processors (empty for queued) |
| `message` | string | Human-readable status | | `message` | string | Human-readable status |
#### Error Responses #### Error Responses
@@ -237,6 +237,105 @@ curl -s "$API/api/v1/jobs" -H "X-API-Key: $KEY" | jq '{count, jobs: [.jobs[] | {
| `page` | integer | Current page number | | `page` | integer | Current page number |
| `page_size` | integer | Jobs per page | | `page_size` | integer | Jobs per page |
### `GET /api/v1/job/:uuid`
**Auth**: Required
**Scope**: file-level
Get detailed information about a specific processing job, including its queue position.
#### Response (200)
```json
{
"id": 51,
"uuid": "c36f35685177c981aa139b66bbbccc5b",
"status": "queued",
"current_processor": null,
"progress_current": 0,
"progress_total": 0,
"processors": [],
"created_at": "2026-06-22 23:08:48.497018",
"started_at": null,
"updated_at": null,
"queue_position": 3
}
```
| Field | Type | Description |
|-------|------|-------------|
| `id` | integer | Monitor job ID |
| `uuid` | string | File UUID |
| `status` | string | `"pending"`, `"queued"`, `"running"`, `"completed"`, `"failed"` |
| `current_processor` | string | Currently active processor, or null |
| `progress_current` | integer | Current progress count |
| `progress_total` | integer | Total progress count |
| `processors` | array | Processor list |
| `created_at` | string | Job creation timestamp |
| `started_at` | string | Processing start timestamp, or null |
| `updated_at` | string | Last update timestamp, or null |
| `queue_position` | integer | Position in FIFO queue (null if not pending/queued) |
---
### Status Lifecycle
```
register ──→ pending
trigger (POST /process)
queued ←── queue_position counts jobs ahead
worker picks up
processing
┌────────┴────────┐
▼ ▼
completed failed
checkin ──→ indexed
checkout ──→ checked_out
```
| Status | Meaning |
|--------|---------|
| `pending` | File registered, not yet triggered |
| `queued` | Triggered, waiting for worker in FIFO queue |
| `processing` | Worker actively processing |
| `completed` | All processors finished successfully |
| `failed` | One or more essential processors failed |
| `indexed` | Post-processing checkin complete |
| `checked_out` | User checked out the file |
Queue order is FIFO (`created_at ASC`). The `GET /api/v1/job/:uuid` endpoint returns `queue_position` showing how many jobs are ahead.
### Frontend Status Mapping
When displaying file status in the frontend list (e.g. after `GET /api/v1/files/scan`), map the `status` field as follows:
| DB Status | Status Label | Filter: 待處理 | Filter: 處理中 | Count: pendingCount | Count: processingCount |
|-----------|-------------|----------------|----------------|---------------------|-----------------------|
| `unregistered` | 未註冊 | No | No | No | No |
| `registered` | 待處理 | **Yes** | No | **Yes** | No |
| `pending` | 待處理 | **Yes** | No | **Yes** | No |
| `queued` | 排隊中 | **Yes** | **Yes** | **Yes** | **Yes** |
| `processing` | 處理中 | No | **Yes** | No | **Yes** |
| `completed` | 已完成 | No | No | No | No |
| `failed` | 處理失敗 | No | No | No | No |
| `indexed` | 已入庫 | No | No | No | No |
**`queued` 的特殊處理**
- `statusLabel` → 顯示「排隊中」,加 `ms-badge-warn` 樣式(黃色)
- `filterPending` → 應包含 `queued`讓它在「待處理」filter 可見
- `pendingCount` + `processingCount` → 兩者都應包含 `queued`,因它既是「待處理」也是「正在排隊」
-`refreshAllStatus` / `loadFiles` 中,如果檔案狀態是 `queued`,應顯示簡單的排隊訊息(無需 polling progress
- 當 worker pickup 後,狀態會變為 `processing`,此時 `refreshAllStatus` 會自動偵測到並開始 polling progress
- 也可以提供一個「queue_position」顯示呼叫 `GET /api/v1/job/:uuid` 取得排在第幾位
---
### `GET /api/v1/file/:file_uuid/processor-counts` ### `GET /api/v1/file/:file_uuid/processor-counts`
**Auth**: Required **Auth**: Required
@@ -407,4 +506,4 @@ curl -s -X POST "$API/api/v1/file/$FILE_UUID/complete" \
Phase 1 (`/phase1`) combines store-asrx + rule1 + vectorize into one call. Phase 1 (`/phase1`) combines store-asrx + rule1 + vectorize into one call.
--- ---
*Updated: 2026-06-20 12:00:00* *Updated: 2026-06-23 — Added queued status, FIFO queue order, queue_position in job detail, frontend status mapping table*

View File

@@ -1,200 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
POC: MediaPipe Face Detection vs Apple Vision Framework vs InsightFace
Tests face detection on video frames and reports:
- Detection count
- Bounding box quality
- Landmarks (468 face mesh)
- Processing speed
"""
import sys
import json
import os
import time
import subprocess
import argparse
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
def extract_frames(video_path, sample_interval=30, max_frames=50):
"""Extract frames using ffmpeg"""
import tempfile
tmpdir = tempfile.mkdtemp(prefix="face_test_")
pattern = os.path.join(tmpdir, "frame_%05d.jpg")
cmd = ["ffmpeg", "-y", "-v", "quiet", "-i", video_path,
"-vf", f"select=not(mod(n\\,{sample_interval}))",
"-vsync", "vfr", "-q:v", "5", pattern]
subprocess.run(cmd, check=True)
files = sorted([f for f in os.listdir(tmpdir) if f.endswith(".jpg")])[:max_frames]
return tmpdir, [os.path.join(tmpdir, f) for f in files]
def test_mediapipe(frame_paths, fps):
"""MediaPipe Face Detection + Face Mesh"""
try:
from mediapipe.tasks import vision
from mediapipe.tasks.python.core.base_options import BaseOptions
from mediapipe.tasks.python.vision.face_detector import FaceDetector, FaceDetectorOptions
from mediapipe.tasks.python.vision.face_landmarker import FaceLandmarker, FaceLandmarkerOptions
except ImportError:
print("[MediaPipe] Not available, skipping")
return None
model_dir = os.path.join(os.path.dirname(__file__), "models")
os.makedirs(model_dir, exist_ok=True)
# Check model files - MediaPipe downloads automatically via the API
base_opts_detect = BaseOptions(model_asset_path="")
detect_opts = FaceDetectorOptions(base_options=BaseOptions())
t0 = time.time()
total_faces = 0
frames_with_faces = 0
landmarks_total = 0
# MediaPipe Face Detector
try:
detector = vision.FaceDetector.create_from_options(
FaceDetectorOptions(
base_options=BaseOptions(model_asset_buffer=None),
running_mode=vision.RunningMode.IMAGE
)
)
except:
# Download model first
import urllib.request
model_url = "https://storage.googleapis.com/mediapipe-models/face_detector/blaze_face_short_range/float16/latest/face_detector.task"
model_path = os.path.join(model_dir, "face_detector.task")
if not os.path.exists(model_path):
print(f"[MediaPipe] Downloading model: {model_url}")
urllib.request.urlretrieve(model_url, model_path)
detector = vision.FaceDetector.create_from_options(
FaceDetectorOptions(
base_options=BaseOptions(model_asset_path=model_path),
running_mode=vision.RunningMode.IMAGE
)
)
import cv2
for path in frame_paths:
img = cv2.imread(path)
if img is None:
continue
h, w = img.shape[:2]
mp_img = mp.Image(image_format=mp.ImageFormat.SRGB, data=img)
result = detector.detect(mp_img)
if result.detections:
frames_with_faces += 1
for det in result.detections:
total_faces += 1
bbox = det.bounding_box
# bbox is [x, y, width, height] in pixels
elapsed = time.time() - t0
print(f"[MediaPipe] Detection: {len(frame_paths)} frames, {frames_with_faces} with faces, {total_faces} faces, {elapsed:.2f}s")
# Face Landmarker (468 points)
landmark_path = os.path.join(model_dir, "face_landmarker.task")
if not os.path.exists(landmark_path):
model_url = "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
print(f"[MediaPipe] Downloading landmark model...")
import urllib.request
urllib.request.urlretrieve(model_url, landmark_path)
landmarker = vision.FaceLandmarker.create_from_options(
FaceLandmarkerOptions(
base_options=BaseOptions(model_asset_path=landmark_path),
running_mode=vision.RunningMode.IMAGE,
output_face_blendshapes=False,
output_facial_transformation_matrixes=False,
)
)
t1 = time.time()
for path in frame_paths[:10]: # Only test 10 frames for landmarks
img = cv2.imread(path)
if img is None:
continue
mp_img = mp.Image(image_format=mp.ImageFormat.SRGB, data=img)
result = landmarker.detect(mp_img)
if result.face_landmarks:
for face in result.face_landmarks:
landmarks_total += len(face)
elapsed2 = time.time() - t1
print(f"[MediaPipe] Face Mesh (10 frames): {landmarks_total} total landmarks (~{landmarks_total//max(len(result.face_landmarks),1)} per face)")
return {
"frames_processed": len(frame_paths),
"frames_with_faces": frames_with_faces,
"total_faces": total_faces,
"time_sec": elapsed,
"landmarks_per_face": 468,
}
def test_vision_framework(frame_paths, fps):
"""Apple Vision Framework face detection via swift binary"""
# Use the existing swift binary
swift_bin = os.path.join(os.path.dirname(__file__),
"swift_processors/.build/debug/swift_ocr")
# swift_ocr doesn't do face detection, use the face_compare_test
swift_face = os.path.join(os.path.dirname(__file__),
"swift_processors/.build/debug/face_compare_test")
if not os.path.exists(swift_face):
print("[Vision] Binary not found, skipping")
return None
print(f"[Vision] Running face compare test...")
t0 = time.time()
result = subprocess.run(
[swift_face, frame_paths[0].rsplit("/", 2)[0].replace("/frames", ""), # This won't work for single files
"--sample-interval", "1", "--max-frames", str(len(frame_paths))],
capture_output=True, text=True, timeout=120
)
elapsed = time.time() - t0
print(result.stdout[-500:])
return {"time_sec": elapsed}
def main():
parser = argparse.ArgumentParser()
parser.add_argument("video_path")
parser.add_argument("--sample-interval", type=int, default=30)
parser.add_argument("--max-frames", type=int, default=50)
args = parser.parse_args()
print(f"Testing: {args.video_path}")
# Extract frames
tmpdir, frames = extract_frames(args.video_path, args.sample_interval, args.max_frames)
print(f"Extracted {len(frames)} frames")
# MediaPipe
print("\n=== MediaPipe ===")
mp_result = test_mediapipe(frames, 24)
# Vision Framework
print("\n=== Apple Vision Framework ===")
vf_result = test_vision_framework(frames, 24)
# Summary
print("\n=== Comparison ===")
if mp_result:
print(f"MediaPipe: {mp_result['total_faces']} faces in {mp_result['frames_with_faces']} frames, {mp_result['time_sec']:.2f}s")
print(f" Landmarks: {mp_result['landmarks_per_face']} per face")
print(f"Vision Framework: (see above)")
# Cleanup
import shutil
shutil.rmtree(tmpdir, ignore_errors=True)
if __name__ == "__main__":
main()

View File

@@ -1 +0,0 @@
../v1.1/scripts/face_mediapipe_test_v1.11.py

View File

@@ -225,8 +225,9 @@ class FaceProcessorVision:
if face_img.size == 0: if face_img.size == 0:
continue continue
# CoreML embedding # CoreML embedding - TODO: push to Qdrant _faces collection instead
emb = self.extract_face_embedding(face_img) # emb = self.extract_face_embedding(face_img)
emb = None
if emb is not None: if emb is not None:
embed_count += 1 embed_count += 1
@@ -240,7 +241,6 @@ class FaceProcessorVision:
faces.append({ faces.append({
"x": x, "y": y, "width": w, "height": h, "x": x, "y": y, "width": w, "height": h,
"confidence": face.get("confidence", 0.5), "confidence": face.get("confidence", 0.5),
"embedding": emb,
"pose_angle": { "pose_angle": {
"angle": pose_angle, "angle": pose_angle,
"roll": pose_info.get("roll", 0), "roll": pose_info.get("roll", 0),
@@ -262,20 +262,17 @@ class FaceProcessorVision:
if len(face_data["frames"]) % 100 == 0: if len(face_data["frames"]) % 100 == 0:
elapsed = time.time() - t0 elapsed = time.time() - t0
print(f"[FACE_V2] {len(face_data['frames'])} frames, {embed_count} embeddings, {elapsed:.0f}s") print(f"[FACE_V2] {len(face_data['frames'])} frames, {elapsed:.0f}s")
if self.publisher: if self.publisher:
pct = int(len(face_data["frames"]) * 100 / max(len(frames), 1)) pct = int(len(face_data["frames"]) * 100 / max(len(frames), 1))
if pct > last_pct: if pct > last_pct:
last_pct = pct last_pct = pct
self.publisher.progress("face", len(face_data["frames"]), len(frames), self.publisher.progress("face", len(face_data["frames"]), len(frames),
f"{embed_count} faces", embed_count, "faces") "", 0, "faces")
self.video.release() self.video.release()
# Finalize
face_data["metadata"]["status"] = "completed" face_data["metadata"]["status"] = "completed"
face_data["metadata"]["total_embeddings"] = embed_count
face_data["metadata"]["embedder"] = "coreml_facenet"
# Convert dict frames to list for Rust FaceResult format # Convert dict frames to list for Rust FaceResult format
frames_list = [] frames_list = []

View File

@@ -1,228 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Regenerate ALL parent chunks for 384b0ff44aaaa1f1 using gemma4
Groups ASR chunks into ~17 logical scenes and generates summaries.
"""
import json
import subprocess
import psycopg2
import psycopg2.extras
DB_CONFIG = {"host": "localhost", "user": "accusys", "dbname": "momentry"}
UUID = "384b0ff44aaaa1f1"
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "gemma4:latest"
# Target ~17 scenes across 6865s = ~400s per scene
# But use natural breaks (gaps in dialogue) to split
SCENE_TARGET_COUNT = 17
def get_chunks():
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cur.execute(
"""
SELECT id, chunk_id, start_time, end_time, start_frame, end_frame,
text_content, fps
FROM chunks
WHERE uuid = %s AND chunk_type = 'sentence'
ORDER BY start_time
""",
(UUID,),
)
chunks = cur.fetchall()
cur.close()
conn.close()
return chunks
def call_gemma4(prompt, max_tokens=300):
payload = {
"model": MODEL,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.3, "num_predict": max_tokens},
}
try:
resp = subprocess.run(
["curl", "-s", OLLAMA_URL, "-d", json.dumps(payload)],
capture_output=True,
text=True,
timeout=180,
)
if resp.returncode == 0:
result = json.loads(resp.stdout)
return result.get("response", "").strip()
except Exception as e:
print(f" ⚠️ Ollama error: {e}")
return ""
def find_scene_boundaries(chunks, target_count=SCENE_TARGET_COUNT):
"""Find optimal scene boundaries based on dialogue gaps"""
if not chunks:
return []
# Calculate gaps between consecutive chunks
gaps = []
for i in range(1, len(chunks)):
gap = chunks[i]["start_time"] - chunks[i - 1]["end_time"]
gaps.append((i, gap))
# Sort by gap size, take top (target_count - 1) gaps
gaps.sort(key=lambda x: x[1], reverse=True)
split_indices = sorted([g[0] for g in gaps[: target_count - 1]])
# Create scenes
scenes = []
start = 0
for split in split_indices:
scenes.append(chunks[start:split])
start = split
scenes.append(chunks[start:])
return scenes
def generate_summary(scene_chunks, scene_num):
"""Generate summary for a scene using gemma4"""
texts = [c["text_content"] for c in scene_chunks if c["text_content"]]
if not texts:
return f"Scene {scene_num}: No dialogue"
combined = " ".join(texts)[:3000]
duration = scene_chunks[-1]["end_time"] - scene_chunks[0]["start_time"]
prompt = f"""You are a professional film scene analyst. Given the following dialogue transcript from a movie scene, write a concise one-sentence English summary.
Duration: {duration:.0f} seconds
Dialogue:
{combined}
Provide ONLY the summary sentence, nothing else. Focus on plot events and character actions."""
summary = call_gemma4(prompt, max_tokens=250)
if not summary:
# Fallback: use first few words of dialogue
summary = f"Scene {scene_num}: {' '.join(texts[:3])[:80]}..."
return summary
def insert_parent_chunks(scenes):
"""Insert parent chunks and update child relationships"""
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
inserted = 0
for i, scene_chunks in enumerate(scenes):
start_time = scene_chunks[0]["start_time"]
end_time = scene_chunks[-1]["end_time"]
start_frame = int(scene_chunks[0]["start_frame"])
end_frame = int(scene_chunks[-1]["end_frame"])
fps = float(scene_chunks[0]["fps"]) if scene_chunks[0]["fps"] else 59.94
chunk_count = len(scene_chunks)
print(
f" Scene {i}: {start_time:.0f}s-{end_time:.0f}s ({chunk_count} chunks, {end_time - start_time:.0f}s)"
)
# Generate summary
summary = generate_summary(scene_chunks, i)
print(f" 📝 {summary[:100]}...")
# Insert parent chunk
cur.execute(
"""
INSERT INTO parent_chunks (
uuid, scene_order, start_time, end_time,
start_frame, end_frame, fps, summary_text,
metadata, rule_3_markers, created_at
) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, NOW())
RETURNING id
""",
(
UUID,
i,
start_time,
end_time,
start_frame,
end_frame,
fps,
summary,
json.dumps({"auto_generated_by": "gemma4", "chunk_count": chunk_count}),
json.dumps({}),
),
)
parent_id = cur.fetchone()[0]
# Update chunks with parent_chunk_id
chunk_ids = [c["chunk_id"] for c in scene_chunks]
child_ids_array = chunk_ids # Store all child chunk IDs
cur.execute(
"""
UPDATE chunks
SET parent_chunk_id = %s::varchar
WHERE uuid = %s AND chunk_id = ANY(%s)
""",
(str(parent_id), UUID, chunk_ids),
)
inserted += 1
if i % 5 == 4 or i == len(scenes) - 1:
conn.commit()
print(f" ✅ Committed scenes 0-{i}")
conn.commit()
cur.close()
conn.close()
return inserted
def main():
print(f"🎬 Regenerating parent chunks for {UUID}")
print(f" Using model: {MODEL}")
print("=" * 70)
# Step 1: Get all chunks
print("\n📥 Fetching ASR chunks...")
chunks = get_chunks()
print(f" Found {len(chunks)} sentence chunks")
if chunks:
print(f" Time range: 0-{chunks[-1]['end_time']:.0f}s")
# Step 2: Find scene boundaries
print(f"\n🔍 Finding {SCENE_TARGET_COUNT} scene boundaries...")
scenes = find_scene_boundaries(chunks, SCENE_TARGET_COUNT)
print(f" Created {len(scenes)} scenes")
for i, s in enumerate(scenes):
print(
f" Scene {i}: {s[0]['start_time']:.0f}s-{s[-1]['end_time']:.0f}s ({len(s)} chunks)"
)
# Step 3: Generate summaries and insert
print("\n🤖 Generating summaries with gemma4...")
inserted = insert_parent_chunks(scenes)
print(f"\n{'=' * 70}")
print(f"✅ Created {inserted} parent chunks")
# Step 4: Verify
print("\n📊 Verification:")
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
cur.execute("SELECT COUNT(*) FROM parent_chunks WHERE uuid = %s", (UUID,))
print(f" parent_chunks: {cur.fetchone()[0]}")
cur.execute(
"SELECT COUNT(*) FROM chunks WHERE uuid = %s AND parent_chunk_id IS NULL AND chunk_type = 'sentence'",
(UUID,),
)
print(f" orphan chunks: {cur.fetchone()[0]}")
cur.close()
conn.close()
if __name__ == "__main__":
main()

View File

@@ -1 +0,0 @@
../v1.1/scripts/generate_parent_chunks_gemma4_v1.11.py

View File

@@ -1,711 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
MediaPipe Holistic Processor - Full body keypoint extraction
Purpose:
1. Extract Face Mesh (468 keypoints) → eye/mouth actions
2. Extract Pose (33 keypoints) → arm/leg/feet actions
3. Extract Hands (21 keypoints × 2) → hand gestures
Output structure:
{
"metadata": {...},
"frames": {
"frame_num": {
"persons": [
{
"person_id": 0,
"bbox": {...},
"face_mesh": {
"landmarks": [[x,y,z], ...], # 468 points
"eye_features": {...},
"mouth_features": {...},
},
"pose": {
"landmarks": [[x,y,z,visibility], ...], # 33 points
"arm_features": {...},
"leg_features": {...},
},
"hands": {
"left": {
"landmarks": [[x,y,z], ...], # 21 points
"gesture": "...",
},
"right": {
"landmarks": [[x,y,z], ...], # 21 points
"gesture": "...",
},
},
}
]
}
}
}
"""
import json
import argparse
import cv2
import numpy as np
import mediapipe as mp
from typing import Dict
class MediaPipeHolisticProcessor:
"""
Process video with MediaPipe Holistic (Face + Pose + Hands)
"""
def __init__(
self,
model_complexity: int = 1, # 0, 1, 2
refine_face_landmarks: bool = True,
enable_segmentation: bool = False,
min_detection_confidence: float = 0.5,
min_tracking_confidence: float = 0.5,
):
"""
Initialize MediaPipe Holistic
Args:
model_complexity: 0 (lite), 1 (full), 2 (heavy)
refine_face_landmarks: Enable iris detection
enable_segmentation: Enable segmentation mask
min_detection_confidence: Detection confidence threshold
min_tracking_confidence: Tracking confidence threshold
"""
self.mp_holistic = mp.solutions.holistic
self.mp_drawing = mp.solutions.drawing_utils
self.mp_drawing_styles = mp.solutions.drawing_styles
self.holistic = self.mp_holistic.Holistic(
static_image_mode=False, # Video mode
model_complexity=model_complexity,
smooth_landmarks=True, # Smooth landmarks across frames
enable_segmentation=enable_segmentation,
smooth_segmentation=True,
refine_face_landmarks=refine_face_landmarks,
min_detection_confidence=min_detection_confidence,
min_tracking_confidence=min_tracking_confidence,
)
# Eye landmark indices (Face Mesh)
self.LEFT_EYE_INDICES = [33, 133, 159, 145, 158, 144] # 6 points
self.RIGHT_EYE_INDICES = [362, 263, 386, 374, 385, 373]
# Iris indices
self.LEFT_IRIS_CENTER = 468
self.RIGHT_IRIS_CENTER = 473
# Mouth indices
self.MOUTH_TOP = 13
self.MOUTH_BOTTOM = 14
self.MOUTH_LEFT = 61
self.MOUTH_RIGHT = 291
# Pose key indices
self.POSE_KEYPOINTS = {
"nose": 0,
"left_shoulder": 11,
"right_shoulder": 12,
"left_elbow": 13,
"right_elbow": 14,
"left_wrist": 15,
"right_wrist": 16,
"left_hip": 23,
"right_hip": 24,
"left_knee": 25,
"right_knee": 26,
"left_ankle": 27,
"right_ankle": 28,
}
# Hand key indices
self.HAND_KEYPOINTS = {
"wrist": 0,
"thumb_cmc": 1,
"thumb_mcp": 2,
"thumb_ip": 3,
"thumb_tip": 4,
"index_mcp": 5,
"index_pip": 6,
"index_dip": 7,
"index_tip": 8,
"middle_mcp": 9,
"middle_pip": 10,
"middle_dip": 11,
"middle_tip": 12,
"ring_mcp": 13,
"ring_pip": 14,
"ring_dip": 15,
"ring_tip": 16,
"pinky_mcp": 17,
"pinky_pip": 18,
"pinky_dip": 19,
"pinky_tip": 20,
}
def process_frame(self, frame: np.ndarray) -> Dict:
"""
Process single frame
Args:
frame: BGR image
Returns:
Dict with face_mesh, pose, hands data
"""
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = self.holistic.process(frame_rgb)
person_data = {
"person_id": 0,
"bbox": None,
"face_mesh": None,
"pose": None,
"hands": {"left": None, "right": None},
}
# Extract face mesh
height, width = frame.shape[:2]
if results.face_landmarks:
person_data["face_mesh"] = self._extract_face_mesh(results.face_landmarks, width, height)
# Extract pose
if results.pose_landmarks:
person_data["pose"] = self._extract_pose(results.pose_landmarks, width, height)
# Extract hands
if results.left_hand_landmarks:
person_data["hands"]["left"] = self._extract_hand(results.left_hand_landmarks, "left", width, height)
if results.right_hand_landmarks:
person_data["hands"]["right"] = self._extract_hand(results.right_hand_landmarks, "right", width, height)
# Calculate bbox from pose landmarks
if results.pose_landmarks:
landmarks = results.pose_landmarks.landmark
x_coords = [lm.x for lm in landmarks if lm.visibility > 0.5]
y_coords = [lm.y for lm in landmarks if lm.visibility > 0.5]
if x_coords and y_coords:
x_min, x_max = min(x_coords), max(x_coords)
y_min, y_max = min(y_coords), max(y_coords)
person_data["bbox"] = {
"x": int(x_min * width),
"y": int(y_min * height),
"width": int((x_max - x_min) * width),
"height": int((y_max - y_min) * height),
}
return person_data
def _extract_face_mesh(self, face_landmarks, width: int, height: int) -> Dict:
"""
Extract face mesh landmarks and calculate features
Args:
face_landmarks: MediaPipe face landmarks
width: Frame width in pixels
height: Frame height in pixels
Returns:
Dict with landmarks (in pixels), eye_features, mouth_features
"""
landmarks = []
for lm in face_landmarks.landmark:
landmarks.append([int(lm.x * width), int(lm.y * height), lm.z])
# Eye Aspect Ratio (EAR)
def calculate_ear(eye_indices):
# Get eye points
p1 = face_landmarks.landmark[eye_indices[0]]
p2 = face_landmarks.landmark[eye_indices[1]]
p3 = face_landmarks.landmark[eye_indices[2]]
p4 = face_landmarks.landmark[eye_indices[3]]
p5 = face_landmarks.landmark[eye_indices[4]]
p6 = face_landmarks.landmark[eye_indices[5]]
# Vertical distances
vertical_1 = np.linalg.norm([p3.x - p5.x, p3.y - p5.y])
vertical_2 = np.linalg.norm([p4.x - p6.x, p4.y - p6.y])
# Horizontal distance
horizontal = np.linalg.norm([p1.x - p2.x, p1.y - p2.y])
ear = (vertical_1 + vertical_2) / (2 * horizontal) if horizontal > 0 else 0
return ear
left_ear = calculate_ear(self.LEFT_EYE_INDICES)
right_ear = calculate_ear(self.RIGHT_EYE_INDICES)
avg_ear = (left_ear + right_ear) / 2
# Iris position (if refined landmarks enabled)
left_iris_x = None
right_iris_x = None
if len(face_landmarks.landmark) > 477:
left_iris = face_landmarks.landmark[self.LEFT_IRIS_CENTER]
right_iris = face_landmarks.landmark[self.RIGHT_IRIS_CENTER]
# Normalize iris position relative to eye
left_eye_center_x = (face_landmarks.landmark[33].x + face_landmarks.landmark[133].x) / 2
right_eye_center_x = (face_landmarks.landmark[362].x + face_landmarks.landmark[263].x) / 2
left_eye_width = abs(face_landmarks.landmark[33].x - face_landmarks.landmark[133].x)
right_eye_width = abs(face_landmarks.landmark[362].x - face_landmarks.landmark[263].x)
left_iris_x = (left_iris.x - left_eye_center_x) / left_eye_width if left_eye_width > 0 else 0
right_iris_x = (right_iris.x - right_eye_center_x) / right_eye_width if right_eye_width > 0 else 0
# Eye action detection
eye_action = "unknown"
if avg_ear < 0.15:
eye_action = "closed"
elif avg_ear > 0.4:
eye_action = "wide_open"
elif 0.15 <= avg_ear < 0.25:
eye_action = "squint"
else:
eye_action = "normal"
# Gaze direction
gaze_direction = "center"
if left_iris_x and right_iris_x:
avg_iris_x = (left_iris_x + right_iris_x) / 2
if avg_iris_x < -0.2:
gaze_direction = "left"
elif avg_iris_x > 0.2:
gaze_direction = "right"
# Mouth Aspect Ratio (MAR)
mouth_top = face_landmarks.landmark[self.MOUTH_TOP]
mouth_bottom = face_landmarks.landmark[self.MOUTH_BOTTOM]
mouth_left = face_landmarks.landmark[self.MOUTH_LEFT]
mouth_right = face_landmarks.landmark[self.MOUTH_RIGHT]
mouth_height = np.linalg.norm([mouth_top.x - mouth_bottom.x, mouth_top.y - mouth_bottom.y])
mouth_width = np.linalg.norm([mouth_left.x - mouth_right.x, mouth_left.y - mouth_right.y])
mar = mouth_height / mouth_width if mouth_width > 0 else 0
# Mouth corner distance (for smile detection)
mouth_center_y = (mouth_top.y + mouth_bottom.y) / 2
corner_lift = (mouth_center_y - mouth_left.y) + (mouth_center_y - mouth_right.y)
# Mouth action detection
mouth_action = "unknown"
if mar > 0.7:
mouth_action = "yawn"
elif mar > 0.5:
mouth_action = "open"
elif mar < 0.2:
if corner_lift > 0.02:
mouth_action = "smile"
else:
mouth_action = "closed"
else:
mouth_action = "slightly_open"
return {
"landmarks": landmarks,
"num_landmarks": len(landmarks),
"eye_features": {
"left_ear": round(left_ear, 4),
"right_ear": round(right_ear, 4),
"avg_ear": round(avg_ear, 4),
"left_iris_x": round(left_iris_x, 4) if left_iris_x else None,
"right_iris_x": round(right_iris_x, 4) if right_iris_x else None,
"eye_action": eye_action,
"gaze_direction": gaze_direction,
},
"mouth_features": {
"mar": round(mar, 4),
"mouth_height": round(mouth_height, 4),
"mouth_width": round(mouth_width, 4),
"corner_lift": round(corner_lift, 4),
"mouth_action": mouth_action,
},
}
def _extract_pose(self, pose_landmarks, width: int, height: int) -> Dict:
"""
Extract pose landmarks and calculate features
Args:
pose_landmarks: MediaPipe pose landmarks
width: Frame width in pixels
height: Frame height in pixels
Returns:
Dict with landmarks (in pixels), arm_features, leg_features
"""
landmarks = []
for lm in pose_landmarks.landmark:
landmarks.append([int(lm.x * width), int(lm.y * height), lm.z, lm.visibility])
# Helper function to calculate angle
def calculate_angle(p1_idx, p2_idx, p3_idx):
p1 = pose_landmarks.landmark[p1_idx]
p2 = pose_landmarks.landmark[p2_idx]
p3 = pose_landmarks.landmark[p3_idx]
v1 = np.array([p1.x, p1.y]) - np.array([p2.x, p2.y])
v2 = np.array([p3.x, p3.y]) - np.array([p2.x, p2.y])
angle = np.arccos(np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2)))
return np.degrees(angle)
# Arm features
left_elbow_angle = calculate_angle(11, 13, 15) # shoulder-elbow-wrist
right_elbow_angle = calculate_angle(12, 14, 16)
# Check if arms raised
left_wrist = pose_landmarks.landmark[15]
left_elbow = pose_landmarks.landmark[13]
left_shoulder = pose_landmarks.landmark[11]
right_wrist = pose_landmarks.landmark[16]
right_elbow = pose_landmarks.landmark[14]
right_shoulder = pose_landmarks.landmark[12]
left_arm_raised = left_wrist.y < left_elbow.y < left_shoulder.y
right_arm_raised = right_wrist.y < right_elbow.y < right_shoulder.y
# Arm action detection
left_arm_action = "unknown"
if left_arm_raised:
left_arm_action = "raise_left"
elif left_elbow_angle > 150:
left_arm_action = "extend_left"
elif left_elbow_angle < 90:
left_arm_action = "fold_left"
else:
left_arm_action = "neutral_left"
right_arm_action = "unknown"
if right_arm_raised:
right_arm_action = "raise_right"
elif right_elbow_angle > 150:
right_arm_action = "extend_right"
elif right_elbow_angle < 90:
right_arm_action = "fold_right"
else:
right_arm_action = "neutral_right"
# Cross arms detection
cross_arms = False
if left_wrist.x > right_wrist.x and right_wrist.x < left_shoulder.x:
cross_arms = True
# Leg features
left_knee_angle = calculate_angle(23, 25, 27) # hip-knee-ankle
right_knee_angle = calculate_angle(24, 26, 28)
# Check standing/sitting
left_hip = pose_landmarks.landmark[23]
left_knee = pose_landmarks.landmark[25]
left_ankle = pose_landmarks.landmark[27]
right_hip = pose_landmarks.landmark[24]
right_knee = pose_landmarks.landmark[26]
right_ankle = pose_landmarks.landmark[28]
hip_avg_y = (left_hip.y + right_hip.y) / 2
knee_avg_y = (left_knee.y + right_knee.y) / 2
# Standing: hip < knee < ankle (y increases downward)
standing = left_hip.y < left_knee.y < left_ankle.y and right_hip.y < right_knee.y < right_ankle.y
# Sitting: hip ≈ knee height
sitting = abs(hip_avg_y - knee_avg_y) < 0.1
# Leg action detection
leg_action = "unknown"
if sitting:
leg_action = "sit"
elif standing:
if left_knee_angle < 120 or right_knee_angle < 120:
leg_action = "knee_bend"
else:
leg_action = "stand"
return {
"landmarks": landmarks,
"num_landmarks": len(landmarks),
"arm_features": {
"left_elbow_angle": round(left_elbow_angle, 2),
"right_elbow_angle": round(right_elbow_angle, 2),
"left_arm_raised": left_arm_raised,
"right_arm_raised": right_arm_raised,
"left_arm_action": left_arm_action,
"right_arm_action": right_arm_action,
"cross_arms": cross_arms,
},
"leg_features": {
"left_knee_angle": round(left_knee_angle, 2),
"right_knee_angle": round(right_knee_angle, 2),
"standing": standing,
"sitting": sitting,
"leg_action": leg_action,
},
}
def _extract_hand(self, hand_landmarks, hand_type: str, width: int, height: int) -> Dict:
"""
Extract hand landmarks and detect gesture
Args:
hand_landmarks: MediaPipe hand landmarks
hand_type: "left" or "right"
width: Frame width in pixels
height: Frame height in pixels
Returns:
Dict with landmarks (in pixels), gesture
"""
landmarks = []
for lm in hand_landmarks.landmark:
landmarks.append([int(lm.x * width), int(lm.y * height), lm.z])
# Check finger extensions
def is_finger_extended(tip_idx, pip_idx):
tip = hand_landmarks.landmark[tip_idx]
pip = hand_landmarks.landmark[pip_idx]
# Finger is extended if tip is higher (lower y) than pip
return tip.y < pip.y
thumb_extended = is_finger_extended(4, 3)
index_extended = is_finger_extended(8, 6)
middle_extended = is_finger_extended(12, 10)
ring_extended = is_finger_extended(16, 14)
pinky_extended = is_finger_extended(20, 18)
extensions = {
"thumb": thumb_extended,
"index": index_extended,
"middle": middle_extended,
"ring": ring_extended,
"pinky": pinky_extended,
}
# Gesture detection
gesture = "unknown"
num_extended = sum(extensions.values())
if num_extended == 5:
gesture = "open_hand"
elif num_extended == 0:
gesture = "fist"
elif thumb_extended and num_extended == 1:
gesture = "thumbs_up"
elif index_extended and middle_extended and num_extended == 2:
gesture = "peace_sign"
elif index_extended and num_extended == 1:
gesture = "pointing"
elif thumb_extended and index_extended and not any([middle_extended, ring_extended, pinky_extended]):
# Check thumb-index distance for OK gesture
thumb_tip = hand_landmarks.landmark[4]
index_tip = hand_landmarks.landmark[8]
distance = np.linalg.norm([thumb_tip.x - index_tip.x, thumb_tip.y - index_tip.y])
if distance < 0.05:
gesture = "ok_sign"
else:
gesture = "grab"
return {
"landmarks": landmarks,
"num_landmarks": len(landmarks),
"finger_extensions": extensions,
"num_fingers_extended": num_extended,
"gesture": gesture,
"hand_type": hand_type,
}
def process_video(
self,
video_path: str,
output_path: str,
sample_interval: int = 1,
uuid: str = "",
) -> Dict:
"""
Process entire video
Args:
video_path: Path to video file
output_path: Path to output JSON
sample_interval: Process every N frames
uuid: UUID for progress reporting
Returns:
Dict with all processed data
"""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"MEDIAPIPE_ERROR:Cannot open video: {video_path}", file=sys.stderr)
return {}
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"MEDIAPIPE_START", file=sys.stderr)
print(f"MEDIAPIPE_INFO:FPS={fps},total={total_frames},interval={sample_interval}", file=sys.stderr)
output_data = {
"metadata": {
"video_path": video_path,
"fps": fps,
"width": width,
"height": height,
"total_frames": total_frames,
"sample_interval": sample_interval,
"processor": "mediapipe_holistic",
"model_complexity": 1,
"refine_face_landmarks": True,
},
"frames": {},
}
frame_count = 0
processed_count = 0
while True:
ret, frame = cap.read()
if not ret:
break
frame_count += 1
if frame_count % sample_interval != 0:
continue
# Process frame
person_data = self.process_frame(frame)
# Only save if landmarks detected
if person_data["face_mesh"] or person_data["pose"] or person_data["hands"]["left"] or person_data["hands"]["right"]:
timestamp = frame_count / fps if fps > 0 else 0
output_data["frames"][str(frame_count)] = {
"frame_number": frame_count,
"timestamp": round(timestamp, 3),
"persons": [person_data],
}
processed_count += 1
if processed_count % 100 == 0:
print(f"MEDIAPIPE_FRAME:{processed_count}", file=sys.stderr)
cap.release()
# Update metadata
output_data["metadata"]["processed_frames"] = processed_count
# Save output
with open(output_path, "w") as f:
json.dump(output_data, f, indent=2)
print(f"MEDIAPIPE_COMPLETE:{processed_count}", file=sys.stderr)
return output_data
def close(self):
"""Close MediaPipe model"""
self.holistic.close()
def main():
parser = argparse.ArgumentParser(description="MediaPipe Holistic Processor")
parser.add_argument("video_path", nargs="?", help="Path to video file (positional)")
parser.add_argument("output_path", nargs="?", help="Path to output JSON (positional)")
parser.add_argument("--video", help="Path to video file")
parser.add_argument("--output", help="Path to output JSON")
parser.add_argument("--sample-interval", type=int, default=1, help="Process every N frames")
parser.add_argument("--model-complexity", type=int, default=1, choices=[0, 1, 2], help="Model complexity")
parser.add_argument("--test-frame", type=int, help="Test single frame only")
parser.add_argument("--uuid", default="", help="UUID for progress reporting")
args = parser.parse_args()
# Resolve positional vs flagged args
video_path = args.video or args.video_path
output_path = args.output or args.output_path
if not video_path or not output_path:
parser.error("video_path and output_path are required")
print("=" * 70)
print("MediaPipe Holistic Processor")
print("=" * 70)
processor = MediaPipeHolisticProcessor(
model_complexity=args.model_complexity,
refine_face_landmarks=True,
)
if args.test_frame:
# Test single frame
print(f"\nTesting frame {args.test_frame}...")
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, args.test_frame - 1)
ret, frame = cap.read()
cap.release()
if ret:
person_data = processor.process_frame(frame)
print("\n=== Results ===")
if person_data["face_mesh"]:
face = person_data["face_mesh"]
print(f"\nFace Mesh: {face['num_landmarks']} landmarks")
print(f" Eye: {face['eye_features']['eye_action']} (EAR: {face['eye_features']['avg_ear']})")
print(f" Gaze: {face['eye_features']['gaze_direction']}")
print(f" Mouth: {face['mouth_features']['mouth_action']} (MAR: {face['mouth_features']['mar']})")
if person_data["pose"]:
pose = person_data["pose"]
print(f"\nPose: {pose['num_landmarks']} keypoints")
print(f" Left arm: {pose['arm_features']['left_arm_action']} (angle: {pose['arm_features']['left_elbow_angle']}°)")
print(f" Right arm: {pose['arm_features']['right_arm_action']} (angle: {pose['arm_features']['right_elbow_angle']}°)")
print(f" Cross arms: {pose['arm_features']['cross_arms']}")
print(f" Leg: {pose['leg_features']['leg_action']}")
if person_data["hands"]["left"]:
hand = person_data["hands"]["left"]
print(f"\nLeft hand: {hand['num_landmarks']} keypoints")
print(f" Gesture: {hand['gesture']}")
print(f" Fingers extended: {hand['num_fingers_extended']}")
if person_data["hands"]["right"]:
hand = person_data["hands"]["right"]
print(f"\nRight hand: {hand['num_landmarks']} keypoints")
print(f" Gesture: {hand['gesture']}")
print(f" Fingers extended: {hand['num_fingers_extended']}")
else:
print("❌ Cannot read frame")
else:
# Process entire video
processor.process_video(
video_path,
output_path,
args.sample_interval,
uuid=args.uuid,
)
processor.close()
if __name__ == "__main__":
main()

View File

@@ -1 +0,0 @@
../v1.1/scripts/mediapipe_holistic_processor_v1.11.py

View File

@@ -1 +0,0 @@
../v1.1/scripts/mediapipe_processor_v1.11.py

View File

@@ -1,381 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Story Processor V2.0 — Dual Pipeline: Story-based + LLM-based Parent-Child Summarization
Pipeline 1 (Story): Template-based, instant, no LLM cost
→ Parent story summary + Child story summary
→ Embedding (Ollama nomic-embed) → pgvector
→ BM25 (PostgreSQL tsvector) → full-text search
Pipeline 2 (LLM): LLM-based summarization (Gemma4/Qwen when resources allow)
→ Parent LLM summary + Child LLM summary
→ Embedding → pgvector + BM25
Both pipelines store into chunks table with distinct chunk_types:
story_parent, story_child, llm_parent, llm_child
Usage:
python parent_chunk_5w1h.py --file-uuid <uuid> --mode story [--embed]
python parent_chunk_5w1h.py --file-uuid <uuid> --mode llm [--embed]
"""
import json, os, sys, argparse, time, requests, psycopg2
from collections import defaultdict
from typing import Dict, List, Optional
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
DB_URL = os.getenv("DATABASE_URL", "postgresql://accusys@localhost:5432/momentry")
SCHEMA = os.getenv("DATABASE_SCHEMA", "dev")
OUTPUT_DIR = os.getenv("MOMENTRY_OUTPUT_DIR", "/Users/accusys/momentry/output_dev")
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://localhost:11436/v1/embeddings")
def load_speaker_map(file_uuid: str) -> dict:
"""Load speaker→identity mapping from DB (generalized, not hardcoded)"""
try:
conn = psycopg2.connect(DB_URL)
cur = conn.cursor()
cur.execute("SET search_path TO %s, public", (SCHEMA,))
cur.execute(
"SELECT metadata->>'speaker_id', name FROM identities "
"WHERE metadata->>'speaker_id' IS NOT NULL"
)
spk_map = {}
for spk_id, name in cur.fetchall():
spk_map[spk_id] = (name, 0.85) # default confidence from MAR
cur.close(); conn.close()
return spk_map if spk_map else DEFAULT_SPEAKER_MAP
except Exception:
return DEFAULT_SPEAKER_MAP
# Default fallback (used when DB has no speaker mapping)
DEFAULT_SPEAKER_MAP = {}
CURRENT_VERSIONS = {
"asr": "faster-whisper/small/v1",
"asrx": "speechbrain/ecapa-tdnn/v1",
"cut": "pyscenedetect/default",
"yolo": "yolov5-coreml/v2",
"face_detection": "apple-vision/v2",
"face_embedding": "coreml-facenet/v2",
"speaker_binding": "mar-lip/v1",
"identity_clustering": "cosine-threshold/v1",
"story_agent": "template/v2.0",
"embedding_agent": "nomic-embed-768d/v1",
}
LLM_URL = os.getenv("MOMENTRY_LLM_URL", os.getenv("MOMENTRY_LLM_SUMMARY_URL", "http://127.0.0.1:8082/v1/chat/completions"))
LLM_MODEL = os.getenv("MOMENTRY_LLM_SUMMARY_MODEL", "gemma4")
def load_data(file_uuid: str) -> dict:
data = {}
for name in ["asr", "asrx", "cut"]:
path = os.path.join(OUTPUT_DIR, f"{file_uuid}.{name}.json")
data[name] = json.load(open(path)) if os.path.exists(path) else None
return data
def build_child_chunks(data: dict, file_uuid: str) -> List[dict]:
"""Group ASR sentences by CUT scene boundaries → parent/child structure."""
asr_segs = data["asr"].get("segments", []) if data["asr"] else []
asrx_segs = data["asrx"].get("segments", []) if data["asrx"] else []
cut_scenes = data["cut"].get("scenes", []) if data["cut"] else []
# Dynamically load speaker→identity mapping from DB
speaker_map = load_speaker_map(file_uuid)
if not cut_scenes:
max_t = max(
(asr_segs[-1].get("end", 0) if asr_segs else 0),
(asrx_segs[-1].get("end_time", 0) if asrx_segs else 0),
)
cut_scenes = [{"start_time": t, "end_time": min(t + 60, max_t)} for t in range(0, int(max_t) + 60, 60)]
scenes = []
for cs in cut_scenes:
s, e = cs["start_time"], cs["end_time"]
children = []
for seg_idx, seg in enumerate(asr_segs):
st, en = seg.get("start", 0), seg.get("end", 0)
text = seg.get("text", "").strip()
if st < s or en > e or not text: continue
spk_id = "unknown"
for ax in asrx_segs:
if ax["start_time"] <= st and ax["end_time"] >= en:
spk_id = ax.get("speaker_id", "unknown"); break
spk_info = speaker_map.get(spk_id)
if spk_info:
character, spk_conf = spk_info
else:
character, spk_conf = spk_id, 0.0
children.append({
"start": st, "end": en, "text": text,
"speaker_id": spk_id, "speaker_name": character,
"speaker_confidence": spk_conf,
"chunk_id": f"{file_uuid}_{seg_idx}",
})
# Boundary overlap: even empty scenes get partial children
for seg_idx, seg in enumerate(asr_segs):
st, en = seg.get("start", 0), seg.get("end", 0)
text = seg.get("text", "").strip()
if not text: continue
if st >= s and en <= e: continue
if not (st < e and en > s): continue
spk_id = "unknown"
for ax in asrx_segs:
if ax["start_time"] <= st and ax["end_time"] >= en:
spk_id = ax.get("speaker_id", "unknown"); break
spk_info = speaker_map.get(spk_id)
if spk_info:
character, spk_conf = spk_info
else:
character, spk_conf = spk_id, 0.0
children.append({
"start": st, "end": en, "text": text,
"speaker_id": spk_id, "speaker_name": character,
"speaker_confidence": spk_conf,
"chunk_id": f"{file_uuid}_{seg_idx}",
"overlap_type": "partial",
})
if children:
scenes.append({
"start_time": s, "end_time": e, "duration": e - s,
"children": children, "child_count": len(children),
})
return scenes
# ===== Pipeline 1: Story (Template) Summaries =====
def generate_story_parent_summary(scene: dict) -> str:
children = scene["children"]
characters = sorted(set(c["speaker_name"] for c in children))
total_words = sum(len(c["text"].split()) for c in children)
by_speaker = defaultdict(list)
for c in children: by_speaker[c["speaker_name"]].append(c["text"])
speakers = []
for char, texts in sorted(by_speaker.items()):
speakers.append(f"{char} ({len(texts)} lines)")
return (
f"[{scene['start_time']:.0f}s-{scene['end_time']:.0f}s, {scene['duration']:.0f}s] "
f"Cast: {', '.join(characters)}. Total: {len(children)} lines, {total_words} words. "
f"Speakers: {' | '.join(speakers[:3])}"
)
def generate_story_child_summary(child: dict, parent_summary: str) -> str:
return (
f"[{child['start']:.0f}s-{child['end']:.0f}s] "
f"{child['speaker_name']}: \"{child['text']}\""
)
# ===== Pipeline 2: LLM Summaries (requires LLM server) =====
def generate_llm_parent_summary(scene: dict, max_scenes_processed: int) -> Optional[str]:
"""LLM-based parent summary"""
if not LLM_URL: return None
children = scene["children"]
dialogue = "\n".join(
f"[{c['start']:.0f}s] {c['speaker_name']}: {c['text'][:150]}"
for c in children[:15]
)
prompt = (
"You are a film analyst. Summarize this scene in one flowing paragraph (60-100 words). "
"Include: who is present, what they discuss, tone/mood.\n\n"
f"Scene: {scene['start_time']:.0f}s - {scene['end_time']:.0f}s\n"
f"Dialogue:\n{dialogue}\n\nSummary:"
)
try:
resp = requests.post(LLM_URL, json={
"model": LLM_MODEL,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 200, "temperature": 0.3,
}, timeout=60)
return resp.json()["choices"][0]["message"]["content"].strip()
except Exception as e:
print(f" ⚠️ LLM parent summary failed: {e}")
return None
def generate_llm_child_summary(child: dict, parent_summary: str) -> Optional[str]:
"""LLM-based child (sentence) summary"""
return f"[{child['start']:.0f}s-{child['end']:.0f}s] {child['speaker_name']}: \"{child['text']}\""
# ===== Embedding (Ollama nomic-embed) =====
def embed_text(text: str, max_retries: int = 3) -> Optional[List[float]]:
"""Get embedding via EmbeddingGemma server"""
for attempt in range(max_retries):
try:
resp = requests.post(EMBEDDING_URL, json={
"input": [text],
}, timeout=30)
if resp.status_code == 200:
data = resp.json()
items = data.get("data", [])
if items:
return items[0]["embedding"]
except Exception as e:
if attempt == max_retries - 1:
print(f" ⚠️ Embedding failed: {e}")
return None
time.sleep(1)
return None
# ===== DB Store (chunks table with embedding + BM25) =====
def store_chunks(file_uuid: str, scenes: List[dict], mode: str, do_embed: bool, conn):
"""Store parent + child summaries into chunks table."""
cur = conn.cursor()
parent_type = f"{mode}_parent"
child_type = f"{mode}_child"
parent_count = 0
child_count = 0
# Get base chunk_index
cur.execute(
f"SELECT COALESCE(MAX(chunk_index), 0) FROM {SCHEMA}.chunk WHERE file_uuid = %s",
(file_uuid,),
)
next_index = (cur.fetchone()[0] or 0) + 1
for scene in scenes:
parent_text = generate_story_parent_summary(scene) if mode == "story" else generate_llm_parent_summary(scene, parent_count)
if not parent_text: continue
parent_id = f"{mode}_parent_{file_uuid}_{scene['start_time']:.0f}_{scene['end_time']:.0f}"
parent_embedding = embed_text(parent_text) if do_embed else None
if do_embed and parent_embedding:
cur.execute(
f"""
INSERT INTO {SCHEMA}.chunk (chunk_id, old_chunk_id, file_uuid, chunk_type, chunk_index,
start_time, end_time, content, text_content, parent_chunk_id, embedding)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s::jsonb, %s, %s, %s::vector)
ON CONFLICT (file_uuid, old_chunk_id) DO UPDATE
SET content = EXCLUDED.content, text_content = EXCLUDED.text_content,
embedding = EXCLUDED.embedding
""",
(parent_id, parent_id, file_uuid, parent_type, next_index,
scene["start_time"], scene["end_time"],
json.dumps({"summary": parent_text, "mode": mode, "type": "parent",
"source_versions": CURRENT_VERSIONS}),
parent_text, None, parent_embedding),
)
else:
cur.execute(
f"""
INSERT INTO {SCHEMA}.chunk (chunk_id, old_chunk_id, file_uuid, chunk_type, chunk_index,
start_time, end_time, content, text_content, parent_chunk_id)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s::jsonb, %s, %s)
ON CONFLICT (file_uuid, old_chunk_id) DO UPDATE
SET content = EXCLUDED.content, text_content = EXCLUDED.text_content
""",
(parent_id, parent_id, file_uuid, parent_type, next_index,
scene["start_time"], scene["end_time"],
json.dumps({"summary": parent_text, "mode": mode, "type": "parent",
"source_versions": CURRENT_VERSIONS}),
parent_text, None),
)
next_index += 1
parent_count += 1
for child in scene["children"]:
child_id = child["chunk_id"]
child_text = generate_story_child_summary(child, parent_text) if mode == "story" else generate_llm_child_summary(child, parent_text)
child_embedding = embed_text(child_text) if do_embed else None
if do_embed and child_embedding:
cur.execute(
f"""
INSERT INTO {SCHEMA}.chunk (chunk_id, old_chunk_id, file_uuid, chunk_type, chunk_index,
start_time, end_time, content, text_content, parent_chunk_id, embedding)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s::jsonb, %s, %s, %s::vector)
ON CONFLICT (file_uuid, old_chunk_id) DO UPDATE
SET content = EXCLUDED.content, text_content = EXCLUDED.text_content,
parent_chunk_id = EXCLUDED.parent_chunk_id,
embedding = EXCLUDED.embedding
""",
(child_id, child_id, file_uuid, child_type, next_index,
child["start"], child["end"],
json.dumps({"speaker": child["speaker_name"], "text": child["text"], "mode": mode,
"speaker_confidence": child.get("speaker_confidence", 0),
"source_versions": CURRENT_VERSIONS}),
child_text, parent_id, child_embedding),
)
else:
cur.execute(
f"""
INSERT INTO {SCHEMA}.chunk (chunk_id, old_chunk_id, file_uuid, chunk_type, chunk_index,
start_time, end_time, content, text_content, parent_chunk_id)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s::jsonb, %s, %s)
ON CONFLICT (file_uuid, old_chunk_id) DO UPDATE
SET content = EXCLUDED.content, text_content = EXCLUDED.text_content,
parent_chunk_id = EXCLUDED.parent_chunk_id
""",
(child_id, child_id, file_uuid, child_type, next_index,
child["start"], child["end"],
json.dumps({"speaker": child["speaker_name"], "text": child["text"], "mode": mode,
"speaker_confidence": child.get("speaker_confidence", 0),
"source_versions": CURRENT_VERSIONS}),
child_text, parent_id),
)
next_index += 1
child_count += 1
conn.commit()
cur.close()
return parent_count, child_count
def main():
parser = argparse.ArgumentParser(description="Story Processor V2.0")
parser.add_argument("--file-uuid", required=True)
parser.add_argument("--mode", choices=["story", "llm"], default="story")
parser.add_argument("--max-scenes", type=int, default=99999)
parser.add_argument("--embed", action="store_true", help="Generate embeddings (Ollama)")
parser.add_argument("--no-db", action="store_true", help="Skip DB storage")
args = parser.parse_args()
file_uuid = args.file_uuid
print(f"[STORY] Mode: {args.mode}, Embed: {args.embed}")
data = load_data(file_uuid)
if not data["asr"]:
print("[STORY] ❌ No ASR data"); return
scenes = build_child_chunks(data, file_uuid)[:args.max_scenes]
total_children = sum(s["child_count"] for s in scenes)
print(f"[STORY] {len(scenes)} scenes, {total_children} child chunks")
if not args.no_db:
conn = psycopg2.connect(DB_URL)
try:
pc, cc = store_chunks(file_uuid, scenes, args.mode, args.embed, conn)
print(f"[STORY] DB: {pc} parent, {cc} child chunks ({args.mode})")
finally:
conn.close()
# Save JSON output
out_path = os.path.join(OUTPUT_DIR, f"{file_uuid}.story_{args.mode}.json")
out_data = {"file_uuid": file_uuid, "mode": args.mode, "scenes": scenes}
with open(out_path, "w") as f:
json.dump(out_data, f, indent=2, ensure_ascii=False, default=str)
print(f"[STORY] ✅ {out_path}")
if __name__ == "__main__":
main()

View File

@@ -1 +0,0 @@
../v1.1/scripts/parent_chunk_5w1h_v1.11.py

View File

@@ -1,320 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Rebuild story chunk text_content and regenerates summaries using new ASRX speaker assignments.
Then updates Qdrant momentry_dev_stories and sentence_story/sentence_summary collections.
"""
import json, sys, time, urllib.request
from urllib.request import Request, urlopen
import psycopg2
UUID = "aeed71342a899fe4b4c57b7d41bcb692"
DB_URL = "postgresql://accusys@localhost:5432/momentry?host=/tmp"
QDRANT_URL = "http://localhost:6333"
LLM_URL = "http://localhost:8082/v1/chat/completions"
EMBED_URL = "http://localhost:11436/v1/embeddings"
def call_llm(dialogue_text):
prompt = f"Dialogue:\n{dialogue_text}\n\n50-word summary:"
body = json.dumps({"model": "google_gemma-4-26B-A4B-it-Q5_K_M.gguf",
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.1, "max_tokens": 100}).encode()
req = Request(LLM_URL, data=body, headers={"Content-Type": "application/json"})
try:
resp = urlopen(req, timeout=120)
return json.loads(resp.read())["choices"][0]["message"]["content"].strip()
except Exception as e:
print(f" LLM error: {e}")
return ""
def call_embed(text):
body = json.dumps({"input": text}).encode()
req = Request(EMBED_URL, data=body, headers={"Content-Type": "application/json"})
try:
resp = urlopen(req, timeout=30)
return json.loads(resp.read())["data"][0]["embedding"]
except Exception as e:
print(f" Embed error: {e}")
return [0.0] * 768
print("=== Step 1: Load sentence chunks with new speaker info ===")
conn = psycopg2.connect(DB_URL)
cur = conn.cursor()
cur.execute("""
SELECT chunk_index, text_content, metadata->>'new_speaker_name',
metadata->>'speaker_name', content
FROM dev.chunks
WHERE file_uuid = %s AND chunk_type = 'sentence'
ORDER BY chunk_index
""", (UUID,))
sentence_rows = cur.fetchall()
print(f"Loaded {len(sentence_rows)} sentence chunks")
# Build lookup
sentences = {}
for r in sentence_rows:
idx, old_text, new_name, old_name, content = r
sentences[idx] = {
"old_text": old_text or "",
"new_name": new_name or old_name or "Unknown",
"old_name": old_name or "Unknown",
"content": content or {},
}
# Rebuild sentence text_content with new speaker names
print("\n=== Step 2: Rebuild sentence text_content ===")
updated_sentences = 0
for r in sentence_rows:
idx, old_text, new_name, old_name, content = r
new_name = new_name or old_name or "Unknown"
# Extract the text part (remove old speaker prefix if exists)
raw_text = ""
if content and isinstance(content, dict):
raw_text = content.get("data", {}).get("text", "")
if not raw_text and old_text:
# Parse old format: [Speaker] text
import re
m = re.search(r'\]\s*(.*)', old_text)
if m:
raw_text = m.group(1)
else:
raw_text = old_text
new_text = f"[{new_name}] {raw_text}"
cur.execute("""
UPDATE dev.chunks
SET text_content = %s, updated_at = NOW()
WHERE file_uuid = %s AND chunk_type = 'sentence' AND chunk_index = %s
""", (new_text, UUID, idx))
updated_sentences += 1
conn.commit()
print(f"Updated {updated_sentences} sentence chunks text_content")
print("\n=== Step 3: Rebuild story chunk text_content ===")
cur.execute("""
SELECT id, chunk_id, chunk_index, child_chunk_ids, start_time, end_time,
text_content, summary_text
FROM dev.chunks
WHERE file_uuid = %s AND chunk_type = 'story'
ORDER BY chunk_index
""", (UUID,))
story_rows = cur.fetchall()
print(f"Loaded {len(story_rows)} story chunks")
# Build child text per story chunk
story_dialogue_texts = []
for r in story_rows:
db_id, cid, idx, child_ids, st, et, old_text, old_summary = r
dialogue_parts = []
for child_cid in (child_ids or []):
parts = child_cid.split("_")
child_idx = int(parts[-1])
if child_idx in sentences:
s = sentences[child_idx]
raw = ""
if s["content"] and isinstance(s["content"], dict):
raw = s["content"].get("data", {}).get("text", "")
if not raw:
import re
m = re.search(r'\]\s*(.*)', s["old_text"])
if m:
raw = m.group(1)
else:
raw = s["old_text"]
if raw:
dialogue_parts.append(f'({s["new_name"]}) {raw}')
dialogue_text = " ".join(dialogue_parts)
story_dialogue_texts.append((db_id, cid, idx, st, et, dialogue_text, old_summary))
print(f"Built {len(story_dialogue_texts)} story dialogue texts")
# Update DB with new text_content (dialogue only, not summary yet)
for item in story_dialogue_texts:
db_id, cid, idx, st, et, dialogue_text, old_summary = item
cur.execute("""
UPDATE dev.chunks
SET text_content = %s, updated_at = NOW()
WHERE id = %s
""", (dialogue_text, db_id))
conn.commit()
print("Updated story chunk dialogue texts")
print("\n=== Step 4: Generate LLM summaries (all 228 stories) ===")
summaries = []
for i, item in enumerate(story_dialogue_texts):
db_id, cid, idx, st, et, dialogue_text, old_summary = item
if len(dialogue_text) < 10:
summary = "[no dialogue]"
embedding = [0.0] * 768
else:
print(f" [{i+1}/{len(story_dialogue_texts)}] {cid}: {len(dialogue_text)} chars", end="")
try:
summary = call_llm(dialogue_text[:3000])
print(f" -> {len(summary)} chars")
time.sleep(0.3)
embedding = call_embed(summary)
except Exception as e:
print(f" ERROR: {e}")
summary = "[error]"
embedding = [0.0] * 768
# Update DB
s_esc = summary.replace("'", "''")
cur.execute(f"""
UPDATE dev.chunks
SET summary_text = '{s_esc}', updated_at = NOW()
WHERE id = {db_id}
""")
summaries.append({
"db_id": db_id,
"chunk_id": cid,
"chunk_index": idx,
"start_time": st,
"end_time": et,
"dialogue": dialogue_text,
"summary": summary,
"embedding": embedding,
})
conn.commit()
print(f"\nGenerated {len(summaries)} summaries")
print("\n=== Step 5: Rebuild Qdrant momentry_dev_stories ===")
# Delete existing
req = Request(f"{QDRANT_URL}/collections/momentry_dev_stories", method="DELETE")
try:
urlopen(req)
time.sleep(0.3)
except:
pass
# Recreate
req = Request(f"{QDRANT_URL}/collections/momentry_dev_stories",
data=json.dumps({"vectors": {"size": 768, "distance": "Cosine"}}).encode(),
headers={"Content-Type": "application/json"}, method="PUT")
urlopen(req)
time.sleep(0.3)
# Upload dialogue points (0..227) and summary points (228..455)
dialogue_points = []
summary_points = []
for s in summaries:
idx = s["chunk_index"]
dialogue_points.append({
"id": idx + 1,
"vector": [0.0] * 768,
"payload": {
"chunk_id": s["chunk_id"],
"file_uuid": UUID,
"start_time": s["start_time"],
"end_time": s["end_time"],
"type": "story_dialogue",
"text": s["dialogue"][:500],
}
})
summary_points.append({
"id": idx + 1 + 228,
"vector": s["embedding"],
"payload": {
"chunk_id": s["chunk_id"],
"file_uuid": UUID,
"start_time": s["start_time"],
"end_time": s["end_time"],
"type": "story_summary",
"summary": s["summary"],
}
})
all_story_points = dialogue_points + summary_points
batch_size = 100
for start in range(0, len(all_story_points), batch_size):
batch = all_story_points[start:start+batch_size]
req = Request(f"{QDRANT_URL}/collections/momentry_dev_stories/points?wait=true",
data=json.dumps({"points": batch}).encode(),
headers={"Content-Type": "application/json"}, method="PUT")
try:
urlopen(req)
except Exception as e:
print(f" Batch {start}: {e}")
if (start // batch_size) % 3 == 0:
print(f" Uploaded {start + len(batch)}/{len(all_story_points)}")
print(f"Uploaded {len(all_story_points)} points to momentry_dev_stories")
print("\n=== Step 6: Populate sentence_story and sentence_summary ===")
# These are the per-sentence template + summary collections
# sentence_story: 3417 points, 768D, template payloads
# sentence_summary: 3417 points, 768D, LLM summary payloads
for col_name in ["sentence_story", "sentence_summary"]:
req = Request(f"{QDRANT_URL}/collections/{col_name}", method="DELETE")
try:
urlopen(req)
time.sleep(0.2)
except:
pass
req = Request(f"{QDRANT_URL}/collections/{col_name}",
data=json.dumps({"vectors": {"size": 768, "distance": "Cosine"}}).encode(),
headers={"Content-Type": "application/json"}, method="PUT")
urlopen(req)
time.sleep(0.2)
# Build points for sentence_story and sentence_summary
story_sentence_points = []
summary_sentence_points = []
for idx in sorted(sentences.keys()):
s = sentences[idx]
raw_text = ""
if s["content"] and isinstance(s["content"], dict):
raw_text = s["content"].get("data", {}).get("text", "")
dialog_line = f'({s["new_name"]}) {raw_text}'
story_sentence_points.append({
"id": idx + 1,
"vector": [0.0] * 768,
"payload": {
"chunk_id": f"{UUID}_{idx}",
"file_uuid": UUID,
"start_time": 0,
"end_time": 0,
"text": dialog_line,
"speaker_name": s["new_name"],
"chunk_type": "sentence",
}
})
# Upload sentence_story (dialogue template)
batch_size = 200
for start in range(0, len(story_sentence_points), batch_size):
batch = story_sentence_points[start:start+batch_size]
req = Request(f"{QDRANT_URL}/collections/sentence_story/points?wait=true",
data=json.dumps({"points": batch}).encode(),
headers={"Content-Type": "application/json"}, method="PUT")
try:
urlopen(req)
except Exception as e:
print(f" sentence_story batch {start}: {e}")
if (start // batch_size) % 5 == 0:
print(f" Uploaded {start + len(batch)}/3417 sentence_story")
print("Uploaded sentence_story points")
# sentence_summary will be populated when we generate per-sentence summaries
# For now, mark as TODO
print("sentence_summary: SKIPPED (needs per-sentence LLM summaries)")
cur.close()
conn.close()
print("\n=== Done ===")

View File

@@ -1 +0,0 @@
../v1.1/scripts/rebuild_story_content_v1.11.py

View File

@@ -1,197 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Regenerate parent chunk summaries using 5W1H multi-dimensional structure via gemma4.
5W1H Structure:
- Who: Main characters/people involved
- What: Key actions/events
- When: Temporal context (sequence in story)
- Where: Location/setting
- Why: Motivation/conflict driving the scene
- How: Emotional tone/manner of events
"""
import json
import requests
import psycopg2
import psycopg2.extras
DB_CONFIG = {"host": "localhost", "user": "accusys", "dbname": "momentry"}
UUID = "384b0ff44aaaa1f1"
LLAMA_URL = "http://127.0.0.1:8081/v1/chat/completions"
def get_parent_with_children():
"""Get all parent chunks with their child chunk texts"""
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cur.execute(
"""
SELECT pc.id, pc.scene_order, pc.start_time, pc.end_time,
pc.start_frame, pc.end_frame, pc.fps, pc.summary_text as old_summary,
pc.metadata,
ARRAY_AGG(c.text_content ORDER BY c.start_time) as child_texts
FROM parent_chunks pc
LEFT JOIN chunks c ON c.parent_chunk_id = pc.id::varchar
WHERE pc.uuid = %s
GROUP BY pc.id, pc.scene_order, pc.start_time, pc.end_time,
pc.start_frame, pc.end_frame, pc.fps, pc.summary_text, pc.metadata
ORDER BY pc.scene_order
""",
(UUID,),
)
parents = cur.fetchall()
cur.close()
conn.close()
return parents
def call_gemma4(prompt, max_tokens=1500):
"""Call Gemma4 via llama-server OpenAI-compatible API"""
payload = {
"messages": [{"role": "user", "content": prompt}],
"max_tokens": max_tokens,
"temperature": 0.3,
"min_p": 0.1,
}
try:
resp = requests.post(LLAMA_URL, json=payload, timeout=180)
if resp.status_code == 200:
result = resp.json()
content = (
result.get("choices", [{}])[0]
.get("message", {})
.get("content", "")
.strip()
)
return content
except Exception as e:
print(f" ⚠️ llama-server error: {e}")
return ""
def generate_5w1h_summary(parent, scene_num):
"""Generate 5W1H structured summary using gemma4"""
texts = [t for t in (parent["child_texts"] or []) if t]
if not texts:
return None
# Use only first 3 and last 3 dialogue lines for context (much faster)
sample_texts = texts[:3] + ["..."] + texts[-3:] if len(texts) > 6 else texts
combined = "\n".join(sample_texts)[:1500]
duration = parent["end_time"] - parent["start_time"]
prompt = f"""You are a film scene analyst. Analyze this scene and provide 5W1H analysis.
Scene {scene_num}/17 | {duration:.0f}s | {len(texts)} dialogue lines
Key dialogue:
{combined}
Respond with ONLY this JSON:
{{"summary_5lines":"...","who":"...","what":"...","when":"...","where":"...","why":"...","how":"...","characters":[],"tone":[],"key_events":[]}}
IMPORTANT: "summary_5lines" must be EXACTLY 5 lines describing the scene. Each line should be a complete sentence separated by \\n."""
response = call_gemma4(prompt, max_tokens=2000)
if not response:
return None
# Simple JSON extraction: find first { and last }
try:
start = response.find("{")
end = response.rfind("}") + 1
if start >= 0 and end > start:
return json.loads(response[start:end])
except Exception:
pass
return None
def update_parent_chunk(parent, analysis):
"""Update parent chunk with 5W1H structured data"""
if not analysis:
return False
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
# Create structured summary text (5 lines)
structured_text = f"{analysis.get('summary_5lines', '')}"
# Update metadata with full 5W1H structure
metadata = parent["metadata"] if parent["metadata"] else {}
metadata["auto_generated_by"] = "gemma4"
metadata["chunk_count"] = len(parent["child_texts"] or [])
metadata["structured_summary"] = {
"summary_5lines": analysis.get("summary_5lines", ""),
"who": analysis.get("who", ""),
"what": analysis.get("what", ""),
"when": analysis.get("when", ""),
"where": analysis.get("where", ""),
"why": analysis.get("why", ""),
"how": analysis.get("how", ""),
"characters": analysis.get("characters", []),
"tone": analysis.get("tone", []),
"key_events": analysis.get("key_events", []),
}
cur.execute(
"""
UPDATE parent_chunks
SET summary_text = %s,
metadata = %s::jsonb
WHERE id = %s
""",
(structured_text, json.dumps(metadata, ensure_ascii=False), parent["id"]),
)
conn.commit()
cur.close()
conn.close()
return True
def main():
print(f"🎬 Regenerating 5W1H summaries for {UUID}")
print(f" Using llama.cpp server at {LLAMA_URL}")
print("=" * 70)
parents = get_parent_with_children()
print(f"📥 Found {len(parents)} parent chunks")
success_count = 0
for i, parent in enumerate(parents):
duration = parent["end_time"] - parent["start_time"]
text_count = len(parent["child_texts"] or [])
print(
f"\n🎬 Scene {parent['scene_order']}: {parent['start_time']:.0f}s-{parent['end_time']:.0f}s ({duration:.0f}s, {text_count} chunks)"
)
if parent["old_summary"]:
print(f" Old: {parent['old_summary'][:80]}...")
analysis = generate_5w1h_summary(parent, parent["scene_order"])
if analysis:
summary = analysis.get("summary_5lines", "N/A")
print(f" ✅ Summary: {summary[:100]}...")
print(f" 👤 Who: {analysis.get('who', 'N/A')[:60]}")
print(f" 📍 Where: {analysis.get('where', 'N/A')[:60]}")
print(f" 💡 Why: {analysis.get('why', 'N/A')[:60]}")
if update_parent_chunk(parent, analysis):
success_count += 1
else:
print(" ❌ Failed to generate analysis")
print(f"\n{'=' * 70}")
print(
f"✅ Updated {success_count}/{len(parents)} parent chunks with 5W1H summaries"
)
if __name__ == "__main__":
main()

View File

@@ -1 +0,0 @@
../v1.1/scripts/regenerate_parent_5w1h_v1.11.py

View File

@@ -39,140 +39,8 @@ def get_conn():
def merge_traces_within_cuts(face_data: dict, cut_scenes: list) -> dict: def merge_traces_within_cuts(face_data: dict, cut_scenes: list) -> dict:
"""Merge traces within the same cut if they have similar embeddings (same person re-appeared).""" """Merge traces within the same cut - DISABLED (no embeddings)."""
frames = face_data.get("frames", {}) # TODO: Reimplement with Qdrant _faces collection
if not frames:
return face_data
# Map each frame to its scene/cut number
frame_to_scene = {}
for s in cut_scenes:
for f in range(s["start_frame"], s["end_frame"] + 1):
frame_to_scene[f] = s["scene_number"]
# Collect per-trace data: scene numbers, embeddings, face positions
trace_frames = defaultdict(list)
trace_embeddings = defaultdict(list)
trace_poses = {}
for fnum_str, frm_data in frames.items():
fnum = int(fnum_str)
for face in frm_data.get("faces", []):
tid = face.get("trace_id")
if tid is None:
continue
trace_frames[tid].append(fnum)
emb = face.get("embedding")
if emb is not None:
trace_embeddings[tid].append(emb)
if tid not in trace_poses:
trace_poses[tid] = (
face.get("x", 0),
face.get("y", 0),
face.get("width", 0),
face.get("height", 0),
)
if len(trace_embeddings) < 2:
return face_data
# Compute centroid per trace
trace_centroids = {}
for tid, embs in trace_embeddings.items():
centroid = np.mean(embs, axis=0)
norm = np.linalg.norm(centroid)
trace_centroids[tid] = centroid / norm if norm > 0 else centroid
# Determine which scene each trace belongs to (majority of frames)
trace_scene = {}
for tid, fns in trace_frames.items():
scene_votes = defaultdict(int)
for fn in fns:
scene = frame_to_scene.get(fn, -1)
scene_votes[scene] += 1
trace_scene[tid] = max(scene_votes, key=scene_votes.get) if scene_votes else -1
# Within each scene, merge traces with similar centroids
scene_traces = defaultdict(list)
for tid, scene in trace_scene.items():
if scene >= 0 and tid in trace_centroids:
scene_traces[scene].append(tid)
merged = 0
next_new_id = max(trace_frames.keys()) + 1 if trace_frames else 0
SIMILARITY_THRESHOLD = 0.75
for scene, tids in scene_traces.items():
if len(tids) < 2:
continue
used = set()
for i in range(len(tids)):
if tids[i] in used:
continue
keep_tid = tids[i]
for j in range(i + 1, len(tids)):
if tids[j] in used:
continue
sim = float(np.dot(trace_centroids[tids[i]], trace_centroids[tids[j]]))
if sim >= SIMILARITY_THRESHOLD:
# Merge tids[j] into keep_tid
for fnum_str, frm_data in frames.items():
for face in frm_data.get("faces", []):
if face.get("trace_id") == tids[j]:
face["trace_id"] = keep_tid
used.add(tids[j])
merged += 1
# If any merges happened, rebuild trace metadata
if merged > 0:
# Rebuild traces dict
new_traces = {}
new_trace_frames = defaultdict(list)
for fnum_str, frm_data in frames.items():
fnum = int(fnum_str)
for face in frm_data.get("faces", []):
tid = face.get("trace_id")
if tid is not None:
new_trace_frames[tid].append(
{
"frame": fnum,
"face_index": 0,
"bbox": {
"x": face.get("x", 0),
"y": face.get("y", 0),
"width": face.get("width", 0),
"height": face.get("height", 0),
},
"confidence": face.get("confidence", 0.0),
}
)
for tid, path in new_trace_frames.items():
if len(path) >= 1:
frames_sorted = sorted(set(p["frame"] for p in path))
new_traces[str(tid)] = {
"trace_id": tid,
"start_frame": frames_sorted[0],
"end_frame": frames_sorted[-1],
"duration_frames": frames_sorted[-1] - frames_sorted[0] + 1,
"duration_seconds": (frames_sorted[-1] - frames_sorted[0])
/ face_data.get("metadata", {}).get("fps", 25.0),
"total_appearances": len(path),
"path": path,
}
face_data["traces"] = new_traces
face_data["metadata"]["trace_stats"] = {
"total_traces": len(new_traces),
"active_traces": len(new_traces),
"long_traces": len(
[t for t in new_traces.values() if t["duration_frames"] >= 2]
),
}
print(
f"[TRACE] Post-merge: {merged} traces merged, {len(new_traces)} total traces"
)
return face_data return face_data
@@ -235,57 +103,12 @@ def run_face_tracker(
print(f"[TRACE] Processing {len(face_data.get('frames', {}))} frames") print(f"[TRACE] Processing {len(face_data.get('frames', {}))} frames")
# Load embeddings from DB for the face tracker # Embeddings no longer loaded from DB - use IoU-only tracking
file_uuid = ( file_uuid = (
face_json_path.split("/")[-1] face_json_path.split("/")[-1]
.replace(".face.json", "") .replace(".face.json", "")
.replace("_traced.json", "") .replace("_traced.json", "")
) )
try:
conn = get_conn()
cur = conn.cursor()
cur.execute(
f"""
SELECT frame_number, x, y, width, height, embedding
FROM {SCHEMA}.face_detections
WHERE file_uuid = %s AND embedding IS NOT NULL
""",
(file_uuid,),
)
emb_rows = cur.fetchall()
conn.close()
# Build lookup: frame_number → list of (bbox, embedding)
emb_map = {}
for fn, x, y, w, h, emb in emb_rows:
emb_map.setdefault(fn, []).append(((x, y, w, h), emb))
print(f"[TRACE] Loaded {len(emb_rows)} embeddings from DB")
# Attach embeddings to face data
attached = 0
for fnum_str, frm_data in face_data.get("frames", {}).items():
fnum = int(fnum_str)
for face in frm_data.get("faces", []):
x, y, w, h = (
face.get("x", 0),
face.get("y", 0),
face.get("width", 0),
face.get("height", 0),
)
candidates = emb_map.get(fnum, [])
# Find matching embedding by bbox proximity
for (ex, ey, ew, eh), emb in candidates:
if (
abs(x - ex) < 10
and abs(y - ey) < 10
and abs(w - ew) < 10
and abs(h - eh) < 10
):
face["embedding"] = emb
attached += 1
break
print(f"[TRACE] Attached {attached} embeddings to faces")
except Exception as e:
print(f"[TRACE] WARNING: Could not load embeddings: {e}")
# Load cut boundaries from cut.json (same directory as face.json) # Load cut boundaries from cut.json (same directory as face.json)
cut_boundaries = None cut_boundaries = None
@@ -301,7 +124,7 @@ def run_face_tracker(
print(f"[TRACE] Loaded {len(cut_boundaries)} cut boundaries") print(f"[TRACE] Loaded {len(cut_boundaries)} cut boundaries")
face_data = track_faces( face_data = track_faces(
face_data, use_embedding=True, cut_boundaries=cut_boundaries face_data, use_embedding=False, cut_boundaries=cut_boundaries
) )
# Merge traces within same cut (same person re-appearing after occlusion/pose change) # Merge traces within same cut (same person re-appearing after occlusion/pose change)
@@ -309,7 +132,7 @@ def run_face_tracker(
face_data = merge_traces_within_cuts(face_data, cut_scenes) face_data = merge_traces_within_cuts(face_data, cut_scenes)
metadata = face_data.get("metadata", {}) metadata = face_data.get("metadata", {})
metadata["tracking_method"] = "iou_embedding" metadata["tracking_method"] = "iou_only"
metadata["tracked_at"] = datetime.now().isoformat() metadata["tracked_at"] = datetime.now().isoformat()
face_data["metadata"] = metadata face_data["metadata"] = metadata
@@ -350,22 +173,19 @@ def store_traced_faces(file_uuid: str, traced_json_path: str, schema: str = SCHE
if face_id is None: if face_id is None:
face_id = f"face_{trace_id}" face_id = f"face_{trace_id}"
attributes = face.get("attributes") attributes = face.get("attributes")
embedding = face.get("embedding")
bbox = json.dumps({"x": x, "y": y, "width": w, "height": h}) bbox = json.dumps({"x": x, "y": y, "width": w, "height": h})
embed_vec = embedding if embedding and len(embedding) > 0 else None
try: try:
cur.execute( cur.execute(
f""" f"""
UPDATE {schema}.face_detections UPDATE {schema}.face_detections
SET trace_id = %s, embedding = %s, face_id = %s SET trace_id = %s, face_id = %s
WHERE file_uuid = %s AND frame_number = %s WHERE file_uuid = %s AND frame_number = %s
AND x = %s AND y = %s AND width = %s AND height = %s AND x = %s AND y = %s AND width = %s AND height = %s
""", """,
( (
trace_id, trace_id,
embed_vec,
face_id, face_id,
file_uuid, file_uuid,
frame_num, frame_num,

View File

@@ -1,87 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Story Embedding Pipeline:
1. Read story chunks → LLM summary (Gemma4)
2. Embed summary (EmbeddingGemma)
3. Store in chunks table + Qdrant
"""
import json, urllib.request, subprocess, sys, time, os
UUID = "aeed71342a899fe4b4c57b7d41bcb692"
PSQL = ["/Users/accusys/pgsql/18.3/bin/psql", "-U", "accusys", "-d", "momentry", "-t", "-A"]
LLM_URL = "http://localhost:8082/v1/chat/completions"
EMBED_URL = "http://localhost:11436/v1/embeddings"
QDRANT_URL = "http://localhost:6333"
QDRANT_COL = "momentry_dev_stories"
def psql(sql):
r = subprocess.run(PSQL + ["-c", sql], capture_output=True, text=True, timeout=30)
return r.stdout.strip()
def call_llm(dialogue):
prompt = f"Dialogue: {dialogue}\n\n50-word summary:"
body = json.dumps({"model": "google_gemma-4-26B-A4B-it-Q5_K_M.gguf",
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.1, "max_tokens": 100}).encode()
req = urllib.request.Request(LLM_URL, data=body, headers={"Content-Type": "application/json"})
resp = urllib.request.urlopen(req, timeout=120)
return json.loads(resp.read())["choices"][0]["message"]["content"].strip()
def call_embed(text):
body = json.dumps({"input": text}).encode()
req = urllib.request.Request(EMBED_URL, data=body, headers={"Content-Type": "application/json"})
resp = urllib.request.urlopen(req, timeout=30)
return json.loads(resp.read())["data"][0]["embedding"]
# Step 0: Ensure Qdrant collection exists (768 dims)
subprocess.run(["curl", "-s", "-X", "PUT", f"{QDRANT_URL}/collections/{QDRANT_COL}",
"-H", "Content-Type: application/json",
"-d", '{"vectors":{"size":768,"distance":"Cosine"}}'], capture_output=True)
# Step 1: Get all story chunks that need summaries
lines = [l for l in psql(f"SELECT chunk_id, chunk_index, start_time, end_time, text_content FROM dev.chunks WHERE file_uuid='{UUID}' AND chunk_type='story' AND (summary_text IS NULL OR summary_text = '') ORDER BY chunk_index").split('\n') if l.strip() and '|' in l]
print(f"Chunks to process: {len(lines)}")
total = len(lines)
errors = 0
for i, line in enumerate(lines):
parts = line.split('|', 4)
cid, idx, st, et, dialogue = parts[0].strip(), int(parts[1]), float(parts[2]), float(parts[3]), parts[4] if len(parts) > 4 else ""
if len(dialogue) < 10:
summary = "[no dialogue]"
embedding = [0.0] * 768
else:
try:
summary = call_llm(dialogue)
time.sleep(0.3)
embedding = call_embed(summary)
except Exception as e:
print(f"[{i+1}/{total}] Error: {cid} - {e}")
errors += 1
summary = "[error]"
embedding = [0.0] * 768
# Update DB
s_esc = summary.replace("'", "''")
psql(f"UPDATE dev.chunks SET summary_text='{s_esc}', updated_at=CURRENT_TIMESTAMP WHERE chunk_id='{cid}'")
# Store in Qdrant
point = json.dumps({"points": [{"id": idx + 1, "vector": embedding,
"payload": {"chunk_id": cid, "file_uuid": UUID, "start_time": st, "end_time": et,
"summary": summary, "type": "story_summary"}
}]}).encode()
req = urllib.request.Request(f"{QDRANT_URL}/collections/{QDRANT_COL}/points?wait=true",
data=point, headers={"Content-Type": "application/json"}, method="PUT")
try:
urllib.request.urlopen(req, timeout=10)
except:
pass
if (i+1) % 20 == 0:
print(f"[{i+1}/{total}] {errors} errors so far")
print(f"\nDone. Processed: {total}, Errors: {errors}")
print(f"Qdrant: {QDRANT_COL}")

View File

@@ -1 +0,0 @@
../v1.1/scripts/story_embed_v1.11.py

View File

@@ -1,230 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Story Pipeline Full — Speaker + Story + Summary
Step 1: Update sentence chunks with speaker name
Step 2: Rebuild story chunks + re-embed
Step 3: LLM summary × 228 + embed
"""
import json, urllib.request, subprocess, sys, time, os
UUID = "aeed71342a899fe4b4c57b7d41bcb692"
DIR = "/Users/accusys/momentry/output_dev"
PSQL = ["/Users/accusys/pgsql/18.3/bin/psql", "-U", "accusys", "-d", "momentry", "-t", "-A"]
LLM_URL = "http://localhost:8082/v1/chat/completions"
EMBED_URL = "http://localhost:11436/v1/embeddings"
QDRANT_URL = "http://localhost:6333/collections/momentry_dev_stories/points"
def psql(sql):
r = subprocess.run(PSQL + ["-c", sql], capture_output=True, text=True, timeout=30)
return r.stdout.strip()
def psql_file(path):
r = subprocess.run(PSQL + ["-f", path], capture_output=True, text=True, timeout=60)
if r.stderr and "ERROR" in r.stderr:
print(f"SQL Error: {r.stderr[:200]}")
return r.returncode
def embed_text(text):
body = json.dumps({"input": text[:1024]}).encode()
req = urllib.request.Request(EMBED_URL, data=body, headers={"Content-Type": "application/json"})
return json.loads(urllib.request.urlopen(req, timeout=30).read())["data"][0]["embedding"]
def llm_summary(dialogue):
body = json.dumps({
"model": "google_gemma-4-26B-A4B-it-Q5_K_M.gguf",
"messages": [{"role": "user", "content": f"Summarize concisely:\n{dialogue}\n\n50-word summary:"}],
"temperature": 0.1, "max_tokens": 100,
}).encode()
req = urllib.request.Request(LLM_URL, data=body, headers={"Content-Type": "application/json"})
return json.loads(urllib.request.urlopen(req, timeout=120).read())["choices"][0]["message"]["content"].strip()
fps = 25.0
FILE_ID = 242
# ═══════════════════════════════════════════════════
# Step 0: Load ASR + ASRX + speaker map
# ═══════════════════════════════════════════════════
print("=" * 60)
print("Step 0: Loading data...")
asr = json.load(open(f"{DIR}/{UUID}.asr.json"))
segs = asr["segments"]
asrx = json.load(open(f"{DIR}/{UUID}.asrx.json"))
asrx_segs = asrx["segments"]
# Speaker map from identity_bindings
r = psql("SELECT ib.identity_value, i.name FROM dev.identity_bindings ib JOIN dev.identities i ON i.id=ib.identity_id WHERE ib.identity_type='speaker'")
speaker_map = {}
for line in r.strip().split('\n'):
if line.strip() and '|' in line:
p = line.split('|')
speaker_map[p[0].strip()] = p[1].strip()
speaker_map["SPEAKER_0"] = "Speaker_0" # Fallback for unbounded
# ═══════════════════════════════════════════════════
# Step 1: Update sentence chunks with speaker
# ═══════════════════════════════════════════════════
print("\n" + "=" * 60)
print("Step 1: Updating sentence chunks with speaker...")
sql = ["BEGIN;"]
chunk_meta = {} # idx → {speaker_id, speaker_name}
for idx, seg in enumerate(segs):
st, et = seg["start"], seg["end"]
text = seg["text"].strip()
if not text:
continue
# Find overlapping ASRX segment → speaker_id
spk_id = "SPEAKER_0"
for ax in asrx_segs:
if ax.get("start_time", 0) <= st and ax.get("end_time", 0) >= et:
spk_id = ax.get("speaker_id", "SPEAKER_0")
break
spk_name = speaker_map.get(spk_id, spk_id)
new_text = f"[{spk_name}] {text}"
meta = json.dumps({"speaker_id": spk_id, "speaker_name": spk_name})
esc = new_text.replace("'", "''")
sql.append(f"UPDATE dev.chunks SET text_content='{esc}', metadata='{meta}'::jsonb WHERE file_uuid='{UUID}' AND chunk_id='{UUID}_{idx}';")
chunk_meta[idx] = {"speaker_id": spk_id, "speaker_name": spk_name}
sql.append("COMMIT;")
with open("/tmp/s1_speaker.sql", "w") as f:
f.write("\n".join(sql))
psql_file("/tmp/s1_speaker.sql")
print(f" Updated {len(chunk_meta)} sentence chunks with speaker")
# ═══════════════════════════════════════════════════
# Step 2: Rebuild story chunks + re-embed
# ═══════════════════════════════════════════════════
print("\n" + "=" * 60)
print("Step 2: Rebuilding story chunks...")
# Delete old story chunks
psql(f"DELETE FROM dev.chunks WHERE file_uuid='{UUID}' AND chunk_type='story';")
# Recreate
CHUNK_SIZE = 15
sql2 = ["BEGIN;"]
story_meta = []
for i in range(0, len(segs), CHUNK_SIZE):
group = segs[i:i+CHUNK_SIZE]
st, et = group[0]["start"], group[-1]["end"]
idx = i // CHUNK_SIZE
chunk_id = f"{UUID}_story_{idx}"
# Build speaker text from individual sentences
texts = []
speakers_used = {}
for j, seg in enumerate(group):
seg_idx = i + j
if seg_idx in chunk_meta:
cm = chunk_meta[seg_idx]
text = seg["text"].strip()
if text:
texts.append(f"[{cm['speaker_name']}] {text}")
speakers_used[cm['speaker_name']] = speakers_used.get(cm['speaker_name'], 0) + 1
dialogue = " ".join(texts)
child_ids = ", ".join([f"'{UUID}_{j}'" for j in range(i, min(i+CHUNK_SIZE, len(segs)))])
words = sum(len(t.split()) for t in texts)
meta = json.dumps({"method": "fixed_15", "seg_count": len(group), "words": words, "speakers": speakers_used})
esc = dialogue.replace("'", "''")
sql2.append(f"""INSERT INTO dev.chunks (file_id,file_uuid,chunk_id,old_chunk_id,chunk_index,chunk_type,start_time,end_time,fps,start_frame,end_frame,text_content,content,metadata,frame_count,child_chunk_ids)
VALUES ({FILE_ID},'{UUID}','{chunk_id}','{chunk_id}',{idx},'story',{st},{et},{fps},{int(st*fps)},{int(et*fps)},'{esc}','{{"type":"story_parent"}}'::jsonb,'{meta}'::jsonb,{int((et-st)*fps)},ARRAY[{child_ids}]);""")
story_meta.append({"idx": idx, "st": st, "et": et, "dialogue": dialogue, "words": words, "speakers": speakers_used})
sql2.append("COMMIT;")
with open("/tmp/s2_story.sql", "w") as f:
f.write("\n".join(sql2))
psql_file("/tmp/s2_story.sql")
print(f" Created {len(story_meta)} story chunks")
# Embed + upsert to Qdrant
print("\n Embedding story chunks...")
points_dialogue = []
for sm in story_meta:
if len(sm["dialogue"]) < 10:
continue
vec = embed_text(sm["dialogue"])
points_dialogue.append({"id": sm["idx"] + 1, "vector": vec, "payload": {
"chunk_id": f"{UUID}_story_{sm['idx']}", "file_uuid": UUID,
"start_time": sm["st"], "end_time": sm["et"], "type": "story_dialogue"
}})
for i in range(0, len(points_dialogue), 100):
batch = points_dialogue[i:i+100]
data = json.dumps({"points": batch, "wait": True}).encode()
req = urllib.request.Request(f"{QDRANT_URL}?wait=true", data=data, headers={"Content-Type": "application/json"}, method="PUT")
urllib.request.urlopen(req, timeout=30)
print(f" Qdrant: {len(points_dialogue)} dialogue vectors")
# ═══════════════════════════════════════════════════
# Step 3: LLM summaries + embed
# ═══════════════════════════════════════════════════
print("\n" + "=" * 60)
print("Step 3: LLM summaries...")
points_summary = []
summary_sql = ["BEGIN;"]
for i, sm in enumerate(story_meta):
if len(sm["dialogue"]) < 10:
continue
try:
summary = llm_summary(sm["dialogue"])
time.sleep(0.3)
vec = embed_text(summary)
time.sleep(0.1)
except Exception as e:
print(f" Error on story {sm['idx']}: {e}")
summary = "[error]"
vec = [0.0] * 768
s_esc = summary.replace("'", "''")
summary_sql.append(f"UPDATE dev.chunks SET summary_text='{s_esc}', updated_at=CURRENT_TIMESTAMP WHERE file_uuid='{UUID}' AND chunk_id='{UUID}_story_{sm['idx']}';")
points_summary.append({"id": 100000 + sm["idx"] + 1, "vector": vec, "payload": {
"chunk_id": f"{UUID}_story_{sm['idx']}", "file_uuid": UUID,
"start_time": sm["st"], "end_time": sm["et"],
"summary": summary, "type": "story_summary"
}})
if (i + 1) % 50 == 0:
print(f" {i+1}/{len(story_meta)}")
# Update DB with summaries
summary_sql.append("COMMIT;")
with open("/tmp/s3_summary.sql", "w") as f:
f.write("\n".join(summary_sql))
psql_file("/tmp/s3_summary.sql")
# Upsert summary vectors to Qdrant
for i in range(0, len(points_summary), 100):
batch = points_summary[i:i+100]
data = json.dumps({"points": batch, "wait": True}).encode()
req = urllib.request.Request(f"{QDRANT_URL}?wait=true", data=data, headers={"Content-Type": "application/json"}, method="PUT")
urllib.request.urlopen(req, timeout=30)
print(f" Qdrant: {len(points_summary)} summary vectors")
# ═══════════════════════════════════════════════════
# Step 4: Verify
# ═══════════════════════════════════════════════════
print("\n" + "=" * 60)
print("Done.")
r1 = psql(f"SELECT count(*) FROM dev.chunks WHERE file_uuid='{UUID}' AND chunk_type='sentence' AND text_content LIKE '[%'")
r2 = psql(f"SELECT count(*) FROM dev.chunks WHERE file_uuid='{UUID}' AND chunk_type='story'")
r3 = psql(f"SELECT count(*) FROM dev.chunks WHERE file_uuid='{UUID}' AND chunk_type='story' AND summary_text IS NOT NULL")
print(f"Sentence chunks with speaker: {r1}")
print(f"Story chunks: {r2}")
print(f"Story chunks with summary: {r3}")

View File

@@ -1 +0,0 @@
../v1.1/scripts/story_pipeline_full_v1.11.py

View File

@@ -1,325 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Story Processor - Generate parent-child chunk hierarchy for RAG
Uses LOCAL video analysis (ASR, YOLO, OCR, Scene) to create parent chunks.
NO cloud API calls - fully offline processing
"""
import sys
import json
import os
import argparse
from typing import Dict, List, Any
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from redis_publisher import RedisPublisher
def extract_video_metadata(video_path: str) -> Dict[str, Any]:
"""Extract basic video metadata using ffprobe"""
import subprocess
try:
cmd = [
"ffprobe",
"-v",
"quiet",
"-print_format",
"json",
"-show_format",
"-show_streams",
video_path,
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
return json.loads(result.stdout)
except Exception:
pass
return {}
def generate_parent_child_chunks(
asr_data: Dict,
cut_data: Dict,
yolo_data: Dict,
ocr_data: Dict,
scene_data: Dict,
parent_chunk_size: int = 5,
) -> Dict:
"""
Generate parent-child chunk hierarchy using LOCAL data only.
No LLM/API calls - uses template-based narrative generation.
"""
child_chunks = []
parent_chunks = []
# Create child chunks from ASR
for seg in asr_data.get("segments", []):
child_chunks.append(
{
"chunk_id": f"asr_{seg.get('start', 0):.1f}_{seg.get('end', 0):.1f}",
"chunk_type": "asr",
"source": "asr",
"start_time": seg.get("start", 0),
"end_time": seg.get("end", 0),
"text_content": seg.get("text", ""),
"content": {
"text": seg.get("text", ""),
"confidence": seg.get("confidence", 0),
},
"child_chunk_ids": [],
"parent_chunk_id": None,
}
)
# Create child chunks from CUT scenes
for scene in cut_data.get("scenes", []):
child_chunks.append(
{
"chunk_id": f"cut_{scene.get('scene_number', 0)}",
"chunk_type": "cut",
"source": "cut",
"start_time": scene.get("start_time", 0),
"end_time": scene.get("end_time", 0),
"text_content": f"Scene {scene.get('scene_number', 0)}",
"content": {
"scene_number": scene.get("scene_number", 0),
"duration": scene.get("duration", 0),
},
"child_chunk_ids": [],
"parent_chunk_id": None,
}
)
asr_child_ids = [c["chunk_id"] for c in child_chunks if c["source"] == "asr"]
cut_child_ids = [c["chunk_id"] for c in child_chunks if c["source"] == "cut"]
yolo_frames = yolo_data.get("frames", [])
ocr_frames = ocr_data.get("frames", [])
# Group ASR segments into parent chunks
for i in range(0, len(asr_child_ids), parent_chunk_size):
batch = asr_child_ids[i : i + parent_chunk_size]
if not batch:
continue
batch_texts = []
batch_objects = []
batch_times = []
for child_id in batch:
for child in child_chunks:
if child["chunk_id"] == child_id:
if child["text_content"]:
batch_texts.append(child["text_content"])
batch_times.append((child["start_time"], child["end_time"]))
break
start_time = batch_times[0][0] if batch_times else 0
end_time = batch_times[-1][1] if batch_times else 0
# Find objects in this time range
for frame in yolo_frames[:50]:
ts = frame.get("timestamp", 0)
if start_time <= ts <= end_time:
for obj in frame.get("objects", []):
batch_objects.append(obj.get("class_name", "unknown"))
narrative = generate_narrative(batch_texts, batch_objects, start_time, end_time)
parent_chunk = {
"chunk_id": f"story_asr_{i // parent_chunk_size:04d}",
"chunk_type": "story",
"source": "story_asr",
"start_time": start_time,
"end_time": end_time,
"text_content": narrative,
"content": {
"description": narrative,
"child_count": len(batch),
"speech_preview": " ".join(batch_texts[:3]) if batch_texts else None,
"detected_objects": list(set(batch_objects))[:5],
},
"child_chunk_ids": batch,
"parent_chunk_id": None,
}
parent_chunks.append(parent_chunk)
for child_id in batch:
for child in child_chunks:
if child["chunk_id"] == child_id:
child["parent_chunk_id"] = parent_chunk["chunk_id"]
break
# Group CUT scenes into parent chunks
for i in range(0, len(cut_child_ids), parent_chunk_size):
batch = cut_child_ids[i : i + parent_chunk_size]
if not batch:
continue
batch_times = []
batch_objects = []
for child_id in batch:
for child in child_chunks:
if child["chunk_id"] == child_id:
batch_times.append((child["start_time"], child["end_time"]))
break
start_time = batch_times[0][0] if batch_times else 0
end_time = batch_times[-1][1] if batch_times else 0
for frame in yolo_frames[:50]:
ts = frame.get("timestamp", 0)
if start_time <= ts <= end_time:
for obj in frame.get("objects", []):
batch_objects.append(obj.get("class_name", "unknown"))
narrative = generate_scene_narrative(
batch_objects, start_time, end_time, len(batch)
)
parent_chunk = {
"chunk_id": f"story_cut_{i // parent_chunk_size:04d}",
"chunk_type": "story",
"source": "story_cut",
"start_time": start_time,
"end_time": end_time,
"text_content": narrative,
"content": {
"description": narrative,
"child_count": len(batch),
"scenes": batch,
"detected_objects": list(set(batch_objects))[:5],
},
"child_chunk_ids": batch,
"parent_chunk_id": None,
}
parent_chunks.append(parent_chunk)
for child_id in batch:
for child in child_chunks:
if child["chunk_id"] == child_id:
child["parent_chunk_id"] = parent_chunk["chunk_id"]
break
return {
"child_chunks": child_chunks,
"parent_chunks": parent_chunks,
"stats": {
"total_child_chunks": len(child_chunks),
"total_parent_chunks": len(parent_chunks),
"asr_children": len(asr_child_ids),
"cut_children": len(cut_child_ids),
},
}
def generate_narrative(
texts: List[str], objects: List[str], start: float, end: float
) -> str:
"""Generate narrative description from LOCAL text snippets and objects"""
if not texts and not objects:
return f"Video segment from {start:.1f}s to {end:.1f}s"
parts = []
if texts:
combined = " ".join(texts[:5])
if len(combined) > 150:
combined = combined[:150] + "..."
parts.append(f"Speech: {combined}")
if objects:
unique_objs = list(set(objects))[:5]
parts.append(f"Visuals: {', '.join(unique_objs)}")
return f"[{start:.0f}s-{end:.0f}s] {' | '.join(parts)}"
def generate_scene_narrative(
objects: List[str], start: float, end: float, scene_count: int
) -> str:
"""Generate scene narrative from LOCAL detected objects"""
unique_objects = list(set(objects))[:5]
if unique_objects:
obj_str = ", ".join(unique_objects)
return f"[{start:.0f}s-{end:.0f}s] {scene_count} scenes. Visuals: {obj_str}."
else:
return f"[{start:.0f}s-{end:.0f}s] {scene_count} video scenes."
def run_story(
video_path: str, output_path: str, uuid: str = "", parent_chunk_size: int = 5
):
publisher = RedisPublisher(uuid) if uuid else None
if publisher:
publisher.info("story", "STORY_START")
base_path = os.path.dirname(output_path)
uuid_name = os.path.basename(output_path).split(".")[0]
asr_data = {"segments": []}
cut_data = {"scenes": []}
yolo_data = {"frames": []}
ocr_data = {"frames": []}
scene_data = {"scenes": []}
for name, data_var in [
("asr", asr_data),
("cut", cut_data),
("yolo", yolo_data),
("ocr", ocr_data),
("scene", scene_data),
]:
path = os.path.join(base_path, f"{uuid_name}.{name}.json")
if os.path.exists(path):
with open(path) as f:
data_var.update(json.load(f))
result = generate_parent_child_chunks(
asr_data, cut_data, yolo_data, ocr_data, scene_data, parent_chunk_size
)
result["video_metadata"] = extract_video_metadata(video_path)
result["processing"] = {
"method": "local_aggregation",
"cloud_api_used": False,
"parent_chunk_size": parent_chunk_size,
}
with open(output_path, "w") as f:
json.dump(result, f, indent=2, ensure_ascii=False)
if publisher:
publisher.complete(
"story",
f"{result['stats']['total_parent_chunks']} parent, {result['stats']['total_child_chunks']} child chunks (LOCAL)",
)
return result
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Story Processor - Parent-Child Chunk Hierarchy (LOCAL ONLY)"
)
parser.add_argument("video_path", help="Path to video file")
parser.add_argument("output_path", help="Output JSON path")
parser.add_argument("--uuid", help="UUID for progress tracking", default="")
parser.add_argument(
"--parent-chunk-size",
type=int,
default=5,
help="Number of child chunks per parent",
)
args = parser.parse_args()
result = run_story(
args.video_path, args.output_path, args.uuid, args.parent_chunk_size
)
print(
f"Story generated: {result['stats']['total_parent_chunks']} parent, "
f"{result['stats']['total_child_chunks']} child chunks (LOCAL)"
)

View File

@@ -1,848 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Story Processor - AI-Driven Processor Contract Version 1.0
Compliant with AI-Driven Processor Contract v1.0
Effective Date: 2025-03-27
Features:
1. Standardized command-line interface
2. Redis progress reporting
3. Signal handling (SIGTERM, SIGINT)
4. Health check mode
5. Resource monitoring
6. Contract-compliant JSON output
7. Unified configuration
"""
import sys
import json
import os
import argparse
import signal
import time
import traceback
from datetime import datetime
from typing import Dict, Any, List
# Redis Publisher for progress reporting
try:
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from redis_publisher import RedisPublisher
REDIS_AVAILABLE = True
except ImportError:
REDIS_AVAILABLE = False
print(
"WARNING: RedisPublisher not available, progress reporting disabled",
file=sys.stderr,
)
# Contract version
CONTRACT_VERSION = "1.0"
PROCESSOR_NAME = (
"/Users/accusys/momentry_core_0.1/scripts/story_processor_contract_v1.py"
)
PROCESSOR_VERSION = "1.0.0"
MODEL_NAME = "gpt-4"
MODEL_VERSION = "latest"
# Unified configuration defaults
DEFAULT_TIMEOUT = 3600 # 1 hour for story generation
DEFAULT_PARENT_CHUNK_SIZE = 5
DEFAULT_MIN_CHILD_CHUNKS = 3
DEFAULT_MAX_CHILD_CHUNKS = 10
DEFAULT_SUMMARY_LENGTH = 150
DEFAULT_MODEL = "openai" # openai, local, or template
DEFAULT_MODEL_NAME = "gpt-4"
DEFAULT_TEMPERATURE = 0.7
DEFAULT_MAX_TOKENS = 500
# Signal handling with timeout support
class SignalHandler:
"""Handle system signals for graceful shutdown"""
def __init__(self):
self.should_exit = False
self.exit_code = 0
signal.signal(signal.SIGTERM, self.handle_signal)
signal.signal(signal.SIGINT, self.handle_signal)
def handle_signal(self, signum, frame):
"""Handle termination signals"""
print(f"\n收到信号 {signum},正在优雅关闭...")
self.should_exit = True
self.exit_code = 128 + signum
def should_stop(self):
"""Check if should stop processing"""
return self.should_exit
# Timeout manager
class TimeoutManager:
"""Manage processing timeouts"""
def __init__(self, timeout_seconds: int):
self.timeout_seconds = timeout_seconds
self.start_time = time.time()
self.timer = None
def check_timeout(self) -> bool:
"""Check if timeout has been reached"""
elapsed = time.time() - self.start_time
return elapsed > self.timeout_seconds
def get_remaining_time(self) -> float:
"""Get remaining time in seconds"""
elapsed = time.time() - self.start_time
return max(0, self.timeout_seconds - elapsed)
def format_remaining_time(self) -> str:
"""Format remaining time as HH:MM:SS"""
remaining = self.get_remaining_time()
hours = int(remaining // 3600)
minutes = int((remaining % 3600) // 60)
seconds = int(remaining % 60)
return f"{hours:02d}:{minutes:02d}:{seconds:02d}"
# Health check functions
def check_environment() -> Dict[str, Any]:
"""Check environment and dependencies"""
checks = []
# Check 1: OpenAI API (optional)
try:
import openai
checks.append(
{
"name": "openai",
"status": "available",
"version": openai.__version__,
}
)
except ImportError:
checks.append({"name": "openai", "status": "optional", "version": None})
# Check 2: Redis (optional)
checks.append(
{
"name": "redis",
"status": "available" if REDIS_AVAILABLE else "optional",
"version": None,
}
)
# Check 3: Python version
checks.append(
{
"name": "python",
"status": "available",
"version": f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}",
}
)
return {
"timestamp": datetime.now().isoformat(),
"processor_name": PROCESSOR_NAME,
"processor_version": PROCESSOR_VERSION,
"contract_version": CONTRACT_VERSION,
"model_name": MODEL_NAME,
"model_version": MODEL_VERSION,
"checks": checks,
}
def check_input_files(input_files: Dict[str, str]) -> Dict[str, Any]:
"""Check input files exist and are valid JSON"""
results = {}
for file_type, file_path in input_files.items():
if not file_path:
results[file_type] = {
"exists": False,
"valid": False,
"error": "No path provided",
}
continue
if not os.path.exists(file_path):
results[file_type] = {
"exists": False,
"valid": False,
"error": "File not found",
}
continue
try:
with open(file_path, "r") as f:
data = json.load(f)
# Basic validation based on file type
if file_type == "asr":
valid = isinstance(data, dict) and "segments" in data
elif file_type == "cut":
valid = isinstance(data, dict) and "scenes" in data
elif file_type == "yolo":
valid = isinstance(data, dict) and "detections" in data
elif file_type == "ocr":
valid = isinstance(data, dict) and "texts" in data
else:
valid = isinstance(data, dict)
results[file_type] = {
"exists": True,
"valid": valid,
"size": os.path.getsize(file_path),
"data_keys": list(data.keys()) if isinstance(data, dict) else [],
}
except json.JSONDecodeError as e:
results[file_type] = {
"exists": True,
"valid": False,
"error": f"Invalid JSON: {e}",
}
except Exception as e:
results[file_type] = {"exists": True, "valid": False, "error": str(e)}
return results
def load_input_data(input_files: Dict[str, str]) -> Dict[str, Any]:
"""Load input data from JSON files"""
data = {}
for file_type, file_path in input_files.items():
if not file_path or not os.path.exists(file_path):
data[file_type] = None
continue
try:
with open(file_path, "r") as f:
data[file_type] = json.load(f)
except:
data[file_type] = None
return data
def generate_parent_child_chunks(
asr_data: Dict,
cut_data: Dict,
yolo_data: Dict,
ocr_data: Dict,
parent_chunk_size: int = DEFAULT_PARENT_CHUNK_SIZE,
min_child_chunks: int = DEFAULT_MIN_CHILD_CHUNKS,
max_child_chunks: int = DEFAULT_MAX_CHILD_CHUNKS,
summary_length: int = DEFAULT_SUMMARY_LENGTH,
model: str = DEFAULT_MODEL,
**kwargs,
) -> List[Dict[str, Any]]:
"""Generate parent-child chunk hierarchy for RAG"""
parent_chunks = []
# Extract ASR segments
asr_segments = asr_data.get("segments", []) if asr_data else []
# Extract scenes from CUT data
scenes = cut_data.get("scenes", []) if cut_data else []
# Extract detections from YOLO data
yolo_detections = yolo_data.get("detections", []) if yolo_data else []
# Extract OCR texts
ocr_texts = ocr_data.get("texts", []) if ocr_data else []
# If we have scenes, use them to group content
if scenes:
for scene in scenes:
scene_start = scene.get("start_time", 0)
scene_end = scene.get("end_time", 0)
scene_duration = scene.get("duration", 0)
# Find ASR segments in this scene
scene_asr_segments = []
for segment in asr_segments:
seg_start = segment.get("start", 0)
if scene_start <= seg_start <= scene_end:
scene_asr_segments.append(segment)
# Find YOLO detections in this scene
scene_yolo_detections = []
for detection in yolo_detections:
det_time = detection.get("timestamp", 0)
if scene_start <= det_time <= scene_end:
scene_yolo_detections.append(detection)
# Find OCR texts in this scene
scene_ocr_texts = []
for text in ocr_texts:
text_time = text.get("timestamp", 0)
if scene_start <= text_time <= scene_end:
scene_ocr_texts.append(text)
# Create child chunks
child_chunks = []
# Add ASR segments as child chunks
for segment in scene_asr_segments[:max_child_chunks]:
child_chunks.append(
{
"type": "asr",
"content": segment.get("text", ""),
"start_time": segment.get("start", 0),
"end_time": segment.get("end", 0),
"confidence": segment.get("confidence", 0),
"metadata": {"speaker": segment.get("speaker")},
}
)
# Add YOLO detections as child chunks
for detection in scene_yolo_detections[:max_child_chunks]:
child_chunks.append(
{
"type": "yolo",
"content": f"Detected {detection.get('class', 'object')} with confidence {detection.get('confidence', 0):.2f}",
"timestamp": detection.get("timestamp", 0),
"confidence": detection.get("confidence", 0),
"metadata": {
"class": detection.get("class"),
"bbox": detection.get("bbox"),
},
}
)
# Add OCR texts as child chunks
for text in scene_ocr_texts[:max_child_chunks]:
child_chunks.append(
{
"type": "ocr",
"content": text.get("text", ""),
"timestamp": text.get("timestamp", 0),
"confidence": text.get("confidence", 0),
"metadata": {
"bbox": text.get("bbox"),
"language": text.get("language"),
},
}
)
# Skip if not enough child chunks
if len(child_chunks) < min_child_chunks:
continue
# Generate parent summary
if model == "openai":
parent_summary = generate_openai_summary(child_chunks, scene, **kwargs)
elif model == "local":
parent_summary = generate_local_summary(child_chunks, scene, **kwargs)
else:
parent_summary = generate_template_summary(child_chunks, scene)
# Create parent chunk
parent_chunks.append(
{
"parent_id": len(parent_chunks) + 1,
"scene_id": scene.get("scene_id", 0),
"start_time": scene_start,
"end_time": scene_end,
"duration": scene_duration,
"summary": parent_summary[:summary_length]
if summary_length > 0
else parent_summary,
"child_count": len(child_chunks),
"child_types": list(set(chunk["type"] for chunk in child_chunks)),
"child_chunks": child_chunks[
:parent_chunk_size
], # Limit child chunks in output
}
)
# If no scenes, create chunks based on time windows
elif asr_segments:
# Group ASR segments by time windows
time_window = 30 # seconds
current_window = 0
while current_window * time_window < (
asr_segments[-1].get("end", 0) if asr_segments else 0
):
window_start = current_window * time_window
window_end = (current_window + 1) * time_window
# Find segments in this window
window_segments = []
for segment in asr_segments:
seg_start = segment.get("start", 0)
if window_start <= seg_start < window_end:
window_segments.append(segment)
if len(window_segments) >= min_child_chunks:
# Create child chunks
child_chunks = []
for segment in window_segments[:max_child_chunks]:
child_chunks.append(
{
"type": "asr",
"content": segment.get("text", ""),
"start_time": segment.get("start", 0),
"end_time": segment.get("end", 0),
"confidence": segment.get("confidence", 0),
"metadata": {"speaker": segment.get("speaker")},
}
)
# Generate parent summary
parent_summary = generate_template_summary(
child_chunks,
{
"start_time": window_start,
"end_time": window_end,
"duration": time_window,
},
)
# Create parent chunk
parent_chunks.append(
{
"parent_id": len(parent_chunks) + 1,
"time_window": current_window,
"start_time": window_start,
"end_time": window_end,
"duration": time_window,
"summary": parent_summary[:summary_length]
if summary_length > 0
else parent_summary,
"child_count": len(child_chunks),
"child_types": ["asr"],
"child_chunks": child_chunks[:parent_chunk_size],
}
)
current_window += 1
return parent_chunks
def generate_openai_summary(child_chunks: List[Dict], scene: Dict, **kwargs) -> str:
"""Generate summary using OpenAI"""
try:
import openai
# Prepare context from child chunks
context_parts = []
for chunk in child_chunks[:10]: # Limit context size
if chunk["type"] == "asr":
context_parts.append(f"Speech: {chunk['content']}")
elif chunk["type"] == "yolo":
context_parts.append(f"Visual: {chunk['content']}")
elif chunk["type"] == "ocr":
context_parts.append(f"Text: {chunk['content']}")
context = "\n".join(context_parts)
# Prepare prompt
prompt = f"""Summarize this video scene ({scene.get("duration", 0):.1f} seconds) based on the following elements:
{context}
Provide a concise narrative summary that connects the speech, visual elements, and text into a coherent description."""
# Call OpenAI API
response = openai.chat.completions.create(
model=kwargs.get("model_name", DEFAULT_MODEL_NAME),
messages=[
{
"role": "system",
"content": "You are a video analysis assistant that creates coherent narrative summaries from multiple data sources.",
},
{"role": "user", "content": prompt},
],
max_tokens=kwargs.get("max_tokens", DEFAULT_MAX_TOKENS),
temperature=kwargs.get("temperature", DEFAULT_TEMPERATURE),
)
return response.choices[0].message.content
except ImportError:
return "OpenAI not available for summary generation"
except Exception as e:
return f"Summary generation error: {str(e)}"
def generate_local_summary(child_chunks: List[Dict], scene: Dict, **kwargs) -> str:
"""Generate summary using local model (placeholder)"""
# This is a placeholder for local model implementation
asr_count = sum(1 for chunk in child_chunks if chunk["type"] == "asr")
yolo_count = sum(1 for chunk in child_chunks if chunk["type"] == "yolo")
ocr_count = sum(1 for chunk in child_chunks if chunk["type"] == "ocr")
return f"Scene ({scene.get('duration', 0):.1f}s) with {asr_count} speech segments, {yolo_count} visual detections, and {ocr_count} text elements. Local summary model not implemented."
def generate_template_summary(child_chunks: List[Dict], scene: Dict) -> str:
"""Generate summary using template"""
asr_count = sum(1 for chunk in child_chunks if chunk["type"] == "asr")
yolo_count = sum(1 for chunk in child_chunks if chunk["type"] == "yolo")
ocr_count = sum(1 for chunk in child_chunks if chunk["type"] == "ocr")
# Extract some sample content
asr_samples = [
chunk["content"][:50] for chunk in child_chunks if chunk["type"] == "asr"
][:2]
yolo_classes = list(
set(
chunk["metadata"].get("class", "object")
for chunk in child_chunks
if chunk["type"] == "yolo"
)
)
summary_parts = [f"Scene duration: {scene.get('duration', 0):.1f} seconds."]
if asr_count > 0:
summary_parts.append(f"Contains {asr_count} speech segments.")
if asr_samples:
summary_parts.append(f"Sample speech: {'; '.join(asr_samples)}...")
if yolo_count > 0:
summary_parts.append(
f"Detected {yolo_count} objects including: {', '.join(yolo_classes[:3])}."
)
if ocr_count > 0:
summary_parts.append(f"Extracted {ocr_count} text elements from the video.")
return " ".join(summary_parts)
# Main processing function
def process_story(
asr_path: str,
cut_path: str,
yolo_path: str,
ocr_path: str,
output_path: str,
uuid: str = "",
parent_chunk_size: int = DEFAULT_PARENT_CHUNK_SIZE,
min_child_chunks: int = DEFAULT_MIN_CHILD_CHUNKS,
max_child_chunks: int = DEFAULT_MAX_CHILD_CHUNKS,
summary_length: int = DEFAULT_SUMMARY_LENGTH,
model: str = DEFAULT_MODEL,
model_name: str = DEFAULT_MODEL_NAME,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
timeout: int = DEFAULT_TIMEOUT,
) -> Dict[str, Any]:
"""Process video analysis data to create parent-child chunk hierarchy"""
# Initialize
signal_handler = SignalHandler()
timeout_manager = TimeoutManager(timeout)
publisher = None
if REDIS_AVAILABLE and uuid:
try:
publisher = RedisPublisher(uuid)
except:
publisher = None
def publish(stage: str, message: str, data: Dict = None):
if publisher:
publisher.info(PROCESSOR_NAME, stage, message, data)
if publisher:
publish("STORY_START", "开始生成故事层次结构")
result = {
"processor_name": PROCESSOR_NAME,
"processor_version": PROCESSOR_VERSION,
"contract_version": CONTRACT_VERSION,
"model_name": MODEL_NAME,
"model_version": MODEL_VERSION,
"input_files": {
"asr": asr_path,
"cut": cut_path,
"yolo": yolo_path,
"ocr": ocr_path,
},
"output_path": output_path,
"uuid": uuid,
"timestamp": datetime.now().isoformat(),
"parameters": {
"parent_chunk_size": parent_chunk_size,
"min_child_chunks": min_child_chunks,
"max_child_chunks": max_child_chunks,
"summary_length": summary_length,
"model": model,
"model_name": model_name,
"temperature": temperature,
"max_tokens": max_tokens,
"timeout": timeout,
},
"success": False,
"error": None,
"parent_chunks": [],
"chunk_statistics": {},
"processing_time": 0,
"resource_usage": {},
}
start_time = time.time()
try:
# Check timeout
if timeout_manager.check_timeout():
raise TimeoutError(f"超时 ({timeout} 秒)")
# Check if should exit
if signal_handler.should_stop():
raise KeyboardInterrupt("收到停止信号")
# Check input files
if publisher:
publish("STORY_CHECK_FILES", "检查输入文件")
input_files = {
"asr": asr_path,
"cut": cut_path,
"yolo": yolo_path,
"ocr": ocr_path,
}
file_checks = check_input_files(input_files)
result["file_checks"] = file_checks
# Check if we have at least ASR data
if not file_checks.get("asr", {}).get("valid", False):
raise ValueError("缺少有效的 ASR 数据文件")
if publisher:
publish("STORY_FILES_VALID", "输入文件检查通过")
# Load input data
if publisher:
publish("STORY_LOAD_DATA", "加载输入数据")
input_data = load_input_data(input_files)
if publisher:
publish("STORY_DATA_LOADED", "数据加载完成")
# Generate parent-child chunks
if publisher:
publish("STORY_GENERATE_CHUNKS", "生成父-子块层次结构")
parent_chunks = generate_parent_child_chunks(
asr_data=input_data.get("asr"),
cut_data=input_data.get("cut"),
yolo_data=input_data.get("yolo"),
ocr_data=input_data.get("ocr"),
parent_chunk_size=parent_chunk_size,
min_child_chunks=min_child_chunks,
max_child_chunks=max_child_chunks,
summary_length=summary_length,
model=model,
model_name=model_name,
temperature=temperature,
max_tokens=max_tokens,
)
result["parent_chunks"] = parent_chunks
result["parent_chunk_count"] = len(parent_chunks)
# Calculate statistics
total_child_chunks = sum(chunk.get("child_count", 0) for chunk in parent_chunks)
child_types = {}
for chunk in parent_chunks:
for child_type in chunk.get("child_types", []):
child_types[child_type] = child_types.get(child_type, 0) + 1
result["chunk_statistics"] = {
"total_parent_chunks": len(parent_chunks),
"total_child_chunks": total_child_chunks,
"avg_children_per_parent": total_child_chunks / len(parent_chunks)
if parent_chunks
else 0,
"child_type_distribution": child_types,
}
result["success"] = True
if publisher:
publish("STORY_COMPLETE", f"完成: {len(parent_chunks)} 个父块")
except TimeoutError as e:
result["error"] = f"处理超时: {e}"
if publisher:
publish("STORY_TIMEOUT", f"超时: {e}")
except KeyboardInterrupt:
result["error"] = "处理被用户中断"
if publisher:
publish("STORY_INTERRUPTED", "处理被中断")
except ImportError as e:
result["error"] = f"依赖缺失: {e}"
if publisher:
publish("STORY_MISSING_DEPS", f"缺少依赖: {e}")
except Exception as e:
result["error"] = f"处理错误: {str(e)}"
if publisher:
publish("STORY_ERROR", f"错误: {str(e)}")
traceback.print_exc()
# Calculate processing time
processing_time = time.time() - start_time
result["processing_time"] = processing_time
# Add resource usage
try:
import psutil
process = psutil.Process()
memory_info = process.memory_info()
result["resource_usage"] = {
"cpu_percent": process.cpu_percent(),
"memory_mb": memory_info.rss / (1024 * 1024),
"user_time": process.cpu_times().user,
"system_time": process.cpu_times().system,
}
except ImportError:
result["resource_usage"] = {"error": "psutil not available"}
# Save result
try:
with open(output_path, "w") as f:
json.dump(result, f, indent=2, ensure_ascii=False)
if publisher:
publish("STORY_SAVED", f"结果保存到: {output_path}")
except Exception as e:
result["error"] = f"保存结果失败: {str(e)}"
if publisher:
publish("STORY_SAVE_ERROR", f"保存失败: {str(e)}")
return result
def main():
"""Main entry point"""
parser = argparse.ArgumentParser(
description=f"{PROCESSOR_NAME.upper()} Processor v{PROCESSOR_VERSION} - Parent-Child Chunk Generation"
)
parser.add_argument("--asr", help="Path to ASR JSON file", required=True)
parser.add_argument("--cut", help="Path to CUT JSON file", default="")
parser.add_argument("--yolo", help="Path to YOLO JSON file", default="")
parser.add_argument("--ocr", help="Path to OCR JSON file", default="")
parser.add_argument("--output", help="Path to output JSON file", required=True)
parser.add_argument("--uuid", help="UUID for progress tracking", default="")
parser.add_argument(
"--parent-chunk-size",
help=f"Maximum child chunks per parent (default: {DEFAULT_PARENT_CHUNK_SIZE})",
type=int,
default=DEFAULT_PARENT_CHUNK_SIZE,
)
parser.add_argument(
"--min-child-chunks",
help=f"Minimum child chunks to create parent (default: {DEFAULT_MIN_CHILD_CHUNKS})",
type=int,
default=DEFAULT_MIN_CHILD_CHUNKS,
)
parser.add_argument(
"--max-child-chunks",
help=f"Maximum child chunks per parent (default: {DEFAULT_MAX_CHILD_CHUNKS})",
type=int,
default=DEFAULT_MAX_CHILD_CHUNKS,
)
parser.add_argument(
"--summary-length",
help=f"Maximum summary length in characters (default: {DEFAULT_SUMMARY_LENGTH})",
type=int,
default=DEFAULT_SUMMARY_LENGTH,
)
parser.add_argument(
"--model",
help=f"Summary model to use (default: {DEFAULT_MODEL})",
default=DEFAULT_MODEL,
choices=["openai", "local", "template"],
)
parser.add_argument(
"--model-name",
help=f"Model name for OpenAI (default: {DEFAULT_MODEL_NAME})",
default=DEFAULT_MODEL_NAME,
)
parser.add_argument(
"--temperature",
help=f"Temperature for generation (default: {DEFAULT_TEMPERATURE})",
type=float,
default=DEFAULT_TEMPERATURE,
)
parser.add_argument(
"--max-tokens",
help=f"Maximum tokens per summary (default: {DEFAULT_MAX_TOKENS})",
type=int,
default=DEFAULT_MAX_TOKENS,
)
parser.add_argument(
"--timeout",
help=f"Timeout in seconds (default: {DEFAULT_TIMEOUT})",
type=int,
default=DEFAULT_TIMEOUT,
)
parser.add_argument(
"--health-check",
help="Run health check and exit",
action="store_true",
)
args = parser.parse_args()
# Health check mode
if args.health_check:
health = check_environment()
print(json.dumps(health, indent=2, ensure_ascii=False))
return (
0
if all(c["status"] in ["available", "optional"] for c in health["checks"])
else 1
)
# Normal processing mode
result = process_story(
asr_path=args.asr,
cut_path=args.cut,
yolo_path=args.yolo,
ocr_path=args.ocr,
output_path=args.output,
uuid=args.uuid,
parent_chunk_size=args.parent_chunk_size,
min_child_chunks=args.min_child_chunks,
max_child_chunks=args.max_child_chunks,
summary_length=args.summary_length,
model=args.model,
model_name=args.model_name,
temperature=args.temperature,
max_tokens=args.max_tokens,
timeout=args.timeout,
)
# Print result summary
if result.get("success", False):
print(f"{PROCESSOR_NAME.upper()} 处理成功")
print(f" 父块数: {result.get('parent_chunk_count', 0)}")
stats = result.get("chunk_statistics", {})
print(f" 子块总数: {stats.get('total_child_chunks', 0)}")
print(f" 平均子块/父块: {stats.get('avg_children_per_parent', 0):.1f}")
print(f" 处理时间: {result.get('processing_time', 0):.1f}")
print(f" 输出文件: {args.output}")
return 0
else:
print(f"{PROCESSOR_NAME.upper()} 处理失败")
print(f" 错误: {result.get('error', '未知错误')}")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -1 +0,0 @@
../v1.1/scripts/story_processor_contract_v1_v1.11.py

View File

@@ -1 +0,0 @@
../v1.1/scripts/story_processor_v1.11.py

View File

@@ -1,121 +0,0 @@
#!/opt/homebrew/bin/python3.11
"""
Test Parent Chunk Summary Generation (Gemma 4)
"""
import json
import ollama
import time
# Configuration
UUID = "384b0ff44aaaa1f1"
ASR_PATH = f"output/{UUID}/{UUID}.asr.json"
MODEL = "gemma4:latest"
# The Prompt Template
PARENT_SUMMARY_PROMPT = """
You are an expert film analyst. Analyze the following movie dialogue segment (approx 60 seconds).
Your task is to generate a structured JSON summary containing:
1. **narrative_summary**: A one-sentence summary of the main event/plot point.
2. **entities**: Key information extracted:
- `who`: List of characters involved.
- `where`: Inferred location (e.g., "Apartment", "Train").
- `objects`: Key props mentioned (e.g., "Ticket", "Money").
3. **emotional_arc**: The emotional transition:
- `start_mood`: Mood at the beginning.
- `end_mood`: Mood at the end.
4. **plot_sequence**:
- `scene_type`: Type of scene (e.g., "Confrontation", "Romance", "Discovery").
- `key_action`: The main action taking place.
**IMPORTANT RULES:**
- Output **ONLY** valid JSON.
- Do NOT include "Thinking Process" or markdown formatting.
- If information is unknown, use "Unknown".
- Context: This is from the movie "Charade" (1963).
Dialogue:
{context}
"""
def load_sample(start_index, count=20):
"""Load a slice of dialogue to simulate a Parent Chunk"""
try:
with open(ASR_PATH, "r") as f:
data = json.load(f)
segments = data.get("segments", [])
selected = segments[start_index : start_index + count]
text = " ".join([s.get("text", "") for s in selected])
print(f"📂 Loaded Sample {start_index}: {len(selected)} segments.")
return text
except Exception as e:
return f"Error: {e}"
def run_test(name, context_text):
print(f"\n🧪 Testing: {name}")
print("-" * 50)
print(f"📖 Input Preview: {context_text[:100]}...")
prompt = PARENT_SUMMARY_PROMPT.format(context=context_text)
try:
start = time.time()
response = ollama.chat(
model=MODEL, messages=[{"role": "user", "content": prompt}]
)
duration = time.time() - start
content = response["message"]["content"]
# Clean up thinking tags if present
if "```json" in content:
content = content.split("```json")[1].split("```")[0]
elif "Thinking..." in content:
# crude cleanup for demo
content = content.split("...")[-1]
# Attempt parse
try:
result = json.loads(content.strip())
print(f"✅ Success ({duration:.2f}s)")
print(json.dumps(result, indent=2))
return True
except json.JSONDecodeError:
print(f"⚠️ JSON Parse Failed ({duration:.2f}s)")
print(content[:500])
return False
except Exception as e:
print(f"❌ API Error: {e}")
return False
def main():
print(f"🚀 Starting Parent Chunk Summary Tests on '{UUID}'")
# Test 1: Early Dialogue (Entities & Narrative Focus)
# "possessed a ticket of passage..."
txt1 = load_sample(start_index=10)
res1 = run_test("Test 1: Early Plot (Entities & Narrative)", txt1)
time.sleep(2) # Cool down
# Test 2: Middle Conflict (Emotional Arc Focus)
# "where did he keep his money..." (From previous context)
txt2 = load_sample(start_index=50)
res2 = run_test("Test 2: Conflict (Emotional Arc)", txt2)
time.sleep(2) # Cool down
# Test 3: Later Dialogue (Plot Sequence Focus)
# Looking for a scene involving a conclusion or death aftermath
# Let's pick a later section to test robustness
txt3 = load_sample(start_index=150)
res3 = run_test("Test 3: Late Plot (Sequence)", txt3)
if __name__ == "__main__":
main()

View File

@@ -1 +0,0 @@
../v1.1/scripts/test_parent_chunk_generation_v1.11.py

View File

@@ -12,7 +12,7 @@ use std::collections::HashMap;
use super::types::AppState; use super::types::AppState;
use crate::core::config; use crate::core::config;
use crate::core::db::schema; use crate::core::db::schema;
use crate::core::db::{Database, PostgresDb, QdrantDb, RedisClient}; use crate::core::db::{Database, PostgresDb, QdrantDb, QdrantWorkspace, RedisClient};
use crate::core::storage::content_hash; use crate::core::storage::content_hash;
use crate::FileManager; use crate::FileManager;
@@ -463,7 +463,6 @@ async fn register_single_file(
.execute(db.pool()).await; .execute(db.pool()).await;
let mut cut_done = false; let mut cut_done = false;
let mut scene_done = false;
if has_video && total_frames > 0 && fps > 0.0 { if has_video && total_frames > 0 && fps > 0.0 {
let output_dir = std::env::var("MOMENTRY_OUTPUT_DIR") let output_dir = std::env::var("MOMENTRY_OUTPUT_DIR")
.unwrap_or_else(|_| "/Users/accusys/momentry/output_dev".to_string()); .unwrap_or_else(|_| "/Users/accusys/momentry/output_dev".to_string());
@@ -511,31 +510,6 @@ async fn register_single_file(
} }
} }
let scene_path =
std::path::Path::new(&output_dir).join(format!("{}.scene.json", file_uuid));
if !scene_path.exists() {
let scene_script = std::path::Path::new(&scripts_dir).join("scene_classifier.py");
if scene_script.exists() {
let scene_output = std::process::Command::new(&python_path)
.arg(&scene_script)
.arg(&canonical_path)
.arg(&scene_path)
.arg("--sample-interval")
.arg("2")
.output();
if let Ok(output) = scene_output {
if output.status.success() {
scene_done = true;
tracing::info!(
"[REGISTER] Scene classification completed for {}",
file_uuid
);
}
}
}
} else {
scene_done = true;
}
} }
let audio_tracks: Vec<serde_json::Value> = temp_probe_json let audio_tracks: Vec<serde_json::Value> = temp_probe_json
@@ -584,9 +558,9 @@ async fn register_single_file(
} }
} }
let _ = sqlx::query( let _ = sqlx::query(
&format!("UPDATE {} SET cut_done = $1, scene_done = $2, audio_tracks = $3, cut_count = $4, cut_max_duration = $5 WHERE file_uuid = $6", videos_table) &format!("UPDATE {} SET cut_done = $1, scene_done = false, audio_tracks = $3, cut_count = $4, cut_max_duration = $5 WHERE file_uuid = $6", videos_table)
) )
.bind(cut_done).bind(scene_done).bind(&audio_tracks_json).bind(cut_count).bind(cut_max_duration).bind(&file_uuid) .bind(cut_done).bind(&audio_tracks_json).bind(cut_count).bind(cut_max_duration).bind(&file_uuid)
.execute(db.pool()).await; .execute(db.pool()).await;
if let Some(json_val) = probe_json { if let Some(json_val) = probe_json {
@@ -599,41 +573,6 @@ async fn register_single_file(
let _ = std::fs::write(&probe_path, json_str); let _ = std::fs::write(&probe_path, json_str);
} }
if final_file_type.as_deref() == Some("video") {
let auto_file_uuid = file_uuid.clone();
let auto_db = db.clone();
tokio::spawn(async move {
let identities_dir =
std::path::Path::new(&*crate::core::config::OUTPUT_DIR).join("identities");
let index_path = identities_dir.join("_index.json");
let cache_path = format!(
"{}/{}.tmdb.json",
*crate::core::config::OUTPUT_DIR,
auto_file_uuid
);
let cache_file = std::path::Path::new(&cache_path);
if index_path.exists() && cache_file.exists() {
tracing::info!(
"[AUTO-TMDB] Offline cache found for {}, running probe",
auto_file_uuid
);
if let Err(e) =
crate::core::tmdb::probe::probe_from_cache(&auto_db, &auto_file_uuid).await
{
tracing::warn!("[AUTO-TMDB] Probe failed for {}: {}", auto_file_uuid, e);
} else {
tracing::info!("[AUTO-TMDB] Probe completed for {}", auto_file_uuid);
}
} else {
tracing::info!(
"[AUTO-TMDB] No offline cache for {}, skipping",
auto_file_uuid
);
}
});
}
RegisterFileResponse { RegisterFileResponse {
success: true, success: true,
file_uuid, file_uuid,
@@ -978,8 +917,16 @@ struct UnregisterResponse {
deleted_chunks: u64, deleted_chunks: u64,
deleted_tkg_nodes: u64, deleted_tkg_nodes: u64,
deleted_qdrant_vectors: Option<u64>, deleted_qdrant_vectors: Option<u64>,
deleted_qdrant_workspace: Option<u64>,
deleted_redis_keys: Option<u64>, deleted_redis_keys: Option<u64>,
deleted_output_files: u64, deleted_output_files: u64,
deleted_file_identities: u64,
deleted_speaker_detections: u64,
deleted_face_clusters: u64,
deleted_face_recognition_results: u64,
deleted_characters: u64,
deleted_chunks_rule1: u64,
deleted_processor_alerts: u64,
} }
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
@@ -1011,6 +958,15 @@ fn delete_output_files(uuid: &str) -> u64 {
} }
} }
} }
let workspace_sqlite = format!("{}.workspace.sqlite", uuid);
for output_dir in &output_dirs {
let path = std::path::Path::new(output_dir).join(&workspace_sqlite);
if path.exists() && std::fs::remove_file(&path).is_ok() {
deleted_count += 1;
tracing::info!("[UNREGISTER] Deleted workspace SQLite: {}", path.display());
}
}
deleted_count deleted_count
} }
@@ -1037,6 +993,13 @@ async fn unregister(
let chunk_vectors_table = schema::table_name("chunk_vectors"); let chunk_vectors_table = schema::table_name("chunk_vectors");
let monitor_jobs_table = schema::table_name("monitor_jobs"); let monitor_jobs_table = schema::table_name("monitor_jobs");
let frames_table = schema::table_name("frames"); let frames_table = schema::table_name("frames");
let file_identities_table = schema::table_name("file_identities");
let speaker_detections_table = schema::table_name("speaker_detections");
let face_clusters_table = schema::table_name("face_clusters");
let face_recognition_results_table = schema::table_name("face_recognition_results");
let characters_table = schema::table_name("characters");
let chunks_rule1_table = schema::table_name("chunks_rule1");
let processor_alerts_table = schema::table_name("processor_alerts");
let mut tx = state.db.pool().begin().await.map_err(|e| { let mut tx = state.db.pool().begin().await.map_err(|e| {
tracing::error!("[unregister] Failed to start transaction: {}", e); tracing::error!("[unregister] Failed to start transaction: {}", e);
@@ -1082,6 +1045,21 @@ async fn unregister(
})? })?
.rows_affected() as i64; .rows_affected() as i64;
let deleted_file_identities =
delete_safe!(file_identities_table, "file_uuid = $1", &uuid, "file identities");
let deleted_speaker_detections =
delete_safe!(speaker_detections_table, "file_uuid = $1", &uuid, "speaker detections");
let deleted_face_clusters =
delete_safe!(face_clusters_table, "file_uuid = $1", &uuid, "face clusters");
let deleted_face_recognition =
delete_safe!(face_recognition_results_table, "file_uuid = $1", &uuid, "face recognition results");
let deleted_characters =
delete_safe!(characters_table, "file_uuid = $1", &uuid, "characters");
let deleted_chunks_rule1 =
delete_safe!(chunks_rule1_table, "uuid = $1", &uuid, "chunks rule1");
let deleted_processor_alerts =
delete_safe!(processor_alerts_table, "file_uuid = $1", &uuid, "processor alerts");
sqlx::query(&format!( sqlx::query(&format!(
"DELETE FROM {} WHERE file_uuid = $1", "DELETE FROM {} WHERE file_uuid = $1",
videos_table videos_table
@@ -1100,10 +1078,13 @@ async fn unregister(
})?; })?;
tracing::info!( tracing::info!(
"[UNREGISTER] Deleted: {} faces, {} processors, {} parent_chunks, {} chunks, {} pre_chunks, {} tkg_nodes, {} cuts, {} strangers, {} chunk_vectors, {} monitor_jobs, {} frames", "[UNREGISTER] Deleted: {} faces, {} processors, {} parent_chunks, {} chunks, {} pre_chunks, {} tkg_nodes, {} cuts, {} strangers, {} chunk_vectors, {} monitor_jobs, {} frames, {} file_identities, {} speaker_detections, {} face_clusters, {} face_recognition_results, {} characters, {} chunks_rule1, {} processor_alerts",
deleted_faces, deleted_processors, deleted_parent_chunks, deleted_chunks, deleted_faces, deleted_processors, deleted_parent_chunks, deleted_chunks,
deleted_pre_chunks, deleted_tkg_nodes, deleted_cuts, deleted_strangers, deleted_pre_chunks, deleted_tkg_nodes, deleted_cuts, deleted_strangers,
deleted_chunk_vectors, deleted_monitor_jobs, deleted_frames deleted_chunk_vectors, deleted_monitor_jobs, deleted_frames,
deleted_file_identities, deleted_speaker_detections, deleted_face_clusters,
deleted_face_recognition, deleted_characters, deleted_chunks_rule1,
deleted_processor_alerts
); );
let deleted_output_files = delete_output_files(&uuid); let deleted_output_files = delete_output_files(&uuid);
@@ -1141,6 +1122,20 @@ async fn unregister(
} }
}; };
let deleted_qdrant_workspace = {
let workspace = QdrantWorkspace::new();
match workspace.delete_by_file_uuid(&uuid).await {
Ok(_) => {
tracing::info!("[UNREGISTER] Deleted Qdrant workspace vectors for {}", uuid);
Some(1)
}
Err(e) => {
tracing::warn!("[UNREGISTER] Failed to delete Qdrant workspace vectors: {}", e);
None
}
}
};
Ok(Json(UnregisterResponse { Ok(Json(UnregisterResponse {
success: true, success: true,
message: format!("File {} unregistered successfully.", uuid), message: format!("File {} unregistered successfully.", uuid),
@@ -1150,8 +1145,16 @@ async fn unregister(
deleted_chunks: (deleted_chunks + deleted_parent_chunks + deleted_pre_chunks) as u64, deleted_chunks: (deleted_chunks + deleted_parent_chunks + deleted_pre_chunks) as u64,
deleted_tkg_nodes: deleted_tkg_nodes as u64, deleted_tkg_nodes: deleted_tkg_nodes as u64,
deleted_qdrant_vectors, deleted_qdrant_vectors,
deleted_qdrant_workspace,
deleted_redis_keys, deleted_redis_keys,
deleted_output_files, deleted_output_files,
deleted_file_identities: deleted_file_identities as u64,
deleted_speaker_detections: deleted_speaker_detections as u64,
deleted_face_clusters: deleted_face_clusters as u64,
deleted_face_recognition_results: deleted_face_recognition as u64,
deleted_characters: deleted_characters as u64,
deleted_chunks_rule1: deleted_chunks_rule1 as u64,
deleted_processor_alerts: deleted_processor_alerts as u64,
})) }))
} }

View File

@@ -1,807 +0,0 @@
use axum::{
extract::State,
http::StatusCode,
response::Json,
routing::{get, post},
Router,
};
use serde::{Deserialize, Serialize};
use crate::core::llm::function_calling::LLM_CLIENT;
use sqlx::Row;
use crate::api::types::AppState;
use crate::core::db::qdrant_db::QdrantDb;
use crate::core::db::schema;
use crate::core::db::{PostgresDb, VectorPayload};
use crate::core::embedding::Embedder;
pub fn five_w1h_agent_routes() -> Router<AppState> {
Router::new()
.route("/api/v1/agents/5w1h/analyze", post(analyze_5w1h))
.route("/api/v1/agents/5w1h/batch", post(batch_analyze_5w1h))
.route("/api/v1/agents/5w1h/status", get(get_5w1h_status))
}
// ── Data Structures ──
#[derive(Debug, Deserialize)]
pub struct Analyze5W1HRequest {
pub file_uuid: String,
}
#[derive(Debug, Serialize)]
pub struct Analyze5W1HResponse {
pub success: bool,
pub file_uuid: String,
pub scenes_processed: usize,
pub scenes_total: usize,
}
#[derive(Debug, Deserialize)]
pub struct BatchAnalyze5W1HRequest {
pub file_uuids: Vec<String>,
}
#[derive(Debug, Serialize)]
pub struct BatchAnalyze5W1HResponse {
pub success: bool,
pub jobs: Vec<BatchJobStatus>,
}
#[derive(Debug, Serialize)]
pub struct BatchJobStatus {
pub file_uuid: String,
pub status: String,
pub message: String,
}
#[derive(Debug, Clone)]
struct CutScene {
chunk_id: String,
start_frame: i64,
end_frame: i64,
fps: f64,
start_time: f64,
end_time: f64,
content: serde_json::Value,
metadata: serde_json::Value,
summary_text: Option<String>,
}
#[derive(Debug, Clone)]
struct SentenceChunk {
chunk_id: String,
text: String,
start_time: f64,
end_time: f64,
start_frame: i64,
end_frame: i64,
content: serde_json::Value,
}
#[derive(Debug)]
struct ChildSummary {
chunk_id: String,
enhanced: String,
five_w1h: serde_json::Value,
}
#[derive(Debug)]
struct SceneSummaryResult {
parent_summary: String,
five_w1h: serde_json::Value,
child_summaries: Vec<ChildSummary>,
}
// ── LLM Endpoint ──
fn llm_base_url() -> String {
crate::core::config::llm::SUMMARY_URL.clone()
}
fn llm_model() -> String {
crate::core::config::llm::SUMMARY_MODEL.clone()
}
// ── Data Fetching ──
async fn fetch_cut_scenes(db: &PostgresDb, file_uuid: &str) -> anyhow::Result<Vec<CutScene>> {
let table = schema::table_name("chunk");
sqlx::query_as::<_, (String, i64, i64, f64, Option<f64>, Option<f64>, serde_json::Value, Option<serde_json::Value>, Option<String>)>(&format!(
r#"SELECT chunk_id, start_frame, end_frame, fps, start_time, end_time, content, metadata, summary_text
FROM {} WHERE file_uuid = $1 AND chunk_type = 'cut' ORDER BY start_frame"#, table
))
.bind(file_uuid)
.fetch_all(db.pool()).await?
.into_iter().map(|r| Ok(CutScene {
chunk_id: r.0, start_frame: r.1, end_frame: r.2,
fps: r.3, start_time: r.4.unwrap_or(0.0), end_time: r.5.unwrap_or(0.0),
content: r.6, metadata: r.7.unwrap_or(serde_json::json!({})), summary_text: r.8,
})).collect()
}
async fn fetch_sentences_in_scene(
db: &PostgresDb,
file_uuid: &str,
cut: &CutScene,
) -> anyhow::Result<Vec<SentenceChunk>> {
let table = schema::table_name("chunk");
sqlx::query_as::<_, (String, String, Option<f64>, Option<f64>, i64, i64, serde_json::Value)>(&format!(
r#"SELECT chunk_id, COALESCE(text_content,''), start_time, end_time, start_frame, end_frame, content
FROM {} WHERE file_uuid = $1 AND chunk_type = 'sentence'
AND start_time >= $2 AND end_time <= $3 ORDER BY start_time"#, table
))
.bind(file_uuid).bind(cut.start_time).bind(cut.end_time)
.fetch_all(db.pool()).await?
.into_iter().map(|r| Ok(SentenceChunk {
chunk_id: r.0, text: r.1, start_time: r.2.unwrap_or(0.0), end_time: r.3.unwrap_or(0.0),
start_frame: r.4, end_frame: r.5, content: r.6,
})).collect()
}
/// Fetch actor names present in this scene from face_detections + identity_bindings + identities
async fn fetch_identity_names_for_scene(
db: &PostgresDb,
file_uuid: &str,
cut: &CutScene,
) -> anyhow::Result<Vec<String>> {
let fd_table = schema::table_name("face_detections");
let ib_table = schema::table_name("identity_bindings");
let id_table = schema::table_name("identities");
let rows = sqlx::query_scalar::<_, String>(&format!(
r#"SELECT DISTINCT i.name
FROM {} fd
JOIN {} ib ON ib.identity_value = fd.trace_id::text AND ib.identity_type = 'trace'
JOIN {} i ON i.id = ib.identity_id
WHERE fd.file_uuid = $1 AND fd.frame_number >= $2 AND fd.frame_number <= $3
AND fd.trace_id IS NOT NULL
ORDER BY i.name"#,
fd_table, ib_table, id_table
))
.bind(file_uuid)
.bind(cut.start_frame)
.bind(cut.end_frame)
.fetch_all(db.pool())
.await?;
Ok(rows)
}
/// Fetch YOLO object labels detected in this scene from pre_chunks
async fn fetch_yolo_objects_for_scene(
db: &PostgresDb,
file_uuid: &str,
cut: &CutScene,
) -> anyhow::Result<Vec<String>> {
let table = schema::table_name("pre_chunks");
let rows = sqlx::query_scalar::<_, String>(&format!(
r#"SELECT DISTINCT data->>'label'
FROM {} WHERE file_uuid = $1 AND processor_type = 'yolo'
AND frame_number >= $2 AND frame_number <= $3
AND data->>'label' IS NOT NULL
ORDER BY data->>'label'"#,
table
))
.bind(file_uuid)
.bind(cut.start_frame)
.bind(cut.end_frame)
.fetch_all(db.pool())
.await?;
Ok(rows)
}
/// Fetch active speakers + their actor names for a scene's frame range
/// Uses identity_bindings to map SPEAKER_X to actor names
async fn fetch_speakers_for_scene(
db: &PostgresDb,
file_uuid: &str,
cut: &CutScene,
) -> anyhow::Result<Vec<String>> {
let pc_table = schema::table_name("pre_chunks");
let speakers = sqlx::query_scalar::<_, String>(&format!(
r#"SELECT DISTINCT data->>'speaker_id'
FROM {} WHERE file_uuid = $1 AND processor_type = 'asrx'
AND data->>'speaker_id' IS NOT NULL
AND start_frame <= $3 AND end_frame >= $2
ORDER BY data->>'speaker_id'"#,
pc_table
))
.bind(file_uuid)
.bind(cut.start_frame)
.bind(cut.end_frame)
.fetch_all(db.pool())
.await?;
if speakers.is_empty() {
return Ok(vec![]);
}
// Map speaker_ids to actor names via identity_bindings
let ib_table = schema::table_name("identity_bindings");
let id_table = schema::table_name("identities");
let mut result = Vec::new();
for spk in &speakers {
let name: Option<String> = sqlx::query_scalar(&format!(
r#"SELECT i.name FROM {} ib JOIN {} i ON i.id = ib.identity_id
WHERE ib.identity_type = 'speaker' AND ib.identity_value = $1 AND i.name IS NOT NULL
LIMIT 1"#,
ib_table, id_table
))
.bind(spk)
.fetch_optional(db.pool())
.await?;
match name {
Some(n) => result.push(format!("{} ({})", spk, n)),
None => result.push(spk.clone()),
}
}
Ok(result)
}
/// Fetch trace IDs with identity names for a scene's frame range
async fn fetch_trace_info(
db: &PostgresDb,
file_uuid: &str,
cut: &CutScene,
) -> anyhow::Result<Vec<String>> {
let fd_table = schema::table_name("face_detections");
let ib_table = schema::table_name("identity_bindings");
let id_table = schema::table_name("identities");
let rows = sqlx::query_as::<_, (i32, Option<String>)>(&format!(
r#"SELECT DISTINCT fd.trace_id, i.name
FROM {} fd
LEFT JOIN {} ib ON ib.identity_value = fd.trace_id::text AND ib.identity_type = 'trace'
LEFT JOIN {} i ON i.id = ib.identity_id
WHERE fd.file_uuid = $1 AND fd.frame_number >= $2 AND fd.frame_number <= $3
AND fd.trace_id IS NOT NULL
ORDER BY fd.trace_id"#,
fd_table, ib_table, id_table
))
.bind(file_uuid)
.bind(cut.start_frame)
.bind(cut.end_frame)
.fetch_all(db.pool())
.await?;
Ok(rows
.iter()
.map(|(trace, name)| {
if let Some(n) = name {
format!("trace_{} ({})", trace, n)
} else {
format!("trace_{}", trace)
}
})
.collect())
}
// ── LLM Prompt (Embedding-Optimized) ──
async fn summarize_one_scene(
db: &PostgresDb,
file_uuid: &str,
cut: &CutScene,
sentences: &[SentenceChunk],
prev_context: &str,
) -> anyhow::Result<SceneSummaryResult> {
if sentences.is_empty() {
return Ok(SceneSummaryResult {
parent_summary: String::new(),
five_w1h: serde_json::Value::Null,
child_summaries: vec![],
});
}
let faces = fetch_identity_names_for_scene(db, file_uuid, cut)
.await
.unwrap_or_default();
let objects = fetch_yolo_objects_for_scene(db, file_uuid, cut)
.await
.unwrap_or_default();
let traces = fetch_trace_info(db, file_uuid, cut)
.await
.unwrap_or_default();
let speakers = fetch_speakers_for_scene(db, file_uuid, cut)
.await
.unwrap_or_default();
let mut dialogue = String::new();
for (i, s) in sentences.iter().enumerate() {
let t = s.text.trim();
if !t.is_empty() {
dialogue.push_str(&format!("[{}] {}\n", i + 1, t));
}
}
let story_so_far = if prev_context.is_empty() {
String::new()
} else {
format!("\nStory so far (previous scenes):\n{}\n", prev_context)
};
let prompt = format!(
r#"Analyze this movie scene and produce a structured summary. Be specific — quote actual dialogue. Avoid template phrases like "within the established dramatic setting."
Scene time: {:.0}s{:.0}s
Dialogue:
{}Actors: {}
Objects: {}
Face traces: {}
Speakers: {}
{}
Output EXACTLY this JSON format:
{{
"scene_summary": "5 flowing sentences: who+what+where+when+why+how. Quote actual lines.",
"5w1h": {{
"who": "1 sentence with actor/character name",
"what": "1 sentence describing the action, quote the line",
"where": "1 sentence about setting",
"when": "1 sentence about timing in story",
"why": "1 sentence explaining why this moment matters",
"how": "1 sentence about delivery, emotion, tone"
}},
"sentences": [
{{
"index": 1,
"who": "1 sentence",
"what": "1 sentence referencing the actual line",
"where": "1 sentence",
"when": "1 sentence",
"why": "1 sentence why this is said",
"how": "1 sentence describing delivery",
"enhanced": "1 sentence with actual dialogue, self-contained for search"
}}
]
}}
Rules:
- scene_summary: 5 sentences, natural paragraph. Use quotes. No template phrases.
- Each 5w1h field: exactly 1 sentence. Specific details. Character names. Quotes.
- Each sentence.enhanced: self-contained for search, include actual spoken words.
- Return ONLY valid JSON. No markdown.
- A short scene with 1-2 lines should have a short summary."#,
cut.start_time,
cut.end_time,
dialogue,
faces.join(", "),
objects.join(", "),
traces.join(", "),
speakers.join(", "),
story_so_far,
);
let body = serde_json::json!({
"model": llm_model(),
"messages": [
{"role": "system", "content": "You output JSON only. Be specific. Quote actual dialogue. Avoid template phrases."},
{"role": "user", "content": prompt}
],
"temperature": 0.1,
"max_tokens": 4096,
"stream": false
});
let resp = LLM_CLIENT
.post(llm_base_url())
.json(&body)
.timeout(std::time::Duration::from_secs(180))
.send()
.await?
.json::<serde_json::Value>()
.await?;
let content = resp["choices"][0]["message"]["content"]
.as_str()
.unwrap_or("{}");
// Strip markdown code fences if present
let cleaned = content
.trim_start_matches("```json")
.trim_start_matches("```")
.trim_end_matches("```")
.trim();
let parsed: serde_json::Value =
serde_json::from_str(cleaned).unwrap_or(serde_json::Value::Null);
let parent_summary = parsed["scene_summary"].as_str().unwrap_or("").to_string();
let five_w1h = parsed
.get("5w1h")
.cloned()
.unwrap_or(serde_json::Value::Null);
let mut child_summaries = Vec::new();
if let Some(arr) = parsed["sentences"].as_array() {
for entry in arr {
let idx = entry["index"].as_u64().unwrap_or(0).saturating_sub(1) as usize;
if let Some(enhanced) = entry["enhanced"].as_str() {
if idx < sentences.len() {
let child_5w1h = serde_json::json!({
"who": entry["who"].as_str().unwrap_or(""),
"what": entry["what"].as_str().unwrap_or(""),
"where": entry["where"].as_str().unwrap_or(""),
"when": entry["when"].as_str().unwrap_or(""),
"why": entry["why"].as_str().unwrap_or(""),
"how": entry["how"].as_str().unwrap_or(""),
});
child_summaries.push(ChildSummary {
chunk_id: sentences[idx].chunk_id.clone(),
enhanced: enhanced.to_string(),
five_w1h: child_5w1h,
});
}
}
}
}
// Fallback
if child_summaries.is_empty() && !parent_summary.is_empty() {
for s in sentences {
let text = s.text.trim();
if !text.is_empty() {
child_summaries.push(ChildSummary {
chunk_id: s.chunk_id.clone(),
enhanced: format!("{} Scene: {}", text, parent_summary),
five_w1h: serde_json::Value::Null,
});
}
}
}
Ok(SceneSummaryResult {
parent_summary,
five_w1h,
child_summaries,
})
}
// ── DB Storage ──
async fn store_parent_summary(
db: &PostgresDb,
cut_chunk_id: &str,
file_uuid: &str,
summary: &str,
five_w1h: &serde_json::Value,
sentences: &[SentenceChunk],
) -> anyhow::Result<()> {
let table = schema::table_name("chunk");
let meta = serde_json::json!({
"5w1h": five_w1h,
"sentence_ids": sentences.iter().map(|s| s.chunk_id.clone()).collect::<Vec<_>>(),
"sentence_count": sentences.len(),
});
sqlx::query(&format!(
r#"UPDATE {} SET summary_text = $1, metadata = jsonb_deep_merge(COALESCE(metadata, '{{}}'::jsonb), $2::jsonb)
WHERE chunk_id = $3 AND file_uuid = $4"#,
table
))
.bind(summary)
.bind(&meta)
.bind(cut_chunk_id)
.bind(file_uuid)
.execute(db.pool())
.await?;
Ok(())
}
async fn store_child_summaries(
db: &PostgresDb,
file_uuid: &str,
children: &[ChildSummary],
) -> anyhow::Result<()> {
let table = schema::table_name("chunk");
for c in children {
let text = c.enhanced.trim();
if text.is_empty() || text.len() < 10 {
continue;
}
// Update text_content (for embedding) + merge 5w1h into content
let merge = serde_json::json!({ "5w1h": c.five_w1h });
sqlx::query(&format!(
r#"UPDATE {} SET text_content = $1, content = content || $2::jsonb, embedding = NULL
WHERE chunk_id = $3 AND file_uuid = $4"#,
table
))
.bind(text)
.bind(&merge)
.bind(&c.chunk_id)
.bind(file_uuid)
.execute(db.pool())
.await?;
}
Ok(())
}
// ── API Handlers ──
async fn analyze_5w1h(
State(state): State<AppState>,
Json(req): Json<Analyze5W1HRequest>,
) -> Result<Json<Analyze5W1HResponse>, (StatusCode, String)> {
let db = PostgresDb::from_pool(state.db.pool().clone());
let cuts = fetch_cut_scenes(&db, &req.file_uuid)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let total = cuts.len();
let mut processed = 0usize;
let mut prev_context: Vec<String> = Vec::new();
for cut in &cuts {
// Skip already-summarized scenes but preserve context
if let Some(ref t) = cut.summary_text {
if t.len() > 20 {
processed += 1;
prev_context.push(format!("Scene (t={:.0}s): {}", cut.start_time, t));
continue;
}
}
let sentences = match fetch_sentences_in_scene(&db, &req.file_uuid, cut).await {
Ok(s) => s,
Err(e) => {
tracing::error!("[5W1H] fetch sentences failed: {}", e);
continue;
}
};
if sentences.is_empty() {
continue;
}
let context = prev_context.join("\n");
let result = match summarize_one_scene(&db, &req.file_uuid, cut, &sentences, &context).await
{
Ok(r) => r,
Err(e) => {
tracing::error!("[5W1H] scene {} failed: {}", cut.chunk_id, e);
processed += 1;
continue;
}
};
if !result.parent_summary.is_empty() {
if let Err(e) = store_parent_summary(
&db,
&cut.chunk_id,
&req.file_uuid,
&result.parent_summary,
&result.five_w1h,
&sentences,
)
.await
{
tracing::error!("[5W1H] parent: {}", e);
}
if let Err(e) =
store_child_summaries(&db, &req.file_uuid, &result.child_summaries).await
{
tracing::error!("[5W1H] child: {}", e);
}
prev_context.push(format!(
"Scene (t={:.0}s): {}",
cut.start_time, result.parent_summary
));
}
processed += 1;
}
Ok(Json(Analyze5W1HResponse {
success: true,
file_uuid: req.file_uuid,
scenes_processed: processed,
scenes_total: total,
}))
}
async fn batch_analyze_5w1h(
State(state): State<AppState>,
Json(req): Json<BatchAnalyze5W1HRequest>,
) -> Result<Json<BatchAnalyze5W1HResponse>, (StatusCode, String)> {
let db = PostgresDb::from_pool(state.db.pool().clone());
let mut jobs = Vec::new();
for uuid in &req.file_uuids {
let cuts = fetch_cut_scenes(&db, uuid).await.unwrap_or_default();
let total = cuts.len();
let mut processed = 0usize;
let mut prev_context: Vec<String> = Vec::new();
for cut in &cuts {
if let Some(ref t) = cut.summary_text {
if t.len() > 20 {
processed += 1;
prev_context.push(format!("Scene (t={:.0}s): {}", cut.start_time, t));
continue;
}
}
let sentences = fetch_sentences_in_scene(&db, uuid, cut)
.await
.unwrap_or_default();
if sentences.is_empty() {
continue;
}
let context = prev_context.join("\n");
if let Ok(result) = summarize_one_scene(&db, uuid, cut, &sentences, &context).await {
if !result.parent_summary.is_empty() {
let _ = store_parent_summary(
&db,
&cut.chunk_id,
uuid,
&result.parent_summary,
&result.five_w1h,
&sentences,
)
.await;
let _ = store_child_summaries(&db, uuid, &result.child_summaries).await;
prev_context.push(format!(
"Scene (t={:.0}s): {}",
cut.start_time, result.parent_summary
));
}
}
processed += 1;
}
jobs.push(BatchJobStatus {
file_uuid: uuid.clone(),
status: if processed > 0 {
"completed".to_string()
} else {
"no_cut_scenes".to_string()
},
message: format!("{}/{} scenes processed", processed, total),
});
}
Ok(Json(BatchAnalyze5W1HResponse {
success: true,
jobs,
}))
}
async fn get_5w1h_status(
State(state): State<AppState>,
) -> Result<Json<serde_json::Value>, (StatusCode, String)> {
let table = schema::table_name("videos");
let rows = sqlx::query(&format!(
r#"SELECT file_uuid, processing_status->'agents'->'five_w1h' as s
FROM {} WHERE processing_status->'agents'->'five_w1h' IS NOT NULL
ORDER BY updated_at DESC LIMIT 50"#,
table
))
.fetch_all(state.db.pool())
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let videos: Vec<serde_json::Value> = rows
.iter()
.map(|r| {
serde_json::json!({
"uuid": r.try_get::<String,_>("file_uuid").unwrap_or_default(),
"five_w1h_status": r.try_get::<Option<serde_json::Value>,_>("s").ok().flatten(),
})
})
.collect();
Ok(Json(
serde_json::json!({ "success": true, "videos": videos }),
))
}
/// Pipeline-triggered entry point: run 5W1H agent for a file.
pub async fn run_5w1h_agent(db: &PostgresDb, file_uuid: &str) -> anyhow::Result<()> {
let cuts = fetch_cut_scenes(db, file_uuid).await?;
let total = cuts.len();
let mut processed = 0usize;
let mut prev_context: Vec<String> = Vec::new();
for cut in &cuts {
if let Some(ref t) = cut.summary_text {
if t.len() > 20 {
processed += 1;
prev_context.push(format!("Scene (t={:.0}s): {}", cut.start_time, t));
continue;
}
}
let sentences = fetch_sentences_in_scene(db, file_uuid, cut).await?;
if sentences.is_empty() {
continue;
}
let context = prev_context.join("\n");
match summarize_one_scene(db, file_uuid, cut, &sentences, &context).await {
Ok(result) => {
if !result.parent_summary.is_empty() {
let _ = store_parent_summary(
db,
&cut.chunk_id,
file_uuid,
&result.parent_summary,
&result.five_w1h,
&sentences,
)
.await;
let _ = store_child_summaries(db, file_uuid, &result.child_summaries).await;
prev_context.push(format!(
"Scene (t={:.0}s): {}",
cut.start_time, result.parent_summary
));
}
processed += 1;
}
Err(e) => tracing::error!("[5W1H] Scene {} failed: {}", cut.chunk_id, e),
}
}
tracing::info!(
"[5W1H] Done for {}: {}/{} scenes",
file_uuid,
processed,
total
);
// Auto-vectorize sentences with EmbeddingGemma (768D)
tracing::info!("[5W1H] Starting vectorize for sentence chunks...");
let embedder = Embedder::new("embeddinggemma-300m".to_string());
let qdrant = QdrantDb::new();
qdrant.init_collection(768).await?;
let chunk_table = schema::table_name("chunk");
let rows = sqlx::query_as::<_, (String, String, String, i64, i64, f64, f64)>(&format!(
"SELECT chunk_id, chunk_type, text_content, start_frame, end_frame, start_time, end_time \
FROM {} WHERE file_uuid = $1 AND chunk_type = 'sentence' AND embedding IS NULL \
AND (text_content IS NOT NULL AND text_content != '') ORDER BY id",
chunk_table
))
.bind(file_uuid)
.fetch_all(db.pool())
.await?;
let total_vec = rows.len();
let mut stored = 0usize;
for (chunk_id, _ctype, text, start_frame, end_frame, start_time, end_time) in &rows {
let text = text.trim();
if text.is_empty() || text.len() < 5 {
continue;
}
match embedder.embed_document(text).await {
Ok(vector) => {
if let Err(e) = sqlx::query(&format!(
"UPDATE {} SET embedding = $1::vector WHERE chunk_id = $2 AND file_uuid = $3",
chunk_table
))
.bind(&vector as &[f32])
.bind(chunk_id)
.bind(file_uuid)
.execute(db.pool())
.await
{
tracing::error!("[Vectorize] PG failed for {}: {}", chunk_id, e);
continue;
}
let payload = VectorPayload {
file_uuid: file_uuid.to_string(),
chunk_id: chunk_id.clone(),
chunk_type: "sentence".to_string(),
start_frame: *start_frame,
end_frame: *end_frame,
start_time: *start_time,
end_time: *end_time,
text: Some(text.to_string()),
};
if let Err(e) = qdrant.upsert_vector(chunk_id, &vector, payload).await {
tracing::error!("[Vectorize] Qdrant failed for {}: {}", chunk_id, e);
continue;
}
stored += 1;
if stored % 50 == 0 {
tracing::info!("[Vectorize] {}/{}", stored, total_vec);
}
}
Err(e) => tracing::error!("[Vectorize] Embed failed for {}: {}", chunk_id, e),
}
}
tracing::info!("[5W1H] Vectorize done: {}/{} stored", stored, total_vec);
Ok(())
}

View File

@@ -180,11 +180,11 @@ async fn list_identities(
})?; })?;
let sql = format!( let sql = format!(
"SELECT id::int, uuid, name, metadata FROM {} WHERE status IS NULL OR status != 'merged' ORDER BY id DESC LIMIT $1 OFFSET $2", "SELECT id::int, uuid, name, metadata, status, starred FROM {} WHERE status IS NULL OR status != 'merged' ORDER BY id DESC LIMIT $1 OFFSET $2",
id_table id_table
); );
let rows: Vec<(i32, uuid::Uuid, String, Option<serde_json::Value>)> = match sqlx::query_as(&sql) let rows: Vec<(i32, uuid::Uuid, String, Option<serde_json::Value>, Option<String>, Option<bool>)> = match sqlx::query_as(&sql)
.bind(page_size as i64) .bind(page_size as i64)
.bind(offset) .bind(offset)
.fetch_all(db.pool()) .fetch_all(db.pool())
@@ -201,11 +201,16 @@ let sql = format!(
let identities: Vec<IdentityResponse> = rows let identities: Vec<IdentityResponse> = rows
.into_iter() .into_iter()
.map(|r| IdentityResponse { .map(|r| {
id: r.0, IdentityResponse {
identity_uuid: r.1.to_string().replace('-', ""), id: r.0,
name: r.2, identity_uuid: r.1.to_string().replace('-', ""),
metadata: r.3, name: r.2,
metadata: r.3,
status: r.4,
starred: r.5.unwrap_or(false),
file_uuids: vec![], // Removed N+1 query
}
}) })
.collect(); .collect();
@@ -281,6 +286,9 @@ pub struct IdentityResponse {
pub identity_uuid: String, pub identity_uuid: String,
pub name: String, pub name: String,
pub metadata: Option<serde_json::Value>, pub metadata: Option<serde_json::Value>,
pub status: Option<String>,
pub starred: bool,
pub file_uuids: Vec<String>,
} }
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]

View File

@@ -661,597 +661,21 @@ fn average_embeddings<'a>(embeddings: impl Iterator<Item = &'a Vec<f32>>) -> Vec
/// Unknown: greedy stranger clustering (TH=0.40) /// Unknown: greedy stranger clustering (TH=0.40)
/// Writes identity_ref/stranger_ref to Qdrant payload, TKG nodes, and face_detections. /// Writes identity_ref/stranger_ref to Qdrant payload, TKG nodes, and face_detections.
async fn match_faces_iterative(pool: &sqlx::PgPool, file_uuid: &str) -> anyhow::Result<usize> { async fn match_faces_iterative(pool: &sqlx::PgPool, file_uuid: &str) -> anyhow::Result<usize> {
use crate::core::db::face_embedding_db::FaceEmbeddingDb; tracing::warn!(
use std::collections::HashMap; "[FaceMatch] Face matching disabled - FaceEmbeddingDb removed. \
TODO: Reimplement with _faces collection for {}",
let face_db = FaceEmbeddingDb::new();
// Step 1: Load seeds from Qdrant (type=identity_seed)
let seeds = face_db.get_seed_embeddings().await?;
tracing::info!(
"[FaceMatch] Loaded {} seeds from Qdrant",
seeds.len()
);
// Step 2: Preload identity internal IDs (uuid → (id, name))
let id_table = schema::table_name("identities");
let seed_identity_map: HashMap<String, (i32, String)> = if !seeds.is_empty() {
let uuids: Vec<String> = seeds.iter().map(|(uuid, _, _)| uuid.clone()).collect();
if uuids.is_empty() {
HashMap::new()
} else {
let rows = sqlx::query_as::<_, (i32, String, String)>(&format!(
"SELECT id, uuid::text, name FROM {} WHERE uuid::text = ANY($1)",
id_table
))
.bind(&uuids)
.fetch_all(pool)
.await?
.into_iter()
.map(|(id, uuid, name)| (uuid, (id, name)))
.collect();
rows
}
} else {
HashMap::new()
};
// Step 3: Load face embeddings from Qdrant for this file
let qdrant_embeddings = face_db.get_all_embeddings_for_file(file_uuid).await?;
if qdrant_embeddings.is_empty() {
tracing::warn!("[FaceMatch] No face embeddings in Qdrant for {}", file_uuid);
return Ok(0);
}
// Step 4: Group embeddings by trace_id, keeping confidence
let mut trace_faces: HashMap<i32, Vec<(i64, Vec<f32>, f64)>> = HashMap::new();
for (_, emb, payload) in &qdrant_embeddings {
trace_faces
.entry(payload.trace_id)
.or_default()
.push((payload.frame, emb.clone(), payload.confidence));
}
// Step 5: Progressive multi-round matching with derived seeds
// Each round: choose a face with best seed sim for matching; separately,
// collect the highest-confidence face per trace for building derived seeds.
const TH_MIN: f32 = 0.35;
const DERIVED_CONF: f64 = 0.90;
const MAX_DERIVED_PER_ID: usize = 9;
const MAX_FACES_PER_TRACE: usize = 3;
const ANGLE_SIM_THRESHOLD: f32 = 0.90;
const TH_STRANGER: f32 = 0.40;
let total_traces = trace_faces.len();
let total_embeddings: usize = trace_faces.values().map(|v| v.len()).sum();
tracing::info!(
"[FaceMatch] Loaded {} traces ({} face embeddings) from Qdrant for {}",
total_traces,
total_embeddings,
file_uuid file_uuid
); );
Ok(0)
let mut matched: HashMap<i32, (String, i32)> = HashMap::new();
let mut trace_face_count: HashMap<i32, usize> = HashMap::new();
// All reference embeddings: start with original TMDb seeds
let mut all_refs: Vec<(String, String, Vec<f32>)> = seeds.clone();
let thresholds = [0.55f32, 0.50, 0.45, 0.40, 0.35];
let mut prev_total = 0usize;
for (round_idx, &th) in thresholds.iter().enumerate() {
if th < TH_MIN {
break;
}
let mut new_matches: HashMap<i32, (String, i32)> = HashMap::new();
let mut seed_candidates: Vec<(i32, String, i32, Vec<f32>, f64)> = Vec::new();
for (&tid, faces) in &trace_faces {
if matched.contains_key(&tid) {
continue;
}
trace_face_count.entry(tid).or_insert(faces.len());
let mut best_sim = 0.0f32;
let mut best_name = String::new();
let mut best_id = 0i32;
// Collect all high-confidence faces in this trace for derived seeds
let mut trace_candidates: Vec<(Vec<f32>, f64)> = Vec::new();
for (_, emb, conf) in faces {
for (ref_uuid, ref_name, ref_emb) in &all_refs {
let s = cosine_similarity(emb, ref_emb);
if s > best_sim {
best_sim = s;
best_name = ref_name.clone();
if let Some(id_str) = ref_uuid.strip_prefix("derived:") {
if let Ok(parsed) = id_str.parse::<i32>() {
best_id = parsed;
}
} else if let Some((id, _)) = seed_identity_map.get(ref_uuid) {
best_id = *id;
}
}
}
if *conf >= DERIVED_CONF {
trace_candidates.push((emb.clone(), *conf));
}
}
if best_sim >= th && best_id > 0 {
new_matches.insert(tid, (best_name.clone(), best_id));
// Top MAX_FACES_PER_TRACE highest-confidence faces with angular diversity
trace_candidates.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
let mut selected: Vec<Vec<f32>> = Vec::new();
for (emb, conf) in trace_candidates {
if selected.len() >= MAX_FACES_PER_TRACE {
break;
}
if selected.iter().any(|e| cosine_similarity(e, &emb) >= ANGLE_SIM_THRESHOLD) {
continue;
}
selected.push(emb.clone());
seed_candidates.push((best_id, best_name.clone(), tid, emb, conf));
}
}
}
let new_count = new_matches.len();
if new_count == 0 && round_idx > 0 {
break;
}
matched.extend(new_matches);
// Build derived seeds: pick up to MAX_DERIVED_PER_ID per identity
// (max MAX_FACES_PER_TRACE from each trace), sorted by confidence descending
seed_candidates.sort_by(|a, b| b.4.partial_cmp(&a.4).unwrap());
let mut per_id: HashMap<i32, usize> = HashMap::new();
let mut trace_used_faces: HashMap<i32, usize> = HashMap::new();
let mut added_seeds = 0usize;
for (id, name, tid, emb, _) in &seed_candidates {
let cnt = per_id.entry(*id).or_insert(0);
if *cnt >= MAX_DERIVED_PER_ID {
continue;
}
let trace_cnt = trace_used_faces.entry(*tid).or_insert(0);
if *trace_cnt >= MAX_FACES_PER_TRACE {
continue;
}
*trace_cnt += 1;
*cnt += 1;
all_refs.push((format!("derived:{}", id), name.clone(), emb.clone()));
added_seeds += 1;
}
tracing::info!(
"[FaceMatch] Round {}: matched {}+{}={} total (TH={}, {} new derived seeds)",
round_idx + 1,
prev_total,
new_count,
matched.len(),
th,
added_seeds
);
prev_total = matched.len();
}
// Step 7: Stranger clustering for unmatched traces
let unmatched_ids: Vec<i32> = trace_faces
.keys()
.filter(|tid| !matched.contains_key(tid))
.copied()
.collect();
let mut stranger_map: HashMap<i32, String> = HashMap::new();
let mut assigned_stranger: std::collections::HashSet<i32> = std::collections::HashSet::new();
let mut stranger_count = 0usize;
// Sort by face count descending (most reliable first)
let mut sorted_unmatched: Vec<i32> = unmatched_ids.clone();
sorted_unmatched.sort_by(|a, b| {
trace_face_count
.get(b)
.unwrap_or(&0)
.cmp(trace_face_count.get(a).unwrap_or(&0))
});
for &tid in &sorted_unmatched {
if assigned_stranger.contains(&tid) {
continue;
}
let centroid_a = if let Some(faces) = trace_faces.get(&tid) {
average_embeddings(faces.iter().map(|(_, emb, _)| emb))
} else {
continue;
};
stranger_count += 1;
let stranger_id = format!("{}:stranger_{}", file_uuid, stranger_count);
assigned_stranger.insert(tid);
stranger_map.insert(tid, stranger_id.clone());
for &other_tid in &sorted_unmatched {
if assigned_stranger.contains(&other_tid) || other_tid == tid {
continue;
}
if let Some(faces_b) = trace_faces.get(&other_tid) {
let centroid_b = average_embeddings(faces_b.iter().map(|(_, emb, _)| emb));
let s = cosine_similarity(&centroid_a, &centroid_b);
if s >= TH_STRANGER {
assigned_stranger.insert(other_tid);
stranger_map.insert(other_tid, stranger_id.clone());
}
}
}
}
let stranger_trace_count = stranger_map.len();
tracing::info!(
"[FaceMatch] Stranger clusters: {} groups, {} traces",
stranger_count,
stranger_trace_count
);
// Step 8: Write results to TKG nodes + Qdrant payload + face_detections
let fd_table = schema::table_name("face_detections");
let nodes_table = schema::table_name("tkg_nodes");
let mut pg_updated = 0usize;
// Clear old identity assignments before writing new ones
let _ = sqlx::query(&format!(
"UPDATE {} SET identity_id = NULL WHERE file_uuid = $1",
fd_table
))
.bind(file_uuid)
.execute(pool)
.await;
// 8a: Matched traces → identity_ref
for (&tid, (name, identity_id)) in &matched {
// Skip if identity_id is invalid (FK constraint would fail)
if *identity_id <= 0 {
tracing::warn!(
"[FaceMatch] Skipping trace {}: invalid identity_id={}",
tid, identity_id
);
continue;
}
let identity_ref = format!("{}:{}", file_uuid, identity_id);
// TKG node
let external_id = format!("face_track_{}", tid);
if let Err(e) = sqlx::query(&format!(
"UPDATE {} SET properties = jsonb_set(\
jsonb_set(properties, '{{identity_ref}}', to_jsonb($1), true),\
'{{identity_name}}', to_jsonb($2), true)\
WHERE file_uuid = $3 AND node_type = 'face_track' AND external_id = $4",
nodes_table
))
.bind(&identity_ref)
.bind(name)
.bind(file_uuid)
.bind(&external_id)
.execute(pool)
.await
{
tracing::warn!("[FaceMatch] TKG update failed for trace {}: {:?}", tid, e);
}
// Qdrant payload
let _ = face_db
.update_identity_ref_by_trace(file_uuid, tid, &identity_ref)
.await;
// PostgreSQL face_detections (backward compat)
let rows = sqlx::query(&format!(
"UPDATE {} SET identity_id = $1 WHERE file_uuid = $2 AND trace_id = $3",
fd_table
))
.bind(identity_id)
.bind(file_uuid)
.bind(tid)
.execute(pool)
.await
.map(|r| r.rows_affected())
.unwrap_or(0);
pg_updated += rows as usize;
}
// 8b: Stranger traces → stranger_ref
for (&tid, stranger_ref) in &stranger_map {
// TKG node
let external_id = format!("face_track_{}", tid);
if let Err(e) = sqlx::query(&format!(
"UPDATE {} SET properties = jsonb_set(\
properties, '{{stranger_ref}}', to_jsonb($1), true)\
WHERE file_uuid = $2 AND node_type = 'face_track' AND external_id = $3",
nodes_table
))
.bind(stranger_ref)
.bind(file_uuid)
.bind(&external_id)
.execute(pool)
.await
{
tracing::warn!("[FaceMatch] TKG stranger update failed for trace {}: {:?}", tid, e);
}
// Qdrant payload
let _ = face_db
.update_stranger_ref_by_trace(file_uuid, tid, stranger_ref)
.await;
}
tracing::info!(
"[FaceMatch] Done: {} matched, {} strangers — {} face_detections updated",
matched.len(),
stranger_trace_count,
pg_updated
);
Ok(pg_updated)
} }
/// Fallback: PostgreSQL-based matching (original implementation) /// Fallback: PostgreSQL-based matching (disabled - embedding column removed)
async fn match_faces_iterative_pg(pool: &sqlx::PgPool, file_uuid: &str) -> anyhow::Result<usize> { async fn match_faces_iterative_pg(pool: &sqlx::PgPool, file_uuid: &str) -> anyhow::Result<usize> {
// Step 1: 載入 TMDb identities (source='tmdb' 且有 face_embedding) tracing::warn!(
let identities_table = schema::table_name("identities"); "[FaceMatch-PG] PostgreSQL matching disabled - embedding column removed for {}",
let tmdb_rows = sqlx::query_as::<_, (i32, String, Vec<f32>)>( file_uuid
&format!("SELECT id, name, face_embedding::real[] FROM {} WHERE source='tmdb' AND face_embedding IS NOT NULL", identities_table)
)
.fetch_all(pool).await?;
if tmdb_rows.is_empty() {
tracing::warn!("[FaceMatch-PG] No TMDb identities with face embeddings");
return Ok(0);
}
tracing::info!(
"[FaceMatch-PG] Loaded {} TMDb seed identities",
tmdb_rows.len()
); );
Ok(0)
// Step 2: 載入所有 face_detections含 frame_number按 trace_id 分組
let fd_table = schema::table_name("face_detections");
let fd_rows = sqlx::query_as::<_, (i32, i64, Vec<f32>)>(&format!(
"SELECT trace_id, frame_number, embedding FROM {} \
WHERE file_uuid=$1 AND trace_id IS NOT NULL AND embedding IS NOT NULL \
ORDER BY trace_id, frame_number",
fd_table
))
.bind(file_uuid)
.fetch_all(pool)
.await?;
if fd_rows.is_empty() {
tracing::warn!("[FaceMatch-PG] No face detections with embeddings");
return Ok(0);
}
// 分組trace_id → (frame_number, embedding)
use std::collections::HashMap;
let mut face_track_faces_raw: HashMap<i32, Vec<(i64, Vec<f32>)>> = HashMap::new();
for (tid, frame, emb) in &fd_rows {
face_track_faces_raw
.entry(*tid)
.or_insert_with(Vec::new)
.push((*frame, emb.clone()));
}
// 從每個 trace 選取不同角度的 3 個 face embedding
let mut face_track_samples: HashMap<i32, Vec<Vec<f32>>> = HashMap::new();
for (tid, mut faces) in face_track_faces_raw {
faces.sort_by_key(|(frame, _)| *frame);
let n = faces.len();
let indices = if n <= 3 {
(0..n).collect()
} else {
let mid = n / 2;
vec![0, mid, n - 1]
};
let samples: Vec<Vec<f32>> = indices.iter().map(|&i| faces[i].1.clone()).collect();
face_track_samples.insert(tid, samples);
}
let total_traces = face_track_samples.len();
let sample_count: usize = face_track_samples.values().map(|v| v.len()).sum();
tracing::info!(
"[FaceMatch-PG] Loaded {} traces, sampled {} embeddings (3-angle)",
total_traces,
sample_count
);
// Step 3: 建立 TMDb 查找表
let tmdb_seeds: Vec<(i32, String, Vec<f32>)> = tmdb_rows;
// Step 4: 迭代匹配
const TH: f32 = 0.50;
let mut matched: HashMap<i32, String> = HashMap::new(); // trace_id → identity_name
// Round 1: 用 3-angle samples 比對 TMDb
for (&tid, samples) in &face_track_samples {
let mut best_name = String::new();
let mut best_sim = 0.0f32;
for (_, ref name, ref tmdb_emb) in &tmdb_seeds {
for face_emb in samples {
let s = cosine_similarity(face_emb, tmdb_emb);
if s > best_sim {
best_sim = s;
best_name = name.clone();
}
}
}
if best_sim >= TH {
matched.insert(tid, best_name);
}
}
tracing::info!(
"[FaceMatch] Round 1: {} matched ({}%) — writing to DB",
matched.len(),
matched.len() * 100 / total_traces
);
// Step 5: 寫入 DB — Round 1 結果先存 (Phase 3: update both face_detections AND tkg_nodes)
let identities_table = schema::table_name("identities");
let strangers_table = schema::table_name("strangers");
let fd_table = schema::table_name("face_detections");
let nodes_table = schema::table_name("tkg_nodes");
let mut updated = 0usize;
for (tid, name) in &matched {
let id_opt = sqlx::query_scalar::<_, Option<i32>>(&format!(
"SELECT id FROM {} WHERE name=$1 AND source='tmdb'",
identities_table
))
.bind(name)
.fetch_optional(pool)
.await?;
if let Some(identity_id) = id_opt {
let _ = sqlx::query(&format!(
"UPDATE {} SET identity_id=$1 WHERE file_uuid=$2 AND trace_id=$3",
fd_table
))
.bind(identity_id)
.bind(file_uuid)
.bind(tid)
.execute(pool)
.await;
// Phase 3: Also update TKG node
let external_id = format!("face_track_{}", tid);
let _ = sqlx::query(&format!(
"UPDATE {} SET properties = jsonb_set(\
jsonb_set(properties, '{{identity_id}}', $1::jsonb, false),\
'{{identity_name}}', $2::jsonb, false)\
WHERE file_uuid = $3 AND node_type = 'face_track' AND external_id = $4",
nodes_table
))
.bind(identity_id)
.bind(name.as_str())
.bind(file_uuid)
.bind(&external_id)
.execute(pool)
.await;
updated += 1;
}
}
tracing::info!("[FaceMatch] Round 1: updated {} face_detections", updated);
// Round 2+: 用已匹配的 face 作為 seed 傳播(剩餘未匹配的 trace
let initial_matched = matched.len();
for round_n in 2..=5 {
let prev = matched.len();
// 建立 seed pool: name → Vec<embedding>
let mut seed_pool: HashMap<String, Vec<&Vec<f32>>> = HashMap::new();
for (&tid, name) in &matched {
if let Some(samples) = face_track_samples.get(&tid) {
seed_pool
.entry(name.clone())
.or_default()
.extend(samples.iter());
}
}
let mut new_matches: Vec<(i32, String)> = Vec::new();
for (&tid, samples) in &face_track_samples {
if matched.contains_key(&tid) {
continue;
}
let mut best_name = String::new();
let mut best_sim = 0.0f32;
if samples.is_empty() {
continue;
}
// 用 3-angle samples 分別比對 seed取最高 similarity
for (name, seed_faces) in &seed_pool {
for face_emb in samples {
for seed in seed_faces {
let s = cosine_similarity(face_emb, seed);
if s > best_sim {
best_sim = s;
best_name = name.clone();
}
}
}
}
if best_sim >= TH {
new_matches.push((tid, best_name));
}
}
for (tid, name) in new_matches {
matched.insert(tid, name);
}
let new = matched.len() - prev;
tracing::info!(
"[FaceMatch] Round {}: +{} matched (total {}, {}%)",
round_n,
new,
matched.len(),
matched.len() * 100 / total_traces
);
if new < 5 {
break;
}
}
// Step 6: 未匹配的 trace 設 stranger_id = strangers.id (FK)
// First: ensure strangers records exist
let _ = sqlx::query(&format!(
"INSERT INTO {} (file_uuid, trace_id) \
SELECT $1, fd.trace_id FROM {} fd \
WHERE fd.file_uuid = $1 AND fd.trace_id IS NOT NULL \
AND fd.identity_id IS NULL \
ON CONFLICT (file_uuid, trace_id) DO NOTHING",
strangers_table, fd_table
))
.bind(file_uuid)
.execute(pool)
.await?;
// Then: update face_detections.stranger_id = strangers.id
let stranger_update = sqlx::query(&format!(
"UPDATE {} fd SET stranger_id = s.id \
FROM {} s \
WHERE s.file_uuid = fd.file_uuid AND s.trace_id = fd.trace_id \
AND fd.file_uuid = $1 AND fd.identity_id IS NULL \
AND fd.trace_id IS NOT NULL AND fd.stranger_id IS NULL",
fd_table, strangers_table
))
.bind(file_uuid)
.execute(pool)
.await?;
let stranger_count = stranger_update.rows_affected();
// Step 7: Save identity files for all affected identities
let affected = sqlx::query_scalar::<_, uuid::Uuid>(&format!(
"SELECT DISTINCT i.uuid FROM {} i \
JOIN {} fd ON fd.identity_id = i.id \
WHERE fd.file_uuid=$1 AND fd.identity_id IS NOT NULL",
identities_table, fd_table
))
.bind(file_uuid)
.fetch_all(pool)
.await
.unwrap_or_default();
for uuid in &affected {
let us = uuid.to_string().replace('-', "");
if let Err(e) = crate::core::identity::storage::save_identity_file_by_pool(pool, &us).await
{
tracing::warn!("[FaceMatch] Failed to save identity file {}: {}", us, e);
}
}
tracing::info!(
"[FaceMatch] Done: {}/{} traces matched ({}%), {} strangers, {} identity files",
matched.len(),
total_traces,
matched.len() * 100 / total_traces,
stranger_count,
affected.len()
);
Ok(updated)
} }
/// Bind ASRX speakers to face traces based on temporal overlap. /// Bind ASRX speakers to face traces based on temporal overlap.
@@ -1589,126 +1013,9 @@ async fn run_identity_handler(
/// Read all TMDb identities with profile photos, extract face embeddings, store in Qdrant as seeds. /// Read all TMDb identities with profile photos, extract face embeddings, store in Qdrant as seeds.
pub async fn generate_seed_embeddings(db: &PostgresDb) -> anyhow::Result<usize> { pub async fn generate_seed_embeddings(db: &PostgresDb) -> anyhow::Result<usize> {
use crate::core::db::face_embedding_db::FaceEmbeddingDb; tracing::warn!(
use std::path::Path; "[GenerateSeeds] Seed embedding generation disabled - FaceEmbeddingDb removed. \
TODO: Reimplement with _faces collection"
let pool = db.pool();
let id_table = schema::table_name("identities");
let rows = sqlx::query_as::<_, (i32, String, String, i32, String)>(&format!(
"SELECT id, name, uuid::text, tmdb_id, tmdb_profile FROM {} \
WHERE source='tmdb' AND tmdb_profile IS NOT NULL",
id_table
))
.fetch_all(pool)
.await?;
if rows.is_empty() {
tracing::warn!("[GenerateSeeds] No TMDb identities with profile photos");
return Ok(0);
}
let scripts_dir = std::env::var("MOMENTRY_SCRIPTS_DIR")
.unwrap_or_else(|_| "/Users/accusys/momentry_core_0.1/scripts".to_string());
let python_path = std::env::var("MOMENTRY_PYTHON_PATH")
.unwrap_or_else(|_| "/opt/homebrew/bin/python3.11".to_string());
let extract_script = Path::new(&scripts_dir).join("extract_face_embedding.py");
let face_db = FaceEmbeddingDb::new();
let mut success = 0usize;
for (id, name, uuid, tmdb_id, profile_url) in &rows {
tracing::info!("[GenerateSeeds] Processing {} ({})", name, uuid);
// Download profile image
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.build()
.unwrap_or_else(|_| reqwest::Client::new());
let resp = client.get(profile_url).send().await;
let image_bytes = match resp {
Ok(r) if r.status().is_success() => r.bytes().await.unwrap_or_default(),
_ => {
tracing::warn!("[GenerateSeeds] Failed to download: {} from {}", name, profile_url);
continue;
}
};
if image_bytes.is_empty() {
tracing::warn!("[GenerateSeeds] Empty image for {}", name);
continue;
}
// Save to temp file
let temp_dir = std::env::temp_dir().join("momentry_seed_faces");
std::fs::create_dir_all(&temp_dir)?;
let temp_img = temp_dir.join(format!("{}.jpg", uuid));
std::fs::write(&temp_img, &image_bytes)?;
// Extract embedding with timeout
use tokio::time::timeout;
let output = timeout(
std::time::Duration::from_secs(180),
tokio::process::Command::new(&python_path)
.arg(&extract_script)
.arg(&temp_img)
.output(),
)
.await
.map_err(|_| anyhow::anyhow!("Extract embedding timed out for {}", name))??;
let _ = std::fs::remove_file(&temp_img);
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
tracing::warn!(
"[GenerateSeeds] Extraction failed for {}: {}",
name,
stderr.trim()
);
continue;
}
let stdout = String::from_utf8_lossy(&output.stdout);
let extract_result: serde_json::Value = match serde_json::from_str(&stdout) {
Ok(v) => v,
Err(e) => {
tracing::warn!("[GenerateSeeds] Parse error for {}: {}", name, e);
continue;
}
};
let embedding: Vec<f64> = match serde_json::from_value(
extract_result.get("embedding").ok_or_else(|| anyhow::anyhow!("No embedding"))?.clone(),
) {
Ok(v) => v,
Err(e) => {
tracing::warn!("[GenerateSeeds] Embedding format error for {}: {}", name, e);
continue;
}
};
let embedding_f32: Vec<f32> = embedding.into_iter().map(|v| v as f32).collect();
// Store in Qdrant
match face_db
.upsert_seed_embedding(uuid, name, *tmdb_id, &embedding_f32)
.await
{
Ok(_) => {
success += 1;
tracing::info!("[GenerateSeeds] Stored seed for {}", name);
}
Err(e) => {
tracing::warn!("[GenerateSeeds] Qdrant error for {}: {}", name, e);
}
}
}
tracing::info!(
"[GenerateSeeds] Done: {}/{} seeds generated",
success,
rows.len()
); );
Ok(success) Ok(0)
} }

View File

@@ -67,11 +67,13 @@ pub async fn bind_identity(
Path(identity_uuid): Path<String>, Path(identity_uuid): Path<String>,
Json(req): Json<BindIdentityRequest>, Json(req): Json<BindIdentityRequest>,
) -> Result<Json<ApiResponse<serde_json::Value>>, (StatusCode, Json<serde_json::Value>)> { ) -> Result<Json<ApiResponse<serde_json::Value>>, (StatusCode, Json<serde_json::Value>)> {
tracing::info!("[bind_identity] req: {:?}", req);
let table = crate::core::db::schema::table_name("face_detections"); let table = crate::core::db::schema::table_name("face_detections");
let id_table = crate::core::db::schema::table_name("identities"); let id_table = crate::core::db::schema::table_name("identities");
let history_table = crate::core::db::schema::table_name("identity_history"); let history_table = crate::core::db::schema::table_name("identity_history");
let uuid_clean = identity_uuid.replace('-', ""); let uuid_clean = identity_uuid.replace('-', "");
tracing::info!("[bind_identity] uuid_clean={}, expand_to_trace={:?}", uuid_clean, req.expand_to_trace);
let identity_row: Option<(i32, String)> = sqlx::query_as(&format!( let identity_row: Option<(i32, String)> = sqlx::query_as(&format!(
"SELECT id, name FROM {} WHERE REPLACE(uuid::text, '-', '') = $1", "SELECT id, name FROM {} WHERE REPLACE(uuid::text, '-', '') = $1",
id_table id_table
@@ -188,21 +190,32 @@ pub async fn bind_identity(
})? })?
.flatten(); .flatten();
// Update Qdrant + TKG if trace_id exists // Expand to entire trace if requested
if let Some(tid) = trace_id { tracing::info!("[bind_identity] trace_id={:?}, expand_to_trace={:?}", trace_id, req.expand_to_trace);
// 1. Update Qdrant payload if req.expand_to_trace.unwrap_or(false) && trace_id.is_some() {
let face_db = crate::core::db::FaceEmbeddingDb::new(); let tid = trace_id.unwrap();
if let Err(e) = face_db tracing::info!("[bind_identity] Expanding to trace {} for file {}", tid, req.file_uuid);
.update_identity_by_trace(&req.file_uuid, tid, &uuid_clean) let expand_result = sqlx::query(&format!(
.await "UPDATE {} SET identity_id = $1 WHERE file_uuid = $2 AND trace_id = $3",
{ table
tracing::warn!( ))
"[bind] Failed to update Qdrant identity_uuid for trace {}: {}", .bind(identity_id)
tid, e .bind(&req.file_uuid)
); .bind(tid)
.execute(state.db.pool())
.await;
if let Ok(r) = expand_result {
tracing::info!("[bind] Expanded to trace {}: {} rows", tid, r.rows_affected());
} else {
tracing::error!("[bind] Failed to expand to trace {}: {:?}", tid, expand_result.err());
} }
} else {
tracing::info!("[bind_identity] NOT expanding: expand_to_trace={:?}, trace_id={:?}", req.expand_to_trace, trace_id);
}
// 2. Update TKG face_track node (dual-field design) // Update TKG if trace_id exists
if let Some(tid) = trace_id {
// Update TKG face_track node (dual-field design)
let tkg_table = crate::core::db::schema::table_name("tkg_nodes"); let tkg_table = crate::core::db::schema::table_name("tkg_nodes");
let ext_id = format!("face_track_{}", tid); let ext_id = format!("face_track_{}", tid);
let identity_ref = format!("{}:identity_{}", req.file_uuid, identity_id); let identity_ref = format!("{}:identity_{}", req.file_uuid, identity_id);
@@ -380,21 +393,9 @@ pub async fn unbind_identity(
})? })?
.flatten(); .flatten();
// Clear Qdrant + TKG if trace_id exists // Clear TKG if trace_id exists
if let Some(tid) = trace_id { if let Some(tid) = trace_id {
// 1. Clear Qdrant payload // Update TKG face_track node (restore stranger_ref)
let face_db = crate::core::db::FaceEmbeddingDb::new();
if let Err(e) = face_db
.clear_identity_by_trace(&req.file_uuid, tid)
.await
{
tracing::warn!(
"[unbind] Failed to clear Qdrant identity_uuid for trace {}: {}",
tid, e
);
}
// 2. Update TKG face_track node (restore stranger_ref)
let tkg_table = crate::core::db::schema::table_name("tkg_nodes"); let tkg_table = crate::core::db::schema::table_name("tkg_nodes");
let ext_id = format!("face_track_{}", tid); let ext_id = format!("face_track_{}", tid);
let stranger_ref = format!("{}:stranger_trace_{}", req.file_uuid, tid); let stranger_ref = format!("{}:stranger_trace_{}", req.file_uuid, tid);
@@ -2199,8 +2200,10 @@ pub async fn list_pending_persons(
let fd_table = crate::core::db::schema::table_name("face_detections"); let fd_table = crate::core::db::schema::table_name("face_detections");
let rows: Vec<(i32, String, String, chrono::NaiveDateTime)> = sqlx::query_as(&format!( let rows: Vec<(i32, String, String, chrono::NaiveDateTime)> = sqlx::query_as(&format!(
"SELECT id, uuid::text, name, created_at FROM {} WHERE file_uuid = $1 AND status = 'pending' ORDER BY created_at DESC", "SELECT DISTINCT i.id, i.uuid::text, i.name, i.created_at FROM {} i \
id_table JOIN {} fd ON fd.identity_id = i.id \
WHERE fd.file_uuid = $1 AND i.status = 'pending' ORDER BY i.created_at DESC",
id_table, fd_table
)) ))
.bind(&file_uuid) .bind(&file_uuid)
.fetch_all(state.db.pool()) .fetch_all(state.db.pool())

View File

@@ -4,7 +4,6 @@ pub mod auth;
pub mod checkin_api; pub mod checkin_api;
pub mod docs; pub mod docs;
pub mod files; pub mod files;
pub mod five_w1h_agent_api;
pub mod health; pub mod health;
pub mod identities; pub mod identities;
pub mod identity_agent_api; pub mod identity_agent_api;

View File

@@ -260,7 +260,25 @@ async fn trigger_processing(
.await .await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?; .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
if existing_id.is_none() { if let Some(job_id) = existing_id {
// Clean up stale processor_results from previous runs
// Old entries with status='running' from a dead worker session
// would block the worker from actually running processors.
let pr_table = schema::table_name("processor_results");
sqlx::query(&format!("DELETE FROM {pr_table} WHERE job_id = $1"))
.bind(job_id)
.execute(state.db.pool())
.await
.map_err(|e| {
tracing::error!(
"[TRIGGER] Failed to clean processor_results for job {}: {}",
job_id,
e
);
StatusCode::INTERNAL_SERVER_ERROR
})?;
tracing::info!("[TRIGGER] Cleaned processor_results for job {}", job_id);
} else {
state state
.db .db
.create_monitor_job(&file_uuid, Some(&file_path)) .create_monitor_job(&file_uuid, Some(&file_path))

View File

@@ -14,7 +14,6 @@ use super::auth;
use super::checkin_api; use super::checkin_api;
use super::docs; use super::docs;
use super::files; use super::files;
use super::five_w1h_agent_api;
use super::health; use super::health;
use super::identities; use super::identities;
use super::identity_agent_api; use super::identity_agent_api;
@@ -116,7 +115,6 @@ pub async fn start_server(host: &str, port: u16) -> anyhow::Result<()> {
.merge(agent_search::agent_search_routes()) .merge(agent_search::agent_search_routes())
.merge(processing::processing_routes()) .merge(processing::processing_routes())
.merge(identity_agent_api::identity_agent_routes()) .merge(identity_agent_api::identity_agent_routes())
.merge(five_w1h_agent_api::five_w1h_agent_routes())
.merge(media_api::bbox_routes()) .merge(media_api::bbox_routes())
.merge(media_api::media_proxy_routes()) .merge(media_api::media_proxy_routes())
.merge(trace_agent_api::trace_agent_routes()) .merge(trace_agent_api::trace_agent_routes())

View File

@@ -608,122 +608,17 @@ async fn tmdb_match_handler(
)); ));
} }
// Get all TMDb identities with face_embedding tracing::warn!(
let tmdb_rows = sqlx::query_as::<_, (i32, String, Vec<f32>)>( "[TKG-MATCH] TMDb matching disabled - sync_trace_embeddings removed. \
&format!( TODO: Reimplement with _faces collection for {}",
"SELECT id, name, face_embedding::real[] FROM {} WHERE source='tmdb' AND face_embedding IS NOT NULL", file_uuid
crate::core::db::schema::table_name("identities")
)
)
.fetch_all(state.db.pool())
.await
.map_err(|e| {
(StatusCode::INTERNAL_SERVER_ERROR, Json(serde_json::json!({"error": e.to_string()})))
})?;
if tmdb_rows.is_empty() {
return Ok(Json(TmdbMatchResponse {
success: true,
file_uuid,
bindings_created: 0,
tmdb_identities_available: 0,
message: "No TMDb identities with face embeddings".to_string(),
}));
}
let face_collection = format!(
"{}_faces",
crate::core::config::REDIS_KEY_PREFIX
.as_str()
.trim_end_matches(':')
); );
let qdrant = QdrantDb::new();
let _ = qdrant.ensure_collection(&face_collection, 512).await;
let trace_collection = format!(
"{}_traces",
crate::core::config::REDIS_KEY_PREFIX
.as_str()
.trim_end_matches(':')
);
let _ = qdrant.ensure_collection(&trace_collection, 512).await;
// Sync trace embeddings (idempotent)
if let Err(e) = crate::core::db::qdrant_db::sync_trace_embeddings(&file_uuid).await {
tracing::error!("[TKG-MATCH] Trace sync failed: {}", e);
}
let mut total_bindings = 0usize;
for (tmdb_id, tmdb_name, tmdb_embedding) in &tmdb_rows {
// Search Qdrant trace collection with this TMDb embedding
let results = match qdrant
.search_face_collection(
&trace_collection,
tmdb_embedding,
100,
"source",
"tmdb",
Some(&file_uuid),
)
.await
{
Ok(r) => r,
Err(e) => {
tracing::warn!("[TKG-MATCH] Qdrant search failed for {}: {}", tmdb_name, e);
continue;
}
};
// Filter results by threshold and file_uuid
let filtered: Vec<_> = results
.into_iter()
.filter(|(score, payload)| {
*score >= 0.50
&& payload.get("file_uuid").and_then(|v| v.as_str()) == Some(&file_uuid)
})
.collect();
if filtered.is_empty() {
continue;
}
// Bind matched traces directly
let mut bound_count = 0usize;
for (_score, payload) in &filtered {
if let Some(tid) = payload.get("trace_id").and_then(|v| v.as_i64()) {
let r = sqlx::query(&format!(
"UPDATE {} SET identity_id=$1 WHERE file_uuid=$2 AND trace_id=$3",
crate::core::db::schema::table_name("face_detections")
))
.bind(tmdb_id)
.bind(&file_uuid)
.bind(tid as i32)
.execute(state.db.pool())
.await;
if let Ok(result) = r {
bound_count += result.rows_affected() as usize;
}
}
}
if bound_count > 0 {
tracing::info!(
"[TKG-MATCH] {}: bound {} traces to TMDb identity {}",
tmdb_name,
bound_count,
tmdb_id
);
}
total_bindings += bound_count;
}
Ok(Json(TmdbMatchResponse { Ok(Json(TmdbMatchResponse {
success: true, success: true,
file_uuid, file_uuid,
bindings_created: total_bindings, bindings_created: 0,
tmdb_identities_available: tmdb_rows.len(), tmdb_identities_available: 0,
message: format!("{} traces matched to TMDb identities", total_bindings), message: "TMDb matching disabled - needs reimplementation with _faces collection".to_string(),
})) }))
} }

View File

@@ -45,11 +45,6 @@ pub enum Commands {
/// File UUID /// File UUID
uuid: String, uuid: String,
}, },
/// Generate story for cut scenes
Story {
/// UUID
uuid: String,
},
/// Detect objects in an image using CLIP or Qwen3-VL /// Detect objects in an image using CLIP or Qwen3-VL
Detect { Detect {
/// Image path /// Image path

View File

@@ -145,42 +145,6 @@ pub async fn checkin(db: &PostgresDb, file_uuid: &str) -> Result<CheckinResult>
} }
} }
} }
// Traces → production traces collection
let traces_coll = format!(
"{}_traces",
crate::core::config::REDIS_KEY_PREFIX
.as_str()
.trim_end_matches(':')
);
for point in &ws_data.traces {
if let Some(ref vector) = point.vector {
let payload_val: serde_json::Value =
serde_json::to_value(&point.payload).unwrap_or(serde_json::Value::Null);
let point_id: u64 = match point.id.parse::<u64>() {
Ok(id) => id,
Err(_) => {
use std::hash::{Hash, Hasher};
let mut hasher = std::collections::hash_map::DefaultHasher::new();
point.id.hash(&mut hasher);
hasher.finish()
}
};
if let Err(e) = qdrant
.upsert_vector_to_collection(
&traces_coll,
point_id,
vector,
Some(payload_val),
)
.await
{
warn!("Failed to checkin trace vector {}: {}", point.id, e);
} else {
vectors_moved += 1;
}
}
}
} }
Err(e) => { Err(e) => {
warn!("Failed to scroll Qdrant workspace for {}: {}", file_uuid, e); warn!("Failed to scroll Qdrant workspace for {}: {}", file_uuid, e);
@@ -297,10 +261,9 @@ pub async fn checkout(db: &PostgresDb, file_uuid: &str) -> Result<CheckoutResult
let prefix = crate::core::config::REDIS_KEY_PREFIX let prefix = crate::core::config::REDIS_KEY_PREFIX
.as_str() .as_str()
.trim_end_matches(':'); .trim_end_matches(':');
let traces_coll = format!("{}_traces", prefix);
let voice_coll = format!("{}_voice", file_uuid); let voice_coll = format!("{}_voice", file_uuid);
for coll in &[traces_coll, voice_coll] { for coll in &[voice_coll] {
if let Err(e) = QdrantDb::delete_by_uuid_from_collection( if let Err(e) = QdrantDb::delete_by_uuid_from_collection(
&qdrant.client, &qdrant.client,
&qdrant.base_url, &qdrant.base_url,

View File

@@ -1,950 +0,0 @@
use anyhow::{Context, Result};
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
pub struct FaceEmbeddingDb {
client: Client,
base_url: String,
api_key: String,
collection_name: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FaceEmbeddingPayload {
pub file_uuid: String,
pub trace_id: i32,
pub frame: i64,
pub bbox_x: f64,
pub bbox_y: f64,
pub bbox_w: f64,
pub bbox_h: f64,
pub confidence: f64,
pub yaw: f64,
pub pitch: f64,
pub roll: f64,
#[serde(skip_serializing_if = "Option::is_none")]
pub identity_uuid: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub identity_ref: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub stranger_ref: Option<String>,
#[serde(skip_serializing_if = "Option::is_none", rename = "type")]
pub r#type: Option<String>,
}
#[derive(Debug, Clone, Deserialize)]
pub struct FaceEmbeddingPoint {
pub id: String,
pub vector: Vec<f32>,
pub payload: FaceEmbeddingPayload,
pub score: f64,
}
impl FaceEmbeddingDb {
pub fn new() -> Self {
let schema = std::env::var("DATABASE_SCHEMA").unwrap_or_else(|_| "dev".to_string());
let collection_name = format!("{}_face_embeddings", schema);
let base_url =
std::env::var("QDRANT_URL").unwrap_or_else(|_| "http://localhost:6333".to_string());
let api_key = std::env::var("QDRANT_API_KEY")
.unwrap_or_else(|_| "Test3200Test3200Test3200".to_string());
Self {
client: Client::new(),
base_url,
api_key,
collection_name,
}
}
pub async fn init_collection(&self) -> Result<()> {
let url = format!("{}/collections/{}", self.base_url, self.collection_name);
let response = self
.client
.get(&url)
.header("api-key", &self.api_key)
.send()
.await?;
if response.status().is_success() {
tracing::info!(
"[FaceEmbedding] Collection {} already exists",
self.collection_name
);
return Ok(());
}
let create_url = format!("{}/collections/{}", self.base_url, self.collection_name);
let body = serde_json::json!({
"vectors": {
"size": 512,
"distance": "Cosine"
}
});
self.client
.put(&create_url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to create face embeddings collection")?;
tracing::info!(
"[FaceEmbedding] Created collection {} (dim=512)",
self.collection_name
);
Ok(())
}
pub async fn upsert_embedding(
&self,
point_id: &str,
embedding: &[f32],
payload: &FaceEmbeddingPayload,
) -> Result<()> {
let url = format!(
"{}/collections/{}/points?wait=true",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"points": [{
"id": point_id,
"vector": embedding,
"payload": payload
}]
});
let response = self
.client
.put(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to upsert face embedding")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant upsert failed: {}", text);
}
Ok(())
}
pub async fn batch_upsert(
&self,
points: Vec<(String, Vec<f32>, FaceEmbeddingPayload)>,
) -> Result<usize> {
if points.is_empty() {
return Ok(0);
}
let url = format!(
"{}/collections/{}/points?wait=true",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"points": points.iter().map(|(id, vec, payload)| {
// Parse id as u64 for Qdrant (requires integer or UUID)
let id_num: u64 = id.parse().unwrap_or(0);
serde_json::json!({
"id": id_num,
"vector": vec,
"payload": payload
})
}).collect::<Vec<_>>()
});
let response = self
.client
.put(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to batch upsert face embeddings")?;
if !response.status().is_success() {
let status = response.status();
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant batch upsert failed (HTTP {}): {}", status, text);
}
Ok(points.len())
}
pub async fn update_identity_by_trace(
&self,
file_uuid: &str,
trace_id: i32,
identity_uuid: &str,
) -> Result<usize> {
let url = format!(
"{}/collections/{}/points",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"filter": {
"must": [
{
"key": "file_uuid",
"match": { "value": file_uuid }
},
{
"key": "trace_id",
"match": { "value": trace_id }
}
]
},
"payload": {
"identity_uuid": identity_uuid
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to update identity_uuid in Qdrant")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant identity update failed: {}", text);
}
tracing::info!(
"[FaceEmbedding] Updated identity_uuid={} for file={}, trace={}",
identity_uuid, file_uuid, trace_id
);
Ok(1)
}
pub async fn clear_identity_by_trace(
&self,
file_uuid: &str,
trace_id: i32,
) -> Result<usize> {
let url = format!(
"{}/collections/{}/points",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"filter": {
"must": [
{
"key": "file_uuid",
"match": { "value": file_uuid }
},
{
"key": "trace_id",
"match": { "value": trace_id }
}
]
},
"payload": {
"identity_uuid": null
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to clear identity_uuid in Qdrant")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant identity clear failed: {}", text);
}
tracing::info!(
"[FaceEmbedding] Cleared identity_uuid for file={}, trace={}",
file_uuid, trace_id
);
Ok(1)
}
pub async fn search_similar(
&self,
query_embedding: &[f32],
file_uuid: Option<&str>,
limit: usize,
threshold: f64,
) -> Result<Vec<FaceEmbeddingPoint>> {
let url = format!(
"{}/collections/{}/points/search",
self.base_url, self.collection_name
);
let mut filter = serde_json::json!({});
if let Some(fu) = file_uuid {
filter = serde_json::json!({
"must": [{
"key": "file_uuid",
"match": { "value": fu }
}]
});
}
let body = serde_json::json!({
"vector": query_embedding,
"limit": limit,
"with_payload": true,
"with_vector": false,
"filter": filter
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to search face embeddings")?;
let status = response.status();
let text = response.text().await.unwrap_or_default();
if !status.is_success() {
anyhow::bail!("Qdrant search failed: {} - {}", status, text);
}
#[derive(Deserialize)]
struct SearchResult {
result: Vec<PointResult>,
}
#[derive(Deserialize)]
struct PointResult {
id: serde_json::Value,
score: f64,
payload: HashMap<String, serde_json::Value>,
}
let parsed: SearchResult =
serde_json::from_str(&text).context("Failed to parse Qdrant search response")?;
let results: Vec<FaceEmbeddingPoint> = parsed
.result
.into_iter()
.filter(|r| r.score >= threshold)
.map(|r| {
let id = match r.id {
serde_json::Value::String(s) => s,
serde_json::Value::Number(n) => n.to_string(),
_ => "unknown".to_string(),
};
let payload = FaceEmbeddingPayload {
file_uuid: r
.payload
.get("file_uuid")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string(),
trace_id: r
.payload
.get("trace_id")
.and_then(|v| v.as_i64())
.unwrap_or(0) as i32,
frame: r.payload.get("frame").and_then(|v| v.as_i64()).unwrap_or(0),
bbox_x: r
.payload
.get("bbox_x")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
bbox_y: r
.payload
.get("bbox_y")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
bbox_w: r
.payload
.get("bbox_w")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
bbox_h: r
.payload
.get("bbox_h")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
confidence: r
.payload
.get("confidence")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
yaw: r.payload.get("yaw").and_then(|v| v.as_f64()).unwrap_or(0.0),
pitch: r
.payload
.get("pitch")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
roll: r
.payload
.get("roll")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
identity_uuid: r
.payload
.get("identity_uuid")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
identity_ref: r
.payload
.get("identity_ref")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
stranger_ref: r
.payload
.get("stranger_ref")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
r#type: r
.payload
.get("type")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
};
FaceEmbeddingPoint {
id,
vector: vec![], // Not returned with_vector=false
payload,
score: r.score,
}
})
.collect();
Ok(results)
}
pub async fn get_embeddings_by_trace(
&self,
file_uuid: &str,
trace_id: i32,
) -> Result<Vec<(String, Vec<f32>)>> {
let url = format!(
"{}/collections/{}/points/scroll",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"limit": 1000,
"with_payload": true,
"with_vector": true,
"filter": {
"must": [
{"key": "file_uuid", "match": { "value": file_uuid }},
{"key": "trace_id", "match": { "value": trace_id }}
]
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to scroll face embeddings")?;
let status = response.status();
let text = response.text().await.unwrap_or_default();
if !status.is_success() {
anyhow::bail!("Qdrant scroll failed: {} - {}", status, text);
}
#[derive(Deserialize)]
struct ScrollResult {
result: ScrollPoints,
}
#[derive(Deserialize)]
struct ScrollPoints {
points: Vec<PointResult>,
}
#[derive(Deserialize)]
struct PointResult {
id: serde_json::Value,
vector: Vec<f32>,
}
let parsed: ScrollResult =
serde_json::from_str(&text).context("Failed to parse Qdrant scroll response")?;
let results: Vec<(String, Vec<f32>)> = parsed
.result
.points
.into_iter()
.map(|r| {
let id = match r.id {
serde_json::Value::String(s) => s,
serde_json::Value::Number(n) => n.to_string(),
_ => "unknown".to_string(),
};
(id, r.vector)
})
.collect();
Ok(results)
}
pub async fn get_all_embeddings_for_file(
&self,
file_uuid: &str,
) -> Result<Vec<(String, Vec<f32>, FaceEmbeddingPayload)>> {
let url = format!(
"{}/collections/{}/points/scroll",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"limit": 10000,
"with_payload": true,
"with_vector": true,
"filter": {
"must": [
{"key": "file_uuid", "match": { "value": file_uuid }}
]
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to scroll face embeddings")?;
let status = response.status();
let text = response.text().await.unwrap_or_default();
if !status.is_success() {
anyhow::bail!("Qdrant scroll failed: {} - {}", status, text);
}
#[derive(Deserialize)]
struct ScrollResult {
result: ScrollPoints,
}
#[derive(Deserialize)]
struct ScrollPoints {
points: Vec<PointResult>,
}
#[derive(Deserialize)]
struct PointResult {
id: serde_json::Value,
vector: Vec<f32>,
payload: HashMap<String, serde_json::Value>,
}
let parsed: ScrollResult =
serde_json::from_str(&text).context("Failed to parse Qdrant scroll response")?;
let results: Vec<(String, Vec<f32>, FaceEmbeddingPayload)> = parsed
.result
.points
.into_iter()
.map(|r| {
let id = match r.id {
serde_json::Value::String(s) => s,
serde_json::Value::Number(n) => n.to_string(),
_ => "unknown".to_string(),
};
let payload = FaceEmbeddingPayload {
file_uuid: r
.payload
.get("file_uuid")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string(),
trace_id: r
.payload
.get("trace_id")
.and_then(|v| v.as_i64())
.unwrap_or(0) as i32,
frame: r.payload.get("frame").and_then(|v| v.as_i64()).unwrap_or(0),
bbox_x: r
.payload
.get("bbox_x")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
bbox_y: r
.payload
.get("bbox_y")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
bbox_w: r
.payload
.get("bbox_w")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
bbox_h: r
.payload
.get("bbox_h")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
confidence: r
.payload
.get("confidence")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
yaw: r.payload.get("yaw").and_then(|v| v.as_f64()).unwrap_or(0.0),
pitch: r
.payload
.get("pitch")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
roll: r
.payload
.get("roll")
.and_then(|v| v.as_f64())
.unwrap_or(0.0),
identity_uuid: r
.payload
.get("identity_uuid")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
identity_ref: r
.payload
.get("identity_ref")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
stranger_ref: r
.payload
.get("stranger_ref")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
r#type: r
.payload
.get("type")
.and_then(|v| v.as_str())
.map(|s| s.to_string()),
};
(id, r.vector, payload)
})
.collect();
Ok(results)
}
pub async fn delete_file_embeddings(&self, file_uuid: &str) -> Result<usize> {
let url = format!(
"{}/collections/{}/points/delete?wait=true",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"filter": {
"must": [
{"key": "file_uuid", "match": { "value": file_uuid }}
]
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to delete face embeddings")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant delete failed: {}", text);
}
Ok(0)
}
pub async fn upsert_seed_embedding(
&self,
identity_uuid: &str,
identity_name: &str,
tmdb_id: i32,
embedding: &[f32],
) -> Result<()> {
let url = format!(
"{}/collections/{}/points?wait=true",
self.base_url, self.collection_name
);
let point_id = identity_uuid.to_string();
let payload = serde_json::json!({
"file_uuid": "",
"trace_id": 0,
"frame": 0,
"bbox_x": 0.0,
"bbox_y": 0.0,
"bbox_w": 0.0,
"bbox_h": 0.0,
"confidence": 0.0,
"yaw": 0.0,
"pitch": 0.0,
"roll": 0.0,
"identity_uuid": identity_uuid,
"identity_ref": serde_json::Value::Null,
"stranger_ref": serde_json::Value::Null,
"identity_name": identity_name,
"tmdb_id": tmdb_id,
"type": "identity_seed",
});
let body = serde_json::json!({
"points": [{
"id": point_id,
"vector": embedding,
"payload": payload
}]
});
let response = self
.client
.put(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to upsert seed embedding")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant seed upsert failed: {}", text);
}
tracing::info!(
"[SeedEmbedding] Stored seed for identity_uuid={}, name={}",
identity_uuid, identity_name
);
Ok(())
}
pub async fn get_seed_embeddings(
&self,
) -> Result<Vec<(String, String, Vec<f32>)>> {
let url = format!(
"{}/collections/{}/points/scroll",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"limit": 10000,
"with_payload": true,
"with_vector": true,
"filter": {
"must": [
{"key": "type", "match": { "value": "identity_seed" }}
]
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to scroll seed embeddings")?;
let status = response.status();
let text = response.text().await.unwrap_or_default();
if !status.is_success() {
anyhow::bail!("Qdrant scroll failed: {} - {}", status, text);
}
#[derive(Deserialize)]
struct ScrollResult {
result: ScrollPoints,
}
#[derive(Deserialize)]
struct ScrollPoints {
points: Vec<PointResult>,
}
#[derive(Deserialize)]
struct PointResult {
id: serde_json::Value,
vector: Vec<f32>,
payload: HashMap<String, serde_json::Value>,
}
let parsed: ScrollResult =
serde_json::from_str(&text).context("Failed to parse Qdrant scroll response")?;
let results: Vec<(String, String, Vec<f32>)> = parsed
.result
.points
.into_iter()
.filter_map(|r| {
let identity_uuid = r
.payload
.get("identity_uuid")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let identity_name = r
.payload
.get("identity_name")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
if identity_uuid.is_empty() {
None
} else {
Some((identity_uuid, identity_name, r.vector))
}
})
.collect();
Ok(results)
}
pub async fn update_identity_ref_by_trace(
&self,
file_uuid: &str,
trace_id: i32,
identity_ref: &str,
) -> Result<usize> {
let url = format!(
"{}/collections/{}/points/payload",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"filter": {
"must": [
{
"key": "file_uuid",
"match": { "value": file_uuid }
},
{
"key": "trace_id",
"match": { "value": trace_id }
}
]
},
"payload": {
"identity_ref": identity_ref
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to update identity_ref in Qdrant")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant identity_ref update failed: {}", text);
}
tracing::info!(
"[FaceEmbedding] Updated identity_ref={} for file={}, trace={}",
identity_ref, file_uuid, trace_id
);
Ok(1)
}
pub async fn update_stranger_ref_by_trace(
&self,
file_uuid: &str,
trace_id: i32,
stranger_ref: &str,
) -> Result<usize> {
let url = format!(
"{}/collections/{}/points/payload",
self.base_url, self.collection_name
);
let body = serde_json::json!({
"filter": {
"must": [
{
"key": "file_uuid",
"match": { "value": file_uuid }
},
{
"key": "trace_id",
"match": { "value": trace_id }
}
]
},
"payload": {
"stranger_ref": stranger_ref
}
});
let response = self
.client
.post(&url)
.header("api-key", &self.api_key)
.header("Content-Type", "application/json")
.json(&body)
.send()
.await
.context("Failed to update stranger_ref in Qdrant")?;
if !response.status().is_success() {
let text = response.text().await.unwrap_or_default();
anyhow::bail!("Qdrant stranger_ref update failed: {}", text);
}
tracing::info!(
"[FaceEmbedding] Updated stranger_ref={} for file={}, trace={}",
stranger_ref, file_uuid, trace_id
);
Ok(1)
}
}
impl Default for FaceEmbeddingDb {
fn default() -> Self {
Self::new()
}
}

View File

@@ -32,14 +32,12 @@ pub trait VectorStore: Send + Sync {
async fn search(&self, query_vector: &[f32], limit: usize) -> Result<Vec<SearchResult>>; async fn search(&self, query_vector: &[f32], limit: usize) -> Result<Vec<SearchResult>>;
} }
pub mod face_embedding_db;
pub mod identity_merge_history; pub mod identity_merge_history;
pub mod mongodb_db; pub mod mongodb_db;
pub mod postgres_db; pub mod postgres_db;
pub mod qdrant_db; pub mod qdrant_db;
pub mod redis_client; pub mod redis_client;
pub mod redis_db; pub mod redis_db;
pub use face_embedding_db::{FaceEmbeddingDb, FaceEmbeddingPayload, FaceEmbeddingPoint};
pub use identity_merge_history::{ pub use identity_merge_history::{
AliasEntry, FacesTransferred, IdentityMergeHistory, IdentityMergeHistoryStore, AliasEntry, FacesTransferred, IdentityMergeHistory, IdentityMergeHistoryStore,
IdentitySnapshot, MergeHistoryEntry, MergeHistoryQuery, MergeParams, TargetIdentitySnapshot, IdentitySnapshot, MergeHistoryEntry, MergeHistoryQuery, MergeParams, TargetIdentitySnapshot,

View File

@@ -448,10 +448,7 @@ pub enum ProcessorType {
Hand, Hand,
Asrx, Asrx,
Scene, Scene,
Story,
FiveW1H,
Appearance, Appearance,
MediaPipe,
FaceCluster, FaceCluster,
} }
@@ -488,10 +485,7 @@ impl ProcessorType {
ProcessorType::Hand => "hand", ProcessorType::Hand => "hand",
ProcessorType::Asrx => "asrx", ProcessorType::Asrx => "asrx",
ProcessorType::Scene => "scene", ProcessorType::Scene => "scene",
ProcessorType::Story => "story",
ProcessorType::FiveW1H => "5w1h",
ProcessorType::Appearance => "appearance", ProcessorType::Appearance => "appearance",
ProcessorType::MediaPipe => "mediapipe",
ProcessorType::FaceCluster => "face_cluster", ProcessorType::FaceCluster => "face_cluster",
} }
} }
@@ -507,10 +501,7 @@ impl ProcessorType {
"hand" => Some(ProcessorType::Hand), "hand" => Some(ProcessorType::Hand),
"asrx" => Some(ProcessorType::Asrx), "asrx" => Some(ProcessorType::Asrx),
"scene" => Some(ProcessorType::Scene), "scene" => Some(ProcessorType::Scene),
"story" => Some(ProcessorType::Story),
"5w1h" => Some(ProcessorType::FiveW1H),
"appearance" => Some(ProcessorType::Appearance), "appearance" => Some(ProcessorType::Appearance),
"mediapipe" => Some(ProcessorType::MediaPipe),
"face_cluster" => Some(ProcessorType::FaceCluster), "face_cluster" => Some(ProcessorType::FaceCluster),
_ => None, _ => None,
} }
@@ -527,10 +518,7 @@ impl ProcessorType {
ProcessorType::Hand => 0.4, ProcessorType::Hand => 0.4,
ProcessorType::Asrx => 0.8, ProcessorType::Asrx => 0.8,
ProcessorType::Scene => 0.3, ProcessorType::Scene => 0.3,
ProcessorType::Story => 0.1,
ProcessorType::FiveW1H => 0.1,
ProcessorType::Appearance => 0.3, ProcessorType::Appearance => 0.3,
ProcessorType::MediaPipe => 0.3,
ProcessorType::FaceCluster => 0.7, ProcessorType::FaceCluster => 0.7,
} }
} }
@@ -538,7 +526,6 @@ impl ProcessorType {
pub fn uses_gpu(&self) -> bool { pub fn uses_gpu(&self) -> bool {
match self { match self {
ProcessorType::Yolo | ProcessorType::Face | ProcessorType::Pose | ProcessorType::Hand => true, ProcessorType::Yolo | ProcessorType::Face | ProcessorType::Pose | ProcessorType::Hand => true,
ProcessorType::MediaPipe | ProcessorType::FaceCluster => false,
_ => false, _ => false,
} }
} }
@@ -554,10 +541,7 @@ impl ProcessorType {
ProcessorType::Hand => 1024, ProcessorType::Hand => 1024,
ProcessorType::Asrx => 2048, ProcessorType::Asrx => 2048,
ProcessorType::Scene => 512, ProcessorType::Scene => 512,
ProcessorType::Story => 256,
ProcessorType::FiveW1H => 256,
ProcessorType::Appearance => 512, ProcessorType::Appearance => 512,
ProcessorType::MediaPipe => 1024,
ProcessorType::FaceCluster => 1024, ProcessorType::FaceCluster => 1024,
} }
} }
@@ -573,10 +557,7 @@ impl ProcessorType {
ProcessorType::Hand => Some("vision/hand_pose"), ProcessorType::Hand => Some("vision/hand_pose"),
ProcessorType::Asrx => Some("speechbrain/ecapa-tdnn"), ProcessorType::Asrx => Some("speechbrain/ecapa-tdnn"),
ProcessorType::Scene => Some("places365"), ProcessorType::Scene => Some("places365"),
ProcessorType::Story => None,
ProcessorType::FiveW1H => Some("gemma4"),
ProcessorType::Appearance => None, ProcessorType::Appearance => None,
ProcessorType::MediaPipe => Some("mediapipe/holistic"),
ProcessorType::FaceCluster => Some("sklearn/agglomerative"), ProcessorType::FaceCluster => Some("sklearn/agglomerative"),
} }
} }
@@ -585,17 +566,8 @@ impl ProcessorType {
match self { match self {
ProcessorType::Asrx => vec![ProcessorType::Cut, ProcessorType::Asr], ProcessorType::Asrx => vec![ProcessorType::Cut, ProcessorType::Asr],
ProcessorType::Scene => vec![ProcessorType::Cut], ProcessorType::Scene => vec![ProcessorType::Cut],
ProcessorType::Story => vec![
ProcessorType::Asrx,
ProcessorType::Cut,
ProcessorType::Yolo,
ProcessorType::Face,
],
ProcessorType::FiveW1H => vec![ProcessorType::Story],
ProcessorType::Appearance => vec![ProcessorType::Pose], ProcessorType::Appearance => vec![ProcessorType::Pose],
ProcessorType::FaceCluster => vec![ProcessorType::Face], ProcessorType::FaceCluster => vec![ProcessorType::Face],
ProcessorType::Hand => vec![],
ProcessorType::MediaPipe => vec![],
_ => vec![], _ => vec![],
} }
} }
@@ -623,15 +595,12 @@ impl ProcessorType {
| ProcessorType::Pose | ProcessorType::Pose
| ProcessorType::Hand | ProcessorType::Hand
| ProcessorType::Appearance | ProcessorType::Appearance
| ProcessorType::MediaPipe
| ProcessorType::FaceCluster => PipelineType::Frame, | ProcessorType::FaceCluster => PipelineType::Frame,
ProcessorType::Cut ProcessorType::Cut
| ProcessorType::Asr | ProcessorType::Asr
| ProcessorType::Asrx | ProcessorType::Asrx
| ProcessorType::Scene | ProcessorType::Scene => PipelineType::Time,
| ProcessorType::Story
| ProcessorType::FiveW1H => PipelineType::Time,
} }
} }
} }
@@ -2612,76 +2581,32 @@ sqlx::query(
Ok(results) Ok(results)
} }
/// Face clustering: group unregistered faces within same trace by embedding similarity /// Face clustering: disabled - embedding column no longer used
pub async fn cluster_face_embeddings( pub async fn cluster_face_embeddings(
&self, &self,
file_uuid: &str, file_uuid: &str,
similarity_threshold: f64, _similarity_threshold: f64,
) -> Result<Vec<FaceClusterGroup>> { ) -> Result<Vec<FaceClusterGroup>> {
let table = schema::table_name("face_detections"); tracing::warn!(
let rows = sqlx::query_as::<_, (String, i64)>(&format!( "[cluster_face_embeddings] Disabled - embedding column removed for {}",
r#" file_uuid
SELECT trace_id::text, COUNT(DISTINCT frame_number) as frame_count );
FROM {} Ok(Vec::new())
WHERE file_uuid = $1
AND embedding IS NOT NULL
AND identity_id IS NULL
GROUP BY trace_id
ORDER BY frame_count DESC
"#,
table
))
.bind(file_uuid)
.fetch_all(&self.pool)
.await?;
Ok(rows
.into_iter()
.map(|(trace_id, frame_count)| FaceClusterGroup {
trace_id,
frame_count: frame_count as i32,
})
.collect())
} }
/// Search similar faces by embedding via pgvector cosine distance /// Search similar faces: disabled - embedding column no longer used
pub async fn search_similar_faces( pub async fn search_similar_faces(
&self, &self,
query_embedding: &[f32], _query_embedding: &[f32],
file_uuid: &str, file_uuid: &str,
limit: i64, _limit: i64,
threshold: f64, _threshold: f64,
) -> Result<Vec<SimilarFaceResult>> { ) -> Result<Vec<SimilarFaceResult>> {
let table = schema::table_name("face_detections"); tracing::warn!(
let rows = sqlx::query_as::<_, (i32, i32, f64)>(&format!( "[search_similar_faces] Disabled - embedding column removed for {}",
r#" file_uuid
SELECT id, trace_id, );
1 - (embedding::vector <=> $1::vector) as similarity Ok(Vec::new())
FROM {}
WHERE file_uuid = $2
AND embedding IS NOT NULL
AND 1 - (embedding::vector <=> $1::vector) >= $3
ORDER BY embedding::vector <=> $1::vector
LIMIT $4
"#,
table
))
.bind(query_embedding)
.bind(file_uuid)
.bind(threshold)
.bind(limit)
.fetch_all(&self.pool)
.await?;
Ok(rows
.into_iter()
.map(|(id, trace_id, similarity)| SimilarFaceResult {
id,
trace_id,
similarity,
bbox: String::new(),
})
.collect())
} }
// ========================================== // ==========================================

View File

@@ -768,45 +768,6 @@ impl QdrantDb {
Ok(result.result.points_count) Ok(result.result.points_count)
} }
/// Store face embedding with trace_id + frame_number payload
pub async fn upsert_face_embedding(
&self,
point_id: u64,
vector: &[f32],
file_uuid: &str,
trace_id: i32,
frame_number: i64,
) -> Result<()> {
let url = format!(
"{}/collections/{}/points?wait=true",
self.base_url, self.collection_name
);
let mut payload_map = std::collections::HashMap::new();
payload_map.insert("file_uuid".to_string(), serde_json::json!(file_uuid));
payload_map.insert("trace_id".to_string(), serde_json::json!(trace_id));
payload_map.insert("frame_number".to_string(), serde_json::json!(frame_number));
payload_map.insert("type".to_string(), serde_json::json!("face_embedding"));
let point = serde_json::json!({
"points": [{
"id": point_id,
"vector": vector,
"payload": payload_map
}]
});
let resp = self
.client
.put(&url)
.header("api-key", &self.api_key)
.json(&point)
.send()
.await?;
if !resp.status().is_success() {
anyhow::bail!("Qdrant upsert face failed: {}", resp.status());
}
Ok(())
}
/// Store chunk embedding with parent-child metadata /// Store chunk embedding with parent-child metadata
pub async fn upsert_chunk_embedding( pub async fn upsert_chunk_embedding(
&self, &self,
@@ -883,113 +844,3 @@ impl VectorStore for QdrantDb {
self.search(query_vector, limit).await self.search(query_vector, limit).await
} }
} }
pub async fn sync_trace_embeddings(file_uuid: &str) -> Result<()> {
use crate::core::config::DATABASE_URL;
use sqlx::Row;
let pool = sqlx::PgPool::connect(&DATABASE_URL).await?;
let table = crate::core::db::schema::table_name("face_detections");
let qdrant = QdrantDb::new();
let collection = format!(
"{}_traces",
crate::core::config::REDIS_KEY_PREFIX
.as_str()
.trim_end_matches(':')
);
qdrant.ensure_collection(&collection, 512).await?;
// Read all face_detections with embeddings, grouped by trace_id in Rust
let rows = sqlx::query(&format!(
"SELECT trace_id, embedding FROM {} \
WHERE file_uuid = $1 AND embedding IS NOT NULL AND trace_id IS NOT NULL \
AND ((metadata->>'qc_ok')::boolean IS NULL OR (metadata->>'qc_ok')::boolean = true)",
table
))
.bind(file_uuid)
.fetch_all(&pool)
.await?;
let mut trace_faces: std::collections::HashMap<i32, Vec<Vec<f32>>> =
std::collections::HashMap::new();
let mut trace_stats: std::collections::HashMap<i32, (i64, i64, i64)> =
std::collections::HashMap::new(); // (count, min_frame, max_frame)
for row in &rows {
let tid: Option<i32> = row.get(0);
let emb: Option<Vec<f32>> = row.get(1);
if let (Some(tid), Some(emb)) = (tid, emb) {
trace_faces.entry(tid).or_default().push(emb);
let entry = trace_stats.entry(tid).or_insert((0, i64::MAX, i64::MIN));
entry.0 += 1;
}
}
// Compute average embedding per trace
struct AvgTrace {
tid: i32,
avg_emb: Vec<f32>,
frame_count: i64,
}
let mut trace_avgs: Vec<AvgTrace> = Vec::new();
for (&tid, faces) in &trace_faces {
let dim = faces[0].len();
let mut avg = vec![0.0f32; dim];
for face in faces {
for (i, &v) in face.iter().enumerate() {
avg[i] += v;
}
}
let n = faces.len() as f32;
for v in &mut avg {
*v /= n;
}
let stats = trace_stats.get(&tid).unwrap_or(&(0, 0, 0));
trace_avgs.push(AvgTrace {
tid,
avg_emb: avg,
frame_count: stats.0,
});
}
// Push to Qdrant in batches
// Point ID: hash(file_uuid + trace_id) for global uniqueness
for chunk in trace_avgs.chunks(500) {
let batch: Vec<(u64, &[f32], Option<serde_json::Value>)> = chunk
.iter()
.map(|t| {
let point_id = {
use sha2::{Digest, Sha256};
let mut hasher = Sha256::new();
hasher.update(file_uuid.as_bytes());
hasher.update(b"_");
hasher.update(t.tid.to_string().as_bytes());
let hash = hasher.finalize();
u64::from_be_bytes(hash[0..8].try_into().unwrap())
};
(
point_id,
t.avg_emb.as_slice(),
Some(serde_json::json!({
"trace_id": t.tid,
"file_uuid": file_uuid,
"frame_count": t.frame_count,
"source": "trace",
})),
)
})
.collect();
qdrant.upsert_vectors_batch(&collection, &batch).await?;
}
tracing::info!(
"Synced {} trace embeddings to Qdrant for {}",
trace_faces.len(),
file_uuid
);
Ok(())
}

View File

@@ -187,34 +187,13 @@ impl QdrantWorkspace {
.await .await
} }
pub async fn upsert_face_embedding(
&self,
point_id: u64,
vector: &[f32],
file_uuid: &str,
trace_id: i32,
frame_number: i64,
) -> Result<()> {
let payload = serde_json::json!({
"file_uuid": file_uuid,
"trace_id": trace_id,
"frame_number": frame_number,
"type": "face_embedding",
});
self.upsert_vector(&self.traces_collection(), point_id, vector, Some(payload))
.await
}
/// Scroll all points for a file from all workspace collections. /// Scroll all points for a file from all workspace collections.
/// Used during checkin to read vectors before moving to production. /// Used during checkin to read vectors before moving to production.
pub async fn scroll_by_file_uuid(&self, file_uuid: &str) -> Result<WorkspaceScrollResult> { pub async fn scroll_by_file_uuid(&self, file_uuid: &str) -> Result<WorkspaceScrollResult> {
let chunks = self let chunks = self
.scroll_collection(&self.chunks_collection(), file_uuid) .scroll_collection(&self.chunks_collection(), file_uuid)
.await?; .await?;
let traces = self Ok(WorkspaceScrollResult { chunks, traces: Vec::new() })
.scroll_collection(&self.traces_collection(), file_uuid)
.await?;
Ok(WorkspaceScrollResult { chunks, traces })
} }
async fn scroll_collection( async fn scroll_collection(

View File

@@ -1,7 +1,7 @@
use anyhow::Result; use anyhow::Result;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use std::time::Duration; use std::time::Duration;
use tracing::{debug, error, warn}; use tracing::{debug, error};
use crate::core::config; use crate::core::config;
use crate::core::llm::function_calling::LLM_CLIENT; use crate::core::llm::function_calling::LLM_CLIENT;
@@ -31,44 +31,17 @@ struct Choice {
message: ChatMessage, message: ChatMessage,
} }
/// Generates a 5W1H+ summary for a given scene context. pub async fn ask_llm(prompt: &str, system_prompt: &str) -> Result<String> {
/// Context should include the combined text of all sentences in the scene.
pub async fn generate_5w1h_summary(scene_text: &str) -> Result<String> {
if !*config::llm::SUMMARY_ENABLED {
warn!("LLM Summary is disabled via config");
return Ok("LLM Disabled".to_string());
}
let prompt = format!(
r#"Analyze the following video scene transcript and provide a concise 5W1H+ summary in JSON format.
Focus on: Who, What, Where, When, Why, How, and Key Objects/Actions.
Transcript:
"{}"
Output format:
{{
"who": "...",
"what": "...",
"where": "...",
"when": "...",
"why": "...",
"how": "...",
"summary": "..."
}}"#,
scene_text
);
let req = ChatRequest { let req = ChatRequest {
model: (*config::llm::SUMMARY_MODEL).clone(), model: (*config::llm::SUMMARY_MODEL).clone(),
messages: vec![ messages: vec![
ChatMessage { ChatMessage {
role: "system".to_string(), role: "system".to_string(),
content: "You are an expert video analyst assistant.".to_string(), content: system_prompt.to_string(),
}, },
ChatMessage { ChatMessage {
role: "user".to_string(), role: "user".to_string(),
content: prompt, content: prompt.to_string(),
}, },
], ],
temperature: 0.1, temperature: 0.1,
@@ -76,7 +49,7 @@ pub async fn generate_5w1h_summary(scene_text: &str) -> Result<String> {
stream: false, stream: false,
}; };
debug!("Calling LLM for summary: {}", *config::llm::SUMMARY_URL); debug!("Calling LLM: {}", *config::llm::SUMMARY_URL);
let res = LLM_CLIENT let res = LLM_CLIENT
.post(&*config::llm::SUMMARY_URL) .post(&*config::llm::SUMMARY_URL)

View File

@@ -71,6 +71,7 @@ pub struct BindIdentityRequest {
pub file_uuid: String, pub file_uuid: String,
pub face_id: Option<String>, pub face_id: Option<String>,
pub id: Option<i64>, pub id: Option<i64>,
pub expand_to_trace: Option<bool>,
} }
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]

View File

@@ -103,6 +103,7 @@ mod tests {
confidence: 0.95, confidence: 0.95,
embedding: Some(vec![0.1, 0.2, 0.3]), embedding: Some(vec![0.1, 0.2, 0.3]),
landmarks: Some(serde_json::json!([[10.0, 20.0], [30.0, 40.0]])), landmarks: Some(serde_json::json!([[10.0, 20.0], [30.0, 40.0]])),
pose_angle: None,
attributes: Some(FaceAttributes { attributes: Some(FaceAttributes {
age: Some(30), age: Some(30),
gender: Some("male".to_string()), gender: Some("male".to_string()),
@@ -174,6 +175,7 @@ mod tests {
confidence: 0.5, confidence: 0.5,
embedding: None, embedding: None,
landmarks: None, landmarks: None,
pose_angle: None,
attributes: None, attributes: None,
}; };
assert!(face.confidence >= 0.0 && face.confidence <= 1.0); assert!(face.confidence >= 0.0 && face.confidence <= 1.0);
@@ -190,6 +192,7 @@ mod tests {
confidence: 0.95, confidence: 0.95,
embedding: Some(vec![0.1; 512]), embedding: Some(vec![0.1; 512]),
landmarks: None, landmarks: None,
pose_angle: None,
attributes: Some(FaceAttributes { attributes: Some(FaceAttributes {
age: Some(35), age: Some(35),
gender: Some("male".to_string()), gender: Some("male".to_string()),

View File

@@ -1,96 +0,0 @@
use anyhow::{Context, Result};
use serde::{Deserialize, Serialize};
use std::time::Duration;
use super::executor::PythonExecutor;
const MEDIAPIPE_TIMEOUT: Duration = Duration::from_secs(7200);
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeResult {
pub frame_count: u64,
pub fps: f64,
pub frames: Vec<MediaPipeFrame>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeFrame {
pub frame: u64,
pub timestamp: f64,
pub persons: Vec<MediaPipePerson>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipePerson {
pub person_id: u64,
pub pose: Option<MediaPipePose>,
pub left_hand: Option<MediaPipeHand>,
pub right_hand: Option<MediaPipeHand>,
pub face_mesh: Option<MediaPipeFaceMesh>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipePose {
pub landmarks: Vec<Vec<f64>>,
pub keypoints_33: Option<Vec<String>>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeHand {
pub landmarks: Vec<Vec<f64>>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeFaceMesh {
pub landmarks: Vec<Vec<f64>>,
}
pub async fn process_mediapipe(
video_path: &str,
output_path: &str,
uuid: Option<&str>,
) -> Result<MediaPipeResult> {
// If mediapipe.json already exists (written by face_processor), skip
if std::path::Path::new(output_path).exists() {
let json_str = std::fs::read_to_string(output_path).context("Failed to read MEDIAPIPE output")?;
let result: MediaPipeResult =
serde_json::from_str(&json_str).context("Failed to parse MEDIAPIPE output")?;
tracing::info!("[MEDIAPIPE] Skipping (already exists): {} frames", result.frames.len());
return Ok(result);
}
let executor = PythonExecutor::new()?;
let script_name = "mediapipe_processor_v1.11.py";
let script_path = executor.script_path(script_name);
tracing::info!("[MEDIAPIPE] Starting MediaPipe Holistic: {}", video_path);
if !script_path.exists() {
tracing::warn!("[MEDIAPIPE] Script not found, returning empty result");
return Ok(MediaPipeResult {
frame_count: 0,
fps: 0.0,
frames: vec![],
});
}
executor
.run(
script_name,
&[video_path, output_path],
uuid,
"MEDIAPIPE",
Some(MEDIAPIPE_TIMEOUT),
)
.await
.with_context(|| format!("Failed to run {:?}", script_path))?;
let json_str =
std::fs::read_to_string(output_path).context("Failed to read MEDIAPIPE output")?;
let result: MediaPipeResult =
serde_json::from_str(&json_str).context("Failed to parse MEDIAPIPE output")?;
tracing::info!("[MEDIAPIPE] Result: {} frames", result.frames.len());
Ok(result)
}

View File

@@ -1,203 +0,0 @@
use anyhow::{Context, Result};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::time::Duration;
use tokio::process::Command;
use tokio::time::timeout;
use super::executor::PythonExecutor;
const MEDIAPIPE_TIMEOUT: Duration = Duration::from_secs(7200);
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeResult {
pub metadata: MediaPipeMetadata,
pub frames: HashMap<String, MediaPipeDictEntry>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeMetadata {
pub fps: f64,
pub total_frames: i64,
pub processed_frames: i64,
pub sample_interval: i64,
pub width: i64,
pub height: i64,
pub processor: String,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeDictEntry {
pub frame_number: i64,
pub timestamp: f64,
pub persons: Vec<MediaPipePerson>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipePerson {
pub person_id: i64,
#[serde(default)]
pub bbox: Option<MediaPipeBBox>,
pub face_mesh: Option<serde_json::Value>,
pub pose: Option<serde_json::Value>,
pub hands: MediaPipeHands,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeBBox {
pub x: i64,
pub y: i64,
pub width: i64,
pub height: i64,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct MediaPipeHands {
pub left: Option<serde_json::Value>,
pub right: Option<serde_json::Value>,
}
pub async fn process_mediapipe_v2(
video_path: &str,
output_path: &str,
uuid: Option<&str>,
frames: Option<&[i64]>,
) -> Result<MediaPipeResult> {
let executor = PythonExecutor::new()?;
let script_path = executor.script_path("mediapipe_holistic_processor.py");
tracing::info!("[MEDIAPIPE] Starting MediaPipe Holistic: {}", video_path);
if !script_path.exists() {
anyhow::bail!("mediapipe_holistic_processor.py not found");
}
let mut cmd = Command::new(executor.python_path());
cmd.arg(&script_path).arg(video_path).arg(output_path);
// Use explicit frame list if provided, otherwise calculate sample_interval for ~8Hz
if let Some(frames) = frames {
let frames_str = frames
.iter()
.map(|f| f.to_string())
.collect::<Vec<_>>()
.join(",");
cmd.arg("--frames").arg(&frames_str);
tracing::info!("[MEDIAPIPE] 8Hz sampling: {} frames", frames.len());
} else {
let sample_interval = calculate_sample_interval(video_path).await;
cmd.arg("--sample-interval")
.arg(sample_interval.to_string());
}
if let Some(u) = uuid {
cmd.arg("--uuid").arg(u);
}
cmd.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped());
let child = cmd.spawn().context("Failed to run MEDIAPIPE processor")?;
let output = match timeout(MEDIAPIPE_TIMEOUT, child.wait_with_output()).await {
Ok(Ok(output)) => output,
Ok(Err(e)) => return Err(e).context("Failed to run MEDIAPIPE processor"),
Err(_) => anyhow::bail!(
"MEDIAPIPE processing timed out after {:?}",
MEDIAPIPE_TIMEOUT
),
};
let stderr = String::from_utf8_lossy(&output.stderr);
for line in stderr.lines() {
let trimmed = line.trim();
if trimmed.starts_with("MEDIAPIPE_START") {
tracing::info!("[MEDIAPIPE] Loading model...");
} else if trimmed.starts_with("MEDIAPIPE_FRAME:") {
let count = trimmed.trim_start_matches("MEDIAPIPE_FRAME:");
tracing::info!("[MEDIAPIPE] Processed {} frames...", count);
} else if trimmed.starts_with("MEDIAPIPE_COMPLETE:") {
let count = trimmed.trim_start_matches("MEDIAPIPE_COMPLETE:");
tracing::info!("[MEDIAPIPE] Completed! Total: {} frames", count);
} else if trimmed.starts_with("MEDIAPIPE_INFO:") {
let info = trimmed.trim_start_matches("MEDIAPIPE_INFO:");
tracing::info!("[MEDIAPIPE] {}", info);
} else if trimmed.starts_with("MEDIAPIPE_ERROR:") {
let err = trimmed.trim_start_matches("MEDIAPIPE_ERROR:");
tracing::error!("[MEDIAPIPE] {}", err);
}
}
tracing::info!("[MEDIAPIPE] stderr output:\n{}", stderr);
if !output.status.success() {
anyhow::bail!("MEDIAPIPE failed: {}", stderr);
}
let json_str =
std::fs::read_to_string(output_path).context("Failed to read MEDIAPIPE output")?;
let result: MediaPipeResult =
serde_json::from_str(&json_str).context("Failed to parse MEDIAPIPE output")?;
tracing::info!("[MEDIAPIPE] Result: {} frames", result.frames.len());
Ok(result)
}
async fn calculate_sample_interval(video_path: &str) -> i64 {
// Try ffprobe to get FPS, calculate sample_interval for ~8Hz
let probe_cmd = Command::new("ffprobe")
.args([
"-v",
"quiet",
"-print_format",
"json",
"-show_streams",
video_path,
])
.output()
.await;
if let Ok(output) = probe_cmd {
if output.status.success() {
if let Ok(json_str) = String::from_utf8(output.stdout) {
if let Ok(probe_data) = serde_json::from_str::<serde_json::Value>(&json_str) {
if let Some(streams) = probe_data["streams"].as_array() {
for stream in streams {
if stream["codec_type"] == "video" {
if let Some(fps_str) = stream["r_frame_rate"].as_str() {
// Parse "30000/1001" style fps
if let Some(fps) = parse_fractional_fps(fps_str) {
let interval = (fps / 8.0).round() as i64;
return interval.max(1);
}
}
if let Some(fps_val) = stream["avg_frame_rate"].as_str() {
if let Some(fps) = parse_fractional_fps(fps_val) {
let interval = (fps / 8.0).round() as i64;
return interval.max(1);
}
}
}
}
}
}
}
}
}
4 // Default: assume 30fps / 8 = ~4
}
fn parse_fractional_fps(s: &str) -> Option<f64> {
let parts: Vec<&str> = s.split('/').collect();
if parts.len() == 2 {
let num: f64 = parts[0].parse().ok()?;
let den: f64 = parts[1].parse().ok()?;
if den > 0.0 {
return Some(num / den);
}
}
s.parse::<f64>().ok()
}

View File

@@ -11,11 +11,9 @@ pub mod face_clustering;
pub mod face_recognition; pub mod face_recognition;
pub mod hand; pub mod hand;
pub mod heuristic_scene; pub mod heuristic_scene;
pub mod mediapipe_v2;
pub mod ocr; pub mod ocr;
pub mod pose; pub mod pose;
pub mod scene_classification; pub mod scene_classification;
pub mod story;
pub mod tkg; pub mod tkg;
pub mod yolo; pub mod yolo;
@@ -48,17 +46,12 @@ pub use heuristic_scene::{
build_heuristic_scene_meta, generate_scene_meta, CrowdSize, HeuristicSceneMeta, build_heuristic_scene_meta, generate_scene_meta, CrowdSize, HeuristicSceneMeta,
SceneSegmentMeta, SceneSegmentMeta,
}; };
pub use mediapipe_v2::{
process_mediapipe_v2, MediaPipeBBox, MediaPipeDictEntry, MediaPipeHands, MediaPipeMetadata,
MediaPipePerson, MediaPipeResult,
};
pub use ocr::{process_ocr, OcrFrame, OcrResult, OcrText}; pub use ocr::{process_ocr, OcrFrame, OcrResult, OcrText};
pub use pose::{process_pose, Bbox, Keypoint, PersonPose, PoseFrame, PoseResult}; pub use pose::{process_pose, Bbox, Keypoint, PersonPose, PoseFrame, PoseResult};
pub use scene_classification::{ pub use scene_classification::{
load_scene_from_file, process_scene_classification, SceneClassificationResult, ScenePrediction, load_scene_from_file, process_scene_classification, SceneClassificationResult, ScenePrediction,
SceneSegment, SceneSegment,
}; };
pub use story::{process_story, StoryChildChunk, StoryParentChunk, StoryResult, StoryStats};
pub use tkg::{ pub use tkg::{
build_tkg, query_auto_representative_frame, FrameTraceInfo, MainIdentityInfo, build_tkg, query_auto_representative_frame, FrameTraceInfo, MainIdentityInfo,
RepresentativeFrameResult, TkgResult, RepresentativeFrameResult, TkgResult,

View File

@@ -1,690 +0,0 @@
use anyhow::{Context, Result};
use serde::{Deserialize, Serialize};
use std::path::Path;
use std::time::Duration;
use super::executor::PythonExecutor;
const STORY_TIMEOUT: Duration = Duration::from_secs(3600);
// ── Input data structs (from JSON files) ──────────────────────────
#[derive(Debug, Deserialize)]
struct AsrData {
segments: Vec<AsrSegmentInput>,
}
#[derive(Debug, Deserialize)]
struct AsrSegmentInput {
#[serde(default, alias = "start")]
start_time: f64,
#[serde(default, alias = "end")]
end_time: f64,
#[serde(default)]
text: String,
#[serde(default)]
confidence: f64,
}
#[derive(Debug, Deserialize)]
struct CutData {
scenes: Vec<CutSceneInput>,
}
#[derive(Debug, Deserialize)]
struct CutSceneInput {
scene_number: Option<i64>,
#[allow(dead_code)]
start_frame: Option<i64>,
#[allow(dead_code)]
end_frame: Option<i64>,
start_time: Option<f64>,
end_time: Option<f64>,
}
// ── Output data structs ───────────────────────────────────────────
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct StoryResult {
pub child_chunks: Vec<StoryChildChunk>,
pub parent_chunks: Vec<StoryParentChunk>,
pub stats: StoryStats,
#[serde(default)]
pub metadata: serde_json::Value,
#[serde(default)]
pub parent_chunk_size: usize,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct StoryStats {
pub total_child_chunks: usize,
pub total_parent_chunks: usize,
pub asr_children: usize,
pub cut_children: usize,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct StoryChildChunk {
pub chunk_id: String,
pub chunk_type: String,
pub source: String,
pub start_time: f64,
pub end_time: f64,
#[serde(skip_serializing_if = "Option::is_none")]
pub text_content: Option<String>,
pub content: serde_json::Value,
#[serde(default)]
pub child_chunk_ids: Vec<String>,
pub parent_chunk_id: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct StoryParentChunk {
pub chunk_id: String,
pub chunk_type: String,
pub source: String,
pub start_time: f64,
pub end_time: f64,
pub text_content: String,
pub content: serde_json::Value,
#[serde(default)]
pub child_chunk_ids: Vec<String>,
pub parent_chunk_id: Option<String>,
}
// ── Public API ────────────────────────────────────────────────────
pub async fn process_story(
video_path: &str,
output_path: &str,
uuid: Option<&str>,
) -> Result<StoryResult> {
// Try native Rust implementation first
let result = try_native_story(video_path, output_path, uuid);
if let Ok(r) = result {
return Ok(r);
}
// Fallback: Python script
tracing::warn!(
"[STORY] Native impl failed, falling back to Python: {:?}",
result.err()
);
let executor = PythonExecutor::new()?;
let script_path = executor.script_path("story_processor.py");
if !script_path.exists() {
return Ok(StoryResult {
child_chunks: vec![],
parent_chunks: vec![],
stats: StoryStats {
total_child_chunks: 0,
total_parent_chunks: 0,
asr_children: 0,
cut_children: 0,
},
metadata: serde_json::json!({}),
parent_chunk_size: 5,
});
}
executor
.run(
"story_processor.py",
&[video_path, output_path],
uuid,
"STORY",
Some(STORY_TIMEOUT),
)
.await
.with_context(|| format!("Failed to run {:?}", script_path))?;
let json_str = std::fs::read_to_string(output_path).context("Failed to read STORY output")?;
let result: StoryResult =
serde_json::from_str(&json_str).context("Failed to parse STORY output")?;
Ok(result)
}
// ── Native implementation ─────────────────────────────────────────
fn try_native_story(
_video_path: &str,
output_path: &str,
_uuid: Option<&str>,
) -> Result<StoryResult> {
let output_dir = Path::new(output_path).parent().unwrap_or(Path::new("."));
let basename = Path::new(output_path)
.file_stem()
.and_then(|s| s.to_str())
.and_then(|s| s.split('.').next())
.unwrap_or("unknown");
let asr_path = output_dir.join(format!("{}.asr.json", basename));
let cut_path = output_dir.join(format!("{}.cut.json", basename));
// ASR data is required; CUT is optional
let asr_data: AsrData = if asr_path.exists() {
let content = std::fs::read_to_string(&asr_path)
.with_context(|| format!("Failed to read {:?}", asr_path))?;
serde_json::from_str(&content).with_context(|| format!("Failed to parse {:?}", asr_path))?
} else {
AsrData { segments: vec![] }
};
let cut_data: CutData = if cut_path.exists() {
let content = std::fs::read_to_string(&cut_path)
.with_context(|| format!("Failed to read {:?}", cut_path))?;
serde_json::from_str(&content).with_context(|| format!("Failed to parse {:?}", cut_path))?
} else {
CutData { scenes: vec![] }
};
let parent_chunk_size: usize = 5;
// ── Build child chunks ────────────────────────────────────────
let mut child_chunks: Vec<StoryChildChunk> = Vec::new();
// ASR child chunks
for seg in &asr_data.segments {
let chunk_id = format!("asr_{:.1}_{:.1}", seg.start_time, seg.end_time);
child_chunks.push(StoryChildChunk {
chunk_id,
chunk_type: "asr".to_string(),
source: "asr".to_string(),
start_time: seg.start_time,
end_time: seg.end_time,
text_content: Some(seg.text.clone()),
content: serde_json::json!({
"text": seg.text,
"confidence": seg.confidence,
}),
child_chunk_ids: vec![],
parent_chunk_id: None,
});
}
// CUT child chunks
for scene in &cut_data.scenes {
let scene_num = scene.scene_number.unwrap_or(0);
let start_time = scene.start_time.unwrap_or(0.0);
let end_time = scene.end_time.unwrap_or(0.0);
let chunk_id = format!("cut_{}", scene_num);
child_chunks.push(StoryChildChunk {
chunk_id,
chunk_type: "cut".to_string(),
source: "cut".to_string(),
start_time,
end_time,
text_content: Some(format!("Scene {}", scene_num)),
content: serde_json::json!({
"scene_number": scene_num,
"start_time": start_time,
"end_time": end_time,
}),
child_chunk_ids: vec![],
parent_chunk_id: None,
});
}
let asr_child_ids: Vec<String> = child_chunks
.iter()
.filter(|c| c.source == "asr")
.map(|c| c.chunk_id.clone())
.collect();
let cut_child_ids: Vec<String> = child_chunks
.iter()
.filter(|c| c.source == "cut")
.map(|c| c.chunk_id.clone())
.collect();
// ── Build parent chunks from ASR ──────────────────────────────
let mut parent_chunks: Vec<StoryParentChunk> = Vec::new();
for (i, batch) in asr_child_ids.chunks(parent_chunk_size).enumerate() {
if batch.is_empty() {
continue;
}
let mut texts: Vec<String> = Vec::new();
let mut times: Vec<(f64, f64)> = Vec::new();
for child_id in batch {
if let Some(child) = child_chunks.iter().find(|c| &c.chunk_id == child_id) {
if let Some(ref t) = child.text_content {
texts.push(t.clone());
}
times.push((child.start_time, child.end_time));
}
}
let start_time = times.first().map(|t| t.0).unwrap_or(0.0);
let end_time = times.last().map(|t| t.1).unwrap_or(0.0);
let narrative = generate_narrative(&texts, &[], start_time, end_time);
let chunk_id = format!("story_asr_{:04}", i);
parent_chunks.push(StoryParentChunk {
chunk_id: chunk_id.clone(),
chunk_type: "story".to_string(),
source: "story_asr".to_string(),
start_time,
end_time,
text_content: narrative.clone(),
content: serde_json::json!({
"description": narrative,
"child_count": batch.len(),
"speech_preview": texts.iter().take(3).cloned().collect::<Vec<_>>().join(" "),
}),
child_chunk_ids: batch.to_vec(),
parent_chunk_id: None,
});
// Link children to parent
for child in &mut child_chunks {
if batch.contains(&child.chunk_id) {
child.parent_chunk_id = Some(chunk_id.clone());
}
}
}
// ── Build parent chunks from CUT ──────────────────────────────
for (i, batch) in cut_child_ids.chunks(parent_chunk_size).enumerate() {
if batch.is_empty() {
continue;
}
let mut times: Vec<(f64, f64)> = Vec::new();
for child_id in batch {
if let Some(child) = child_chunks.iter().find(|c| &c.chunk_id == child_id) {
times.push((child.start_time, child.end_time));
}
}
let start_time = times.first().map(|t| t.0).unwrap_or(0.0);
let end_time = times.last().map(|t| t.1).unwrap_or(0.0);
let narrative = generate_scene_narrative(&[], start_time, end_time, batch.len());
let chunk_id = format!("story_cut_{:04}", i);
parent_chunks.push(StoryParentChunk {
chunk_id: chunk_id.clone(),
chunk_type: "story".to_string(),
source: "story_cut".to_string(),
start_time,
end_time,
text_content: narrative.clone(),
content: serde_json::json!({
"description": narrative,
"child_count": batch.len(),
"scenes": batch,
}),
child_chunk_ids: batch.to_vec(),
parent_chunk_id: None,
});
for child in &mut child_chunks {
if batch.contains(&child.chunk_id) {
child.parent_chunk_id = Some(chunk_id.clone());
}
}
}
// ── Build result ──────────────────────────────────────────────
let total_child = asr_child_ids.len() + cut_child_ids.len();
let total_parent = parent_chunks.len();
let asr_count = asr_child_ids.len();
let cut_count = cut_child_ids.len();
let result = StoryResult {
child_chunks,
parent_chunks,
stats: StoryStats {
total_child_chunks: total_child,
total_parent_chunks: total_parent,
asr_children: asr_count,
cut_children: cut_count,
},
metadata: serde_json::json!({}),
parent_chunk_size,
};
// Write output (for compatibility with Python path)
let json_str = serde_json::to_string_pretty(&result)?;
std::fs::write(output_path, &json_str)
.with_context(|| format!("Failed to write {:?}", output_path))?;
Ok(result)
}
// ── Narrative generation (matching Python logic) ──────────────────
fn generate_narrative(texts: &[String], objects: &[String], start: f64, end: f64) -> String {
if texts.is_empty() && objects.is_empty() {
return format!("Video segment from {:.1}s to {:.1}s", start, end);
}
let mut parts: Vec<String> = Vec::new();
if !texts.is_empty() {
let combined = texts.join(" ");
let truncated = if combined.len() > 150 {
format!("{}...", &combined[..150])
} else {
combined
};
parts.push(format!("Speech: {}", truncated));
}
if !objects.is_empty() {
let mut unique: Vec<&String> = objects.iter().collect();
unique.sort();
unique.dedup();
let objs = unique
.iter()
.take(5)
.map(|s| (*s).as_str())
.collect::<Vec<_>>()
.join(", ");
parts.push(format!("Visuals: {}", objs));
}
format!("[{:.0}s-{:.0}s] {}", start, end, parts.join(" | "))
}
fn generate_scene_narrative(
objects: &[String],
start: f64,
end: f64,
scene_count: usize,
) -> String {
let mut unique: Vec<&String> = objects.iter().collect();
unique.sort();
unique.dedup();
let top5: Vec<&String> = unique.iter().take(5).cloned().collect();
if !top5.is_empty() {
let obj_str = top5
.iter()
.map(|s| s.as_str())
.collect::<Vec<_>>()
.join(", ");
format!(
"[{:.0}s-{:.0}s] {} scenes. Visuals: {}.",
start, end, scene_count, obj_str
)
} else {
format!("[{:.0}s-{:.0}s] {} video scenes.", start, end, scene_count)
}
}
// ── Tests ─────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_generate_narrative_with_text() {
let text = generate_narrative(
&["Hello world".to_string()],
&["person".to_string()],
0.0,
5.0,
);
assert!(text.contains("[0s-5s]"));
assert!(text.contains("Speech:"));
assert!(text.contains("Visuals:"));
}
#[test]
fn test_generate_narrative_empty() {
let text = generate_narrative(&[], &[], 10.0, 20.0);
assert!(text.contains("10.0s to 20.0s"));
}
#[test]
fn test_generate_scene_narrative() {
let text = generate_scene_narrative(&["person".to_string()], 0.0, 10.0, 3);
assert!(text.contains("3 scenes"));
assert!(text.contains("person"));
}
#[test]
fn test_generate_scene_narrative_empty() {
let text = generate_scene_narrative(&[], 0.0, 10.0, 1);
assert!(text.contains("1 video scenes"));
}
#[test]
fn test_narrative_truncation() {
let long_text = "a".repeat(200);
let text = generate_narrative(&[long_text], &[], 0.0, 5.0);
assert!(text.len() < 200 + 50); // truncated with "..."
assert!(text.ends_with("..."));
}
#[test]
fn test_story_result_serialization() {
let result = StoryResult {
child_chunks: vec![StoryChildChunk {
chunk_id: "asr_0001".to_string(),
chunk_type: "sentence".to_string(),
source: "asr".to_string(),
start_time: 0.0,
end_time: 5.0,
text_content: Some("Hello world".to_string()),
content: serde_json::json!({}),
child_chunk_ids: vec![],
parent_chunk_id: Some("story_asr_0000".to_string()),
}],
parent_chunks: vec![StoryParentChunk {
chunk_id: "story_asr_0000".to_string(),
chunk_type: "story".to_string(),
source: "story_asr".to_string(),
start_time: 0.0,
end_time: 25.0,
text_content: "[0s-25s] Hello world...".to_string(),
content: serde_json::json!({
"description": "[0s-25s] Hello world...",
"child_count": 5
}),
child_chunk_ids: vec!["asr_0001".to_string()],
parent_chunk_id: None,
}],
stats: StoryStats {
total_child_chunks: 10,
total_parent_chunks: 2,
asr_children: 10,
cut_children: 0,
},
metadata: serde_json::json!({}),
parent_chunk_size: 5,
};
let json = serde_json::to_string(&result).unwrap();
assert!(json.contains("asr_0001"));
assert!(json.contains("story_asr_0000"));
assert!(json.contains("Hello world"));
}
#[test]
fn test_story_result_deserialization() {
let json = r#"{
"child_chunks": [{
"chunk_id": "asr_0001",
"chunk_type": "sentence",
"source": "asr",
"start_time": 0.0,
"end_time": 5.0,
"text_content": "Hello",
"content": {},
"child_chunk_ids": [],
"parent_chunk_id": null
}],
"parent_chunks": [{
"chunk_id": "story_asr_0000",
"chunk_type": "story",
"source": "story_asr",
"start_time": 0.0,
"end_time": 5.0,
"text_content": "Hello segment",
"content": {"description": "Hello segment"},
"child_chunk_ids": ["asr_0001"],
"parent_chunk_id": null
}],
"stats": {
"total_child_chunks": 1,
"total_parent_chunks": 1,
"asr_children": 1,
"cut_children": 0
},
"metadata": {},
"parent_chunk_size": 5
}"#;
let result: StoryResult = serde_json::from_str(json).unwrap();
assert_eq!(result.child_chunks.len(), 1);
assert_eq!(result.parent_chunks.len(), 1);
assert_eq!(result.stats.total_child_chunks, 1);
}
#[test]
fn test_parent_child_relationship() {
let result = StoryResult {
child_chunks: vec![
StoryChildChunk {
chunk_id: "asr_0001".to_string(),
chunk_type: "sentence".to_string(),
source: "asr".to_string(),
start_time: 0.0,
end_time: 5.0,
text_content: Some("First".to_string()),
content: serde_json::json!({}),
child_chunk_ids: vec![],
parent_chunk_id: Some("story_asr_0000".to_string()),
},
StoryChildChunk {
chunk_id: "asr_0002".to_string(),
chunk_type: "sentence".to_string(),
source: "asr".to_string(),
start_time: 5.0,
end_time: 10.0,
text_content: Some("Second".to_string()),
content: serde_json::json!({}),
child_chunk_ids: vec![],
parent_chunk_id: Some("story_asr_0000".to_string()),
},
],
parent_chunks: vec![StoryParentChunk {
chunk_id: "story_asr_0000".to_string(),
chunk_type: "story".to_string(),
source: "story_asr".to_string(),
start_time: 0.0,
end_time: 10.0,
text_content: "Combined narrative".to_string(),
content: serde_json::json!({}),
child_chunk_ids: vec!["asr_0001".to_string(), "asr_0002".to_string()],
parent_chunk_id: None,
}],
stats: StoryStats {
total_child_chunks: 2,
total_parent_chunks: 1,
asr_children: 2,
cut_children: 0,
},
metadata: serde_json::json!({}),
parent_chunk_size: 5,
};
assert_eq!(result.parent_chunks[0].child_chunk_ids.len(), 2);
assert!(result
.child_chunks
.iter()
.all(|c| c.parent_chunk_id.is_some()));
assert!(result.parent_chunks[0].parent_chunk_id.is_none());
}
#[test]
fn test_native_story_empty_data() {
// Write empty ASR and CUT files, then test try_native_story
let dir = std::env::temp_dir().join("story_test_empty");
let _ = std::fs::create_dir_all(&dir);
let basename = "test_video";
let asr_path = dir.join(format!("{}.asr.json", basename));
let cut_path = dir.join(format!("{}.cut.json", basename));
let out_path = dir.join(format!("{}.story.json", basename));
std::fs::write(&asr_path, r#"{"segments":[]}"#).unwrap();
std::fs::write(&cut_path, r#"{"scenes":[]}"#).unwrap();
let result = try_native_story("/dummy.mp4", out_path.to_str().unwrap(), None).unwrap();
assert_eq!(result.stats.total_child_chunks, 0);
assert_eq!(result.stats.total_parent_chunks, 0);
let _ = std::fs::remove_dir_all(&dir);
}
#[test]
fn test_native_story_with_data() {
let dir = std::env::temp_dir().join("story_test_data");
let _ = std::fs::create_dir_all(&dir);
let basename = "test_video";
let asr_path = dir.join(format!("{}.asr.json", basename));
let cut_path = dir.join(format!("{}.cut.json", basename));
let out_path = dir.join(format!("{}.story.json", basename));
std::fs::write(
&asr_path,
r#"{
"segments": [
{"start": 0.0, "end": 2.5, "text": "Hello", "confidence": 0.95},
{"start": 2.5, "end": 5.0, "text": "World", "confidence": 0.92},
{"start": 5.0, "end": 7.5, "text": "Foo", "confidence": 0.90}
]
}"#,
)
.unwrap();
std::fs::write(&cut_path, r#"{
"scenes": [
{"scene_number": 1, "start_frame": 0, "end_frame": 150, "start_time": 0.0, "end_time": 5.0},
{"scene_number": 2, "start_frame": 150, "end_frame": 300, "start_time": 5.0, "end_time": 10.0}
]
}"#).unwrap();
let result = try_native_story("/dummy.mp4", out_path.to_str().unwrap(), None).unwrap();
assert_eq!(result.stats.asr_children, 3);
assert_eq!(result.stats.cut_children, 2);
assert_eq!(result.stats.total_child_chunks, 5);
// 3 ASR segments, parent_chunk_size=5 → 1 parent
// 2 CUT scenes, parent_chunk_size=5 → 1 parent
assert_eq!(result.stats.total_parent_chunks, 2);
// Verify child-parent linking
for child in &result.child_chunks {
if child.source == "asr" {
assert!(child.parent_chunk_id.is_some());
assert!(child
.parent_chunk_id
.as_ref()
.unwrap()
.starts_with("story_asr_"));
}
}
// Verify output file was written
assert!(out_path.exists());
let content = std::fs::read_to_string(&out_path).unwrap();
assert!(content.contains("Hello"));
assert!(content.contains("World"));
let _ = std::fs::remove_dir_all(&dir);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -49,9 +49,6 @@ async fn main() -> Result<()> {
Commands::StoreAsrx { uuid } => { Commands::StoreAsrx { uuid } => {
handle_store_asrx(&uuid).await?; handle_store_asrx(&uuid).await?;
} }
Commands::Story { uuid } => {
handle_story(&uuid).await?;
}
Commands::Vectorize { uuid } => { Commands::Vectorize { uuid } => {
handle_vectorize(&uuid).await?; handle_vectorize(&uuid).await?;
} }
@@ -169,13 +166,6 @@ async fn handle_chunk(uuid: &str) -> Result<()> {
} }
/// Handle story command /// Handle story command
async fn handle_story(uuid: &str) -> Result<()> {
println!("Generating story for: {}", uuid);
// TODO: Implement story logic
Ok(())
}
/// Handle vectorize command /// Handle vectorize command
async fn handle_vectorize(uuid: &str) -> Result<()> { async fn handle_vectorize(uuid: &str) -> Result<()> {
println!("Vectorizing chunks for: {}", uuid); println!("Vectorizing chunks for: {}", uuid);

View File

@@ -633,44 +633,6 @@ async fn process_appearance_module(
Ok(()) Ok(())
} }
async fn process_story_module(
story_path: &Path,
video_path: &str,
uuid: &str,
progress_state: &Arc<Mutex<ProgressState>>,
ui: &Arc<Mutex<Option<ProgressUi>>>,
) -> anyhow::Result<()> {
{
let mut state = progress_state.lock().unwrap();
state.get_processor(ProcessorType::Story).start(1);
}
let story_result = momentry_core::core::processor::process_story(
video_path,
story_path.to_str().unwrap(),
Some(uuid),
)
.await?;
let story_json = serde_json::to_string_pretty(&story_result)?;
std::fs::write(story_path, &story_json)?;
let output_dir = OutputDir::new();
let _ = output_dir.backup_file(uuid, "story.json");
println!(
" ✓ Story saved: {} parent chunks, {} child chunks",
story_result.stats.total_parent_chunks, story_result.stats.total_child_chunks
);
{
let mut state = progress_state.lock().unwrap();
state.get_processor(ProcessorType::Story).complete(&format!(
"{} parents, {} children",
story_result.stats.total_parent_chunks, story_result.stats.total_child_chunks
));
}
if let Some(ref mut ui) = *ui.lock().unwrap() {
let _ = ui.render();
}
Ok(())
}
async fn process_caption_module( async fn process_caption_module(
caption_path: &Path, caption_path: &Path,
video_path: &str, video_path: &str,
@@ -745,11 +707,6 @@ enum Commands {
/// UUID /// UUID
uuid: String, uuid: String,
}, },
/// Generate story for cut scenes
Story {
/// UUID
uuid: String,
},
/// Vectorize chunks /// Vectorize chunks
Vectorize { Vectorize {
/// UUID (or 'all' for all) /// UUID (or 'all' for all)
@@ -2382,150 +2339,6 @@ Ok(())
Ok(()) Ok(())
} }
Commands::Story { uuid } => {
println!("Generating story for: {}", uuid);
let db = PostgresDb::init().await?;
let video = db
.get_video_by_uuid(&uuid)
.await?
.ok_or_else(|| anyhow::anyhow!("Video not found: {}", uuid))?;
let file_id = video.id;
let _fps = video.fps;
let duration = video.duration;
// Get all chunks
let all_chunks = db.get_chunks_by_uuid(&uuid).await?;
// Try cut chunks first, fall back to sentence chunks
let mut story_chunks: Vec<&Chunk> = all_chunks
.iter()
.filter(|c| c.chunk_type == ChunkType::Cut)
.collect();
let story_type = if story_chunks.is_empty() {
story_chunks = all_chunks
.iter()
.filter(|c| c.chunk_type == ChunkType::Sentence && c.text_content.is_some())
.collect();
"sentence"
} else {
"cut"
};
if story_chunks.is_empty() {
println!("No story chunks found. Run 'chunk' command first.");
return Ok(());
}
println!("Found {} {} scenes", story_chunks.len(), story_type);
for (i, story_chunk) in story_chunks.iter().enumerate() {
println!("\n=== Scene {} ===", i + 1);
println!(
"Time: {:.2}s - {:.2}s",
story_chunk.start_time().seconds(),
story_chunk.end_time().seconds()
);
let context_start = (story_chunk.start_time().seconds() - 5.0).max(0.0);
let context_end = (story_chunk.end_time().seconds() + 5.0).min(duration);
let context_chunks = db
.get_chunks_by_time_range(&uuid, context_start, context_end)
.await?;
let context_frames = db
.get_frames_by_time_range(&uuid, context_start, context_end)
.await?;
let mut story = String::new();
story.push_str(&format!(
"Scene {} ({:.1}s - {:.1}s)\n\n",
i + 1,
story_chunk.start_time().seconds(),
story_chunk.end_time().seconds()
));
let sentence_chunks: Vec<&serde_json::Value> = context_chunks
.iter()
.filter(|c| c["chunk_type"] == "sentence")
.collect();
if !sentence_chunks.is_empty() {
story.push_str("【Speech】\n");
for sc in &sentence_chunks {
if let Some(text) = sc["text_content"].as_str() {
story.push_str(&format!(" - {}\n", text));
}
}
story.push('\n');
}
let mut all_objects: std::collections::HashMap<String, u32> =
std::collections::HashMap::new();
for frame in &context_frames {
if let Some(objects) = frame["yolo_objects"].as_array() {
for obj in objects {
if let Some(class_name) = obj.get("class_name").and_then(|v| v.as_str())
{
*all_objects.entry(class_name.to_string()).or_insert(0) += 1;
}
}
}
}
if !all_objects.is_empty() {
story.push_str("【Objects】\n");
let mut sorted_objects: Vec<_> = all_objects.iter().collect();
sorted_objects.sort_by(|a, b| b.1.cmp(a.1));
for (obj, count) in sorted_objects.iter().take(10) {
story.push_str(&format!(" - {} ({} frames)\n", obj, count));
}
story.push('\n');
}
let mut all_texts: Vec<String> = Vec::new();
for frame in &context_frames {
if let Some(texts) = frame["ocr_results"].as_array() {
for txt in texts {
if let Some(text) = txt.get("text").and_then(|v| v.as_str()) {
if !text.is_empty() && text.len() > 2 {
all_texts.push(text.to_string());
}
}
}
}
}
if !all_texts.is_empty() {
story.push_str("【Text in video】\n");
for txt in all_texts.iter().take(10) {
story.push_str(&format!(" - {}\n", txt));
}
story.push('\n');
}
let mut face_count = 0;
for frame in &context_frames {
if let Some(faces) = frame["face_results"].as_array() {
face_count += faces.len();
}
}
if face_count > 0 {
story.push_str(&format!(
"【Faces】\n - {} face(s) detected\n\n",
face_count
));
}
println!("{}", story);
}
Ok(())
}
Commands::Vectorize { uuid } => { Commands::Vectorize { uuid } => {
println!("Vectorizing: {}", uuid); println!("Vectorizing: {}", uuid);

View File

@@ -8,7 +8,6 @@ pub mod cut;
pub mod face; pub mod face;
pub mod ocr; pub mod ocr;
pub mod pose; pub mod pose;
pub mod story;
pub mod yolo; pub mod yolo;
pub use appearance::*; pub use appearance::*;
@@ -19,5 +18,4 @@ pub use cut::*;
pub use face::*; pub use face::*;
pub use ocr::*; pub use ocr::*;
pub use pose::*; pub use pose::*;
pub use story::*;
pub use yolo::*; pub use yolo::*;

View File

@@ -1,53 +0,0 @@
//! Story generation processing module
use anyhow::Result;
use momentry_core::ui::progress::{ProcessorType, ProgressState, ProgressUi};
use momentry_core::OutputDir;
use std::path::Path;
use std::sync::{Arc, Mutex};
/// Process Story module
pub async fn process_story_module(
story_path: &Path,
video_path: &str,
uuid: &str,
progress_state: &Arc<Mutex<ProgressState>>,
ui: &Arc<Mutex<Option<ProgressUi>>>,
) -> Result<()> {
{
let mut state = progress_state.lock().unwrap();
state.get_processor(ProcessorType::Story).start(1);
}
let story_result = momentry_core::core::processor::process_story(
video_path,
story_path.to_str().unwrap(),
Some(uuid),
)
.await?;
let story_json = serde_json::to_string_pretty(&story_result)?;
std::fs::write(story_path, &story_json)?;
let output_dir = OutputDir::new();
let _ = output_dir.backup_file(uuid, "story.json");
println!(
" ✓ Story saved: {} parent chunks, {} child chunks",
story_result.stats.total_parent_chunks, story_result.stats.total_child_chunks
);
{
let mut state = progress_state.lock().unwrap();
state.get_processor(ProcessorType::Story).complete(&format!(
"{} parents, {} children",
story_result.stats.total_parent_chunks, story_result.stats.total_child_chunks
));
}
if let Some(ref mut ui) = *ui.lock().unwrap() {
let _ = ui.render();
}
Ok(())
}

View File

@@ -21,7 +21,6 @@ pub enum ProcessorType {
Face, Face,
Pose, Pose,
Hand, Hand,
Story,
Caption, Caption,
} }
@@ -37,7 +36,6 @@ impl ProcessorType {
ProcessorType::Face => "Face", ProcessorType::Face => "Face",
ProcessorType::Pose => "Pose", ProcessorType::Pose => "Pose",
ProcessorType::Hand => "Hand", ProcessorType::Hand => "Hand",
ProcessorType::Story => "Story",
ProcessorType::Caption => "Caption", ProcessorType::Caption => "Caption",
} }
} }
@@ -145,7 +143,6 @@ impl ProgressState {
ProcessorProgress::new(ProcessorType::Face), ProcessorProgress::new(ProcessorType::Face),
ProcessorProgress::new(ProcessorType::Pose), ProcessorProgress::new(ProcessorType::Pose),
ProcessorProgress::new(ProcessorType::Hand), ProcessorProgress::new(ProcessorType::Hand),
ProcessorProgress::new(ProcessorType::Story),
ProcessorProgress::new(ProcessorType::Caption), ProcessorProgress::new(ProcessorType::Caption),
], ],
video_name: video_name.to_string(), video_name: video_name.to_string(),
@@ -201,7 +198,6 @@ impl ProgressState {
"OCR" => ProcessorType::Ocr, "OCR" => ProcessorType::Ocr,
"FACE" => ProcessorType::Face, "FACE" => ProcessorType::Face,
"POSE" => ProcessorType::Pose, "POSE" => ProcessorType::Pose,
"STORY" => ProcessorType::Story,
"CAPTION" => ProcessorType::Caption, "CAPTION" => ProcessorType::Caption,
_ => return, _ => return,
}; };

View File

@@ -209,48 +209,6 @@ pub const PROCESSOR_SCHEMAS: &[ProcessorJsonSchema] = &[
], ],
min_data_threshold: 1, min_data_threshold: 1,
}, },
ProcessorJsonSchema {
processor: ProcessorType::Story,
required_fields: &[
RequiredField {
path: "child_chunks",
field_type: FieldType::Array,
allow_empty: true,
},
RequiredField {
path: "parent_chunks",
field_type: FieldType::Array,
allow_empty: true,
},
RequiredField {
path: "stats",
field_type: FieldType::Object,
allow_empty: false,
},
],
min_data_threshold: 0,
},
ProcessorJsonSchema {
processor: ProcessorType::MediaPipe,
required_fields: &[
RequiredField {
path: "frame_count",
field_type: FieldType::PositiveNumber,
allow_empty: false,
},
RequiredField {
path: "fps",
field_type: FieldType::PositiveNumber,
allow_empty: false,
},
RequiredField {
path: "frames",
field_type: FieldType::Array,
allow_empty: true,
},
],
min_data_threshold: 0,
},
]; ];
/// Get schema for a processor /// Get schema for a processor

View File

@@ -161,24 +161,6 @@ fn count_data_items(processor: &ProcessorType, value: &serde_json::Value) -> usi
.and_then(|v| v.as_array()) .and_then(|v| v.as_array())
.map(|a| a.len()) .map(|a| a.len())
.unwrap_or(0), .unwrap_or(0),
ProcessorType::Story => {
let child = value
.get("child_chunks")
.and_then(|v| v.as_array())
.map(|a| a.len())
.unwrap_or(0);
let parent = value
.get("parent_chunks")
.and_then(|v| v.as_array())
.map(|a| a.len())
.unwrap_or(0);
child + parent
}
ProcessorType::MediaPipe => value
.get("frames")
.and_then(|v| v.as_array())
.map(|a| a.len())
.unwrap_or(0),
_ => 0, _ => 0,
} }
} }
@@ -318,23 +300,6 @@ fn check_reasonableness(
} }
} }
// Story-specific: check chunk count vs cut scene count
if *processor == ProcessorType::Story {
if let Some(cut_value) = all_values.get("cut") {
let story_chunks = count_data_items(processor, value);
let cut_scenes = count_data_items(&ProcessorType::Cut, cut_value);
if story_chunks > 0 && cut_scenes > 0 {
// Story chunks should be >= cut scenes (one chunk per scene minimum)
if story_chunks < cut_scenes / 2 {
issues.push(format!(
"story chunk count ({}) much less than cut scene count ({})",
story_chunks, cut_scenes
));
}
}
}
}
// ASR-specific: check segments vs cut scenes // ASR-specific: check segments vs cut scenes
if *processor == ProcessorType::Asr { if *processor == ProcessorType::Asr {
if let Some(cut_value) = all_values.get("cut") { if let Some(cut_value) = all_values.get("cut") {
@@ -499,11 +464,6 @@ fn build_data_summary(processor: &ProcessorType, value: &serde_json::Value) -> s
summary["speaker_count"] = serde_json::json!(speakers.len()); summary["speaker_count"] = serde_json::json!(speakers.len());
} }
} }
ProcessorType::Story => {
if let Some(stats) = value.get("stats") {
summary["stats"] = stats.clone();
}
}
_ => {} _ => {}
} }
@@ -538,10 +498,7 @@ pub fn verify_file(file_uuid: &str) -> FileVerificationReport {
let mut all_values: HashMap<String, serde_json::Value> = HashMap::new(); let mut all_values: HashMap<String, serde_json::Value> = HashMap::new();
for processor in &processors { for processor in &processors {
let proc_name = processor.as_str(); let proc_name = processor.as_str();
let filename = match processor { let filename = format!("{}.{}.json", full_uuid, proc_name);
ProcessorType::Story => format!("{}.story_story.json", full_uuid),
_ => format!("{}.{}.json", full_uuid, proc_name),
};
let path = PathBuf::from(OUTPUT_DIR.as_str()).join(&filename); let path = PathBuf::from(OUTPUT_DIR.as_str()).join(&filename);
if let Ok(content) = std::fs::read_to_string(&path) { if let Ok(content) = std::fs::read_to_string(&path) {
@@ -639,10 +596,7 @@ pub fn verify_file(file_uuid: &str) -> FileVerificationReport {
/// Legacy verification function (backward compatible) /// Legacy verification function (backward compatible)
pub fn verify_output(processor: &ProcessorType, file_uuid: &str) -> VerificationResult { pub fn verify_output(processor: &ProcessorType, file_uuid: &str) -> VerificationResult {
let proc_name = processor.as_str(); let proc_name = processor.as_str();
let filename = match processor { let filename = format!("{}.{}.json", file_uuid, proc_name);
ProcessorType::Story => format!("{}.story_story.json", file_uuid),
_ => format!("{}.{}.json", file_uuid, proc_name),
};
let output_path = PathBuf::from(OUTPUT_DIR.as_str()).join(&filename); let output_path = PathBuf::from(OUTPUT_DIR.as_str()).join(&filename);
if !output_path.exists() { if !output_path.exists() {

View File

@@ -14,9 +14,7 @@ struct ProcessorCleanupGuard {
running_count: Arc<RwLock<usize>>, running_count: Arc<RwLock<usize>>,
frame_count: Arc<RwLock<usize>>, frame_count: Arc<RwLock<usize>>,
time_count: Arc<RwLock<usize>>, time_count: Arc<RwLock<usize>>,
best_effort_count: Arc<RwLock<usize>>,
pipeline: PipelineType, pipeline: PipelineType,
is_best_effort: bool,
} }
impl Drop for ProcessorCleanupGuard { impl Drop for ProcessorCleanupGuard {
@@ -32,30 +30,22 @@ impl Drop for ProcessorCleanupGuard {
*guard -= 1; *guard -= 1;
} }
} }
if self.is_best_effort { match self.pipeline {
if let Ok(mut guard) = self.best_effort_count.try_write() { PipelineType::Frame => {
if *guard > 0 { if let Ok(mut guard) = self.frame_count.try_write() {
*guard -= 1; if *guard > 0 {
} *guard -= 1;
}
} else {
match self.pipeline {
PipelineType::Frame => {
if let Ok(mut guard) = self.frame_count.try_write() {
if *guard > 0 {
*guard -= 1;
}
} }
} }
PipelineType::Time => { }
if let Ok(mut guard) = self.time_count.try_write() { PipelineType::Time => {
if *guard > 0 { if let Ok(mut guard) = self.time_count.try_write() {
*guard -= 1; if *guard > 0 {
} *guard -= 1;
} }
} }
PipelineType::Cross => {} // cross pipeline not tracked in slot counts
} }
PipelineType::Cross => {}
} }
} }
} }
@@ -106,8 +96,6 @@ pub struct ProcessorTask {
const FRAME_SLOT_MAX: usize = 2; const FRAME_SLOT_MAX: usize = 2;
/// Time pipeline max concurrent processors (audio is heavy, run 1 at a time). /// Time pipeline max concurrent processors (audio is heavy, run 1 at a time).
const TIME_SLOT_MAX: usize = 1; const TIME_SLOT_MAX: usize = 1;
/// Best-effort slot (used by low-priority processors like MediaPipe).
const BEST_EFFORT_SLOT_MAX: usize = 1;
pub struct ProcessorPool { pub struct ProcessorPool {
db: Arc<PostgresDb>, db: Arc<PostgresDb>,
@@ -117,7 +105,6 @@ pub struct ProcessorPool {
running_count: Arc<RwLock<usize>>, running_count: Arc<RwLock<usize>>,
running_frame_count: Arc<RwLock<usize>>, running_frame_count: Arc<RwLock<usize>>,
running_time_count: Arc<RwLock<usize>>, running_time_count: Arc<RwLock<usize>>,
running_best_effort_count: Arc<RwLock<usize>>,
} }
impl ProcessorPool { impl ProcessorPool {
@@ -130,7 +117,6 @@ impl ProcessorPool {
running_count: Arc::new(RwLock::new(0)), running_count: Arc::new(RwLock::new(0)),
running_frame_count: Arc::new(RwLock::new(0)), running_frame_count: Arc::new(RwLock::new(0)),
running_time_count: Arc::new(RwLock::new(0)), running_time_count: Arc::new(RwLock::new(0)),
running_best_effort_count: Arc::new(RwLock::new(0)),
} }
} }
@@ -240,22 +226,16 @@ impl ProcessorPool {
*count += 1; *count += 1;
} }
// 遞增產線專屬 slot // 遞增產線專屬 slot
let is_best_effort = processor_type == ProcessorType::MediaPipe; match pipeline {
if is_best_effort { PipelineType::Frame => *self.running_frame_count.write().await += 1,
*self.running_best_effort_count.write().await += 1; PipelineType::Time => *self.running_time_count.write().await += 1,
} else { PipelineType::Cross => {} // cross pipeline uses global slot
match pipeline {
PipelineType::Frame => *self.running_frame_count.write().await += 1,
PipelineType::Time => *self.running_time_count.write().await += 1,
PipelineType::Cross => {} // cross pipeline uses global slot
}
} }
let running = self.running.clone(); let running = self.running.clone();
let running_count = self.running_count.clone(); let running_count = self.running_count.clone();
let running_frame_count = self.running_frame_count.clone(); let running_frame_count = self.running_frame_count.clone();
let running_time_count = self.running_time_count.clone(); let running_time_count = self.running_time_count.clone();
let running_best_effort_count = self.running_best_effort_count.clone();
let child_pid: Arc<RwLock<Option<i32>>> = Arc::new(RwLock::new(None)); let child_pid: Arc<RwLock<Option<i32>>> = Arc::new(RwLock::new(None));
running.write().await.insert( running.write().await.insert(
job_id, job_id,
@@ -287,9 +267,7 @@ impl ProcessorPool {
running_count: running_count.clone(), running_count: running_count.clone(),
frame_count: running_frame_count.clone(), frame_count: running_frame_count.clone(),
time_count: running_time_count.clone(), time_count: running_time_count.clone(),
best_effort_count: running_best_effort_count.clone(),
pipeline, pipeline,
is_best_effort,
}; };
info!("Starting processor {} for job {}", processor_name, job.uuid); info!("Starting processor {} for job {}", processor_name, job.uuid);
@@ -528,10 +506,7 @@ impl ProcessorPool {
// Generate output path // Generate output path
let output_dir = PathBuf::from(OUTPUT_DIR.as_str()); let output_dir = PathBuf::from(OUTPUT_DIR.as_str());
let suffix = match processor_type { let suffix = format!("{}.{}", job.uuid, processor_type.as_str());
ProcessorType::Story => format!("{}.story_story", job.uuid),
_ => format!("{}.{}", job.uuid, processor_type.as_str()),
};
let output_path = output_dir.join(format!("{}.json", suffix)); let output_path = output_dir.join(format!("{}.json", suffix));
// Ensure output directory exists // Ensure output directory exists
@@ -1052,80 +1027,6 @@ impl ProcessorPool {
pid: 0, pid: 0,
}) })
} }
ProcessorType::Story => {
let executor = crate::core::processor::PythonExecutor::new()?;
let _ = executor
.run(
"parent_chunk_5w1h.py",
&["--file-uuid", &job.uuid, "--embed"],
uuid,
"STORY",
Some(std::time::Duration::from_secs(300)),
)
.await;
let narratives_path = output_dir.join(format!("{}.narratives.json", job.uuid));
let chunks_produced = if narratives_path.exists() {
let content = std::fs::read_to_string(&narratives_path).unwrap_or_default();
let count: i32 = serde_json::from_str::<Vec<String>>(&content)
.map(|v| v.len() as i32)
.unwrap_or(0);
tracing::info!("Story generated {} narratives for {}", count, job.uuid);
count
} else {
0
};
Ok(ProcessorOutput {
data: serde_json::Value::Null,
chunks_produced,
frames_processed: total_frames,
total_frames,
retry_count: 0,
pid: 0,
})
}
ProcessorType::FiveW1H => {
let executor = crate::core::processor::PythonExecutor::new()?;
let _ = executor
.run(
"parent_chunk_5w1h.py",
&["--file-uuid", &job.uuid, "--embed", "--mode", "llm"],
uuid,
"5W1H",
Some(std::time::Duration::from_secs(300)),
)
.await;
Ok(ProcessorOutput {
data: serde_json::Value::Null,
chunks_produced: 0,
frames_processed: total_frames,
total_frames,
retry_count: 0,
pid: 0,
})
}
ProcessorType::MediaPipe => {
let result = processor::process_mediapipe_v2(
video_path,
output_path.to_str().unwrap(),
uuid,
Some(&sample_frames),
)
.await?;
let chunks_produced = result.frames.len() as i32;
tracing::info!(
"MEDIAPIPE completed, {} frames for {}",
chunks_produced,
job.uuid
);
Ok(ProcessorOutput {
data: serde_json::to_value(result)?,
chunks_produced,
frames_processed: total_frames,
total_frames,
retry_count: 0,
pid: 0,
})
}
} }
} }