feat: progressive multi-round face matching + pending person API

- Identity agent: per-face max matching, multi-round with derived
  seeds from high-confidence faces, angle diversity filter (cosine sim < 0.90)
- Pending person API: POST /file/:file_uuid/pending-person
  + GET /file/:file_uuid/pending-persons with status=pending, source=manual
- Update API docs (07_identity.md)
This commit is contained in:
Accusys
2026-06-24 03:42:04 +08:00
parent 766a1d9a6d
commit 14e886cc08
31 changed files with 5882 additions and 742 deletions

View File

@@ -32,7 +32,7 @@ a { color: #0066cc; }
<a class="logout-btn" href="#" onclick="fetch('/api/v1/auth/logout',{method:'POST'}).then(()=>window.location.reload());return false">Logout</a>
</div>
<!-- module: search -->
<!-- description: Vector search, BM25, smart search, universal search, visual search -->
<!-- description: Vector search, BM25, smart search, universal search, LLM reranked search, frame search -->
<!-- depends: 01_auth -->
<h2>Search APIs</h2>
@@ -282,9 +282,251 @@ a { color: #0066cc; }
<h3><code>POST /api/v1/search/frames</code></h3>
<p><strong>Auth</strong>: Required
<strong>Scope</strong>: global / file-level</p>
<p>Search face detection frames by identity name or trace ID.</p>
<p>Search frames by YOLO objects, OCR text, face IDs, or pose detections. Filters frames based on visual content detected during processing.</p>
<h4>Request Parameters</h4>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Required</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>file_uuid</code></td>
<td>string</td>
<td>No</td>
<td></td>
<td>Restrict to specific file</td>
</tr>
<tr>
<td><code>object_class</code></td>
<td>string</td>
<td>No</td>
<td></td>
<td>Filter by YOLO object class (e.g., <code>person</code>, <code>car</code>, <code>dog</code>)</td>
</tr>
<tr>
<td><code>ocr_text</code></td>
<td>string</td>
<td>No</td>
<td></td>
<td>Filter by OCR text content (ILIKE match)</td>
</tr>
<tr>
<td><code>face_id</code></td>
<td>string</td>
<td>No</td>
<td></td>
<td>Filter by face detection ID</td>
</tr>
<tr>
<td><code>time_range</code></td>
<td>[float, float]</td>
<td>No</td>
<td></td>
<td>Filter by time range <code>[start_secs, end_secs]</code></td>
</tr>
<tr>
<td><code>limit</code></td>
<td>integer</td>
<td>No</td>
<td>100</td>
<td>Max results</td>
</tr>
</tbody>
</table>
<h4>Example</h4>
<div class="codehilite"><pre><span></span><code><span class="c1"># Search for frames containing &quot;person&quot; objects</span>
curl<span class="w"> </span>-s<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s2">&quot;</span><span class="nv">$API</span><span class="s2">/api/v1/search/frames&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-H<span class="w"> </span><span class="s2">&quot;Content-Type: application/json&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-H<span class="w"> </span><span class="s2">&quot;X-API-Key: </span><span class="nv">$KEY</span><span class="s2">&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;file_uuid&quot;: &quot;&#39;</span><span class="s2">&quot;</span><span class="nv">$FILE_UUID</span><span class="s2">&quot;</span><span class="s1">&#39;&quot;, &quot;object_class&quot;: &quot;person&quot;, &quot;limit&quot;: 20}&#39;</span>
<span class="c1"># Search for frames with specific OCR text</span>
curl<span class="w"> </span>-s<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s2">&quot;</span><span class="nv">$API</span><span class="s2">/api/v1/search/frames&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-H<span class="w"> </span><span class="s2">&quot;Content-Type: application/json&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-H<span class="w"> </span><span class="s2">&quot;X-API-Key: </span><span class="nv">$KEY</span><span class="s2">&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;file_uuid&quot;: &quot;&#39;</span><span class="s2">&quot;</span><span class="nv">$FILE_UUID</span><span class="s2">&quot;</span><span class="s1">&#39;&quot;, &quot;ocr_text&quot;: &quot;hello&quot;, &quot;time_range&quot;: [10.0, 30.0]}&#39;</span>
</code></pre></div>
<h4>Response (200)</h4>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;frames&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;frame_number&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">1200</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;timestamp&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">50.0</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;file_uuid&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;d3f9ae8e471a1fc4d47022c66091b920&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;objects&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nt">&quot;class&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;person&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;confidence&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">0.95</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;bbox&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">100</span><span class="p">,</span><span class="w"> </span><span class="mi">50</span><span class="p">,</span><span class="w"> </span><span class="mi">300</span><span class="p">,</span><span class="w"> </span><span class="mi">400</span><span class="p">]}],</span>
<span class="w"> </span><span class="nt">&quot;ocr_texts&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;Hello World&quot;</span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;faces&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nt">&quot;face_id&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;face_42&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;confidence&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">0.88</span><span class="p">}],</span>
<span class="w"> </span><span class="nt">&quot;pose_persons&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nt">&quot;trace_id&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;bbox&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">120</span><span class="p">,</span><span class="w"> </span><span class="mi">60</span><span class="p">,</span><span class="w"> </span><span class="mi">280</span><span class="p">,</span><span class="w"> </span><span class="mi">380</span><span class="p">]}]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;total&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">15</span>
<span class="p">}</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>frames</code></td>
<td>array</td>
<td>Array of matching frame objects</td>
</tr>
<tr>
<td><code>frames[].frame_number</code></td>
<td>integer</td>
<td>Frame number in video</td>
</tr>
<tr>
<td><code>frames[].timestamp</code></td>
<td>float</td>
<td>Timestamp in seconds</td>
</tr>
<tr>
<td><code>frames[].file_uuid</code></td>
<td>string</td>
<td>File UUID</td>
</tr>
<tr>
<td><code>frames[].objects</code></td>
<td>array/null</td>
<td>YOLO detections in this frame</td>
</tr>
<tr>
<td><code>frames[].ocr_texts</code></td>
<td>array/null</td>
<td>OCR text strings in this frame</td>
</tr>
<tr>
<td><code>frames[].faces</code></td>
<td>array/null</td>
<td>Face detections in this frame</td>
</tr>
<tr>
<td><code>frames[].pose_persons</code></td>
<td>array/null</td>
<td>Pose-detected persons in this frame</td>
</tr>
<tr>
<td><code>total</code></td>
<td>integer</td>
<td>Total matching frame count</td>
</tr>
</tbody>
</table>
<hr />
<h3><code>GET /api/v1/search/identity_text</code></h3>
<h3><code>POST /api/v1/search/llm-smart</code></h3>
<p><strong>Auth</strong>: Required
<strong>Scope</strong>: global / file-level</p>
<p>Smart search with LLM re-ranking. First fetches candidate results via RRF (Reciprocal Rank Fusion) using the existing smart search, then uses an LLM (Gemma4 on port 8000) to re-rank candidates by relevance to the query.</p>
<h4>Request Parameters</h4>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Required</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>query</code></td>
<td>string</td>
<td>Yes</td>
<td></td>
<td>Search text</td>
</tr>
<tr>
<td><code>file_uuid</code></td>
<td>string</td>
<td>No</td>
<td></td>
<td>File UUID to search within</td>
</tr>
<tr>
<td><code>limit</code></td>
<td>integer</td>
<td>No</td>
<td>10</td>
<td>Max results to return</td>
</tr>
</tbody>
</table>
<h4>Pipeline</h4>
<div class="codehilite"><pre><span></span><code><span class="w"> </span><span class="mf">1.</span><span class="w"> </span><span class="n">smart_search</span><span class="w"> </span><span class="n"></span><span class="w"> </span><span class="k">fetch</span><span class="w"> </span><span class="n">N</span><span class="w"> </span><span class="n">candidates</span><span class="w"> </span><span class="p">(</span><span class="k">limit</span><span class="w"> </span><span class="n">×</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">clamped</span><span class="w"> </span><span class="mi">10</span><span class="o">-</span><span class="mi">20</span><span class="p">)</span>
<span class="w"> </span><span class="mf">2.</span><span class="w"> </span><span class="n">LLM</span><span class="w"> </span><span class="n">rerank</span><span class="w"> </span><span class="n"></span><span class="w"> </span><span class="n">re</span><span class="o">-</span><span class="k">order</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">relevance</span><span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">Gemma4</span>
<span class="w"> </span><span class="mf">3.</span><span class="w"> </span><span class="n">trim</span><span class="w"> </span><span class="n"></span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="n n-Quoted">`limit`</span><span class="w"> </span><span class="n">results</span>
</code></pre></div>
<h4>Example</h4>
<div class="codehilite"><pre><span></span><code>curl<span class="w"> </span>-s<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s2">&quot;</span><span class="nv">$API</span><span class="s2">/api/v1/search/llm-smart&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-H<span class="w"> </span><span class="s2">&quot;Content-Type: application/json&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-H<span class="w"> </span><span class="s2">&quot;X-API-Key: </span><span class="nv">$KEY</span><span class="s2">&quot;</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;query&quot;: &quot;two people having a conversation about business&quot;, &quot;limit&quot;: 5}&#39;</span>
</code></pre></div>
<h4>Response (200)</h4>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;query&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;two people having a conversation about business&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;results&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;file_uuid&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;d3f9ae8e471a1fc4d47022c66091b920&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;parent_id&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">1234</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;scene_order&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">1234</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;start_frame&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">5000</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;end_frame&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">5200</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;fps&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">24.0</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;start_time&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">208.3</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;end_time&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">216.7</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;summary&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;[208s-217s, 9s] Two people discussing project timeline...&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;similarity&quot;</span><span class="p">:</span><span class="w"> </span><span class="mf">0.72</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;page&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;page_size&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;strategy&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;llm_reranked&quot;</span>
<span class="p">}</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>strategy</code></td>
<td>string</td>
<td>Always <code>"llm_reranked"</code> for this endpoint</td>
</tr>
<tr>
<td><code>results</code></td>
<td>array</td>
<td>Re-ranked search results (same format as smart search)</td>
</tr>
</tbody>
</table>
<h4>Fallback</h4>
<p>If LLM reranking fails (model unavailable, timeout), falls back to RRF order without error.</p>
<hr />
<h3>Visual Search</h3>
<p><strong>Auth</strong>: Required
<strong>Scope</strong>: global / file-level</p>
<p>Search text chunks → find associated identities. Returns chunks where face detections overlap with text content.</p>
@@ -392,12 +634,13 @@ a { color: #0066cc; }
</tbody>
</table>
<hr />
<h3>Visual Search</h3>
<h3>Visual Search (Planned)</h3>
<table class="table">
<thead>
<tr>
<th>Method</th>
<th>Endpoint</th>
<th>Status</th>
<th>Description</th>
</tr>
</thead>
@@ -405,26 +648,31 @@ a { color: #0066cc; }
<tr>
<td>POST</td>
<td><code>/api/v1/search/visual</code></td>
<td>Not implemented</td>
<td>Search visual chunks</td>
</tr>
<tr>
<td>POST</td>
<td><code>/api/v1/search/visual/class</code></td>
<td>Not implemented</td>
<td>Search by object class</td>
</tr>
<tr>
<td>POST</td>
<td><code>/api/v1/search/visual/density</code></td>
<td>Not implemented</td>
<td>Search by object density</td>
</tr>
<tr>
<td>POST</td>
<td><code>/api/v1/search/visual/combination</code></td>
<td>Not implemented</td>
<td>Search by object combination</td>
</tr>
<tr>
<td>POST</td>
<td><code>/api/v1/search/visual/stats</code></td>
<td>Not implemented</td>
<td>Visual chunk statistics</td>
</tr>
</tbody>
@@ -457,7 +705,7 @@ a { color: #0066cc; }
</tbody>
</table>
<hr />
<p><em>Updated: 2026-05-27 — Added global search support for smart, universal, identity_text APIs</em></p>
<p><em>Updated: 2026-06-20 — Added llm-smart search, completed frames search documentation, marked visual search as planned</em></p>
</div>
</body>
</html>