Files
momentry_core/docs_v1.0/doc/12_agent.html

399 lines
15 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>12 Agent - Momentry API Docs</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; background: #f5f5f5; color: #333; padding: 40px; }
.container { max-width: 960px; margin: 0 auto; background: white; border-radius: 12px; box-shadow: 0 2px 12px rgba(0,0,0,0.08); padding: 40px; }
h1 { font-size: 24px; margin: 24px 0 12px; }
h2 { font-size: 20px; margin: 20px 0 10px; color: #222; }
h3 { font-size: 16px; margin: 16px 0 8px; color: #444; }
p { line-height: 1.6; margin: 8px 0; }
table { border-collapse: collapse; width: 100%; margin: 12px 0; font-size: 14px; }
th, td { border: 1px solid #ddd; padding: 8px 12px; text-align: left; }
th { background: #f0f0f0; font-weight: 600; }
code { background: #f0f0f0; padding: 2px 6px; border-radius: 3px; font-size: 13px; }
pre { background: #f8f8f8; border: 1px solid #ddd; border-radius: 6px; padding: 12px; overflow-x: auto; margin: 12px 0; }
pre code { background: none; padding: 0; }
a { color: #0066cc; }
.back { display: inline-block; margin-bottom: 20px; color: #666; }
.back:hover { color: #333; }
.topbar { display: flex; justify-content: space-between; align-items: center; margin-bottom: 20px; }
.logout-btn { font-size: 13px; color: #999; text-decoration: none; }
.logout-btn:hover { color: #cc0000; }
</style>
</head>
<body>
<div class="container">
<div class="topbar">
<a class="back" href="index.html">&larr; Back to index</a>
<a class="logout-btn" href="#" onclick="fetch('/api/v1/auth/logout',{method:'POST'}).then(()=>window.location.reload());return false">Logout</a>
</div>
<h1>Agent Endpoints</h1>
<p>Agent endpoints provide AI-powered capabilities including translation, identity analysis, and 5W1H extraction.</p>
<h2>POST /api/v1/agents/translate</h2>
<p>Translate text between languages using Gemma4 (llama.cpp, port 8082).</p>
<h3>Request</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;text&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;Hello, welcome to Momentry Core.&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;target_language&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;Traditional Chinese&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;source_language&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;English&quot;</span>
<span class="p">}</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Required</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>text</code></td>
<td>string</td>
<td></td>
<td>Text to translate</td>
</tr>
<tr>
<td><code>target_language</code></td>
<td>string</td>
<td></td>
<td>Target language name (e.g. "Traditional Chinese", "Japanese")</td>
</tr>
<tr>
<td><code>source_language</code></td>
<td>string</td>
<td></td>
<td>Source language (default: "auto")</td>
</tr>
</tbody>
</table>
<h3>Response</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;success&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;translated_text&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;您好,歡迎使用 Momentry Core。&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;source_language_detected&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;English&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;model_used&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;google_gemma-4-26B-A4B-it-Q5_K_M.gguf&quot;</span>
<span class="p">}</span>
</code></pre></div>
<h3>Supported Language Pairs (tested)</h3>
<table class="table">
<thead>
<tr>
<th>Source</th>
<th>Target</th>
<th>Quality</th>
</tr>
</thead>
<tbody>
<tr>
<td>English</td>
<td>Traditional Chinese</td>
<td></td>
</tr>
<tr>
<td>English</td>
<td>Japanese</td>
<td></td>
</tr>
<tr>
<td>Chinese</td>
<td>English</td>
<td></td>
</tr>
<tr>
<td>English</td>
<td>French</td>
<td></td>
</tr>
<tr>
<td>Chinese</td>
<td>Japanese</td>
<td></td>
</tr>
</tbody>
</table>
<h3>Model</h3>
<ul>
<li><strong>Model</strong>: Gemma4 26B (Q5_K_M)</li>
<li><strong>Engine</strong>: llama.cpp at <code>localhost:8082</code></li>
<li><strong>Endpoint</strong>: <code>/v1/chat/completions</code> (OpenAI-compatible)</li>
<li><strong>Temperature</strong>: 0.1</li>
<li><strong>Max tokens</strong>: 1024</li>
</ul>
<h3>Errors</h3>
<table class="table">
<thead>
<tr>
<th>Status</th>
<th>Condition</th>
</tr>
</thead>
<tbody>
<tr>
<td>500</td>
<td>LLM unreachable or response parse failure</td>
</tr>
<tr>
<td>401</td>
<td>Missing/invalid auth</td>
</tr>
</tbody>
</table>
<hr />
<h2>POST /api/v1/agents/5w1h/analyze</h2>
<p>Extract 5W1H (Who, What, When, Where, Why, How) from a scene. Uses Gemma4 LLM on port 8082.</p>
<h3>Request</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;file_uuid&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;3abeee81d94597629ed8cb943f182e94&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;scene_id&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">42</span>
<span class="p">}</span>
</code></pre></div>
<h3>Response</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;success&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;5w1h&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;who&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;Cary Grant&quot;</span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;what&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;discussing plans&quot;</span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;when&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;1963&quot;</span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;where&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;Paris&quot;</span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;why&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;vacation&quot;</span><span class="p">],</span>
<span class="w"> </span><span class="nt">&quot;how&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;in person&quot;</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h2>POST /api/v1/agents/5w1h/batch</h2>
<p>Batch analyze all scenes in a file for 5W1H extraction. Uses the pipeline's <code>parent_chunk_5w1h.py --mode llm</code>.</p>
<h3>Request</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;file_uuid&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;3abeee81d94597629ed8cb943f182e94&quot;</span>
<span class="p">}</span>
</code></pre></div>
<h2>GET /api/v1/agents/5w1h/status</h2>
<p>Get status of the 5W1H agent pipeline for a file.</p>
<hr />
<h2>Embedding Model</h2>
<table class="table">
<thead>
<tr>
<th>Detail</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Model</strong></td>
<td>EmbeddingGemma-300m</td>
</tr>
<tr>
<td><strong>Endpoint</strong></td>
<td><code>POST /v1/embeddings</code> on port 11436</td>
</tr>
<tr>
<td><strong>Dimension</strong></td>
<td>768</td>
</tr>
<tr>
<td><strong>Used by</strong></td>
<td><code>parent_chunk_5w1h.py --embed</code>, story, 5W1H, search</td>
</tr>
</tbody>
</table>
<hr />
<h2>POST /api/v1/agents/search</h2>
<p>Conversational search assistant. Uses Gemma4 function calling to automatically decide which tools to call based on the user's natural language query. Supports multi-turn conversation.</p>
<h3>Request</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;query&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;Audrey Hepburn 和 Cary Grant 第一次同框在哪個 frame&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;conversation_id&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;file_uuid&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span>
<span class="p">}</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Required</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>query</code></td>
<td>string</td>
<td></td>
<td>自然語言查詢</td>
</tr>
<tr>
<td><code>conversation_id</code></td>
<td>string</td>
<td></td>
<td>延續對話時傳入;新對話不傳</td>
</tr>
<tr>
<td><code>file_uuid</code></td>
<td>string</td>
<td></td>
<td>Portal 有選中檔案時可指定</td>
</tr>
</tbody>
</table>
<h3>Response</h3>
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;success&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;conversation_id&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;conv_abc123&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;answer&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;在 Charade (1963) 中Audrey Hepburn 與 Cary Grant 第一次同框在第 38619 幀(約 1544.76 秒)。&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;need_input&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;sources&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;tool&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;tkg_query&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;result&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;{\&quot;first_cooccurrence\&quot;:{\&quot;frame\&quot;:38619,\&quot;timestamp_secs\&quot;:1544.76}}&quot;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<table class="table">
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>conversation_id</code></td>
<td>string</td>
<td>後續對話需要傳入此 ID</td>
</tr>
<tr>
<td><code>answer</code></td>
<td>string</td>
<td>Agent 的自然語言回答(或反問)</td>
</tr>
<tr>
<td><code>need_input</code></td>
<td>boolean</td>
<td><code>true</code> 表示 agent 需要更多資訊才能回答</td>
</tr>
<tr>
<td><code>suggestions</code></td>
<td>string[]</td>
<td>建議用戶提供的線索(當 <code>need_input=true</code></td>
</tr>
<tr>
<td><code>sources</code></td>
<td>array</td>
<td>引用的工具執行結果</td>
</tr>
</tbody>
</table>
<h3>Conversation Flow</h3>
<div class="codehilite"><pre><span></span><code>Round 1: POST /agents/search { query: &quot;我想看男女主角同框&quot; }
→ need_input: true, suggestions: [&quot;片名&quot;, &quot;演員&quot;, &quot;年代&quot;]
→ answer: &quot;請問是哪部電影?請提供更多線索&quot;
Round 2: POST /agents/search { query: &quot;奧黛麗赫本&quot;, conversation_id: &quot;...&quot; }
→ need_input: false
→ answer: &quot;找到 Charade (1963)Audrey Hepburn 和 Cary Grant...&quot;
</code></pre></div>
<h3>Available Tools</h3>
<p>Agent 內部使用 Gemma4 function calling 自動調用以下工具:</p>
<table class="table">
<thead>
<tr>
<th>Tool</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>find_file</code></td>
<td>透過片名/演員/年份關鍵字搜尋影片,回傳 file_uuid + has_data 狀態</td>
</tr>
<tr>
<td><code>list_files</code></td>
<td>列出近期註冊的影片</td>
</tr>
<tr>
<td><code>tkg_query</code></td>
<td>查詢人物互動資料7 種子類型top_identities、first_cooccurrence、identity_details、mutual_gaze、interaction_network、identity_traces、file_info</td>
</tr>
<tr>
<td><code>smart_search</code></td>
<td>文字內容 ILIKE 搜尋 chunk可指定 file_uuid 限制範圍)</td>
</tr>
<tr>
<td><code>get_identity_detail</code></td>
<td>查詢單一身份的詳細資料角色、TMDb 資訊)</td>
</tr>
<tr>
<td><code>get_file_info</code></td>
<td>查詢影片基本資訊(片長、解析度)</td>
</tr>
<tr>
<td><code>get_representative_frame</code></td>
<td>查詢影片最具代表性的 frame 資訊</td>
</tr>
</tbody>
</table>
<h3>Design Principles</h3>
<ul>
<li><strong>用戶不需要知道 file_uuid</strong> — Agent 會自動用 <code>find_file</code> 搜尋或反問</li>
<li><strong>不推薦無資料的影片</strong><code>has_data=false</code> 的影片不會被推薦給用戶</li>
<li><strong>多輪對話</strong> — 透過 <code>conversation_id</code> 延續上下文agent 會記得之前的交流</li>
<li><strong>並行工具呼叫</strong> — Gemma4 可以一次呼叫多個工具再綜合回答</li>
</ul>
<h3>Model</h3>
<table class="table">
<thead>
<tr>
<th>Detail</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>LLM</strong></td>
<td>Gemma4 26B (Q5_K_M)</td>
</tr>
<tr>
<td><strong>Engine</strong></td>
<td>llama.cpp at <code>localhost:8082</code></td>
</tr>
<tr>
<td><strong>Endpoint</strong></td>
<td><code>/v1/chat/completions</code> (OpenAI-compatible)</td>
</tr>
<tr>
<td><strong>Temperature</strong></td>
<td>0.1</td>
</tr>
<tr>
<td><strong>Max rounds</strong></td>
<td>5 (tool call iterations)</td>
</tr>
<tr>
<td><strong>Conversation TTL</strong></td>
<td>30 minutes</td>
</tr>
</tbody>
</table>
<hr />
<p><em>Updated: 2026-05-22</em></p>
</div>
</body>
</html>