399 lines
15 KiB
HTML
399 lines
15 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="UTF-8">
|
||
<title>12 Agent - Momentry API Docs</title>
|
||
<style>
|
||
* { margin: 0; padding: 0; box-sizing: border-box; }
|
||
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; background: #f5f5f5; color: #333; padding: 40px; }
|
||
.container { max-width: 960px; margin: 0 auto; background: white; border-radius: 12px; box-shadow: 0 2px 12px rgba(0,0,0,0.08); padding: 40px; }
|
||
h1 { font-size: 24px; margin: 24px 0 12px; }
|
||
h2 { font-size: 20px; margin: 20px 0 10px; color: #222; }
|
||
h3 { font-size: 16px; margin: 16px 0 8px; color: #444; }
|
||
p { line-height: 1.6; margin: 8px 0; }
|
||
table { border-collapse: collapse; width: 100%; margin: 12px 0; font-size: 14px; }
|
||
th, td { border: 1px solid #ddd; padding: 8px 12px; text-align: left; }
|
||
th { background: #f0f0f0; font-weight: 600; }
|
||
code { background: #f0f0f0; padding: 2px 6px; border-radius: 3px; font-size: 13px; }
|
||
pre { background: #f8f8f8; border: 1px solid #ddd; border-radius: 6px; padding: 12px; overflow-x: auto; margin: 12px 0; }
|
||
pre code { background: none; padding: 0; }
|
||
a { color: #0066cc; }
|
||
.back { display: inline-block; margin-bottom: 20px; color: #666; }
|
||
.back:hover { color: #333; }
|
||
.topbar { display: flex; justify-content: space-between; align-items: center; margin-bottom: 20px; }
|
||
.logout-btn { font-size: 13px; color: #999; text-decoration: none; }
|
||
.logout-btn:hover { color: #cc0000; }
|
||
</style>
|
||
</head>
|
||
<body>
|
||
<div class="container">
|
||
<div class="topbar">
|
||
<a class="back" href="index.html">← Back to index</a>
|
||
<a class="logout-btn" href="#" onclick="fetch('/api/v1/auth/logout',{method:'POST'}).then(()=>window.location.reload());return false">Logout</a>
|
||
</div>
|
||
<h1>Agent Endpoints</h1>
|
||
<p>Agent endpoints provide AI-powered capabilities including translation, identity analysis, and 5W1H extraction.</p>
|
||
<h2>POST /api/v1/agents/translate</h2>
|
||
<p>Translate text between languages using Gemma4 (llama.cpp, port 8082).</p>
|
||
<h3>Request</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"text"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hello, welcome to Momentry Core."</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"target_language"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Traditional Chinese"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"source_language"</span><span class="p">:</span><span class="w"> </span><span class="s2">"English"</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Field</th>
|
||
<th>Type</th>
|
||
<th>Required</th>
|
||
<th>Description</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><code>text</code></td>
|
||
<td>string</td>
|
||
<td>✅</td>
|
||
<td>Text to translate</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>target_language</code></td>
|
||
<td>string</td>
|
||
<td>✅</td>
|
||
<td>Target language name (e.g. "Traditional Chinese", "Japanese")</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>source_language</code></td>
|
||
<td>string</td>
|
||
<td>❌</td>
|
||
<td>Source language (default: "auto")</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h3>Response</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"success"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"translated_text"</span><span class="p">:</span><span class="w"> </span><span class="s2">"您好,歡迎使用 Momentry Core。"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"source_language_detected"</span><span class="p">:</span><span class="w"> </span><span class="s2">"English"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"model_used"</span><span class="p">:</span><span class="w"> </span><span class="s2">"google_gemma-4-26B-A4B-it-Q5_K_M.gguf"</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<h3>Supported Language Pairs (tested)</h3>
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Source</th>
|
||
<th>Target</th>
|
||
<th>Quality</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>English</td>
|
||
<td>Traditional Chinese</td>
|
||
<td>✅</td>
|
||
</tr>
|
||
<tr>
|
||
<td>English</td>
|
||
<td>Japanese</td>
|
||
<td>✅</td>
|
||
</tr>
|
||
<tr>
|
||
<td>Chinese</td>
|
||
<td>English</td>
|
||
<td>✅</td>
|
||
</tr>
|
||
<tr>
|
||
<td>English</td>
|
||
<td>French</td>
|
||
<td>✅</td>
|
||
</tr>
|
||
<tr>
|
||
<td>Chinese</td>
|
||
<td>Japanese</td>
|
||
<td>✅</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h3>Model</h3>
|
||
<ul>
|
||
<li><strong>Model</strong>: Gemma4 26B (Q5_K_M)</li>
|
||
<li><strong>Engine</strong>: llama.cpp at <code>localhost:8082</code></li>
|
||
<li><strong>Endpoint</strong>: <code>/v1/chat/completions</code> (OpenAI-compatible)</li>
|
||
<li><strong>Temperature</strong>: 0.1</li>
|
||
<li><strong>Max tokens</strong>: 1024</li>
|
||
</ul>
|
||
<h3>Errors</h3>
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Status</th>
|
||
<th>Condition</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>500</td>
|
||
<td>LLM unreachable or response parse failure</td>
|
||
</tr>
|
||
<tr>
|
||
<td>401</td>
|
||
<td>Missing/invalid auth</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<hr />
|
||
<h2>POST /api/v1/agents/5w1h/analyze</h2>
|
||
<p>Extract 5W1H (Who, What, When, Where, Why, How) from a scene. Uses Gemma4 LLM on port 8082.</p>
|
||
<h3>Request</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"file_uuid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"3abeee81d94597629ed8cb943f182e94"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"scene_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">42</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<h3>Response</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"success"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"5w1h"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"who"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"Cary Grant"</span><span class="p">],</span>
|
||
<span class="w"> </span><span class="nt">"what"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"discussing plans"</span><span class="p">],</span>
|
||
<span class="w"> </span><span class="nt">"when"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"1963"</span><span class="p">],</span>
|
||
<span class="w"> </span><span class="nt">"where"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"Paris"</span><span class="p">],</span>
|
||
<span class="w"> </span><span class="nt">"why"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"vacation"</span><span class="p">],</span>
|
||
<span class="w"> </span><span class="nt">"how"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"in person"</span><span class="p">]</span>
|
||
<span class="w"> </span><span class="p">}</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<h2>POST /api/v1/agents/5w1h/batch</h2>
|
||
<p>Batch analyze all scenes in a file for 5W1H extraction. Uses the pipeline's <code>parent_chunk_5w1h.py --mode llm</code>.</p>
|
||
<h3>Request</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"file_uuid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"3abeee81d94597629ed8cb943f182e94"</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<h2>GET /api/v1/agents/5w1h/status</h2>
|
||
<p>Get status of the 5W1H agent pipeline for a file.</p>
|
||
<hr />
|
||
<h2>Embedding Model</h2>
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Detail</th>
|
||
<th>Value</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><strong>Model</strong></td>
|
||
<td>EmbeddingGemma-300m</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Endpoint</strong></td>
|
||
<td><code>POST /v1/embeddings</code> on port 11436</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Dimension</strong></td>
|
||
<td>768</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Used by</strong></td>
|
||
<td><code>parent_chunk_5w1h.py --embed</code>, story, 5W1H, search</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<hr />
|
||
<h2>POST /api/v1/agents/search</h2>
|
||
<p>Conversational search assistant. Uses Gemma4 function calling to automatically decide which tools to call based on the user's natural language query. Supports multi-turn conversation.</p>
|
||
<h3>Request</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"query"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Audrey Hepburn 和 Cary Grant 第一次同框在哪個 frame?"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"conversation_id"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"file_uuid"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Field</th>
|
||
<th>Type</th>
|
||
<th>Required</th>
|
||
<th>Description</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><code>query</code></td>
|
||
<td>string</td>
|
||
<td>✅</td>
|
||
<td>自然語言查詢</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>conversation_id</code></td>
|
||
<td>string</td>
|
||
<td>❌</td>
|
||
<td>延續對話時傳入;新對話不傳</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>file_uuid</code></td>
|
||
<td>string</td>
|
||
<td>❌</td>
|
||
<td>Portal 有選中檔案時可指定</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h3>Response</h3>
|
||
<div class="codehilite"><pre><span></span><code><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"success"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"conversation_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"conv_abc123"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"answer"</span><span class="p">:</span><span class="w"> </span><span class="s2">"在 Charade (1963) 中,Audrey Hepburn 與 Cary Grant 第一次同框在第 38619 幀(約 1544.76 秒)。"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"need_input"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"sources"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
|
||
<span class="w"> </span><span class="p">{</span>
|
||
<span class="w"> </span><span class="nt">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"tkg_query"</span><span class="p">,</span>
|
||
<span class="w"> </span><span class="nt">"result"</span><span class="p">:</span><span class="w"> </span><span class="s2">"{\"first_cooccurrence\":{\"frame\":38619,\"timestamp_secs\":1544.76}}"</span>
|
||
<span class="w"> </span><span class="p">}</span>
|
||
<span class="w"> </span><span class="p">]</span>
|
||
<span class="p">}</span>
|
||
</code></pre></div>
|
||
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Field</th>
|
||
<th>Type</th>
|
||
<th>Description</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><code>conversation_id</code></td>
|
||
<td>string</td>
|
||
<td>後續對話需要傳入此 ID</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>answer</code></td>
|
||
<td>string</td>
|
||
<td>Agent 的自然語言回答(或反問)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>need_input</code></td>
|
||
<td>boolean</td>
|
||
<td><code>true</code> 表示 agent 需要更多資訊才能回答</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>suggestions</code></td>
|
||
<td>string[]</td>
|
||
<td>建議用戶提供的線索(當 <code>need_input=true</code>)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>sources</code></td>
|
||
<td>array</td>
|
||
<td>引用的工具執行結果</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h3>Conversation Flow</h3>
|
||
<div class="codehilite"><pre><span></span><code>Round 1: POST /agents/search { query: "我想看男女主角同框" }
|
||
→ need_input: true, suggestions: ["片名", "演員", "年代"]
|
||
→ answer: "請問是哪部電影?請提供更多線索"
|
||
|
||
Round 2: POST /agents/search { query: "奧黛麗赫本", conversation_id: "..." }
|
||
→ need_input: false
|
||
→ answer: "找到 Charade (1963),Audrey Hepburn 和 Cary Grant..."
|
||
</code></pre></div>
|
||
|
||
<h3>Available Tools</h3>
|
||
<p>Agent 內部使用 Gemma4 function calling 自動調用以下工具:</p>
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Tool</th>
|
||
<th>Description</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><code>find_file</code></td>
|
||
<td>透過片名/演員/年份關鍵字搜尋影片,回傳 file_uuid + has_data 狀態</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>list_files</code></td>
|
||
<td>列出近期註冊的影片</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>tkg_query</code></td>
|
||
<td>查詢人物互動資料(7 種子類型:top_identities、first_cooccurrence、identity_details、mutual_gaze、interaction_network、identity_traces、file_info)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>smart_search</code></td>
|
||
<td>文字內容 ILIKE 搜尋 chunk(可指定 file_uuid 限制範圍)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>get_identity_detail</code></td>
|
||
<td>查詢單一身份的詳細資料(角色、TMDb 資訊)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>get_file_info</code></td>
|
||
<td>查詢影片基本資訊(片長、解析度)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><code>get_representative_frame</code></td>
|
||
<td>查詢影片最具代表性的 frame 資訊</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h3>Design Principles</h3>
|
||
<ul>
|
||
<li><strong>用戶不需要知道 file_uuid</strong> — Agent 會自動用 <code>find_file</code> 搜尋或反問</li>
|
||
<li><strong>不推薦無資料的影片</strong> — <code>has_data=false</code> 的影片不會被推薦給用戶</li>
|
||
<li><strong>多輪對話</strong> — 透過 <code>conversation_id</code> 延續上下文,agent 會記得之前的交流</li>
|
||
<li><strong>並行工具呼叫</strong> — Gemma4 可以一次呼叫多個工具再綜合回答</li>
|
||
</ul>
|
||
<h3>Model</h3>
|
||
<table class="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Detail</th>
|
||
<th>Value</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><strong>LLM</strong></td>
|
||
<td>Gemma4 26B (Q5_K_M)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Engine</strong></td>
|
||
<td>llama.cpp at <code>localhost:8082</code></td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Endpoint</strong></td>
|
||
<td><code>/v1/chat/completions</code> (OpenAI-compatible)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Temperature</strong></td>
|
||
<td>0.1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Max rounds</strong></td>
|
||
<td>5 (tool call iterations)</td>
|
||
</tr>
|
||
<tr>
|
||
<td><strong>Conversation TTL</strong></td>
|
||
<td>30 minutes</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<hr />
|
||
<p><em>Updated: 2026-05-22</em></p>
|
||
</div>
|
||
</body>
|
||
</html> |