ollama source for Momentry Core verification
This commit is contained in:
421
docs/api/anthropic-compatibility.mdx
Normal file
421
docs/api/anthropic-compatibility.mdx
Normal file
@@ -0,0 +1,421 @@
|
||||
---
|
||||
title: Anthropic compatibility
|
||||
---
|
||||
|
||||
Ollama provides compatibility with the [Anthropic Messages API](https://docs.anthropic.com/en/api/messages) to help connect existing applications to Ollama, including tools like Claude Code.
|
||||
|
||||
## Usage
|
||||
|
||||
### Environment variables
|
||||
|
||||
To use Ollama with tools that expect the Anthropic API (like Claude Code), set these environment variables:
|
||||
|
||||
```shell
|
||||
export ANTHROPIC_AUTH_TOKEN=ollama # required but ignored
|
||||
export ANTHROPIC_BASE_URL=http://localhost:11434
|
||||
```
|
||||
|
||||
### Simple `/v1/messages` example
|
||||
|
||||
<CodeGroup dropdown>
|
||||
|
||||
```python basic.py
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url='http://localhost:11434',
|
||||
api_key='ollama', # required but ignored
|
||||
)
|
||||
|
||||
message = client.messages.create(
|
||||
model='qwen3-coder',
|
||||
max_tokens=1024,
|
||||
messages=[
|
||||
{'role': 'user', 'content': 'Hello, how are you?'}
|
||||
]
|
||||
)
|
||||
print(message.content[0].text)
|
||||
```
|
||||
|
||||
```javascript basic.js
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:11434",
|
||||
apiKey: "ollama", // required but ignored
|
||||
});
|
||||
|
||||
const message = await anthropic.messages.create({
|
||||
model: "qwen3-coder",
|
||||
max_tokens: 1024,
|
||||
messages: [{ role: "user", content: "Hello, how are you?" }],
|
||||
});
|
||||
|
||||
console.log(message.content[0].text);
|
||||
```
|
||||
|
||||
```shell basic.sh
|
||||
curl -X POST http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: ollama" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
-d '{
|
||||
"model": "qwen3-coder",
|
||||
"max_tokens": 1024,
|
||||
"messages": [{ "role": "user", "content": "Hello, how are you?" }]
|
||||
}'
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
### Streaming example
|
||||
|
||||
<CodeGroup dropdown>
|
||||
|
||||
```python streaming.py
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url='http://localhost:11434',
|
||||
api_key='ollama',
|
||||
)
|
||||
|
||||
with client.messages.stream(
|
||||
model='qwen3-coder',
|
||||
max_tokens=1024,
|
||||
messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
|
||||
) as stream:
|
||||
for text in stream.text_stream:
|
||||
print(text, end='', flush=True)
|
||||
```
|
||||
|
||||
```javascript streaming.js
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:11434",
|
||||
apiKey: "ollama",
|
||||
});
|
||||
|
||||
const stream = await anthropic.messages.stream({
|
||||
model: "qwen3-coder",
|
||||
max_tokens: 1024,
|
||||
messages: [{ role: "user", content: "Count from 1 to 10" }],
|
||||
});
|
||||
|
||||
for await (const event of stream) {
|
||||
if (
|
||||
event.type === "content_block_delta" &&
|
||||
event.delta.type === "text_delta"
|
||||
) {
|
||||
process.stdout.write(event.delta.text);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```shell streaming.sh
|
||||
curl -X POST http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "qwen3-coder",
|
||||
"max_tokens": 1024,
|
||||
"stream": true,
|
||||
"messages": [{ "role": "user", "content": "Count from 1 to 10" }]
|
||||
}'
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
### Tool calling example
|
||||
|
||||
<CodeGroup dropdown>
|
||||
|
||||
```python tools.py
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url='http://localhost:11434',
|
||||
api_key='ollama',
|
||||
)
|
||||
|
||||
message = client.messages.create(
|
||||
model='qwen3-coder',
|
||||
max_tokens=1024,
|
||||
tools=[
|
||||
{
|
||||
'name': 'get_weather',
|
||||
'description': 'Get the current weather in a location',
|
||||
'input_schema': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'location': {
|
||||
'type': 'string',
|
||||
'description': 'The city and state, e.g. San Francisco, CA'
|
||||
}
|
||||
},
|
||||
'required': ['location']
|
||||
}
|
||||
}
|
||||
],
|
||||
messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
|
||||
)
|
||||
|
||||
for block in message.content:
|
||||
if block.type == 'tool_use':
|
||||
print(f'Tool: {block.name}')
|
||||
print(f'Input: {block.input}')
|
||||
```
|
||||
|
||||
```javascript tools.js
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:11434",
|
||||
apiKey: "ollama",
|
||||
});
|
||||
|
||||
const message = await anthropic.messages.create({
|
||||
model: "qwen3-coder",
|
||||
max_tokens: 1024,
|
||||
tools: [
|
||||
{
|
||||
name: "get_weather",
|
||||
description: "Get the current weather in a location",
|
||||
input_schema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
location: {
|
||||
type: "string",
|
||||
description: "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
},
|
||||
required: ["location"],
|
||||
},
|
||||
},
|
||||
],
|
||||
messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
|
||||
});
|
||||
|
||||
for (const block of message.content) {
|
||||
if (block.type === "tool_use") {
|
||||
console.log("Tool:", block.name);
|
||||
console.log("Input:", block.input);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```shell tools.sh
|
||||
curl -X POST http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "qwen3-coder",
|
||||
"max_tokens": 1024,
|
||||
"tools": [
|
||||
{
|
||||
"name": "get_weather",
|
||||
"description": "Get the current weather in a location",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state"
|
||||
}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }]
|
||||
}'
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
[Claude Code](https://code.claude.com/docs/en/overview) can be configured to use Ollama as its backend.
|
||||
|
||||
### Recommended models
|
||||
|
||||
For coding use cases, models like `glm-4.7`, `minimax-m2.1`, and `qwen3-coder` are recommended.
|
||||
|
||||
Download a model before use:
|
||||
|
||||
```shell
|
||||
ollama pull qwen3-coder
|
||||
```
|
||||
> Note: Qwen 3 coder is a 30B parameter model requiring at least 24GB of VRAM to run smoothly. More is required for longer context lengths.
|
||||
|
||||
```shell
|
||||
ollama pull glm-4.7:cloud
|
||||
```
|
||||
|
||||
### Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch claude
|
||||
```
|
||||
|
||||
This will prompt you to select a model, configure Claude Code automatically, and launch it. To configure without launching:
|
||||
|
||||
```shell
|
||||
ollama launch claude --config
|
||||
```
|
||||
|
||||
### Manual setup
|
||||
|
||||
Set the environment variables and run Claude Code:
|
||||
|
||||
```shell
|
||||
ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3-coder
|
||||
```
|
||||
|
||||
Or set the environment variables in your shell profile:
|
||||
|
||||
```shell
|
||||
export ANTHROPIC_AUTH_TOKEN=ollama
|
||||
export ANTHROPIC_BASE_URL=http://localhost:11434
|
||||
```
|
||||
|
||||
Then run Claude Code with any Ollama model:
|
||||
|
||||
```shell
|
||||
claude --model qwen3-coder
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
### `/v1/messages`
|
||||
|
||||
#### Supported features
|
||||
|
||||
- [x] Messages
|
||||
- [x] Streaming
|
||||
- [x] System prompts
|
||||
- [x] Multi-turn conversations
|
||||
- [x] Vision (images)
|
||||
- [x] Tools (function calling)
|
||||
- [x] Tool results
|
||||
- [x] Thinking/extended thinking
|
||||
|
||||
#### Supported request fields
|
||||
|
||||
- [x] `model`
|
||||
- [x] `max_tokens`
|
||||
- [x] `messages`
|
||||
- [x] Text `content`
|
||||
- [x] Image `content` (base64)
|
||||
- [x] Array of content blocks
|
||||
- [x] `tool_use` blocks
|
||||
- [x] `tool_result` blocks
|
||||
- [x] `thinking` blocks
|
||||
- [x] `system` (string or array)
|
||||
- [x] `stream`
|
||||
- [x] `temperature`
|
||||
- [x] `top_p`
|
||||
- [x] `top_k`
|
||||
- [x] `stop_sequences`
|
||||
- [x] `tools`
|
||||
- [x] `thinking`
|
||||
- [ ] `tool_choice`
|
||||
- [ ] `metadata`
|
||||
|
||||
#### Supported response fields
|
||||
|
||||
- [x] `id`
|
||||
- [x] `type`
|
||||
- [x] `role`
|
||||
- [x] `model`
|
||||
- [x] `content` (text, tool_use, thinking blocks)
|
||||
- [x] `stop_reason` (end_turn, max_tokens, tool_use)
|
||||
- [x] `usage` (input_tokens, output_tokens)
|
||||
|
||||
#### Streaming events
|
||||
|
||||
- [x] `message_start`
|
||||
- [x] `content_block_start`
|
||||
- [x] `content_block_delta` (text_delta, input_json_delta, thinking_delta)
|
||||
- [x] `content_block_stop`
|
||||
- [x] `message_delta`
|
||||
- [x] `message_stop`
|
||||
- [x] `ping`
|
||||
- [x] `error`
|
||||
|
||||
## Models
|
||||
|
||||
Ollama supports both local and cloud models.
|
||||
|
||||
### Local models
|
||||
|
||||
Pull a local model before use:
|
||||
|
||||
```shell
|
||||
ollama pull qwen3-coder
|
||||
```
|
||||
|
||||
Recommended local models:
|
||||
- `qwen3-coder` - Excellent for coding tasks
|
||||
- `gpt-oss:20b` - Strong general-purpose model
|
||||
|
||||
### Cloud models
|
||||
|
||||
Cloud models are available immediately without pulling:
|
||||
|
||||
- `glm-4.7:cloud` - High-performance cloud model
|
||||
- `minimax-m2.1:cloud` - Fast cloud model
|
||||
|
||||
### Default model names
|
||||
|
||||
For tooling that relies on default Anthropic model names such as `claude-3-5-sonnet`, use `ollama cp` to copy an existing model name:
|
||||
|
||||
```shell
|
||||
ollama cp qwen3-coder claude-3-5-sonnet
|
||||
```
|
||||
|
||||
Afterwards, this new model name can be specified in the `model` field:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "claude-3-5-sonnet",
|
||||
"max_tokens": 1024,
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello!"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
## Differences from the Anthropic API
|
||||
|
||||
### Behavior differences
|
||||
|
||||
- API key is accepted but not validated
|
||||
- `anthropic-version` header is accepted but not used
|
||||
- Token counts are approximations based on the underlying model's tokenizer
|
||||
|
||||
### Not supported
|
||||
|
||||
The following Anthropic API features are not currently supported:
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `/v1/messages/count_tokens` | Token counting endpoint |
|
||||
| `tool_choice` | Forcing specific tool use or disabling tools |
|
||||
| `metadata` | Request metadata (user_id) |
|
||||
| Prompt caching | `cache_control` blocks for caching prefixes |
|
||||
| Batches API | `/v1/messages/batches` for async batch processing |
|
||||
| Citations | `citations` content blocks |
|
||||
| PDF support | `document` content blocks with PDF files |
|
||||
| Server-sent errors | `error` events during streaming (errors return HTTP status) |
|
||||
|
||||
### Partial support
|
||||
|
||||
| Feature | Status |
|
||||
|---------|--------|
|
||||
| Image content | Base64 images supported; URL images not supported |
|
||||
| Extended thinking | Basic support; `budget_tokens` accepted but not enforced |
|
||||
63
docs/api/authentication.mdx
Normal file
63
docs/api/authentication.mdx
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: Authentication
|
||||
---
|
||||
|
||||
No authentication is required when accessing Ollama's API locally via `http://localhost:11434`.
|
||||
|
||||
Authentication is required for the following:
|
||||
|
||||
* Running cloud models via ollama.com
|
||||
* Publishing models
|
||||
* Downloading private models
|
||||
|
||||
Ollama supports two authentication methods:
|
||||
|
||||
* **Signing in**: sign in from your local installation, and Ollama will automatically take care of authenticating requests to ollama.com when running commands
|
||||
* **API keys**: API keys for programmatic access to ollama.com's API
|
||||
|
||||
## Signing in
|
||||
|
||||
To sign in to ollama.com from your local installation of Ollama, run:
|
||||
|
||||
```
|
||||
ollama signin
|
||||
```
|
||||
|
||||
Once signed in, Ollama will automatically authenticate commands as required:
|
||||
|
||||
```
|
||||
ollama run gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
Similarly, when accessing a local API endpoint that requires cloud access, Ollama will automatically authenticate the request:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "gpt-oss:120b-cloud",
|
||||
"prompt": "Why is the sky blue?"
|
||||
}'
|
||||
```
|
||||
|
||||
## API keys
|
||||
|
||||
For direct access to ollama.com's API served at `https://ollama.com/api`, authentication via API keys is required.
|
||||
|
||||
First, create an [API key](https://ollama.com/settings/keys), then set the `OLLAMA_API_KEY` environment variable:
|
||||
|
||||
```shell
|
||||
export OLLAMA_API_KEY=your_api_key
|
||||
```
|
||||
|
||||
Then use the API key in the Authorization header:
|
||||
|
||||
```shell
|
||||
curl https://ollama.com/api/generate \
|
||||
-H "Authorization: Bearer $OLLAMA_API_KEY" \
|
||||
-d '{
|
||||
"model": "gpt-oss:120b",
|
||||
"prompt": "Why is the sky blue?",
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
API keys don't currently expire, however you can revoke them at any time in your [API keys settings](https://ollama.com/settings/keys).
|
||||
36
docs/api/errors.mdx
Normal file
36
docs/api/errors.mdx
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Errors
|
||||
---
|
||||
|
||||
## Status codes
|
||||
|
||||
Endpoints return appropriate HTTP status codes based on the success or failure of the request in the HTTP status line (e.g. `HTTP/1.1 200 OK` or `HTTP/1.1 400 Bad Request`). Common status codes are:
|
||||
|
||||
- `200`: Success
|
||||
- `400`: Bad Request (missing parameters, invalid JSON, etc.)
|
||||
- `404`: Not Found (model doesn't exist, etc.)
|
||||
- `429`: Too Many Requests (e.g. when a rate limit is exceeded)
|
||||
- `500`: Internal Server Error
|
||||
- `502`: Bad Gateway (e.g. when a cloud model cannot be reached)
|
||||
|
||||
## Error messages
|
||||
|
||||
Errors are returned in the `application/json` format with the following structure, with the error message in the `error` property:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "the model failed to generate a response"
|
||||
}
|
||||
```
|
||||
|
||||
## Errors that occur while streaming
|
||||
|
||||
If an error occurs mid-stream, the error will be returned as an object in the `application/x-ndjson` format with an `error` property. Since the response has already started, the status code of the response will not be changed.
|
||||
|
||||
```json
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.196249Z","response":" Yes","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.207235Z","response":".","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.219166Z","response":"I","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.231094Z","response":"can","done":false}
|
||||
{"error":"an error was encountered while running the model"}
|
||||
```
|
||||
47
docs/api/introduction.mdx
Normal file
47
docs/api/introduction.mdx
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Introduction
|
||||
---
|
||||
|
||||
Ollama's API allows you to run and interact with models programatically.
|
||||
|
||||
## Get started
|
||||
|
||||
If you're just getting started, follow the [quickstart](/quickstart) documentation to get up and running with Ollama's API.
|
||||
|
||||
## Base URL
|
||||
|
||||
After installation, Ollama's API is served by default at:
|
||||
|
||||
```
|
||||
http://localhost:11434/api
|
||||
```
|
||||
|
||||
For running cloud models on **ollama.com**, the same API is available with the following base URL:
|
||||
|
||||
```
|
||||
https://ollama.com/api
|
||||
```
|
||||
|
||||
## Example request
|
||||
|
||||
Once Ollama is running, its API is automatically available and can be accessed via `curl`:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "gemma3",
|
||||
"prompt": "Why is the sky blue?"
|
||||
}'
|
||||
```
|
||||
|
||||
## Libraries
|
||||
|
||||
Ollama has official libraries for Python and JavaScript:
|
||||
|
||||
- [Python](https://github.com/ollama/ollama-python)
|
||||
- [JavaScript](https://github.com/ollama/ollama-js)
|
||||
|
||||
Several community-maintained libraries are available for Ollama. For a full list, see the [Ollama GitHub repository](https://github.com/ollama/ollama?tab=readme-ov-file#libraries-1).
|
||||
|
||||
## Versioning
|
||||
|
||||
Ollama's API isn't strictly versioned, but the API is expected to be stable and backwards compatible. Deprecations are rare and will be announced in the [release notes](https://github.com/ollama/ollama/releases).
|
||||
431
docs/api/openai-compatibility.mdx
Normal file
431
docs/api/openai-compatibility.mdx
Normal file
File diff suppressed because one or more lines are too long
35
docs/api/streaming.mdx
Normal file
35
docs/api/streaming.mdx
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: Streaming
|
||||
---
|
||||
|
||||
Certain API endpoints stream responses by default, such as `/api/generate`. These responses are provided in the newline-delimited JSON format (i.e. the `application/x-ndjson` content type). For example:
|
||||
|
||||
```json
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.097767Z","response":"That","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.109172Z","response":"'","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.121485Z","response":"s","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.132802Z","response":" a","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.143931Z","response":" fantastic","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.155176Z","response":" question","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.166576Z","response":"!","done":true, "done_reason": "stop"}
|
||||
```
|
||||
|
||||
## Disabling streaming
|
||||
|
||||
Streaming can be disabled by providing `{"stream": false}` in the request body for any endpoint that support streaming. This will cause responses to be returned in the `application/json` format instead:
|
||||
|
||||
```json
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.166576Z","response":"That's a fantastic question!","done":true}
|
||||
```
|
||||
|
||||
## When to use streaming vs non-streaming
|
||||
|
||||
**Streaming (default)**:
|
||||
- Real-time response generation
|
||||
- Lower perceived latency
|
||||
- Better for long generations
|
||||
|
||||
**Non-streaming**:
|
||||
- Simpler to process
|
||||
- Better for short responses, or structured outputs
|
||||
- Easier to handle in some applications
|
||||
36
docs/api/usage.mdx
Normal file
36
docs/api/usage.mdx
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Usage
|
||||
---
|
||||
|
||||
Ollama's API responses include metrics that can be used for measuring performance and model usage:
|
||||
|
||||
* `total_duration`: How long the response took to generate
|
||||
* `load_duration`: How long the model took to load
|
||||
* `prompt_eval_count`: How many input tokens were processed
|
||||
* `prompt_eval_duration`: How long it took to evaluate the prompt
|
||||
* `eval_count`: How many output tokens were processes
|
||||
* `eval_duration`: How long it took to generate the output tokens
|
||||
|
||||
All timing values are measured in nanoseconds.
|
||||
|
||||
## Example response
|
||||
|
||||
For endpoints that return usage metrics, the response body will include the usage fields. For example, a non-streaming call to `/api/generate` may return the following response:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "gemma3",
|
||||
"created_at": "2025-10-17T23:14:07.414671Z",
|
||||
"response": "Hello! How can I help you today?",
|
||||
"done": true,
|
||||
"done_reason": "stop",
|
||||
"total_duration": 174560334,
|
||||
"load_duration": 101397084,
|
||||
"prompt_eval_count": 11,
|
||||
"prompt_eval_duration": 13074791,
|
||||
"eval_count": 18,
|
||||
"eval_duration": 52479709
|
||||
}
|
||||
```
|
||||
|
||||
For endpoints that return **streaming responses**, usage fields are included as part of the final chunk, where `done` is `true`.
|
||||
Reference in New Issue
Block a user