ollama source for Momentry Core verification
23
docs/README.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Documentation
|
||||
|
||||
### Getting Started
|
||||
* [Quickstart](https://docs.ollama.com/quickstart)
|
||||
* [Examples](./examples.md)
|
||||
* [Importing models](https://docs.ollama.com/import)
|
||||
* [MacOS Documentation](https://docs.ollama.com/macos)
|
||||
* [Linux Documentation](https://docs.ollama.com/linux)
|
||||
* [Windows Documentation](https://docs.ollama.com/windows)
|
||||
* [Docker Documentation](https://docs.ollama.com/docker)
|
||||
|
||||
### Reference
|
||||
|
||||
* [API Reference](https://docs.ollama.com/api)
|
||||
* [Modelfile Reference](https://docs.ollama.com/modelfile)
|
||||
* [OpenAI Compatibility](https://docs.ollama.com/api/openai-compatibility)
|
||||
* [Anthropic Compatibility](./api/anthropic-compatibility.mdx)
|
||||
|
||||
### Resources
|
||||
|
||||
* [Troubleshooting Guide](https://docs.ollama.com/troubleshooting)
|
||||
* [FAQ](https://docs.ollama.com/faq#faq)
|
||||
* [Development guide](./development.md)
|
||||
1931
docs/api.md
Normal file
421
docs/api/anthropic-compatibility.mdx
Normal file
@@ -0,0 +1,421 @@
|
||||
---
|
||||
title: Anthropic compatibility
|
||||
---
|
||||
|
||||
Ollama provides compatibility with the [Anthropic Messages API](https://docs.anthropic.com/en/api/messages) to help connect existing applications to Ollama, including tools like Claude Code.
|
||||
|
||||
## Usage
|
||||
|
||||
### Environment variables
|
||||
|
||||
To use Ollama with tools that expect the Anthropic API (like Claude Code), set these environment variables:
|
||||
|
||||
```shell
|
||||
export ANTHROPIC_AUTH_TOKEN=ollama # required but ignored
|
||||
export ANTHROPIC_BASE_URL=http://localhost:11434
|
||||
```
|
||||
|
||||
### Simple `/v1/messages` example
|
||||
|
||||
<CodeGroup dropdown>
|
||||
|
||||
```python basic.py
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url='http://localhost:11434',
|
||||
api_key='ollama', # required but ignored
|
||||
)
|
||||
|
||||
message = client.messages.create(
|
||||
model='qwen3-coder',
|
||||
max_tokens=1024,
|
||||
messages=[
|
||||
{'role': 'user', 'content': 'Hello, how are you?'}
|
||||
]
|
||||
)
|
||||
print(message.content[0].text)
|
||||
```
|
||||
|
||||
```javascript basic.js
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:11434",
|
||||
apiKey: "ollama", // required but ignored
|
||||
});
|
||||
|
||||
const message = await anthropic.messages.create({
|
||||
model: "qwen3-coder",
|
||||
max_tokens: 1024,
|
||||
messages: [{ role: "user", content: "Hello, how are you?" }],
|
||||
});
|
||||
|
||||
console.log(message.content[0].text);
|
||||
```
|
||||
|
||||
```shell basic.sh
|
||||
curl -X POST http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: ollama" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
-d '{
|
||||
"model": "qwen3-coder",
|
||||
"max_tokens": 1024,
|
||||
"messages": [{ "role": "user", "content": "Hello, how are you?" }]
|
||||
}'
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
### Streaming example
|
||||
|
||||
<CodeGroup dropdown>
|
||||
|
||||
```python streaming.py
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url='http://localhost:11434',
|
||||
api_key='ollama',
|
||||
)
|
||||
|
||||
with client.messages.stream(
|
||||
model='qwen3-coder',
|
||||
max_tokens=1024,
|
||||
messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
|
||||
) as stream:
|
||||
for text in stream.text_stream:
|
||||
print(text, end='', flush=True)
|
||||
```
|
||||
|
||||
```javascript streaming.js
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:11434",
|
||||
apiKey: "ollama",
|
||||
});
|
||||
|
||||
const stream = await anthropic.messages.stream({
|
||||
model: "qwen3-coder",
|
||||
max_tokens: 1024,
|
||||
messages: [{ role: "user", content: "Count from 1 to 10" }],
|
||||
});
|
||||
|
||||
for await (const event of stream) {
|
||||
if (
|
||||
event.type === "content_block_delta" &&
|
||||
event.delta.type === "text_delta"
|
||||
) {
|
||||
process.stdout.write(event.delta.text);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```shell streaming.sh
|
||||
curl -X POST http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "qwen3-coder",
|
||||
"max_tokens": 1024,
|
||||
"stream": true,
|
||||
"messages": [{ "role": "user", "content": "Count from 1 to 10" }]
|
||||
}'
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
### Tool calling example
|
||||
|
||||
<CodeGroup dropdown>
|
||||
|
||||
```python tools.py
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url='http://localhost:11434',
|
||||
api_key='ollama',
|
||||
)
|
||||
|
||||
message = client.messages.create(
|
||||
model='qwen3-coder',
|
||||
max_tokens=1024,
|
||||
tools=[
|
||||
{
|
||||
'name': 'get_weather',
|
||||
'description': 'Get the current weather in a location',
|
||||
'input_schema': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'location': {
|
||||
'type': 'string',
|
||||
'description': 'The city and state, e.g. San Francisco, CA'
|
||||
}
|
||||
},
|
||||
'required': ['location']
|
||||
}
|
||||
}
|
||||
],
|
||||
messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
|
||||
)
|
||||
|
||||
for block in message.content:
|
||||
if block.type == 'tool_use':
|
||||
print(f'Tool: {block.name}')
|
||||
print(f'Input: {block.input}')
|
||||
```
|
||||
|
||||
```javascript tools.js
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:11434",
|
||||
apiKey: "ollama",
|
||||
});
|
||||
|
||||
const message = await anthropic.messages.create({
|
||||
model: "qwen3-coder",
|
||||
max_tokens: 1024,
|
||||
tools: [
|
||||
{
|
||||
name: "get_weather",
|
||||
description: "Get the current weather in a location",
|
||||
input_schema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
location: {
|
||||
type: "string",
|
||||
description: "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
},
|
||||
required: ["location"],
|
||||
},
|
||||
},
|
||||
],
|
||||
messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
|
||||
});
|
||||
|
||||
for (const block of message.content) {
|
||||
if (block.type === "tool_use") {
|
||||
console.log("Tool:", block.name);
|
||||
console.log("Input:", block.input);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```shell tools.sh
|
||||
curl -X POST http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "qwen3-coder",
|
||||
"max_tokens": 1024,
|
||||
"tools": [
|
||||
{
|
||||
"name": "get_weather",
|
||||
"description": "Get the current weather in a location",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state"
|
||||
}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }]
|
||||
}'
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
[Claude Code](https://code.claude.com/docs/en/overview) can be configured to use Ollama as its backend.
|
||||
|
||||
### Recommended models
|
||||
|
||||
For coding use cases, models like `glm-4.7`, `minimax-m2.1`, and `qwen3-coder` are recommended.
|
||||
|
||||
Download a model before use:
|
||||
|
||||
```shell
|
||||
ollama pull qwen3-coder
|
||||
```
|
||||
> Note: Qwen 3 coder is a 30B parameter model requiring at least 24GB of VRAM to run smoothly. More is required for longer context lengths.
|
||||
|
||||
```shell
|
||||
ollama pull glm-4.7:cloud
|
||||
```
|
||||
|
||||
### Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch claude
|
||||
```
|
||||
|
||||
This will prompt you to select a model, configure Claude Code automatically, and launch it. To configure without launching:
|
||||
|
||||
```shell
|
||||
ollama launch claude --config
|
||||
```
|
||||
|
||||
### Manual setup
|
||||
|
||||
Set the environment variables and run Claude Code:
|
||||
|
||||
```shell
|
||||
ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3-coder
|
||||
```
|
||||
|
||||
Or set the environment variables in your shell profile:
|
||||
|
||||
```shell
|
||||
export ANTHROPIC_AUTH_TOKEN=ollama
|
||||
export ANTHROPIC_BASE_URL=http://localhost:11434
|
||||
```
|
||||
|
||||
Then run Claude Code with any Ollama model:
|
||||
|
||||
```shell
|
||||
claude --model qwen3-coder
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
### `/v1/messages`
|
||||
|
||||
#### Supported features
|
||||
|
||||
- [x] Messages
|
||||
- [x] Streaming
|
||||
- [x] System prompts
|
||||
- [x] Multi-turn conversations
|
||||
- [x] Vision (images)
|
||||
- [x] Tools (function calling)
|
||||
- [x] Tool results
|
||||
- [x] Thinking/extended thinking
|
||||
|
||||
#### Supported request fields
|
||||
|
||||
- [x] `model`
|
||||
- [x] `max_tokens`
|
||||
- [x] `messages`
|
||||
- [x] Text `content`
|
||||
- [x] Image `content` (base64)
|
||||
- [x] Array of content blocks
|
||||
- [x] `tool_use` blocks
|
||||
- [x] `tool_result` blocks
|
||||
- [x] `thinking` blocks
|
||||
- [x] `system` (string or array)
|
||||
- [x] `stream`
|
||||
- [x] `temperature`
|
||||
- [x] `top_p`
|
||||
- [x] `top_k`
|
||||
- [x] `stop_sequences`
|
||||
- [x] `tools`
|
||||
- [x] `thinking`
|
||||
- [ ] `tool_choice`
|
||||
- [ ] `metadata`
|
||||
|
||||
#### Supported response fields
|
||||
|
||||
- [x] `id`
|
||||
- [x] `type`
|
||||
- [x] `role`
|
||||
- [x] `model`
|
||||
- [x] `content` (text, tool_use, thinking blocks)
|
||||
- [x] `stop_reason` (end_turn, max_tokens, tool_use)
|
||||
- [x] `usage` (input_tokens, output_tokens)
|
||||
|
||||
#### Streaming events
|
||||
|
||||
- [x] `message_start`
|
||||
- [x] `content_block_start`
|
||||
- [x] `content_block_delta` (text_delta, input_json_delta, thinking_delta)
|
||||
- [x] `content_block_stop`
|
||||
- [x] `message_delta`
|
||||
- [x] `message_stop`
|
||||
- [x] `ping`
|
||||
- [x] `error`
|
||||
|
||||
## Models
|
||||
|
||||
Ollama supports both local and cloud models.
|
||||
|
||||
### Local models
|
||||
|
||||
Pull a local model before use:
|
||||
|
||||
```shell
|
||||
ollama pull qwen3-coder
|
||||
```
|
||||
|
||||
Recommended local models:
|
||||
- `qwen3-coder` - Excellent for coding tasks
|
||||
- `gpt-oss:20b` - Strong general-purpose model
|
||||
|
||||
### Cloud models
|
||||
|
||||
Cloud models are available immediately without pulling:
|
||||
|
||||
- `glm-4.7:cloud` - High-performance cloud model
|
||||
- `minimax-m2.1:cloud` - Fast cloud model
|
||||
|
||||
### Default model names
|
||||
|
||||
For tooling that relies on default Anthropic model names such as `claude-3-5-sonnet`, use `ollama cp` to copy an existing model name:
|
||||
|
||||
```shell
|
||||
ollama cp qwen3-coder claude-3-5-sonnet
|
||||
```
|
||||
|
||||
Afterwards, this new model name can be specified in the `model` field:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "claude-3-5-sonnet",
|
||||
"max_tokens": 1024,
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello!"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
## Differences from the Anthropic API
|
||||
|
||||
### Behavior differences
|
||||
|
||||
- API key is accepted but not validated
|
||||
- `anthropic-version` header is accepted but not used
|
||||
- Token counts are approximations based on the underlying model's tokenizer
|
||||
|
||||
### Not supported
|
||||
|
||||
The following Anthropic API features are not currently supported:
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `/v1/messages/count_tokens` | Token counting endpoint |
|
||||
| `tool_choice` | Forcing specific tool use or disabling tools |
|
||||
| `metadata` | Request metadata (user_id) |
|
||||
| Prompt caching | `cache_control` blocks for caching prefixes |
|
||||
| Batches API | `/v1/messages/batches` for async batch processing |
|
||||
| Citations | `citations` content blocks |
|
||||
| PDF support | `document` content blocks with PDF files |
|
||||
| Server-sent errors | `error` events during streaming (errors return HTTP status) |
|
||||
|
||||
### Partial support
|
||||
|
||||
| Feature | Status |
|
||||
|---------|--------|
|
||||
| Image content | Base64 images supported; URL images not supported |
|
||||
| Extended thinking | Basic support; `budget_tokens` accepted but not enforced |
|
||||
63
docs/api/authentication.mdx
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: Authentication
|
||||
---
|
||||
|
||||
No authentication is required when accessing Ollama's API locally via `http://localhost:11434`.
|
||||
|
||||
Authentication is required for the following:
|
||||
|
||||
* Running cloud models via ollama.com
|
||||
* Publishing models
|
||||
* Downloading private models
|
||||
|
||||
Ollama supports two authentication methods:
|
||||
|
||||
* **Signing in**: sign in from your local installation, and Ollama will automatically take care of authenticating requests to ollama.com when running commands
|
||||
* **API keys**: API keys for programmatic access to ollama.com's API
|
||||
|
||||
## Signing in
|
||||
|
||||
To sign in to ollama.com from your local installation of Ollama, run:
|
||||
|
||||
```
|
||||
ollama signin
|
||||
```
|
||||
|
||||
Once signed in, Ollama will automatically authenticate commands as required:
|
||||
|
||||
```
|
||||
ollama run gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
Similarly, when accessing a local API endpoint that requires cloud access, Ollama will automatically authenticate the request:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "gpt-oss:120b-cloud",
|
||||
"prompt": "Why is the sky blue?"
|
||||
}'
|
||||
```
|
||||
|
||||
## API keys
|
||||
|
||||
For direct access to ollama.com's API served at `https://ollama.com/api`, authentication via API keys is required.
|
||||
|
||||
First, create an [API key](https://ollama.com/settings/keys), then set the `OLLAMA_API_KEY` environment variable:
|
||||
|
||||
```shell
|
||||
export OLLAMA_API_KEY=your_api_key
|
||||
```
|
||||
|
||||
Then use the API key in the Authorization header:
|
||||
|
||||
```shell
|
||||
curl https://ollama.com/api/generate \
|
||||
-H "Authorization: Bearer $OLLAMA_API_KEY" \
|
||||
-d '{
|
||||
"model": "gpt-oss:120b",
|
||||
"prompt": "Why is the sky blue?",
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
API keys don't currently expire, however you can revoke them at any time in your [API keys settings](https://ollama.com/settings/keys).
|
||||
36
docs/api/errors.mdx
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Errors
|
||||
---
|
||||
|
||||
## Status codes
|
||||
|
||||
Endpoints return appropriate HTTP status codes based on the success or failure of the request in the HTTP status line (e.g. `HTTP/1.1 200 OK` or `HTTP/1.1 400 Bad Request`). Common status codes are:
|
||||
|
||||
- `200`: Success
|
||||
- `400`: Bad Request (missing parameters, invalid JSON, etc.)
|
||||
- `404`: Not Found (model doesn't exist, etc.)
|
||||
- `429`: Too Many Requests (e.g. when a rate limit is exceeded)
|
||||
- `500`: Internal Server Error
|
||||
- `502`: Bad Gateway (e.g. when a cloud model cannot be reached)
|
||||
|
||||
## Error messages
|
||||
|
||||
Errors are returned in the `application/json` format with the following structure, with the error message in the `error` property:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "the model failed to generate a response"
|
||||
}
|
||||
```
|
||||
|
||||
## Errors that occur while streaming
|
||||
|
||||
If an error occurs mid-stream, the error will be returned as an object in the `application/x-ndjson` format with an `error` property. Since the response has already started, the status code of the response will not be changed.
|
||||
|
||||
```json
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.196249Z","response":" Yes","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.207235Z","response":".","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.219166Z","response":"I","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:21:21.231094Z","response":"can","done":false}
|
||||
{"error":"an error was encountered while running the model"}
|
||||
```
|
||||
47
docs/api/introduction.mdx
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Introduction
|
||||
---
|
||||
|
||||
Ollama's API allows you to run and interact with models programatically.
|
||||
|
||||
## Get started
|
||||
|
||||
If you're just getting started, follow the [quickstart](/quickstart) documentation to get up and running with Ollama's API.
|
||||
|
||||
## Base URL
|
||||
|
||||
After installation, Ollama's API is served by default at:
|
||||
|
||||
```
|
||||
http://localhost:11434/api
|
||||
```
|
||||
|
||||
For running cloud models on **ollama.com**, the same API is available with the following base URL:
|
||||
|
||||
```
|
||||
https://ollama.com/api
|
||||
```
|
||||
|
||||
## Example request
|
||||
|
||||
Once Ollama is running, its API is automatically available and can be accessed via `curl`:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "gemma3",
|
||||
"prompt": "Why is the sky blue?"
|
||||
}'
|
||||
```
|
||||
|
||||
## Libraries
|
||||
|
||||
Ollama has official libraries for Python and JavaScript:
|
||||
|
||||
- [Python](https://github.com/ollama/ollama-python)
|
||||
- [JavaScript](https://github.com/ollama/ollama-js)
|
||||
|
||||
Several community-maintained libraries are available for Ollama. For a full list, see the [Ollama GitHub repository](https://github.com/ollama/ollama?tab=readme-ov-file#libraries-1).
|
||||
|
||||
## Versioning
|
||||
|
||||
Ollama's API isn't strictly versioned, but the API is expected to be stable and backwards compatible. Deprecations are rare and will be announced in the [release notes](https://github.com/ollama/ollama/releases).
|
||||
431
docs/api/openai-compatibility.mdx
Normal file
35
docs/api/streaming.mdx
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: Streaming
|
||||
---
|
||||
|
||||
Certain API endpoints stream responses by default, such as `/api/generate`. These responses are provided in the newline-delimited JSON format (i.e. the `application/x-ndjson` content type). For example:
|
||||
|
||||
```json
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.097767Z","response":"That","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.109172Z","response":"'","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.121485Z","response":"s","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.132802Z","response":" a","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.143931Z","response":" fantastic","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.155176Z","response":" question","done":false}
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.166576Z","response":"!","done":true, "done_reason": "stop"}
|
||||
```
|
||||
|
||||
## Disabling streaming
|
||||
|
||||
Streaming can be disabled by providing `{"stream": false}` in the request body for any endpoint that support streaming. This will cause responses to be returned in the `application/json` format instead:
|
||||
|
||||
```json
|
||||
{"model":"gemma3","created_at":"2025-10-26T17:15:24.166576Z","response":"That's a fantastic question!","done":true}
|
||||
```
|
||||
|
||||
## When to use streaming vs non-streaming
|
||||
|
||||
**Streaming (default)**:
|
||||
- Real-time response generation
|
||||
- Lower perceived latency
|
||||
- Better for long generations
|
||||
|
||||
**Non-streaming**:
|
||||
- Simpler to process
|
||||
- Better for short responses, or structured outputs
|
||||
- Easier to handle in some applications
|
||||
36
docs/api/usage.mdx
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Usage
|
||||
---
|
||||
|
||||
Ollama's API responses include metrics that can be used for measuring performance and model usage:
|
||||
|
||||
* `total_duration`: How long the response took to generate
|
||||
* `load_duration`: How long the model took to load
|
||||
* `prompt_eval_count`: How many input tokens were processed
|
||||
* `prompt_eval_duration`: How long it took to evaluate the prompt
|
||||
* `eval_count`: How many output tokens were processes
|
||||
* `eval_duration`: How long it took to generate the output tokens
|
||||
|
||||
All timing values are measured in nanoseconds.
|
||||
|
||||
## Example response
|
||||
|
||||
For endpoints that return usage metrics, the response body will include the usage fields. For example, a non-streaming call to `/api/generate` may return the following response:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "gemma3",
|
||||
"created_at": "2025-10-17T23:14:07.414671Z",
|
||||
"response": "Hello! How can I help you today?",
|
||||
"done": true,
|
||||
"done_reason": "stop",
|
||||
"total_duration": 174560334,
|
||||
"load_duration": 101397084,
|
||||
"prompt_eval_count": 11,
|
||||
"prompt_eval_duration": 13074791,
|
||||
"eval_count": 18,
|
||||
"eval_duration": 52479709
|
||||
}
|
||||
```
|
||||
|
||||
For endpoints that return **streaming responses**, usage fields are included as part of the final chunk, where `done` is `true`.
|
||||
127
docs/capabilities/embeddings.mdx
Normal file
@@ -0,0 +1,127 @@
|
||||
---
|
||||
title: Embeddings
|
||||
description: Generate text embeddings for semantic search, retrieval, and RAG.
|
||||
---
|
||||
|
||||
Embeddings turn text into numeric vectors you can store in a vector database, search with cosine similarity, or use in RAG pipelines. The vector length depends on the model (typically 384–1024 dimensions).
|
||||
|
||||
## Recommended models
|
||||
|
||||
- [embeddinggemma](https://ollama.com/library/embeddinggemma)
|
||||
- [qwen3-embedding](https://ollama.com/library/qwen3-embedding)
|
||||
- [all-minilm](https://ollama.com/library/all-minilm)
|
||||
|
||||
## Generate embeddings
|
||||
|
||||
<Tabs>
|
||||
<Tab title="CLI">
|
||||
Generate embeddings directly from the command line:
|
||||
|
||||
```shell
|
||||
ollama run embeddinggemma "Hello world"
|
||||
```
|
||||
|
||||
You can also pipe text to generate embeddings:
|
||||
|
||||
```shell
|
||||
echo "Hello world" | ollama run embeddinggemma
|
||||
```
|
||||
|
||||
Output is a JSON array.
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
```shell
|
||||
curl -X POST http://localhost:11434/api/embed \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "embeddinggemma",
|
||||
"input": "The quick brown fox jumps over the lazy dog."
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
import ollama
|
||||
|
||||
single = ollama.embed(
|
||||
model='embeddinggemma',
|
||||
input='The quick brown fox jumps over the lazy dog.'
|
||||
)
|
||||
print(len(single['embeddings'][0])) # vector length
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
const single = await ollama.embed({
|
||||
model: 'embeddinggemma',
|
||||
input: 'The quick brown fox jumps over the lazy dog.',
|
||||
})
|
||||
console.log(single.embeddings[0].length) // vector length
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
The `/api/embed` endpoint returns L2‑normalized (unit‑length) vectors.
|
||||
</Note>
|
||||
|
||||
## Generate a batch of embeddings
|
||||
|
||||
Pass an array of strings to `input`.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
```shell
|
||||
curl -X POST http://localhost:11434/api/embed \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "embeddinggemma",
|
||||
"input": [
|
||||
"First sentence",
|
||||
"Second sentence",
|
||||
"Third sentence"
|
||||
]
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
import ollama
|
||||
|
||||
batch = ollama.embed(
|
||||
model='embeddinggemma',
|
||||
input=[
|
||||
'The quick brown fox jumps over the lazy dog.',
|
||||
'The five boxing wizards jump quickly.',
|
||||
'Jackdaws love my big sphinx of quartz.',
|
||||
]
|
||||
)
|
||||
print(len(batch['embeddings'])) # number of vectors
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
const batch = await ollama.embed({
|
||||
model: 'embeddinggemma',
|
||||
input: [
|
||||
'The quick brown fox jumps over the lazy dog.',
|
||||
'The five boxing wizards jump quickly.',
|
||||
'Jackdaws love my big sphinx of quartz.',
|
||||
],
|
||||
})
|
||||
console.log(batch.embeddings.length) // number of vectors
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Tips
|
||||
|
||||
- Use cosine similarity for most semantic search use cases.
|
||||
- Use the same embedding model for both indexing and querying.
|
||||
|
||||
|
||||
99
docs/capabilities/streaming.mdx
Normal file
@@ -0,0 +1,99 @@
|
||||
---
|
||||
title: Streaming
|
||||
---
|
||||
|
||||
Streaming allows you to render text as it is produced by the model.
|
||||
|
||||
Streaming is enabled by default through the REST API, but disabled by default in the SDKs.
|
||||
|
||||
To enable streaming in the SDKs, set the `stream` parameter to `True`.
|
||||
|
||||
## Key streaming concepts
|
||||
1. Chatting: Stream partial assistant messages. Each chunk includes the `content` so you can render messages as they arrive.
|
||||
1. Thinking: Thinking-capable models emit a `thinking` field alongside regular content in each chunk. Detect this field in streaming chunks to show or hide reasoning traces before the final answer arrives.
|
||||
1. Tool calling: Watch for streamed `tool_calls` in each chunk, execute the requested tool, and append tool outputs back into the conversation.
|
||||
|
||||
## Handling streamed chunks
|
||||
|
||||
|
||||
<Note> It is necessary to accumulate the partial fields in order to maintain the history of the conversation. This is particularly important for tool calling where the thinking, tool call from the model, and the executed tool result must be passed back to the model in the next request. </Note>
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
stream = chat(
|
||||
model='qwen3',
|
||||
messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
|
||||
stream=True,
|
||||
)
|
||||
|
||||
in_thinking = False
|
||||
content = ''
|
||||
thinking = ''
|
||||
for chunk in stream:
|
||||
if chunk.message.thinking:
|
||||
if not in_thinking:
|
||||
in_thinking = True
|
||||
print('Thinking:\n', end='', flush=True)
|
||||
print(chunk.message.thinking, end='', flush=True)
|
||||
# accumulate the partial thinking
|
||||
thinking += chunk.message.thinking
|
||||
elif chunk.message.content:
|
||||
if in_thinking:
|
||||
in_thinking = False
|
||||
print('\n\nAnswer:\n', end='', flush=True)
|
||||
print(chunk.message.content, end='', flush=True)
|
||||
# accumulate the partial content
|
||||
content += chunk.message.content
|
||||
|
||||
# append the accumulated fields to the messages for the next request
|
||||
new_messages = [{ role: 'assistant', thinking: thinking, content: content }]
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
async function main() {
|
||||
const stream = await ollama.chat({
|
||||
model: 'qwen3',
|
||||
messages: [{ role: 'user', content: 'What is 17 × 23?' }],
|
||||
stream: true,
|
||||
})
|
||||
|
||||
let inThinking = false
|
||||
let content = ''
|
||||
let thinking = ''
|
||||
|
||||
for await (const chunk of stream) {
|
||||
if (chunk.message.thinking) {
|
||||
if (!inThinking) {
|
||||
inThinking = true
|
||||
process.stdout.write('Thinking:\n')
|
||||
}
|
||||
process.stdout.write(chunk.message.thinking)
|
||||
// accumulate the partial thinking
|
||||
thinking += chunk.message.thinking
|
||||
} else if (chunk.message.content) {
|
||||
if (inThinking) {
|
||||
inThinking = false
|
||||
process.stdout.write('\n\nAnswer:\n')
|
||||
}
|
||||
process.stdout.write(chunk.message.content)
|
||||
// accumulate the partial content
|
||||
content += chunk.message.content
|
||||
}
|
||||
}
|
||||
|
||||
// append the accumulated fields to the messages for the next request
|
||||
new_messages = [{ role: 'assistant', thinking: thinking, content: content }]
|
||||
}
|
||||
|
||||
main().catch(console.error)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
198
docs/capabilities/structured-outputs.mdx
Normal file
@@ -0,0 +1,198 @@
|
||||
---
|
||||
title: Structured Outputs
|
||||
---
|
||||
|
||||
<Note>
|
||||
Ollama's Cloud currently does not support structured outputs.
|
||||
</Note>
|
||||
|
||||
Structured outputs let you enforce a JSON schema on model responses so you can reliably extract structured data, describe images, or keep every reply consistent.
|
||||
|
||||
## Generating structured JSON
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
```shell
|
||||
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
|
||||
"model": "gpt-oss",
|
||||
"messages": [{"role": "user", "content": "Tell me about Canada in one line"}],
|
||||
"stream": false,
|
||||
"format": "json"
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
response = chat(
|
||||
model='gpt-oss',
|
||||
messages=[{'role': 'user', 'content': 'Tell me about Canada.'}],
|
||||
format='json'
|
||||
)
|
||||
print(response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: 'gpt-oss',
|
||||
messages: [{ role: 'user', content: 'Tell me about Canada.' }],
|
||||
format: 'json'
|
||||
})
|
||||
console.log(response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Generating structured JSON with a schema
|
||||
|
||||
Provide a JSON schema to the `format` field.
|
||||
|
||||
<Note>
|
||||
It is ideal to also pass the JSON schema as a string in the prompt to ground the model's response.
|
||||
</Note>
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
```shell
|
||||
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
|
||||
"model": "gpt-oss",
|
||||
"messages": [{"role": "user", "content": "Tell me about Canada."}],
|
||||
"stream": false,
|
||||
"format": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string"},
|
||||
"capital": {"type": "string"},
|
||||
"languages": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"required": ["name", "capital", "languages"]
|
||||
}
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
Use Pydantic models and pass `model_json_schema()` to `format`, then validate the response:
|
||||
|
||||
```python
|
||||
from ollama import chat
|
||||
from pydantic import BaseModel
|
||||
|
||||
class Country(BaseModel):
|
||||
name: str
|
||||
capital: str
|
||||
languages: list[str]
|
||||
|
||||
response = chat(
|
||||
model='gpt-oss',
|
||||
messages=[{'role': 'user', 'content': 'Tell me about Canada.'}],
|
||||
format=Country.model_json_schema(),
|
||||
)
|
||||
|
||||
country = Country.model_validate_json(response.message.content)
|
||||
print(country)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
Serialize a Zod schema with `zodToJsonSchema()` and parse the structured response:
|
||||
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
import { z } from 'zod'
|
||||
import { zodToJsonSchema } from 'zod-to-json-schema'
|
||||
|
||||
const Country = z.object({
|
||||
name: z.string(),
|
||||
capital: z.string(),
|
||||
languages: z.array(z.string()),
|
||||
})
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: 'gpt-oss',
|
||||
messages: [{ role: 'user', content: 'Tell me about Canada.' }],
|
||||
format: zodToJsonSchema(Country),
|
||||
})
|
||||
|
||||
const country = Country.parse(JSON.parse(response.message.content))
|
||||
console.log(country)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Example: Extract structured data
|
||||
|
||||
Define the objects you want returned and let the model populate the fields:
|
||||
|
||||
```python
|
||||
from ollama import chat
|
||||
from pydantic import BaseModel
|
||||
|
||||
class Pet(BaseModel):
|
||||
name: str
|
||||
animal: str
|
||||
age: int
|
||||
color: str | None
|
||||
favorite_toy: str | None
|
||||
|
||||
class PetList(BaseModel):
|
||||
pets: list[Pet]
|
||||
|
||||
response = chat(
|
||||
model='gpt-oss',
|
||||
messages=[{'role': 'user', 'content': 'I have two cats named Luna and Loki...'}],
|
||||
format=PetList.model_json_schema(),
|
||||
)
|
||||
|
||||
pets = PetList.model_validate_json(response.message.content)
|
||||
print(pets)
|
||||
```
|
||||
|
||||
## Example: Vision with structured outputs
|
||||
|
||||
Vision models accept the same `format` parameter, enabling deterministic descriptions of images:
|
||||
|
||||
```python
|
||||
from ollama import chat
|
||||
from pydantic import BaseModel
|
||||
from typing import Literal, Optional
|
||||
|
||||
class Object(BaseModel):
|
||||
name: str
|
||||
confidence: float
|
||||
attributes: str
|
||||
|
||||
class ImageDescription(BaseModel):
|
||||
summary: str
|
||||
objects: list[Object]
|
||||
scene: str
|
||||
colors: list[str]
|
||||
time_of_day: Literal['Morning', 'Afternoon', 'Evening', 'Night']
|
||||
setting: Literal['Indoor', 'Outdoor', 'Unknown']
|
||||
text_content: Optional[str] = None
|
||||
|
||||
response = chat(
|
||||
model='gemma3',
|
||||
messages=[{
|
||||
'role': 'user',
|
||||
'content': 'Describe this photo and list the objects you detect.',
|
||||
'images': ['path/to/image.jpg'],
|
||||
}],
|
||||
format=ImageDescription.model_json_schema(),
|
||||
options={'temperature': 0},
|
||||
)
|
||||
|
||||
image_description = ImageDescription.model_validate_json(response.message.content)
|
||||
print(image_description)
|
||||
```
|
||||
|
||||
## Tips for reliable structured outputs
|
||||
|
||||
- Define schemas with Pydantic (Python) or Zod (JavaScript) so they can be reused for validation.
|
||||
- Lower the temperature (e.g., set it to `0`) for more deterministic completions.
|
||||
- Structured outputs work through the OpenAI-compatible API via `response_format`
|
||||
153
docs/capabilities/thinking.mdx
Normal file
@@ -0,0 +1,153 @@
|
||||
---
|
||||
title: Thinking
|
||||
---
|
||||
|
||||
Thinking-capable models emit a `thinking` field that separates their reasoning trace from the final answer.
|
||||
|
||||
Use this capability to audit model steps, animate the model *thinking* in a UI, or hide the trace entirely when you only need the final response.
|
||||
|
||||
## Supported models
|
||||
|
||||
- [Qwen 3](https://ollama.com/library/qwen3)
|
||||
- [GPT-OSS](https://ollama.com/library/gpt-oss) *(use `think` levels: `low`, `medium`, `high` — the trace cannot be fully disabled)*
|
||||
- [DeepSeek-v3.1](https://ollama.com/library/deepseek-v3.1)
|
||||
- [DeepSeek R1](https://ollama.com/library/deepseek-r1)
|
||||
- Browse the latest additions under [thinking models](https://ollama.com/search?c=thinking)
|
||||
|
||||
## Enable thinking in API calls
|
||||
|
||||
Set the `think` field on chat or generate requests. Most models accept booleans (`true`/`false`).
|
||||
|
||||
GPT-OSS instead expects one of `low`, `medium`, or `high` to tune the trace length.
|
||||
|
||||
The `message.thinking` (chat endpoint) or `thinking` (generate endpoint) field contains the reasoning trace while `message.content` / `response` holds the final answer.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
```shell
|
||||
curl http://localhost:11434/api/chat -d '{
|
||||
"model": "qwen3",
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": "How many letter r are in strawberry?"
|
||||
}],
|
||||
"think": true,
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
response = chat(
|
||||
model='qwen3',
|
||||
messages=[{'role': 'user', 'content': 'How many letter r are in strawberry?'}],
|
||||
think=True,
|
||||
stream=False,
|
||||
)
|
||||
|
||||
print('Thinking:\n', response.message.thinking)
|
||||
print('Answer:\n', response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: 'deepseek-r1',
|
||||
messages: [{ role: 'user', content: 'How many letter r are in strawberry?' }],
|
||||
think: true,
|
||||
stream: false,
|
||||
})
|
||||
|
||||
console.log('Thinking:\n', response.message.thinking)
|
||||
console.log('Answer:\n', response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
GPT-OSS requires `think` to be set to `"low"`, `"medium"`, or `"high"`. Passing `true`/`false` is ignored for that model.
|
||||
</Note>
|
||||
|
||||
## Stream the reasoning trace
|
||||
|
||||
Thinking streams interleave reasoning tokens before answer tokens. Detect the first `thinking` chunk to render a "thinking" section, then switch to the final reply once `message.content` arrives.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
stream = chat(
|
||||
model='qwen3',
|
||||
messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
|
||||
think=True,
|
||||
stream=True,
|
||||
)
|
||||
|
||||
in_thinking = False
|
||||
|
||||
for chunk in stream:
|
||||
if chunk.message.thinking and not in_thinking:
|
||||
in_thinking = True
|
||||
print('Thinking:\n', end='')
|
||||
|
||||
if chunk.message.thinking:
|
||||
print(chunk.message.thinking, end='')
|
||||
elif chunk.message.content:
|
||||
if in_thinking:
|
||||
print('\n\nAnswer:\n', end='')
|
||||
in_thinking = False
|
||||
print(chunk.message.content, end='')
|
||||
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
async function main() {
|
||||
const stream = await ollama.chat({
|
||||
model: 'qwen3',
|
||||
messages: [{ role: 'user', content: 'What is 17 × 23?' }],
|
||||
think: true,
|
||||
stream: true,
|
||||
})
|
||||
|
||||
let inThinking = false
|
||||
|
||||
for await (const chunk of stream) {
|
||||
if (chunk.message.thinking && !inThinking) {
|
||||
inThinking = true
|
||||
process.stdout.write('Thinking:\n')
|
||||
}
|
||||
|
||||
if (chunk.message.thinking) {
|
||||
process.stdout.write(chunk.message.thinking)
|
||||
} else if (chunk.message.content) {
|
||||
if (inThinking) {
|
||||
process.stdout.write('\n\nAnswer:\n')
|
||||
inThinking = false
|
||||
}
|
||||
process.stdout.write(chunk.message.content)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
main()
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## CLI quick reference
|
||||
|
||||
- Enable thinking for a single run: `ollama run deepseek-r1 --think "Where should I visit in Lisbon?"`
|
||||
- Disable thinking: `ollama run deepseek-r1 --think=false "Summarize this article"`
|
||||
- Hide the trace while still using a thinking model: `ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"`
|
||||
- Inside interactive sessions, toggle with `/set think` or `/set nothink`.
|
||||
- GPT-OSS only accepts levels: `ollama run gpt-oss --think=low "Draft a headline"` (replace `low` with `medium` or `high` as needed).
|
||||
|
||||
<Note>Thinking is enabled by default in the CLI and API for supported models.</Note>
|
||||
777
docs/capabilities/tool-calling.mdx
Normal file
@@ -0,0 +1,777 @@
|
||||
---
|
||||
title: Tool calling
|
||||
---
|
||||
|
||||
Ollama supports tool calling (also known as function calling) which allows a model to invoke tools and incorporate their results into its replies.
|
||||
|
||||
## Calling a single tool
|
||||
Invoke a single tool and include its response in a follow-up request.
|
||||
|
||||
Also known as "single-shot" tool calling.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
|
||||
```shell
|
||||
curl -s http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
|
||||
"model": "qwen3",
|
||||
"messages": [{"role": "user", "content": "What is the temperature in New York?"}],
|
||||
"stream": false,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_temperature",
|
||||
"description": "Get the current temperature for a city",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"required": ["city"],
|
||||
"properties": {
|
||||
"city": {"type": "string", "description": "The name of the city"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
**Generate a response with a single tool result**
|
||||
```shell
|
||||
curl -s http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
|
||||
"model": "qwen3",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the temperature in New York?"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"index": 0,
|
||||
"name": "get_temperature",
|
||||
"arguments": {"city": "New York"}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{"role": "tool", "tool_name": "get_temperature", "content": "22°C"}
|
||||
],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
Install the Ollama Python SDK:
|
||||
```bash
|
||||
# with pip
|
||||
pip install ollama -U
|
||||
|
||||
# with uv
|
||||
uv add ollama
|
||||
```
|
||||
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
def get_temperature(city: str) -> str:
|
||||
"""Get the current temperature for a city
|
||||
|
||||
Args:
|
||||
city: The name of the city
|
||||
|
||||
Returns:
|
||||
The current temperature for the city
|
||||
"""
|
||||
temperatures = {
|
||||
"New York": "22°C",
|
||||
"London": "15°C",
|
||||
"Tokyo": "18°C",
|
||||
}
|
||||
return temperatures.get(city, "Unknown")
|
||||
|
||||
messages = [{"role": "user", "content": "What is the temperature in New York?"}]
|
||||
|
||||
# pass functions directly as tools in the tools list or as a JSON schema
|
||||
response = chat(model="qwen3", messages=messages, tools=[get_temperature], think=True)
|
||||
|
||||
messages.append(response.message)
|
||||
if response.message.tool_calls:
|
||||
# only recommended for models which only return a single tool call
|
||||
call = response.message.tool_calls[0]
|
||||
result = get_temperature(**call.function.arguments)
|
||||
# add the tool result to the messages
|
||||
messages.append({"role": "tool", "tool_name": call.function.name, "content": str(result)})
|
||||
|
||||
final_response = chat(model="qwen3", messages=messages, tools=[get_temperature], think=True)
|
||||
print(final_response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
Install the Ollama JavaScript library:
|
||||
```bash
|
||||
# with npm
|
||||
npm i ollama
|
||||
|
||||
# with bun
|
||||
bun i ollama
|
||||
```
|
||||
|
||||
```typescript
|
||||
import ollama from 'ollama'
|
||||
|
||||
function getTemperature(city: string): string {
|
||||
const temperatures: Record<string, string> = {
|
||||
'New York': '22°C',
|
||||
'London': '15°C',
|
||||
'Tokyo': '18°C',
|
||||
}
|
||||
return temperatures[city] ?? 'Unknown'
|
||||
}
|
||||
|
||||
const tools = [
|
||||
{
|
||||
type: 'function',
|
||||
function: {
|
||||
name: 'get_temperature',
|
||||
description: 'Get the current temperature for a city',
|
||||
parameters: {
|
||||
type: 'object',
|
||||
required: ['city'],
|
||||
properties: {
|
||||
city: { type: 'string', description: 'The name of the city' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
const messages = [{ role: 'user', content: "What is the temperature in New York?" }]
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: 'qwen3',
|
||||
messages,
|
||||
tools,
|
||||
think: true,
|
||||
})
|
||||
|
||||
messages.push(response.message)
|
||||
if (response.message.tool_calls?.length) {
|
||||
// only recommended for models which only return a single tool call
|
||||
const call = response.message.tool_calls[0]
|
||||
const args = call.function.arguments as { city: string }
|
||||
const result = getTemperature(args.city)
|
||||
// add the tool result to the messages
|
||||
messages.push({ role: 'tool', tool_name: call.function.name, content: result })
|
||||
|
||||
// generate the final response
|
||||
const finalResponse = await ollama.chat({ model: 'qwen3', messages, tools, think: true })
|
||||
console.log(finalResponse.message.content)
|
||||
}
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Parallel tool calling
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
Request multiple tool calls in parallel, then send all tool responses back to the model.
|
||||
|
||||
```shell
|
||||
curl -s http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
|
||||
"model": "qwen3",
|
||||
"messages": [{"role": "user", "content": "What are the current weather conditions and temperature in New York and London?"}],
|
||||
"stream": false,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_temperature",
|
||||
"description": "Get the current temperature for a city",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"required": ["city"],
|
||||
"properties": {
|
||||
"city": {"type": "string", "description": "The name of the city"}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_conditions",
|
||||
"description": "Get the current weather conditions for a city",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"required": ["city"],
|
||||
"properties": {
|
||||
"city": {"type": "string", "description": "The name of the city"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
**Generate a response with multiple tool results**
|
||||
```shell
|
||||
curl -s http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
|
||||
"model": "qwen3",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What are the current weather conditions and temperature in New York and London?"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"index": 0,
|
||||
"name": "get_temperature",
|
||||
"arguments": {"city": "New York"}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"index": 1,
|
||||
"name": "get_conditions",
|
||||
"arguments": {"city": "New York"}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"index": 2,
|
||||
"name": "get_temperature",
|
||||
"arguments": {"city": "London"}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"index": 3,
|
||||
"name": "get_conditions",
|
||||
"arguments": {"city": "London"}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{"role": "tool", "tool_name": "get_temperature", "content": "22°C"},
|
||||
{"role": "tool", "tool_name": "get_conditions", "content": "Partly cloudy"},
|
||||
{"role": "tool", "tool_name": "get_temperature", "content": "15°C"},
|
||||
{"role": "tool", "tool_name": "get_conditions", "content": "Rainy"}
|
||||
],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
def get_temperature(city: str) -> str:
|
||||
"""Get the current temperature for a city
|
||||
|
||||
Args:
|
||||
city: The name of the city
|
||||
|
||||
Returns:
|
||||
The current temperature for the city
|
||||
"""
|
||||
temperatures = {
|
||||
"New York": "22°C",
|
||||
"London": "15°C",
|
||||
"Tokyo": "18°C"
|
||||
}
|
||||
return temperatures.get(city, "Unknown")
|
||||
|
||||
def get_conditions(city: str) -> str:
|
||||
"""Get the current weather conditions for a city
|
||||
|
||||
Args:
|
||||
city: The name of the city
|
||||
|
||||
Returns:
|
||||
The current weather conditions for the city
|
||||
"""
|
||||
conditions = {
|
||||
"New York": "Partly cloudy",
|
||||
"London": "Rainy",
|
||||
"Tokyo": "Sunny"
|
||||
}
|
||||
return conditions.get(city, "Unknown")
|
||||
|
||||
|
||||
messages = [{'role': 'user', 'content': 'What are the current weather conditions and temperature in New York and London?'}]
|
||||
|
||||
# The python client automatically parses functions as a tool schema so we can pass them directly
|
||||
# Schemas can be passed directly in the tools list as well
|
||||
response = chat(model='qwen3', messages=messages, tools=[get_temperature, get_conditions], think=True)
|
||||
|
||||
# add the assistant message to the messages
|
||||
messages.append(response.message)
|
||||
if response.message.tool_calls:
|
||||
# process each tool call
|
||||
for call in response.message.tool_calls:
|
||||
# execute the appropriate tool
|
||||
if call.function.name == 'get_temperature':
|
||||
result = get_temperature(**call.function.arguments)
|
||||
elif call.function.name == 'get_conditions':
|
||||
result = get_conditions(**call.function.arguments)
|
||||
else:
|
||||
result = 'Unknown tool'
|
||||
# add the tool result to the messages
|
||||
messages.append({'role': 'tool', 'tool_name': call.function.name, 'content': str(result)})
|
||||
|
||||
# generate the final response
|
||||
final_response = chat(model='qwen3', messages=messages, tools=[get_temperature, get_conditions], think=True)
|
||||
print(final_response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```typescript
|
||||
import ollama from 'ollama'
|
||||
|
||||
function getTemperature(city: string): string {
|
||||
const temperatures: { [key: string]: string } = {
|
||||
"New York": "22°C",
|
||||
"London": "15°C",
|
||||
"Tokyo": "18°C"
|
||||
}
|
||||
return temperatures[city] || "Unknown"
|
||||
}
|
||||
|
||||
function getConditions(city: string): string {
|
||||
const conditions: { [key: string]: string } = {
|
||||
"New York": "Partly cloudy",
|
||||
"London": "Rainy",
|
||||
"Tokyo": "Sunny"
|
||||
}
|
||||
return conditions[city] || "Unknown"
|
||||
}
|
||||
|
||||
const tools = [
|
||||
{
|
||||
type: 'function',
|
||||
function: {
|
||||
name: 'get_temperature',
|
||||
description: 'Get the current temperature for a city',
|
||||
parameters: {
|
||||
type: 'object',
|
||||
required: ['city'],
|
||||
properties: {
|
||||
city: { type: 'string', description: 'The name of the city' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
type: 'function',
|
||||
function: {
|
||||
name: 'get_conditions',
|
||||
description: 'Get the current weather conditions for a city',
|
||||
parameters: {
|
||||
type: 'object',
|
||||
required: ['city'],
|
||||
properties: {
|
||||
city: { type: 'string', description: 'The name of the city' },
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
const messages = [{ role: 'user', content: 'What are the current weather conditions and temperature in New York and London?' }]
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: 'qwen3',
|
||||
messages,
|
||||
tools,
|
||||
think: true
|
||||
})
|
||||
|
||||
// add the assistant message to the messages
|
||||
messages.push(response.message)
|
||||
if (response.message.tool_calls) {
|
||||
// process each tool call
|
||||
for (const call of response.message.tool_calls) {
|
||||
// execute the appropriate tool
|
||||
let result: string
|
||||
if (call.function.name === 'get_temperature') {
|
||||
const args = call.function.arguments as { city: string }
|
||||
result = getTemperature(args.city)
|
||||
} else if (call.function.name === 'get_conditions') {
|
||||
const args = call.function.arguments as { city: string }
|
||||
result = getConditions(args.city)
|
||||
} else {
|
||||
result = 'Unknown tool'
|
||||
}
|
||||
// add the tool result to the messages
|
||||
messages.push({ role: 'tool', tool_name: call.function.name, content: result })
|
||||
}
|
||||
|
||||
// generate the final response
|
||||
const finalResponse = await ollama.chat({ model: 'qwen3', messages, tools, think: true })
|
||||
console.log(finalResponse.message.content)
|
||||
}
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
|
||||
## Multi-turn tool calling (Agent loop)
|
||||
|
||||
An agent loop allows the model to decide when to invoke tools and incorporate their results into its replies.
|
||||
|
||||
It also might help to tell the model that it is in a loop and can make multiple tool calls.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat, ChatResponse
|
||||
|
||||
|
||||
def add(a: int, b: int) -> int:
|
||||
"""Add two numbers"""
|
||||
"""
|
||||
Args:
|
||||
a: The first number
|
||||
b: The second number
|
||||
|
||||
Returns:
|
||||
The sum of the two numbers
|
||||
"""
|
||||
return a + b
|
||||
|
||||
|
||||
def multiply(a: int, b: int) -> int:
|
||||
"""Multiply two numbers"""
|
||||
"""
|
||||
Args:
|
||||
a: The first number
|
||||
b: The second number
|
||||
|
||||
Returns:
|
||||
The product of the two numbers
|
||||
"""
|
||||
return a * b
|
||||
|
||||
|
||||
available_functions = {
|
||||
'add': add,
|
||||
'multiply': multiply,
|
||||
}
|
||||
|
||||
messages = [{'role': 'user', 'content': 'What is (11434+12341)*412?'}]
|
||||
while True:
|
||||
response: ChatResponse = chat(
|
||||
model='qwen3',
|
||||
messages=messages,
|
||||
tools=[add, multiply],
|
||||
think=True,
|
||||
)
|
||||
messages.append(response.message)
|
||||
print("Thinking: ", response.message.thinking)
|
||||
print("Content: ", response.message.content)
|
||||
if response.message.tool_calls:
|
||||
for tc in response.message.tool_calls:
|
||||
if tc.function.name in available_functions:
|
||||
print(f"Calling {tc.function.name} with arguments {tc.function.arguments}")
|
||||
result = available_functions[tc.function.name](**tc.function.arguments)
|
||||
print(f"Result: {result}")
|
||||
# add the tool result to the messages
|
||||
messages.append({'role': 'tool', 'tool_name': tc.function.name, 'content': str(result)})
|
||||
else:
|
||||
# end the loop when there are no more tool calls
|
||||
break
|
||||
# continue the loop with the updated messages
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```typescript
|
||||
import ollama from 'ollama'
|
||||
|
||||
type ToolName = 'add' | 'multiply'
|
||||
|
||||
function add(a: number, b: number): number {
|
||||
return a + b
|
||||
}
|
||||
|
||||
function multiply(a: number, b: number): number {
|
||||
return a * b
|
||||
}
|
||||
|
||||
const availableFunctions: Record<ToolName, (a: number, b: number) => number> = {
|
||||
add,
|
||||
multiply,
|
||||
}
|
||||
|
||||
const tools = [
|
||||
{
|
||||
type: 'function',
|
||||
function: {
|
||||
name: 'add',
|
||||
description: 'Add two numbers',
|
||||
parameters: {
|
||||
type: 'object',
|
||||
required: ['a', 'b'],
|
||||
properties: {
|
||||
a: { type: 'integer', description: 'The first number' },
|
||||
b: { type: 'integer', description: 'The second number' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
type: 'function',
|
||||
function: {
|
||||
name: 'multiply',
|
||||
description: 'Multiply two numbers',
|
||||
parameters: {
|
||||
type: 'object',
|
||||
required: ['a', 'b'],
|
||||
properties: {
|
||||
a: { type: 'integer', description: 'The first number' },
|
||||
b: { type: 'integer', description: 'The second number' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
async function agentLoop() {
|
||||
const messages = [{ role: 'user', content: 'What is (11434+12341)*412?' }]
|
||||
|
||||
while (true) {
|
||||
const response = await ollama.chat({
|
||||
model: 'qwen3',
|
||||
messages,
|
||||
tools,
|
||||
think: true,
|
||||
})
|
||||
|
||||
messages.push(response.message)
|
||||
console.log('Thinking:', response.message.thinking)
|
||||
console.log('Content:', response.message.content)
|
||||
|
||||
const toolCalls = response.message.tool_calls ?? []
|
||||
if (toolCalls.length) {
|
||||
for (const call of toolCalls) {
|
||||
const fn = availableFunctions[call.function.name as ToolName]
|
||||
if (!fn) {
|
||||
continue
|
||||
}
|
||||
|
||||
const args = call.function.arguments as { a: number; b: number }
|
||||
console.log(`Calling ${call.function.name} with arguments`, args)
|
||||
const result = fn(args.a, args.b)
|
||||
console.log(`Result: ${result}`)
|
||||
messages.push({ role: 'tool', tool_name: call.function.name, content: String(result) })
|
||||
}
|
||||
} else {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
agentLoop().catch(console.error)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
|
||||
## Tool calling with streaming
|
||||
|
||||
When streaming, gather every chunk of `thinking`, `content`, and `tool_calls`, then return those fields together with any tool results in the follow-up request.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
|
||||
def get_temperature(city: str) -> str:
|
||||
"""Get the current temperature for a city
|
||||
|
||||
Args:
|
||||
city: The name of the city
|
||||
|
||||
Returns:
|
||||
The current temperature for the city
|
||||
"""
|
||||
temperatures = {
|
||||
'New York': '22°C',
|
||||
'London': '15°C',
|
||||
}
|
||||
return temperatures.get(city, 'Unknown')
|
||||
|
||||
|
||||
messages = [{'role': 'user', 'content': "What is the temperature in New York?"}]
|
||||
|
||||
while True:
|
||||
stream = chat(
|
||||
model='qwen3',
|
||||
messages=messages,
|
||||
tools=[get_temperature],
|
||||
stream=True,
|
||||
think=True,
|
||||
)
|
||||
|
||||
thinking = ''
|
||||
content = ''
|
||||
tool_calls = []
|
||||
|
||||
done_thinking = False
|
||||
# accumulate the partial fields
|
||||
for chunk in stream:
|
||||
if chunk.message.thinking:
|
||||
thinking += chunk.message.thinking
|
||||
print(chunk.message.thinking, end='', flush=True)
|
||||
if chunk.message.content:
|
||||
if not done_thinking:
|
||||
done_thinking = True
|
||||
print('\n')
|
||||
content += chunk.message.content
|
||||
print(chunk.message.content, end='', flush=True)
|
||||
if chunk.message.tool_calls:
|
||||
tool_calls.extend(chunk.message.tool_calls)
|
||||
print(chunk.message.tool_calls)
|
||||
|
||||
# append accumulated fields to the messages
|
||||
if thinking or content or tool_calls:
|
||||
messages.append({'role': 'assistant', 'thinking': thinking, 'content': content, 'tool_calls': tool_calls})
|
||||
|
||||
if not tool_calls:
|
||||
break
|
||||
|
||||
for call in tool_calls:
|
||||
if call.function.name == 'get_temperature':
|
||||
result = get_temperature(**call.function.arguments)
|
||||
else:
|
||||
result = 'Unknown tool'
|
||||
messages.append({'role': 'tool', 'tool_name': call.function.name, 'content': result})
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```typescript
|
||||
import ollama from 'ollama'
|
||||
|
||||
function getTemperature(city: string): string {
|
||||
const temperatures: Record<string, string> = {
|
||||
'New York': '22°C',
|
||||
'London': '15°C',
|
||||
}
|
||||
return temperatures[city] ?? 'Unknown'
|
||||
}
|
||||
|
||||
const getTemperatureTool = {
|
||||
type: 'function',
|
||||
function: {
|
||||
name: 'get_temperature',
|
||||
description: 'Get the current temperature for a city',
|
||||
parameters: {
|
||||
type: 'object',
|
||||
required: ['city'],
|
||||
properties: {
|
||||
city: { type: 'string', description: 'The name of the city' },
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
async function agentLoop() {
|
||||
const messages = [{ role: 'user', content: "What is the temperature in New York?" }]
|
||||
|
||||
while (true) {
|
||||
const stream = await ollama.chat({
|
||||
model: 'qwen3',
|
||||
messages,
|
||||
tools: [getTemperatureTool],
|
||||
stream: true,
|
||||
think: true,
|
||||
})
|
||||
|
||||
let thinking = ''
|
||||
let content = ''
|
||||
const toolCalls: any[] = []
|
||||
let doneThinking = false
|
||||
|
||||
for await (const chunk of stream) {
|
||||
if (chunk.message.thinking) {
|
||||
thinking += chunk.message.thinking
|
||||
process.stdout.write(chunk.message.thinking)
|
||||
}
|
||||
if (chunk.message.content) {
|
||||
if (!doneThinking) {
|
||||
doneThinking = true
|
||||
process.stdout.write('\n')
|
||||
}
|
||||
content += chunk.message.content
|
||||
process.stdout.write(chunk.message.content)
|
||||
}
|
||||
if (chunk.message.tool_calls?.length) {
|
||||
toolCalls.push(...chunk.message.tool_calls)
|
||||
console.log(chunk.message.tool_calls)
|
||||
}
|
||||
}
|
||||
|
||||
if (thinking || content || toolCalls.length) {
|
||||
messages.push({ role: 'assistant', thinking, content, tool_calls: toolCalls } as any)
|
||||
}
|
||||
|
||||
if (!toolCalls.length) {
|
||||
break
|
||||
}
|
||||
|
||||
for (const call of toolCalls) {
|
||||
if (call.function.name === 'get_temperature') {
|
||||
const args = call.function.arguments as { city: string }
|
||||
const result = getTemperature(args.city)
|
||||
messages.push({ role: 'tool', tool_name: call.function.name, content: result } )
|
||||
} else {
|
||||
messages.push({ role: 'tool', tool_name: call.function.name, content: 'Unknown tool' } )
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
agentLoop().catch(console.error)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
This loop streams the assistant response, accumulates partial fields, passes them back together, and appends the tool results so the model can complete its answer.
|
||||
|
||||
|
||||
## Using functions as tools with Ollama Python SDK
|
||||
The Python SDK automatically parses functions as a tool schema so we can pass them directly.
|
||||
Schemas can still be passed if needed.
|
||||
|
||||
```python
|
||||
from ollama import chat
|
||||
|
||||
def get_temperature(city: str) -> str:
|
||||
"""Get the current temperature for a city
|
||||
|
||||
Args:
|
||||
city: The name of the city
|
||||
|
||||
Returns:
|
||||
The current temperature for the city
|
||||
"""
|
||||
temperatures = {
|
||||
'New York': '22°C',
|
||||
'London': '15°C',
|
||||
}
|
||||
return temperatures.get(city, 'Unknown')
|
||||
|
||||
available_functions = {
|
||||
'get_temperature': get_temperature,
|
||||
}
|
||||
# directly pass the function as part of the tools list
|
||||
response = chat(model='qwen3', messages=messages, tools=available_functions.values(), think=True)
|
||||
```
|
||||
84
docs/capabilities/vision.mdx
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
title: Vision
|
||||
---
|
||||
|
||||
Vision models accept images alongside text so the model can describe, classify, and answer questions about what it sees.
|
||||
|
||||
## Quick start
|
||||
|
||||
```shell
|
||||
ollama run gemma3 ./image.png whats in this image?
|
||||
```
|
||||
|
||||
|
||||
## Usage with Ollama's API
|
||||
Provide an `images` array. SDKs accept file paths, URLs or raw bytes while the REST API expects base64-encoded image data.
|
||||
|
||||
|
||||
<Tabs>
|
||||
<Tab title="cURL">
|
||||
```shell
|
||||
# 1. Download a sample image
|
||||
curl -L -o test.jpg "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg"
|
||||
|
||||
# 2. Encode the image
|
||||
IMG=$(base64 < test.jpg | tr -d '\n')
|
||||
|
||||
# 3. Send it to Ollama
|
||||
curl -X POST http://localhost:11434/api/chat \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gemma3",
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": "What is in this image?",
|
||||
"images": ["'"$IMG"'"]
|
||||
}],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
```python
|
||||
from ollama import chat
|
||||
# from pathlib import Path
|
||||
|
||||
# Pass in the path to the image
|
||||
path = input('Please enter the path to the image: ')
|
||||
|
||||
# You can also pass in base64 encoded image data
|
||||
# img = base64.b64encode(Path(path).read_bytes()).decode()
|
||||
# or the raw bytes
|
||||
# img = Path(path).read_bytes()
|
||||
|
||||
response = chat(
|
||||
model='gemma3',
|
||||
messages=[
|
||||
{
|
||||
'role': 'user',
|
||||
'content': 'What is in this image? Be concise.',
|
||||
'images': [path],
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
print(response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
```javascript
|
||||
import ollama from 'ollama'
|
||||
|
||||
const imagePath = '/absolute/path/to/image.jpg'
|
||||
const response = await ollama.chat({
|
||||
model: 'gemma3',
|
||||
messages: [
|
||||
{ role: 'user', content: 'What is in this image?', images: [imagePath] }
|
||||
],
|
||||
stream: false,
|
||||
})
|
||||
|
||||
console.log(response.message.content)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
360
docs/capabilities/web-search.mdx
Normal file
@@ -0,0 +1,360 @@
|
||||
---
|
||||
title: Web search
|
||||
---
|
||||
|
||||
Ollama's web search API can be used to augment models with the latest information to reduce hallucinations and improve accuracy.
|
||||
|
||||
Web search is provided as a REST API with deeper tool integrations in the Python and JavaScript libraries. This also enables models like OpenAI’s gpt-oss models to conduct long-running research tasks.
|
||||
|
||||
## Authentication
|
||||
|
||||
For access to Ollama's web search API, create an [API key](https://ollama.com/settings/keys). A free Ollama account is required.
|
||||
|
||||
## Web search API
|
||||
|
||||
Performs a web search for a single query and returns relevant results.
|
||||
|
||||
### Request
|
||||
|
||||
`POST https://ollama.com/api/web_search`
|
||||
|
||||
- `query` (string, required): the search query string
|
||||
- `max_results` (integer, optional): maximum results to return (default 5, max 10)
|
||||
|
||||
### Response
|
||||
|
||||
Returns an object containing:
|
||||
|
||||
- `results` (array): array of search result objects, each containing:
|
||||
- `title` (string): the title of the web page
|
||||
- `url` (string): the URL of the web page
|
||||
- `content` (string): relevant content snippet from the web page
|
||||
|
||||
### Examples
|
||||
|
||||
<Note>
|
||||
Ensure OLLAMA_API_KEY is set or it must be passed in the Authorization header.
|
||||
</Note>
|
||||
|
||||
#### cURL Request
|
||||
|
||||
```bash
|
||||
curl https://ollama.com/api/web_search \
|
||||
--header "Authorization: Bearer $OLLAMA_API_KEY" \
|
||||
-d '{
|
||||
"query":"what is ollama?"
|
||||
}'
|
||||
```
|
||||
|
||||
**Response**
|
||||
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"title": "Ollama",
|
||||
"url": "https://ollama.com/",
|
||||
"content": "Cloud models are now available..."
|
||||
},
|
||||
{
|
||||
"title": "What is Ollama? Introduction to the AI model management tool",
|
||||
"url": "https://www.hostinger.com/tutorials/what-is-ollama",
|
||||
"content": "Ariffud M. 6min Read..."
|
||||
},
|
||||
{
|
||||
"title": "Ollama Explained: Transforming AI Accessibility and Language ...",
|
||||
"url": "https://www.geeksforgeeks.org/artificial-intelligence/ollama-explained-transforming-ai-accessibility-and-language-processing/",
|
||||
"content": "Data Science Data Science Projects Data Analysis..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Python library
|
||||
|
||||
```python
|
||||
import ollama
|
||||
response = ollama.web_search("What is Ollama?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
**Example output**
|
||||
|
||||
```python
|
||||
|
||||
results = [
|
||||
{
|
||||
"title": "Ollama",
|
||||
"url": "https://ollama.com/",
|
||||
"content": "Cloud models are now available in Ollama..."
|
||||
},
|
||||
{
|
||||
"title": "What is Ollama? Features, Pricing, and Use Cases - Walturn",
|
||||
"url": "https://www.walturn.com/insights/what-is-ollama-features-pricing-and-use-cases",
|
||||
"content": "Our services..."
|
||||
},
|
||||
{
|
||||
"title": "Complete Ollama Guide: Installation, Usage & Code Examples",
|
||||
"url": "https://collabnix.com/complete-ollama-guide-installation-usage-code-examples",
|
||||
"content": "Join our Discord Server..."
|
||||
}
|
||||
]
|
||||
|
||||
```
|
||||
|
||||
More Ollama [Python example](https://github.com/ollama/ollama-python/blob/main/examples/web-search.py)
|
||||
|
||||
#### JavaScript Library
|
||||
|
||||
```tsx
|
||||
import { Ollama } from "ollama";
|
||||
|
||||
const client = new Ollama();
|
||||
const results = await client.webSearch("what is ollama?");
|
||||
console.log(JSON.stringify(results, null, 2));
|
||||
```
|
||||
|
||||
**Example output**
|
||||
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"title": "Ollama",
|
||||
"url": "https://ollama.com/",
|
||||
"content": "Cloud models are now available..."
|
||||
},
|
||||
{
|
||||
"title": "What is Ollama? Introduction to the AI model management tool",
|
||||
"url": "https://www.hostinger.com/tutorials/what-is-ollama",
|
||||
"content": "Ollama is an open-source tool..."
|
||||
},
|
||||
{
|
||||
"title": "Ollama Explained: Transforming AI Accessibility and Language Processing",
|
||||
"url": "https://www.geeksforgeeks.org/artificial-intelligence/ollama-explained-transforming-ai-accessibility-and-language-processing/",
|
||||
"content": "Ollama is a groundbreaking..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
More Ollama [JavaScript example](https://github.com/ollama/ollama-js/blob/main/examples/websearch/websearch-tools.ts)
|
||||
|
||||
## Web fetch API
|
||||
|
||||
Fetches a single web page by URL and returns its content.
|
||||
|
||||
### Request
|
||||
|
||||
`POST https://ollama.com/api/web_fetch`
|
||||
|
||||
- `url` (string, required): the URL to fetch
|
||||
|
||||
### Response
|
||||
|
||||
Returns an object containing:
|
||||
|
||||
- `title` (string): the title of the web page
|
||||
- `content` (string): the main content of the web page
|
||||
- `links` (array): array of links found on the page
|
||||
|
||||
### Examples
|
||||
|
||||
#### cURL Request
|
||||
|
||||
```python
|
||||
curl --request POST \
|
||||
--url https://ollama.com/api/web_fetch \
|
||||
--header "Authorization: Bearer $OLLAMA_API_KEY" \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data '{
|
||||
"url": "ollama.com"
|
||||
}'
|
||||
```
|
||||
|
||||
**Response**
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "Ollama",
|
||||
"content": "[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama...",
|
||||
"links": [
|
||||
"http://ollama.com/",
|
||||
"http://ollama.com/models",
|
||||
"https://github.com/ollama/ollama"
|
||||
]
|
||||
|
||||
```
|
||||
|
||||
#### Python SDK
|
||||
|
||||
```python
|
||||
from ollama import web_fetch
|
||||
|
||||
result = web_fetch('https://ollama.com')
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Result**
|
||||
|
||||
```python
|
||||
WebFetchResponse(
|
||||
title='Ollama',
|
||||
content='[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama\n\n**Chat & build
|
||||
with open models**\n\n[Download](https://ollama.com/download) [Explore
|
||||
models](https://ollama.com/models)\n\nAvailable for macOS, Windows, and Linux',
|
||||
links=['https://ollama.com/', 'https://ollama.com/models', 'https://github.com/ollama/ollama']
|
||||
)
|
||||
```
|
||||
|
||||
#### JavaScript SDK
|
||||
|
||||
```tsx
|
||||
import { Ollama } from "ollama";
|
||||
|
||||
const client = new Ollama();
|
||||
const fetchResult = await client.webFetch("https://ollama.com");
|
||||
console.log(JSON.stringify(fetchResult, null, 2));
|
||||
```
|
||||
|
||||
**Result**
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "Ollama",
|
||||
"content": "[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama...",
|
||||
"links": [
|
||||
"https://ollama.com/",
|
||||
"https://ollama.com/models",
|
||||
"https://github.com/ollama/ollama"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Building a search agent
|
||||
|
||||
Use Ollama’s web search API as a tool to build a mini search agent.
|
||||
|
||||
This example uses Alibaba’s Qwen 3 model with 4B parameters.
|
||||
|
||||
```bash
|
||||
ollama pull qwen3:4b
|
||||
```
|
||||
|
||||
```python
|
||||
from ollama import chat, web_fetch, web_search
|
||||
|
||||
available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
|
||||
|
||||
messages = [{'role': 'user', 'content': "what is ollama's new engine"}]
|
||||
|
||||
while True:
|
||||
response = chat(
|
||||
model='qwen3:4b',
|
||||
messages=messages,
|
||||
tools=[web_search, web_fetch],
|
||||
think=True
|
||||
)
|
||||
if response.message.thinking:
|
||||
print('Thinking: ', response.message.thinking)
|
||||
if response.message.content:
|
||||
print('Content: ', response.message.content)
|
||||
messages.append(response.message)
|
||||
if response.message.tool_calls:
|
||||
print('Tool calls: ', response.message.tool_calls)
|
||||
for tool_call in response.message.tool_calls:
|
||||
function_to_call = available_tools.get(tool_call.function.name)
|
||||
if function_to_call:
|
||||
args = tool_call.function.arguments
|
||||
result = function_to_call(**args)
|
||||
print('Result: ', str(result)[:200]+'...')
|
||||
# Result is truncated for limited context lengths
|
||||
messages.append({'role': 'tool', 'content': str(result)[:2000 * 4], 'tool_name': tool_call.function.name})
|
||||
else:
|
||||
messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
|
||||
else:
|
||||
break
|
||||
```
|
||||
|
||||
**Result**
|
||||
|
||||
```
|
||||
Thinking: Okay, the user is asking about Ollama's new engine. I need to figure out what they're referring to. Ollama is a company that develops large language models, so maybe they've released a new model or an updated version of their existing engine....
|
||||
|
||||
Tool calls: [ToolCall(function=Function(name='web_search', arguments={'max_results': 3, 'query': 'Ollama new engine'}))]
|
||||
Result: results=[WebSearchResult(content='# New model scheduling\n\n## September 23, 2025\n\nOllama now includes a significantly improved model scheduling system. Ahead of running a model, Ollama’s new engine
|
||||
|
||||
Thinking: Okay, the user asked about Ollama's new engine. Let me look at the search results.
|
||||
|
||||
First result is from September 23, 2025, talking about new model scheduling. It mentions improved memory management, reduced crashes, better GPU utilization, and multi-GPU performance. Examples show speed improvements and accurate memory reporting. Supported models include gemma3, llama4, qwen3, etc...
|
||||
|
||||
Content: Ollama has introduced two key updates to its engine, both released in 2025:
|
||||
|
||||
1. **Enhanced Model Scheduling (September 23, 2025)**
|
||||
- **Precision Memory Management**: Exact memory allocation reduces out-of-memory crashes and optimizes GPU utilization.
|
||||
- **Performance Gains**: Examples show significant speed improvements (e.g., 85.54 tokens/s vs 52.02 tokens/s) and full GPU layer utilization.
|
||||
- **Multi-GPU Support**: Improved efficiency across multiple GPUs, with accurate memory reporting via tools like `nvidia-smi`.
|
||||
- **Supported Models**: Includes `gemma3`, `llama4`, `qwen3`, `mistral-small3.2`, and more.
|
||||
|
||||
2. **Multimodal Engine (May 15, 2025)**
|
||||
- **Vision Support**: First-class support for vision models, including `llama4:scout` (109B parameters), `gemma3`, `qwen2.5vl`, and `mistral-small3.1`.
|
||||
- **Multimodal Tasks**: Examples include identifying animals in multiple images, answering location-based questions from videos, and document scanning.
|
||||
|
||||
These updates highlight Ollama's focus on efficiency, performance, and expanded capabilities for both text and vision tasks.
|
||||
```
|
||||
|
||||
### Context length and agents
|
||||
|
||||
Web search results can return thousands of tokens. It is recommended to increase the context length of the model to at least ~32000 tokens. Search agents work best with full context length. [Ollama's cloud models](https://docs.ollama.com/cloud) run at the full context length.
|
||||
|
||||
## MCP Server
|
||||
|
||||
You can enable web search in any MCP client through the [Python MCP server](https://github.com/ollama/ollama-python/blob/main/examples/web-search-mcp.py).
|
||||
|
||||
### Cline
|
||||
|
||||
Ollama's web search can be integrated with Cline easily using the MCP server configuration.
|
||||
|
||||
`Manage MCP Servers` > `Configure MCP Servers` > Add the following configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"web_search_and_fetch": {
|
||||
"type": "stdio",
|
||||
"command": "uv",
|
||||
"args": ["run", "path/to/web-search-mcp.py"],
|
||||
"env": { "OLLAMA_API_KEY": "your_api_key_here" }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||

|
||||
|
||||
### Codex
|
||||
|
||||
Ollama works well with OpenAI's Codex tool.
|
||||
|
||||
Add the following configuration to `~/.codex/config.toml`
|
||||
|
||||
```python
|
||||
[mcp_servers.web_search]
|
||||
command = "uv"
|
||||
args = ["run", "path/to/web-search-mcp.py"]
|
||||
env = { "OLLAMA_API_KEY" = "your_api_key_here" }
|
||||
```
|
||||
|
||||

|
||||
|
||||
### Goose
|
||||
|
||||
Ollama can integrate with Goose via its MCP feature.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### Other integrations
|
||||
|
||||
Ollama can be integrated into most of the tools available either through direct integration of Ollama's API, Python / JavaScript libraries, OpenAI compatible API, and MCP server integration.
|
||||
145
docs/cli.mdx
Normal file
@@ -0,0 +1,145 @@
|
||||
---
|
||||
title: CLI Reference
|
||||
---
|
||||
|
||||
### Run a model
|
||||
|
||||
```
|
||||
ollama run gemma3
|
||||
```
|
||||
|
||||
### Launch integrations
|
||||
|
||||
```
|
||||
ollama launch
|
||||
```
|
||||
|
||||
Configure and launch external applications to use Ollama models. This provides an interactive way to set up and start integrations with supported apps.
|
||||
|
||||
#### Supported integrations
|
||||
|
||||
- **OpenCode** - Open-source coding assistant
|
||||
- **Claude Code** - Anthropic's agentic coding tool
|
||||
- **Codex** - OpenAI's coding assistant
|
||||
- **VS Code** - Microsoft's IDE with built-in AI chat
|
||||
- **Droid** - Factory's AI coding agent
|
||||
|
||||
#### Examples
|
||||
|
||||
Launch an integration interactively:
|
||||
|
||||
```
|
||||
ollama launch
|
||||
```
|
||||
|
||||
Launch a specific integration:
|
||||
|
||||
```
|
||||
ollama launch claude
|
||||
```
|
||||
|
||||
Launch with a specific model:
|
||||
|
||||
```
|
||||
ollama launch claude --model qwen3.5
|
||||
```
|
||||
|
||||
Configure without launching:
|
||||
|
||||
```
|
||||
ollama launch droid --config
|
||||
```
|
||||
|
||||
#### Multiline input
|
||||
|
||||
For multiline input, you can wrap text with `"""`:
|
||||
|
||||
```
|
||||
>>> """Hello,
|
||||
... world!
|
||||
... """
|
||||
I'm a basic program that prints the famous "Hello, world!" message to the console.
|
||||
```
|
||||
|
||||
#### Multimodal models
|
||||
|
||||
```
|
||||
ollama run gemma3 "What's in this image? /Users/jmorgan/Desktop/smile.png"
|
||||
```
|
||||
|
||||
### Generate embeddings
|
||||
|
||||
```
|
||||
ollama run embeddinggemma "Hello world"
|
||||
```
|
||||
|
||||
Output is a JSON array:
|
||||
|
||||
```
|
||||
echo "Hello world" | ollama run nomic-embed-text
|
||||
```
|
||||
|
||||
### Download a model
|
||||
|
||||
```
|
||||
ollama pull gemma3
|
||||
```
|
||||
|
||||
### Remove a model
|
||||
|
||||
```
|
||||
ollama rm gemma3
|
||||
```
|
||||
|
||||
### List models
|
||||
|
||||
```
|
||||
ollama ls
|
||||
```
|
||||
|
||||
### Sign in to Ollama
|
||||
|
||||
```
|
||||
ollama signin
|
||||
```
|
||||
|
||||
### Sign out of Ollama
|
||||
|
||||
```
|
||||
ollama signout
|
||||
```
|
||||
|
||||
### Create a customized model
|
||||
|
||||
First, create a `Modelfile`
|
||||
|
||||
```
|
||||
FROM gemma3
|
||||
SYSTEM """You are a happy cat."""
|
||||
```
|
||||
|
||||
Then run `ollama create`:
|
||||
|
||||
```
|
||||
ollama create -f Modelfile
|
||||
```
|
||||
|
||||
### List running models
|
||||
|
||||
```
|
||||
ollama ps
|
||||
```
|
||||
|
||||
### Stop a running model
|
||||
|
||||
```
|
||||
ollama stop gemma3
|
||||
```
|
||||
|
||||
### Start Ollama
|
||||
|
||||
```
|
||||
ollama serve
|
||||
```
|
||||
|
||||
To view a list of environment variables that can be set run `ollama serve --help`
|
||||
232
docs/cloud.mdx
Normal file
@@ -0,0 +1,232 @@
|
||||
---
|
||||
title: Cloud
|
||||
sidebarTitle: Cloud
|
||||
---
|
||||
|
||||
## Cloud Models
|
||||
|
||||
Ollama's cloud models are a new kind of model in Ollama that can run without a powerful GPU. Instead, cloud models are automatically offloaded to Ollama's cloud service while offering the same capabilities as local models, making it possible to keep using your local tools while running larger models that wouldn't fit on a personal computer.
|
||||
|
||||
### Supported models
|
||||
|
||||
For a list of supported models, see Ollama's [model library](https://ollama.com/search?c=cloud).
|
||||
|
||||
### Running Cloud models
|
||||
|
||||
Ollama's cloud models require an account on [ollama.com](https://ollama.com). To sign in or create an account, run:
|
||||
|
||||
```
|
||||
ollama signin
|
||||
```
|
||||
|
||||
<Tabs>
|
||||
<Tab title="CLI">
|
||||
|
||||
To run a cloud model, open the terminal and run:
|
||||
|
||||
```
|
||||
ollama run gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Python">
|
||||
|
||||
First, pull a cloud model so it can be accessed:
|
||||
|
||||
```
|
||||
ollama pull gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
Next, install [Ollama's Python library](https://github.com/ollama/ollama-python):
|
||||
|
||||
```
|
||||
pip install ollama
|
||||
```
|
||||
|
||||
Next, create and run a simple Python script:
|
||||
|
||||
```python
|
||||
from ollama import Client
|
||||
|
||||
client = Client()
|
||||
|
||||
messages = [
|
||||
{
|
||||
'role': 'user',
|
||||
'content': 'Why is the sky blue?',
|
||||
},
|
||||
]
|
||||
|
||||
for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
|
||||
print(part['message']['content'], end='', flush=True)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
First, pull a cloud model so it can be accessed:
|
||||
|
||||
```
|
||||
ollama pull gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
Next, install [Ollama's JavaScript library](https://github.com/ollama/ollama-js):
|
||||
|
||||
```
|
||||
npm i ollama
|
||||
```
|
||||
|
||||
Then use the library to run a cloud model:
|
||||
|
||||
```typescript
|
||||
import { Ollama } from "ollama";
|
||||
|
||||
const ollama = new Ollama();
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: "gpt-oss:120b-cloud",
|
||||
messages: [{ role: "user", content: "Explain quantum computing" }],
|
||||
stream: true,
|
||||
});
|
||||
|
||||
for await (const part of response) {
|
||||
process.stdout.write(part.message.content);
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
First, pull a cloud model so it can be accessed:
|
||||
|
||||
```
|
||||
ollama pull gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
Run the following cURL command to run the command via Ollama's API:
|
||||
|
||||
```
|
||||
curl http://localhost:11434/api/chat -d '{
|
||||
"model": "gpt-oss:120b-cloud",
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": "Why is the sky blue?"
|
||||
}],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Cloud API access
|
||||
|
||||
Cloud models can also be accessed directly on ollama.com's API. In this mode, ollama.com acts as a remote Ollama host.
|
||||
|
||||
### Authentication
|
||||
|
||||
For direct access to ollama.com's API, first create an [API key](https://ollama.com/settings/keys).
|
||||
|
||||
Then, set the `OLLAMA_API_KEY` environment variable to your API key.
|
||||
|
||||
```
|
||||
export OLLAMA_API_KEY=your_api_key
|
||||
```
|
||||
|
||||
### Listing models
|
||||
|
||||
For models available directly via Ollama's API, models can be listed via:
|
||||
|
||||
```
|
||||
curl https://ollama.com/api/tags
|
||||
```
|
||||
|
||||
### Generating a response
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Python">
|
||||
|
||||
First, install [Ollama's Python library](https://github.com/ollama/ollama-python)
|
||||
|
||||
```
|
||||
pip install ollama
|
||||
```
|
||||
|
||||
Then make a request
|
||||
|
||||
```python
|
||||
import os
|
||||
from ollama import Client
|
||||
|
||||
client = Client(
|
||||
host="https://ollama.com",
|
||||
headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
'role': 'user',
|
||||
'content': 'Why is the sky blue?',
|
||||
},
|
||||
]
|
||||
|
||||
for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
|
||||
print(part['message']['content'], end='', flush=True)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
First, install [Ollama's JavaScript library](https://github.com/ollama/ollama-js):
|
||||
|
||||
```
|
||||
npm i ollama
|
||||
```
|
||||
|
||||
Next, make a request to the model:
|
||||
|
||||
```typescript
|
||||
import { Ollama } from "ollama";
|
||||
|
||||
const ollama = new Ollama({
|
||||
host: "https://ollama.com",
|
||||
headers: {
|
||||
Authorization: "Bearer " + process.env.OLLAMA_API_KEY,
|
||||
},
|
||||
});
|
||||
|
||||
const response = await ollama.chat({
|
||||
model: "gpt-oss:120b",
|
||||
messages: [{ role: "user", content: "Explain quantum computing" }],
|
||||
stream: true,
|
||||
});
|
||||
|
||||
for await (const part of response) {
|
||||
process.stdout.write(part.message.content);
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
Generate a response via Ollama's chat API:
|
||||
|
||||
```
|
||||
curl https://ollama.com/api/chat \
|
||||
-H "Authorization: Bearer $OLLAMA_API_KEY" \
|
||||
-d '{
|
||||
"model": "gpt-oss:120b",
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": "Why is the sky blue?"
|
||||
}],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Local only
|
||||
|
||||
Ollama can run in local-only mode by [disabling Ollama's cloud](./faq#how-do-i-disable-ollama-cloud) features.
|
||||
41
docs/context-length.mdx
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: Context length
|
||||
---
|
||||
|
||||
Context length is the maximum number of tokens that the model has access to in memory.
|
||||
|
||||
<Note>
|
||||
Ollama defaults to the following context lengths based on VRAM:
|
||||
- < 24 GiB VRAM: 4k context
|
||||
- 24-48 GiB VRAM: 32k context
|
||||
- >= 48 GiB VRAM: 256k context
|
||||
</Note>
|
||||
|
||||
Tasks which require large context like web search, agents, and coding tools should be set to at least 64000 tokens.
|
||||
|
||||
## Setting context length
|
||||
|
||||
Setting a larger context length will increase the amount of memory required to run a model. Ensure you have enough VRAM available to increase the context length.
|
||||
|
||||
Cloud models are set to their maximum context length by default.
|
||||
|
||||
### App
|
||||
|
||||
Change the slider in the Ollama app under settings to your desired context length.
|
||||

|
||||
|
||||
### CLI
|
||||
If editing the context length for Ollama is not possible, the context length can also be updated when serving Ollama.
|
||||
```
|
||||
OLLAMA_CONTEXT_LENGTH=64000 ollama serve
|
||||
```
|
||||
|
||||
### Check allocated context length and model offloading
|
||||
For best performance, use the maximum context length for a model, and avoid offloading the model to CPU. Verify the split under `PROCESSOR` using `ollama ps`.
|
||||
```
|
||||
ollama ps
|
||||
```
|
||||
```
|
||||
NAME ID SIZE PROCESSOR CONTEXT UNTIL
|
||||
gemma3:latest a2af6cc3eb7f 6.6 GB 100% GPU 65536 2 minutes from now
|
||||
```
|
||||
218
docs/development.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# Development
|
||||
|
||||
Install prerequisites:
|
||||
|
||||
- [Go](https://go.dev/doc/install)
|
||||
- C/C++ Compiler e.g. Clang on macOS, [TDM-GCC](https://github.com/jmeubank/tdm-gcc/releases/latest) (Windows amd64) or [llvm-mingw](https://github.com/mstorsjo/llvm-mingw) (Windows arm64), GCC/Clang on Linux.
|
||||
|
||||
Then build and run Ollama from the root directory of the repository:
|
||||
|
||||
```shell
|
||||
go run . serve
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Ollama includes native code compiled with CGO. From time to time these data structures can change and CGO can get out of sync resulting in unexpected crashes. You can force a full build of the native code by running `go clean -cache` first.
|
||||
|
||||
|
||||
## macOS (Apple Silicon)
|
||||
|
||||
macOS Apple Silicon supports Metal which is built-in to the Ollama binary. No additional steps are required.
|
||||
|
||||
## macOS (Intel)
|
||||
|
||||
Install prerequisites:
|
||||
|
||||
- [CMake](https://cmake.org/download/) or `brew install cmake`
|
||||
|
||||
Then, configure and build the project:
|
||||
|
||||
```shell
|
||||
cmake -B build
|
||||
cmake --build build
|
||||
```
|
||||
|
||||
Lastly, run Ollama:
|
||||
|
||||
```shell
|
||||
go run . serve
|
||||
```
|
||||
|
||||
## Windows
|
||||
|
||||
Install prerequisites:
|
||||
|
||||
- [CMake](https://cmake.org/download/)
|
||||
- [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) including the Native Desktop Workload
|
||||
- (Optional) AMD GPU support
|
||||
- [ROCm](https://rocm.docs.amd.com/en/latest/)
|
||||
- [Ninja](https://github.com/ninja-build/ninja/releases)
|
||||
- (Optional) NVIDIA GPU support
|
||||
- [CUDA SDK](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_network)
|
||||
- (Optional) VULKAN GPU support
|
||||
- [VULKAN SDK](https://vulkan.lunarg.com/sdk/home) - useful for AMD/Intel GPUs
|
||||
- (Optional) MLX engine support
|
||||
- [CUDA 13+ SDK](https://developer.nvidia.com/cuda-downloads)
|
||||
- [cuDNN 9+](https://developer.nvidia.com/cudnn)
|
||||
|
||||
Then, configure and build the project:
|
||||
|
||||
```shell
|
||||
cmake -B build
|
||||
cmake --build build --config Release
|
||||
```
|
||||
|
||||
> Building for Vulkan requires VULKAN_SDK environment variable:
|
||||
>
|
||||
> PowerShell
|
||||
> ```powershell
|
||||
> $env:VULKAN_SDK="C:\VulkanSDK\<version>"
|
||||
> ```
|
||||
> CMD
|
||||
> ```cmd
|
||||
> set VULKAN_SDK=C:\VulkanSDK\<version>
|
||||
> ```
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Building for ROCm requires additional flags:
|
||||
> ```
|
||||
> cmake -B build -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
|
||||
> cmake --build build --config Release
|
||||
> ```
|
||||
|
||||
|
||||
|
||||
Lastly, run Ollama:
|
||||
|
||||
```shell
|
||||
go run . serve
|
||||
```
|
||||
|
||||
## Windows (ARM)
|
||||
|
||||
Windows ARM does not support additional acceleration libraries at this time. Do not use cmake, simply `go run` or `go build`.
|
||||
|
||||
## Linux
|
||||
|
||||
Install prerequisites:
|
||||
|
||||
- [CMake](https://cmake.org/download/) or `sudo apt install cmake` or `sudo dnf install cmake`
|
||||
- (Optional) AMD GPU support
|
||||
- [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html)
|
||||
- (Optional) NVIDIA GPU support
|
||||
- [CUDA SDK](https://developer.nvidia.com/cuda-downloads)
|
||||
- (Optional) VULKAN GPU support
|
||||
- [VULKAN SDK](https://vulkan.lunarg.com/sdk/home) - useful for AMD/Intel GPUs
|
||||
- Or install via package manager: `sudo apt install vulkan-sdk` (Ubuntu/Debian) or `sudo dnf install vulkan-sdk` (Fedora/CentOS)
|
||||
- (Optional) MLX engine support
|
||||
- [CUDA 13+ SDK](https://developer.nvidia.com/cuda-downloads)
|
||||
- [cuDNN 9+](https://developer.nvidia.com/cudnn)
|
||||
- OpenBLAS/LAPACK: `sudo apt install libopenblas-dev liblapack-dev liblapacke-dev` (Ubuntu/Debian)
|
||||
> [!IMPORTANT]
|
||||
> Ensure prerequisites are in `PATH` before running CMake.
|
||||
|
||||
|
||||
Then, configure and build the project:
|
||||
|
||||
```shell
|
||||
cmake -B build
|
||||
cmake --build build
|
||||
```
|
||||
|
||||
Lastly, run Ollama:
|
||||
|
||||
```shell
|
||||
go run . serve
|
||||
```
|
||||
|
||||
## MLX Engine (Optional)
|
||||
|
||||
The MLX engine enables running safetensor based models. It requires building the [MLX](https://github.com/ml-explore/mlx) and [MLX-C](https://github.com/ml-explore/mlx-c) shared libraries separately via CMake. On MacOS, MLX leverages the Metal library to run on the GPU, and on Windows and Linux, runs on NVIDIA GPUs via CUDA v13.
|
||||
|
||||
### macOS (Apple Silicon)
|
||||
|
||||
Requires the Metal toolchain. Install [Xcode](https://developer.apple.com/xcode/) first, then:
|
||||
|
||||
```shell
|
||||
xcodebuild -downloadComponent MetalToolchain
|
||||
```
|
||||
|
||||
Verify it's installed correctly (should print "no input files"):
|
||||
|
||||
```shell
|
||||
xcrun metal
|
||||
```
|
||||
|
||||
Then build:
|
||||
|
||||
```shell
|
||||
cmake -B build --preset MLX
|
||||
cmake --build build --preset MLX --parallel
|
||||
cmake --install build --component MLX
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Without the Metal toolchain, cmake will silently complete with Metal disabled. Check the cmake output for `Setting MLX_BUILD_METAL=OFF` which indicates the toolchain is missing.
|
||||
|
||||
### Windows / Linux (CUDA)
|
||||
|
||||
Requires CUDA 13+ and [cuDNN](https://developer.nvidia.com/cudnn) 9+.
|
||||
|
||||
```shell
|
||||
cmake -B build --preset "MLX CUDA 13"
|
||||
cmake --build build --target mlx --target mlxc --config Release --parallel
|
||||
cmake --install build --component MLX --strip
|
||||
```
|
||||
|
||||
### Local MLX source overrides
|
||||
|
||||
To build against a local checkout of MLX and/or MLX-C (useful for development), set environment variables before running CMake:
|
||||
|
||||
```shell
|
||||
export OLLAMA_MLX_SOURCE=/path/to/mlx
|
||||
export OLLAMA_MLX_C_SOURCE=/path/to/mlx-c
|
||||
```
|
||||
|
||||
For example, using the helper scripts with local mlx and mlx-c repos:
|
||||
```shell
|
||||
OLLAMA_MLX_SOURCE=../mlx OLLAMA_MLX_C_SOURCE=../mlx-c ./scripts/build_linux.sh
|
||||
|
||||
OLLAMA_MLX_SOURCE=../mlx OLLAMA_MLX_C_SOURCE=../mlx-c ./scripts/build_darwin.sh
|
||||
```
|
||||
|
||||
```powershell
|
||||
$env:OLLAMA_MLX_SOURCE="../mlx"
|
||||
$env:OLLAMA_MLX_C_SOURCE="../mlx-c"
|
||||
./scripts/build_darwin.ps1
|
||||
```
|
||||
|
||||
## Docker
|
||||
|
||||
```shell
|
||||
docker build .
|
||||
```
|
||||
|
||||
### ROCm
|
||||
|
||||
```shell
|
||||
docker build --build-arg FLAVOR=rocm .
|
||||
```
|
||||
|
||||
## Running tests
|
||||
|
||||
To run tests, use `go test`:
|
||||
|
||||
```shell
|
||||
go test ./...
|
||||
```
|
||||
|
||||
## Library detection
|
||||
|
||||
Ollama looks for acceleration libraries in the following paths relative to the `ollama` executable:
|
||||
|
||||
* `./lib/ollama` (Windows)
|
||||
* `../lib/ollama` (Linux)
|
||||
* `.` (macOS)
|
||||
* `build/lib/ollama` (for development)
|
||||
|
||||
If the libraries are not found, Ollama will not run with any acceleration libraries.
|
||||
91
docs/docker.mdx
Normal file
@@ -0,0 +1,91 @@
|
||||
## CPU only
|
||||
|
||||
```shell
|
||||
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||
```
|
||||
|
||||
## Nvidia GPU
|
||||
|
||||
Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation).
|
||||
|
||||
### Install with Apt
|
||||
|
||||
1. Configure the repository
|
||||
|
||||
```shell
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
|
||||
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
|
||||
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
|
||||
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
sudo apt-get update
|
||||
```
|
||||
|
||||
2. Install the NVIDIA Container Toolkit packages
|
||||
|
||||
```shell
|
||||
sudo apt-get install -y nvidia-container-toolkit
|
||||
```
|
||||
|
||||
### Install with Yum or Dnf
|
||||
|
||||
1. Configure the repository
|
||||
|
||||
```shell
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
|
||||
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
|
||||
```
|
||||
|
||||
2. Install the NVIDIA Container Toolkit packages
|
||||
|
||||
```shell
|
||||
sudo yum install -y nvidia-container-toolkit
|
||||
```
|
||||
|
||||
### Configure Docker to use Nvidia driver
|
||||
|
||||
```shell
|
||||
sudo nvidia-ctk runtime configure --runtime=docker
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
### Start the container
|
||||
|
||||
```shell
|
||||
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||
```
|
||||
|
||||
<Note>
|
||||
If you're running on an NVIDIA JetPack system, Ollama can't automatically discover the correct JetPack version.
|
||||
Pass the environment variable `JETSON_JETPACK=5` or `JETSON_JETPACK=6` to the container to select version 5 or 6.
|
||||
</Note>
|
||||
|
||||
## AMD GPU
|
||||
|
||||
To run Ollama using Docker with AMD GPUs, use the `rocm` tag and the following command:
|
||||
|
||||
```shell
|
||||
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
|
||||
```
|
||||
|
||||
## Vulkan Support
|
||||
|
||||
Vulkan is bundled into the `ollama/ollama` image.
|
||||
|
||||
```shell
|
||||
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama
|
||||
```
|
||||
|
||||
|
||||
## Run model locally
|
||||
|
||||
Now you can run a model:
|
||||
|
||||
```shell
|
||||
docker exec -it ollama ollama run llama3.2
|
||||
```
|
||||
|
||||
## Try different models
|
||||
|
||||
More models can be found on the [Ollama library](https://ollama.com/library).
|
||||
|
||||
229
docs/docs.json
Normal file
@@ -0,0 +1,229 @@
|
||||
{
|
||||
"$schema": "https://mintlify.com/docs.json",
|
||||
"name": "Ollama",
|
||||
"colors": {
|
||||
"primary": "#000",
|
||||
"light": "#b5b5b5",
|
||||
"dark": "#000"
|
||||
},
|
||||
"favicon": "/images/favicon.png",
|
||||
"logo": {
|
||||
"light": "/images/logo.png",
|
||||
"dark": "/images/logo-dark.png",
|
||||
"href": "https://ollama.com"
|
||||
},
|
||||
"theme": "maple",
|
||||
"background": {
|
||||
"color": {
|
||||
"light": "#ffffff",
|
||||
"dark": "#000000"
|
||||
}
|
||||
},
|
||||
"fonts": {
|
||||
"family": "system-ui",
|
||||
"heading": {
|
||||
"family": "system-ui"
|
||||
},
|
||||
"body": {
|
||||
"family": "system-ui"
|
||||
}
|
||||
},
|
||||
"styling": {
|
||||
"codeblocks": "system"
|
||||
},
|
||||
"contextual": {
|
||||
"options": [
|
||||
"copy"
|
||||
]
|
||||
},
|
||||
"navbar": {
|
||||
"links": [
|
||||
{
|
||||
"label": "Sign in",
|
||||
"href": "https://ollama.com/signin"
|
||||
}
|
||||
],
|
||||
"primary": {
|
||||
"type": "button",
|
||||
"label": "Download",
|
||||
"href": "https://ollama.com/download"
|
||||
}
|
||||
},
|
||||
"api": {
|
||||
"playground": {
|
||||
"display": "simple"
|
||||
},
|
||||
"examples": {
|
||||
"languages": [
|
||||
"curl"
|
||||
]
|
||||
}
|
||||
},
|
||||
"redirects": [
|
||||
{
|
||||
"source": "/openai",
|
||||
"destination": "/api/openai-compatibility"
|
||||
},
|
||||
{
|
||||
"source": "/api/openai",
|
||||
"destination": "/api/openai-compatibility"
|
||||
},
|
||||
{
|
||||
"source": "/api",
|
||||
"destination": "/api/introduction"
|
||||
},
|
||||
{
|
||||
"source": "/integrations/clawdbot",
|
||||
"destination": "/integrations/openclaw"
|
||||
},
|
||||
{
|
||||
"source": "/integrations/poolside",
|
||||
"destination": "/integrations/pool"
|
||||
}
|
||||
],
|
||||
"navigation": {
|
||||
"tabs": [
|
||||
{
|
||||
"tab": "Documentation",
|
||||
"groups": [
|
||||
{
|
||||
"group": "Get started",
|
||||
"pages": [
|
||||
"index",
|
||||
"quickstart",
|
||||
"/cloud"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Capabilities",
|
||||
"pages": [
|
||||
"/capabilities/streaming",
|
||||
"/capabilities/thinking",
|
||||
"/capabilities/structured-outputs",
|
||||
"/capabilities/vision",
|
||||
"/capabilities/embeddings",
|
||||
"/capabilities/tool-calling",
|
||||
"/capabilities/web-search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Integrations",
|
||||
"pages": [
|
||||
"/integrations/index",
|
||||
{
|
||||
"group": "Assistants",
|
||||
"expanded": true,
|
||||
"pages": [
|
||||
"/integrations/openclaw",
|
||||
"/integrations/hermes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Coding",
|
||||
"expanded": true,
|
||||
"pages": [
|
||||
"/integrations/claude-code",
|
||||
"/integrations/codex-app",
|
||||
"/integrations/codex",
|
||||
"/integrations/copilot-cli",
|
||||
"/integrations/opencode",
|
||||
"/integrations/droid",
|
||||
"/integrations/goose",
|
||||
"/integrations/pi",
|
||||
"/integrations/pool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "IDEs & Editors",
|
||||
"expanded": true,
|
||||
"pages": [
|
||||
"/integrations/cline",
|
||||
"/integrations/jetbrains",
|
||||
"/integrations/roo-code",
|
||||
"/integrations/vscode",
|
||||
"/integrations/xcode",
|
||||
"/integrations/zed"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Chat & RAG",
|
||||
"pages": [
|
||||
"/integrations/onyx"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Automation",
|
||||
"pages": [
|
||||
"/integrations/n8n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Notebooks",
|
||||
"pages": [
|
||||
"/integrations/marimo"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "More information",
|
||||
"pages": [
|
||||
"/cli",
|
||||
{
|
||||
"group": "Assistant Sandboxing",
|
||||
"pages": [
|
||||
"/integrations/nemoclaw"
|
||||
]
|
||||
},
|
||||
"/modelfile",
|
||||
"/context-length",
|
||||
"/linux",
|
||||
"/macos",
|
||||
"/windows",
|
||||
"/docker",
|
||||
"/import",
|
||||
"/faq",
|
||||
"/gpu",
|
||||
"/troubleshooting"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "API Reference",
|
||||
"openapi": "/openapi.yaml",
|
||||
"groups": [
|
||||
{
|
||||
"group": "API Reference",
|
||||
"pages": [
|
||||
"/api/introduction",
|
||||
"/api/authentication",
|
||||
"/api/streaming",
|
||||
"/api/usage",
|
||||
"/api/errors",
|
||||
"/api/openai-compatibility",
|
||||
"/api/anthropic-compatibility"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Endpoints",
|
||||
"pages": [
|
||||
"POST /api/generate",
|
||||
"POST /api/chat",
|
||||
"POST /api/embed",
|
||||
"GET /api/tags",
|
||||
"GET /api/ps",
|
||||
"POST /api/show",
|
||||
"POST /api/create",
|
||||
"POST /api/copy",
|
||||
"POST /api/pull",
|
||||
"POST /api/push",
|
||||
"DELETE /api/delete",
|
||||
"GET /api/version"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
14
docs/examples.md
Normal file
@@ -0,0 +1,14 @@
|
||||
# Examples
|
||||
|
||||
This directory contains different examples of using Ollama.
|
||||
|
||||
## Python examples
|
||||
Ollama Python examples at [ollama-python/examples](https://github.com/ollama/ollama-python/tree/main/examples)
|
||||
|
||||
|
||||
## JavaScript examples
|
||||
Ollama JavaScript examples at [ollama-js/examples](https://github.com/ollama/ollama-js/tree/main/examples)
|
||||
|
||||
|
||||
## OpenAI compatibility examples
|
||||
Ollama OpenAI compatibility examples at [ollama/examples/openai](../docs/openai.md)
|
||||
413
docs/faq.mdx
Normal file
@@ -0,0 +1,413 @@
|
||||
---
|
||||
title: FAQ
|
||||
---
|
||||
|
||||
## How can I upgrade Ollama?
|
||||
|
||||
Ollama on macOS and Windows will automatically download updates. Click on the taskbar or menubar item and then click "Restart to update" to apply the update. Updates can also be installed by downloading the latest version [manually](https://ollama.com/download/).
|
||||
|
||||
On Linux, re-run the install script:
|
||||
|
||||
```shell
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
```
|
||||
|
||||
## How can I view the logs?
|
||||
|
||||
Review the [Troubleshooting](./troubleshooting.mdx) docs for more about using logs.
|
||||
|
||||
## Is my GPU compatible with Ollama?
|
||||
|
||||
Please refer to the [GPU docs](./gpu.mdx).
|
||||
|
||||
## How can I specify the context window size?
|
||||
|
||||
By default, Ollama uses a context window size of 4096 tokens.
|
||||
|
||||
This can be overridden with the `OLLAMA_CONTEXT_LENGTH` environment variable. For example, to set the default context window to 8K, use:
|
||||
|
||||
```shell
|
||||
OLLAMA_CONTEXT_LENGTH=8192 ollama serve
|
||||
```
|
||||
|
||||
To change this when using `ollama run`, use `/set parameter`:
|
||||
|
||||
```shell
|
||||
/set parameter num_ctx 4096
|
||||
```
|
||||
|
||||
When using the API, specify the `num_ctx` parameter:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama3.2",
|
||||
"prompt": "Why is the sky blue?",
|
||||
"options": {
|
||||
"num_ctx": 4096
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## How can I tell if my model was loaded onto the GPU?
|
||||
|
||||
Use the `ollama ps` command to see what models are currently loaded into memory.
|
||||
|
||||
```shell
|
||||
ollama ps
|
||||
```
|
||||
|
||||
<Info>
|
||||
|
||||
**Output**:
|
||||
|
||||
```
|
||||
NAME ID SIZE PROCESSOR UNTIL
|
||||
llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now
|
||||
```
|
||||
</Info>
|
||||
|
||||
The `Processor` column will show which memory the model was loaded into:
|
||||
|
||||
- `100% GPU` means the model was loaded entirely into the GPU
|
||||
- `100% CPU` means the model was loaded entirely in system memory
|
||||
- `48%/52% CPU/GPU` means the model was loaded partially onto both the GPU and into system memory
|
||||
|
||||
## How do I configure Ollama server?
|
||||
|
||||
Ollama server can be configured with environment variables.
|
||||
|
||||
### Setting environment variables on Mac
|
||||
|
||||
If Ollama is run as a macOS application, environment variables should be set using `launchctl`:
|
||||
|
||||
1. For each environment variable, call `launchctl setenv`.
|
||||
|
||||
```bash
|
||||
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
|
||||
```
|
||||
|
||||
2. Restart Ollama application.
|
||||
|
||||
### Setting environment variables on Linux
|
||||
|
||||
If Ollama is run as a systemd service, environment variables should be set using `systemctl`:
|
||||
|
||||
1. Edit the systemd service by calling `systemctl edit ollama.service`. This will open an editor.
|
||||
|
||||
2. For each environment variable, add a line `Environment` under section `[Service]`:
|
||||
|
||||
```ini
|
||||
[Service]
|
||||
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||
```
|
||||
|
||||
3. Save and exit.
|
||||
|
||||
4. Reload `systemd` and restart Ollama:
|
||||
|
||||
```shell
|
||||
systemctl daemon-reload
|
||||
systemctl restart ollama
|
||||
```
|
||||
|
||||
### Setting environment variables on Windows
|
||||
|
||||
On Windows, Ollama inherits your user and system environment variables.
|
||||
|
||||
1. First Quit Ollama by clicking on it in the task bar.
|
||||
|
||||
2. Start the Settings (Windows 11) or Control Panel (Windows 10) application and search for _environment variables_.
|
||||
|
||||
3. Click on _Edit environment variables for your account_.
|
||||
|
||||
4. Edit or create a new variable for your user account for `OLLAMA_HOST`, `OLLAMA_MODELS`, etc.
|
||||
|
||||
5. Click OK/Apply to save.
|
||||
|
||||
6. Start the Ollama application from the Windows Start menu.
|
||||
|
||||
## How do I use Ollama behind a proxy?
|
||||
|
||||
Ollama pulls models from the Internet and may require a proxy server to access the models. Use `HTTPS_PROXY` to redirect outbound requests through the proxy. Ensure the proxy certificate is installed as a system certificate. Refer to the section above for how to use environment variables on your platform.
|
||||
|
||||
<Note>
|
||||
Avoid setting `HTTP_PROXY`. Ollama does not use HTTP for model pulls, only
|
||||
HTTPS. Setting `HTTP_PROXY` may interrupt client connections to the server.
|
||||
</Note>
|
||||
|
||||
### How do I use Ollama behind a proxy in Docker?
|
||||
|
||||
The Ollama Docker container image can be configured to use a proxy by passing `-e HTTPS_PROXY=https://proxy.example.com` when starting the container.
|
||||
|
||||
Alternatively, the Docker daemon can be configured to use a proxy. Instructions are available for Docker Desktop on [macOS](https://docs.docker.com/desktop/settings/mac/#proxies), [Windows](https://docs.docker.com/desktop/settings/windows/#proxies), and [Linux](https://docs.docker.com/desktop/settings/linux/#proxies), and Docker [daemon with systemd](https://docs.docker.com/config/daemon/systemd/#httphttps-proxy).
|
||||
|
||||
Ensure the certificate is installed as a system certificate when using HTTPS. This may require a new Docker image when using a self-signed certificate.
|
||||
|
||||
```dockerfile
|
||||
FROM ollama/ollama
|
||||
COPY my-ca.pem /usr/local/share/ca-certificates/my-ca.crt
|
||||
RUN update-ca-certificates
|
||||
```
|
||||
|
||||
Build and run this image:
|
||||
|
||||
```shell
|
||||
docker build -t ollama-with-ca .
|
||||
docker run -d -e HTTPS_PROXY=https://my.proxy.example.com -p 11434:11434 ollama-with-ca
|
||||
```
|
||||
|
||||
## Does Ollama send my prompts and answers back to ollama.com?
|
||||
|
||||
Ollama runs locally. We don't see your prompts or data when you run locally. When using cloud-hosted models, we process your prompts and responses to provide the service but do not store or log that content and never train on it. We collect basic account info and limited usage metadata to provide the service that does not include prompt or response content. We don't sell your data. You can delete your account anytime.
|
||||
|
||||
## How do I disable Ollama's cloud features?
|
||||
|
||||
Ollama can run in local only mode by disabling Ollama's cloud features. By turning off Ollama's cloud features, you will lose the ability to use Ollama's cloud models and web search.
|
||||
|
||||
Set `disable_ollama_cloud` in `~/.ollama/server.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"disable_ollama_cloud": true
|
||||
}
|
||||
```
|
||||
|
||||
You can also set the environment variable:
|
||||
|
||||
```shell
|
||||
OLLAMA_NO_CLOUD=1
|
||||
```
|
||||
|
||||
Restart Ollama after changing configuration. Once disabled, Ollama's logs will show `Ollama cloud disabled: true`.
|
||||
|
||||
## How can I expose Ollama on my network?
|
||||
|
||||
Ollama binds 127.0.0.1 port 11434 by default. Change the bind address with the `OLLAMA_HOST` environment variable.
|
||||
|
||||
Refer to the section [above](#how-do-i-configure-ollama-server) for how to set environment variables on your platform.
|
||||
|
||||
## How can I use Ollama with a proxy server?
|
||||
|
||||
Ollama runs an HTTP server and can be exposed using a proxy server such as Nginx. To do so, configure the proxy to forward requests and optionally set required headers (if not exposing Ollama on the network). For example, with Nginx:
|
||||
|
||||
```nginx
|
||||
server {
|
||||
listen 80;
|
||||
server_name example.com; # Replace with your domain or IP
|
||||
location / {
|
||||
proxy_pass http://localhost:11434;
|
||||
proxy_set_header Host localhost:11434;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## How can I use Ollama with ngrok?
|
||||
|
||||
Ollama can be accessed using a range of tunneling apps. For example with Ngrok:
|
||||
|
||||
```shell
|
||||
ngrok http 11434 --host-header="localhost:11434"
|
||||
```
|
||||
|
||||
## How can I use Ollama with Cloudflare Tunnel?
|
||||
|
||||
To use Ollama with Cloudflare Tunnel, use the `--url` and `--http-host-header` flags:
|
||||
|
||||
```shell
|
||||
cloudflared tunnel --url http://localhost:11434 --http-host-header="localhost:11434"
|
||||
```
|
||||
|
||||
## How can I allow additional web origins to access Ollama?
|
||||
|
||||
Ollama allows cross-origin requests from `127.0.0.1` and `0.0.0.0` by default. Additional origins can be configured with `OLLAMA_ORIGINS`.
|
||||
|
||||
For browser extensions, you'll need to explicitly allow the extension's origin pattern. Set `OLLAMA_ORIGINS` to include `chrome-extension://*`, `moz-extension://*`, and `safari-web-extension://*` if you wish to allow all browser extensions access, or specific extensions as needed:
|
||||
|
||||
```
|
||||
# Allow all Chrome, Firefox, and Safari extensions
|
||||
OLLAMA_ORIGINS=chrome-extension://*,moz-extension://*,safari-web-extension://* ollama serve
|
||||
```
|
||||
|
||||
Refer to the section [above](#how-do-i-configure-ollama-server) for how to set environment variables on your platform.
|
||||
|
||||
## Where are models stored?
|
||||
|
||||
- macOS: `~/.ollama/models`
|
||||
- Linux: `/usr/share/ollama/.ollama/models`
|
||||
- Windows: `C:\Users\%username%\.ollama\models`
|
||||
|
||||
### How do I set them to a different location?
|
||||
|
||||
If a different directory needs to be used, set the environment variable `OLLAMA_MODELS` to the chosen directory.
|
||||
|
||||
<Note>
|
||||
On Linux using the standard installer, the `ollama` user needs read and write access to the specified directory. To assign the directory to the `ollama` user run `sudo chown -R ollama:ollama <directory>`.
|
||||
</Note>
|
||||
|
||||
Refer to the section [above](#how-do-i-configure-ollama-server) for how to set environment variables on your platform.
|
||||
|
||||
## How can I use Ollama in Visual Studio Code?
|
||||
|
||||
There is already a large collection of plugins available for VS Code as well as other editors that leverage Ollama. See the list of [extensions & plugins](https://github.com/ollama/ollama#extensions--plugins) at the bottom of the main repository readme.
|
||||
|
||||
## How do I use Ollama with GPU acceleration in Docker?
|
||||
|
||||
The Ollama Docker container can be configured with GPU acceleration in Linux or Windows (with WSL2). This requires the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). See [ollama/ollama](https://hub.docker.com/r/ollama/ollama) for more details.
|
||||
|
||||
GPU acceleration is not available for Docker Desktop in macOS due to the lack of GPU passthrough and emulation.
|
||||
|
||||
## Why is networking slow in WSL2 on Windows 10?
|
||||
|
||||
This can impact both installing Ollama, as well as downloading models.
|
||||
|
||||
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernet (WSL)` adapter, right click and select `Properties`.
|
||||
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. _Disable_ both of these
|
||||
properties.
|
||||
|
||||
## How can I preload a model into Ollama to get faster response times?
|
||||
|
||||
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
|
||||
|
||||
To preload the mistral model using the generate endpoint, use:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{"model": "mistral"}'
|
||||
```
|
||||
|
||||
To use the chat completions endpoint, use:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
|
||||
```
|
||||
|
||||
To preload a model using the CLI, use the command:
|
||||
|
||||
```shell
|
||||
ollama run llama3.2 ""
|
||||
```
|
||||
|
||||
## How do I keep a model loaded in memory or make it unload immediately?
|
||||
|
||||
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you're making numerous requests to the LLM. If you want to immediately unload a model from memory, use the `ollama stop` command:
|
||||
|
||||
```shell
|
||||
ollama stop llama3.2
|
||||
```
|
||||
|
||||
If you're using the API, use the `keep_alive` parameter with the `/api/generate` and `/api/chat` endpoints to set the amount of time that a model stays in memory. The `keep_alive` parameter can be set to:
|
||||
|
||||
- a duration string (such as "10m" or "24h")
|
||||
- a number in seconds (such as 3600)
|
||||
- any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
|
||||
- '0' which will unload the model immediately after generating a response
|
||||
|
||||
For example, to preload a model and leave it in memory use:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": -1}'
|
||||
```
|
||||
|
||||
To unload the model and free up memory use:
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'
|
||||
```
|
||||
|
||||
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to the section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
|
||||
|
||||
The `keep_alive` API parameter with the `/api/generate` and `/api/chat` API endpoints will override the `OLLAMA_KEEP_ALIVE` setting.
|
||||
|
||||
## How do I manage the maximum number of requests the Ollama server can queue?
|
||||
|
||||
If too many requests are sent to the server, it will respond with a 503 error indicating the server is overloaded. You can adjust how many requests may be queued by setting `OLLAMA_MAX_QUEUE`.
|
||||
|
||||
## How does Ollama handle concurrent requests?
|
||||
|
||||
Ollama supports two levels of concurrent processing. If your system has sufficient available memory (system memory when using CPU inference, or VRAM for GPU inference) then multiple models can be loaded at the same time. For a given model, if there is sufficient available memory when the model is loaded, it is configured to allow parallel request processing.
|
||||
|
||||
If there is insufficient available memory to load a new model request while one or more models are already loaded, all new requests will be queued until the new model can be loaded. As prior models become idle, one or more will be unloaded to make room for the new model. Queued requests will be processed in order. When using GPU inference new models must be able to completely fit in VRAM to allow concurrent model loads.
|
||||
|
||||
Parallel request processing for a given model results in increasing the context size by the number of parallel requests. For example, a 2K context with 4 parallel requests will result in an 8K context and additional memory allocation.
|
||||
|
||||
The following server settings may be used to adjust how Ollama handles concurrent requests on most platforms:
|
||||
|
||||
- `OLLAMA_MAX_LOADED_MODELS` - The maximum number of models that can be loaded concurrently provided they fit in available memory. The default is 3 \* the number of GPUs or 3 for CPU inference.
|
||||
- `OLLAMA_NUM_PARALLEL` - The maximum number of parallel requests each model will process at the same time, default 1. Required RAM will scale by `OLLAMA_NUM_PARALLEL` * `OLLAMA_CONTEXT_LENGTH`.
|
||||
- `OLLAMA_MAX_QUEUE` - The maximum number of requests Ollama will queue when busy before rejecting additional requests. The default is 512
|
||||
|
||||
Note: Windows with Radeon GPUs currently default to 1 model maximum due to limitations in ROCm v5.7 for available VRAM reporting. Once ROCm v6.2 is available, Windows Radeon will follow the defaults above. You may enable concurrent model loads on Radeon on Windows, but ensure you don't load more models than will fit into your GPU's VRAM.
|
||||
|
||||
## How does Ollama load models on multiple GPUs?
|
||||
|
||||
When loading a new model, Ollama evaluates the required VRAM for the model against what is currently available. If the model will entirely fit on any single GPU, Ollama will load the model on that GPU. This typically provides the best performance as it reduces the amount of data transferring across the PCI bus during inference. If the model does not fit entirely on one GPU, then it will be spread across all the available GPUs.
|
||||
|
||||
## How can I enable Flash Attention?
|
||||
|
||||
Flash Attention is a feature of most modern models that can significantly reduce memory usage as the context size grows. To enable Flash Attention, set the `OLLAMA_FLASH_ATTENTION` environment variable to `1` when starting the Ollama server.
|
||||
|
||||
## How can I set the quantization type for the K/V cache?
|
||||
|
||||
The K/V context cache can be quantized to significantly reduce memory usage when Flash Attention is enabled.
|
||||
|
||||
To use quantized K/V cache with Ollama you can set the following environment variable:
|
||||
|
||||
- `OLLAMA_KV_CACHE_TYPE` - The quantization type for the K/V cache. Default is `f16`.
|
||||
|
||||
<Note>
|
||||
Currently this is a global option - meaning all models will run with the
|
||||
specified quantization type.
|
||||
</Note>
|
||||
|
||||
The currently available K/V cache quantization types are:
|
||||
|
||||
- `f16` - high precision and memory usage (default).
|
||||
- `q8_0` - 8-bit quantization, uses approximately 1/2 the memory of `f16` with a very small loss in precision, this usually has no noticeable impact on the model's quality (recommended if not using f16).
|
||||
- `q4_0` - 4-bit quantization, uses approximately 1/4 the memory of `f16` with a small-medium loss in precision that may be more noticeable at higher context sizes.
|
||||
|
||||
How much the cache quantization impacts the model's response quality will depend on the model and the task. Models that have a high GQA count (e.g. Qwen2) may see a larger impact on precision from quantization than models with a low GQA count.
|
||||
|
||||
You may need to experiment with different quantization types to find the best balance between memory usage and quality.
|
||||
|
||||
## Where can I find my Ollama Public Key?
|
||||
|
||||
Your **Ollama Public Key** is the public part of the key pair that lets your local Ollama instance talk to [ollama.com](https://ollama.com).
|
||||
|
||||
You'll need it to:
|
||||
* Push models to Ollama
|
||||
* Pull private models from Ollama to your machine
|
||||
* Run models hosted in [Ollama Cloud](https://ollama.com/cloud)
|
||||
|
||||
### How to Add the Key
|
||||
|
||||
* **Sign-in via the Settings page** in the **Mac** and **Windows App**
|
||||
|
||||
* **Sign‑in via CLI**
|
||||
|
||||
```shell
|
||||
ollama signin
|
||||
```
|
||||
|
||||
* **Manually copy & paste** the key on the **Ollama Keys** page:
|
||||
[https://ollama.com/settings/keys](https://ollama.com/settings/keys)
|
||||
|
||||
### Where the Ollama Public Key lives
|
||||
|
||||
| OS | Path to `id_ed25519.pub` |
|
||||
| :- | :- |
|
||||
| macOS | `~/.ollama/id_ed25519.pub` |
|
||||
| Linux | `/usr/share/ollama/.ollama/id_ed25519.pub` |
|
||||
| Windows | `C:\Users\<username>\.ollama\id_ed25519.pub` |
|
||||
|
||||
<Note>
|
||||
Replace <username> with your actual Windows user name.
|
||||
</Note>
|
||||
|
||||
## How can I stop Ollama from starting when I login to my computer?
|
||||
|
||||
Ollama for Windows and macOS register as a login item during installation. You can disable this if you prefer not to have Ollama automatically start. Ollama will respect this setting across upgrades, unless you uninstall the application.
|
||||
|
||||
**Windows**
|
||||
- In `Task Manager` go to the `Startup apps` tab, search for `ollama` then click `Disable`
|
||||
|
||||
**MacOS**
|
||||
- Open `Settings` and search for "Login Items", find the `Ollama` entry under `Allow in the Background`, then click the slider to disable.
|
||||
3
docs/favicon-dark.svg
Normal file
|
After Width: | Height: | Size: 6.7 KiB |
3
docs/favicon.svg
Normal file
|
After Width: | Height: | Size: 6.5 KiB |
176
docs/gpu.mdx
Normal file
@@ -0,0 +1,176 @@
|
||||
---
|
||||
title: Hardware support
|
||||
---
|
||||
|
||||
## Nvidia
|
||||
Ollama supports Nvidia GPUs with compute capability 5.0+ and driver version 531 and newer.
|
||||
|
||||
Check your compute compatibility to see if your card is supported:
|
||||
[https://developer.nvidia.com/cuda-gpus](https://developer.nvidia.com/cuda-gpus)
|
||||
|
||||
| Compute Capability | Family | Cards |
|
||||
| ------------------ | ------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| 12.1 | NVIDIA | `GB10 (DGX Spark)` |
|
||||
| 12.0 | GeForce RTX 50xx | `RTX 5060` `RTX 5060 Ti` `RTX 5070` `RTX 5070 Ti` `RTX 5080` `RTX 5090` |
|
||||
| | NVIDIA Professional | `RTX PRO 4000 Blackwell` `RTX PRO 4500 Blackwell` `RTX PRO 5000 Blackwell` `RTX PRO 6000 Blackwell` |
|
||||
| 9.0 | NVIDIA | `H200` `H100` |
|
||||
| 8.9 | GeForce RTX 40xx | `RTX 4090` `RTX 4080 SUPER` `RTX 4080` `RTX 4070 Ti SUPER` `RTX 4070 Ti` `RTX 4070 SUPER` `RTX 4070` `RTX 4060 Ti` `RTX 4060` |
|
||||
| | NVIDIA Professional | `L4` `L40` `RTX 6000` |
|
||||
| 8.6 | GeForce RTX 30xx | `RTX 3090 Ti` `RTX 3090` `RTX 3080 Ti` `RTX 3080` `RTX 3070 Ti` `RTX 3070` `RTX 3060 Ti` `RTX 3060` `RTX 3050 Ti` `RTX 3050` |
|
||||
| | NVIDIA Professional | `A40` `RTX A6000` `RTX A5000` `RTX A4000` `RTX A3000` `RTX A2000` `A10` `A16` `A2` |
|
||||
| 8.0 | NVIDIA | `A100` `A30` |
|
||||
| 7.5 | GeForce GTX/RTX | `GTX 1650 Ti` `TITAN RTX` `RTX 2080 Ti` `RTX 2080` `RTX 2070` `RTX 2060` |
|
||||
| | NVIDIA Professional | `T4` `RTX 5000` `RTX 4000` `RTX 3000` `T2000` `T1200` `T1000` `T600` `T500` |
|
||||
| | Quadro | `RTX 8000` `RTX 6000` `RTX 5000` `RTX 4000` |
|
||||
| 7.0 | NVIDIA | `TITAN V` `V100` `Quadro GV100` |
|
||||
| 6.1 | NVIDIA TITAN | `TITAN Xp` `TITAN X` |
|
||||
| | GeForce GTX | `GTX 1080 Ti` `GTX 1080` `GTX 1070 Ti` `GTX 1070` `GTX 1060` `GTX 1050 Ti` `GTX 1050` |
|
||||
| | Quadro | `P6000` `P5200` `P4200` `P3200` `P5000` `P4000` `P3000` `P2200` `P2000` `P1000` `P620` `P600` `P500` `P520` |
|
||||
| | Tesla | `P40` `P4` |
|
||||
| 6.0 | NVIDIA | `Tesla P100` `Quadro GP100` |
|
||||
| 5.2 | GeForce GTX | `GTX TITAN X` `GTX 980 Ti` `GTX 980` `GTX 970` `GTX 960` `GTX 950` |
|
||||
| | Quadro | `M6000 24GB` `M6000` `M5000` `M5500M` `M4000` `M2200` `M2000` `M620` |
|
||||
| | Tesla | `M60` `M40` |
|
||||
| 5.0 | GeForce GTX | `GTX 750 Ti` `GTX 750` `NVS 810` |
|
||||
| | Quadro | `K2200` `K1200` `K620` `M1200` `M520` `M5000M` `M4000M` `M3000M` `M2000M` `M1000M` `K620M` `M600M` `M500M` |
|
||||
|
||||
For building locally to support older GPUs, see [developer](./development#linux-cuda-nvidia)
|
||||
|
||||
### GPU Selection
|
||||
|
||||
If you have multiple NVIDIA GPUs in your system and want to limit Ollama to use
|
||||
a subset, you can set `CUDA_VISIBLE_DEVICES` to a comma separated list of GPUs.
|
||||
Numeric IDs may be used, however ordering may vary, so UUIDs are more reliable.
|
||||
You can discover the UUID of your GPUs by running `nvidia-smi -L` If you want to
|
||||
ignore the GPUs and force CPU usage, use an invalid GPU ID (e.g., "-1")
|
||||
|
||||
### Linux Suspend Resume
|
||||
|
||||
On linux, after a suspend/resume cycle, sometimes Ollama will fail to discover
|
||||
your NVIDIA GPU, and fallback to running on the CPU. You can workaround this
|
||||
driver bug by reloading the NVIDIA UVM driver with `sudo rmmod nvidia_uvm &&
|
||||
sudo modprobe nvidia_uvm`
|
||||
|
||||
## AMD Radeon
|
||||
|
||||
Ollama supports the following AMD GPUs via the ROCm library:
|
||||
|
||||
> **NOTE:**
|
||||
> Additional AMD GPU support is provided by the Vulkan Library - see below.
|
||||
|
||||
|
||||
### Linux Support
|
||||
|
||||
Ollama requires the AMD ROCm v7 driver on Linux. You can install or upgrade
|
||||
using the `amdgpu-install` utility from
|
||||
[AMD's ROCm documentation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/).
|
||||
|
||||
| Family | Cards and accelerators |
|
||||
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| AMD Radeon RX | `9070 XT` `9070 GRE` `9070` `9060 XT` `9060 XT LP` `9060` `7900 XTX` `7900 XT` `7900 GRE` `7800 XT` `7700 XT` `7700` `7600 XT` `7600` `6950 XT` `6900 XTX` `6900XT` `6800 XT` `6800` `5700 XT` `5700` `5600 XT` `5500 XT` |
|
||||
| AMD Radeon AI PRO | `R9700` `R9600D` |
|
||||
| AMD Radeon PRO | `W7900` `W7800` `W7700` `W7600` `W7500` `W6900X` `W6800X Duo` `W6800X` `W6800` `V620` |
|
||||
| AMD Ryzen AI | `Ryzen AI Max+ 395` `Ryzen AI Max 390` `Ryzen AI Max 385` `Ryzen AI 9 HX 475` `Ryzen AI 9 HX 470` `Ryzen AI 9 465` `Ryzen AI 9 HX 375` `Ryzen AI 9 HX 370` `Ryzen AI 9 365` |
|
||||
| AMD Instinct | `MI350X` `MI300X` `MI300A` `MI250X` `MI250` `MI210` `MI100` |
|
||||
|
||||
### Windows Support
|
||||
|
||||
With ROCm v6.1, the following GPUs are supported on Windows.
|
||||
|
||||
| Family | Cards and accelerators |
|
||||
| -------------- | -------------------------------------------------------------------------------------------------------------------- |
|
||||
| AMD Radeon RX | `7900 XTX` `7900 XT` `7900 GRE` `7800 XT` `7700 XT` `7600 XT` `7600` `6950 XT` `6900 XTX` `6900XT` `6800 XT` `6800` |
|
||||
| AMD Radeon PRO | `W7900` `W7800` `W7700` `W7600` `W7500` `W6900X` `W6800X Duo` `W6800X` `W6800` `V620` |
|
||||
|
||||
### Overrides on Linux
|
||||
|
||||
Ollama leverages the AMD ROCm library, which does not support all AMD GPUs. In
|
||||
some cases you can force the system to try to use a similar LLVM target that is
|
||||
close. For example The Radeon RX 5400 is `gfx1034` (also known as 10.3.4)
|
||||
however, ROCm does not currently support this target. The closest support is
|
||||
`gfx1030`. You can use the environment variable `HSA_OVERRIDE_GFX_VERSION` with
|
||||
`x.y.z` syntax. So for example, to force the system to run on the RX 5400, you
|
||||
would set `HSA_OVERRIDE_GFX_VERSION="10.3.0"` as an environment variable for the
|
||||
server. If you have an unsupported AMD GPU you can experiment using the list of
|
||||
supported types below.
|
||||
|
||||
If you have multiple GPUs with different GFX versions, append the numeric device
|
||||
number to the environment variable to set them individually. For example,
|
||||
`HSA_OVERRIDE_GFX_VERSION_0=10.3.0` and `HSA_OVERRIDE_GFX_VERSION_1=11.0.0`
|
||||
|
||||
At this time, the known supported GPU types on linux are the following LLVM Targets.
|
||||
This table shows some example GPUs that map to these LLVM targets:
|
||||
| **LLVM Target** | **An Example GPU** |
|
||||
|-----------------|---------------------|
|
||||
| gfx908 | Radeon Instinct MI100 |
|
||||
| gfx90a | Radeon Instinct MI210/MI250 |
|
||||
| gfx942 | Radeon Instinct MI300X/MI300A |
|
||||
| gfx950 | Radeon Instinct MI350X |
|
||||
| gfx1010 | Radeon RX 5700 XT |
|
||||
| gfx1012 | Radeon RX 5500 XT |
|
||||
| gfx1030 | Radeon PRO V620 |
|
||||
| gfx1100 | Radeon PRO W7900 |
|
||||
| gfx1101 | Radeon PRO W7700 |
|
||||
| gfx1102 | Radeon RX 7600 |
|
||||
| gfx1103 | Radeon 780M |
|
||||
| gfx1150 | Ryzen AI 9 HX 375 |
|
||||
| gfx1151 | Ryzen AI Max+ 395 |
|
||||
| gfx1200 | Radeon RX 9070 |
|
||||
| gfx1201 | Radeon RX 9070 XT |
|
||||
|
||||
Reach out on [Discord](https://discord.gg/ollama) or file an
|
||||
[issue](https://github.com/ollama/ollama/issues) for additional help.
|
||||
|
||||
### GPU Selection
|
||||
|
||||
If you have multiple AMD GPUs in your system and want to limit Ollama to use a
|
||||
subset, you can set `ROCR_VISIBLE_DEVICES` to a comma separated list of GPUs.
|
||||
You can see the list of devices with `rocminfo`. If you want to ignore the GPUs
|
||||
and force CPU usage, use an invalid GPU ID (e.g., "-1"). When available, use the
|
||||
`Uuid` to uniquely identify the device instead of numeric value.
|
||||
|
||||
### Container Permission
|
||||
|
||||
In some Linux distributions, SELinux can prevent containers from
|
||||
accessing the AMD GPU devices. On the host system you can run
|
||||
`sudo setsebool container_use_devices=1` to allow containers to use devices.
|
||||
|
||||
## Metal (Apple GPUs)
|
||||
|
||||
Ollama supports GPU acceleration on Apple devices via the Metal API.
|
||||
|
||||
|
||||
## Vulkan GPU Support
|
||||
|
||||
> **NOTE:**
|
||||
> Vulkan is currently an Experimental feature. To enable, you must set OLLAMA_VULKAN=1 for the Ollama server as
|
||||
described in the [FAQ](faq#how-do-i-configure-ollama-server)
|
||||
|
||||
Additional GPU support on Windows and Linux is provided via
|
||||
[Vulkan](https://www.vulkan.org/). On Windows most GPU vendors drivers come
|
||||
bundled with Vulkan support and require no additional setup steps. Most Linux
|
||||
distributions require installing additional components, and you may have
|
||||
multiple options for Vulkan drivers between Mesa and GPU Vendor specific packages
|
||||
|
||||
- Linux Intel GPU Instructions - https://dgpu-docs.intel.com/driver/client/overview.html
|
||||
- Linux AMD GPU Instructions - https://amdgpu-install.readthedocs.io/en/latest/install-script.html#specifying-a-vulkan-implementation
|
||||
|
||||
For AMD GPUs on some Linux distributions, you may need to add the `ollama` user to the `render` group.
|
||||
|
||||
The Ollama scheduler leverages available VRAM data reported by the GPU libraries to
|
||||
make optimal scheduling decisions. Vulkan requires additional capabilities or
|
||||
running as root to expose this available VRAM data. If neither root access or this
|
||||
capability are granted, Ollama will use approximate sizes of the models
|
||||
to make best effort scheduling decisions.
|
||||
|
||||
```bash
|
||||
sudo setcap cap_perfmon+ep /usr/local/bin/ollama
|
||||
```
|
||||
|
||||
### GPU Selection
|
||||
|
||||
To select specific Vulkan GPU(s), you can set the environment variable
|
||||
`GGML_VK_VISIBLE_DEVICES` to one or more numeric IDs on the Ollama server as
|
||||
described in the [FAQ](faq#how-do-i-configure-ollama-server). If you
|
||||
encounter any problems with Vulkan based GPUs, you can disable all Vulkan GPUs
|
||||
by setting `GGML_VK_VISIBLE_DEVICES=-1`
|
||||
BIN
docs/images/claude-code-kimi-k2-6.png
Normal file
|
After Width: | Height: | Size: 307 KiB |
BIN
docs/images/claude-cowork-kimi-k2-6.png
Normal file
|
After Width: | Height: | Size: 297 KiB |
BIN
docs/images/cline-mcp.png
Normal file
|
After Width: | Height: | Size: 556 KiB |
BIN
docs/images/cline-settings.png
Normal file
|
After Width: | Height: | Size: 76 KiB |
BIN
docs/images/codex-app-annotate.png
Normal file
|
After Width: | Height: | Size: 491 KiB |
BIN
docs/images/codex-app-home.png
Normal file
|
After Width: | Height: | Size: 610 KiB |
BIN
docs/images/codex-app-review.png
Normal file
|
After Width: | Height: | Size: 593 KiB |
BIN
docs/images/codex-mcp.png
Normal file
|
After Width: | Height: | Size: 948 KiB |
BIN
docs/images/favicon.png
Normal file
|
After Width: | Height: | Size: 890 B |
BIN
docs/images/goose-cli.png
Normal file
|
After Width: | Height: | Size: 160 KiB |
BIN
docs/images/goose-mcp-1.png
Normal file
|
After Width: | Height: | Size: 877 KiB |
BIN
docs/images/goose-mcp-2.png
Normal file
|
After Width: | Height: | Size: 911 KiB |
BIN
docs/images/goose-settings.png
Normal file
|
After Width: | Height: | Size: 109 KiB |
BIN
docs/images/hermes.png
Normal file
|
After Width: | Height: | Size: 1.4 MiB |
BIN
docs/images/intellij-chat-sidebar.png
Normal file
|
After Width: | Height: | Size: 69 KiB |
BIN
docs/images/intellij-current-model.png
Normal file
|
After Width: | Height: | Size: 106 KiB |
BIN
docs/images/intellij-local-models.png
Normal file
|
After Width: | Height: | Size: 79 KiB |
BIN
docs/images/local.png
Normal file
|
After Width: | Height: | Size: 29 KiB |
BIN
docs/images/logo-dark.png
Normal file
|
After Width: | Height: | Size: 3.3 KiB |
BIN
docs/images/logo.png
Normal file
|
After Width: | Height: | Size: 2.7 KiB |
BIN
docs/images/marimo-add-model.png
Normal file
|
After Width: | Height: | Size: 174 KiB |
BIN
docs/images/marimo-chat.png
Normal file
|
After Width: | Height: | Size: 80 KiB |
BIN
docs/images/marimo-code-completion.png
Normal file
|
After Width: | Height: | Size: 230 KiB |
BIN
docs/images/marimo-models.png
Normal file
|
After Width: | Height: | Size: 178 KiB |
BIN
docs/images/marimo-settings.png
Normal file
|
After Width: | Height: | Size: 186 KiB |
BIN
docs/images/n8n-chat-model.png
Normal file
|
After Width: | Height: | Size: 87 KiB |
BIN
docs/images/n8n-chat-node.png
Normal file
|
After Width: | Height: | Size: 70 KiB |
BIN
docs/images/n8n-credential-creation.png
Normal file
|
After Width: | Height: | Size: 43 KiB |
BIN
docs/images/n8n-models.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/images/n8n-ollama-form.png
Normal file
|
After Width: | Height: | Size: 53 KiB |
BIN
docs/images/ollama-keys.png
Normal file
|
After Width: | Height: | Size: 150 KiB |
BIN
docs/images/ollama-settings.png
Normal file
|
After Width: | Height: | Size: 3.6 MiB |
BIN
docs/images/onyx-login.png
Normal file
|
After Width: | Height: | Size: 100 KiB |
BIN
docs/images/onyx-ollama-form.png
Normal file
|
After Width: | Height: | Size: 306 KiB |
BIN
docs/images/onyx-ollama-llm.png
Normal file
|
After Width: | Height: | Size: 300 KiB |
BIN
docs/images/onyx-query.png
Normal file
|
After Width: | Height: | Size: 211 KiB |
BIN
docs/images/signup.png
Normal file
|
After Width: | Height: | Size: 80 KiB |
BIN
docs/images/vscode-add-ollama.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
docs/images/vscode-other-models.png
Normal file
|
After Width: | Height: | Size: 52 KiB |
BIN
docs/images/vscode-sidebar.png
Normal file
|
After Width: | Height: | Size: 25 KiB |
BIN
docs/images/vscode-unhide.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
docs/images/vscode.png
Normal file
|
After Width: | Height: | Size: 2.7 MiB |
BIN
docs/images/welcome.png
Normal file
|
After Width: | Height: | Size: 233 KiB |
BIN
docs/images/xcode-chat-icon.png
Normal file
|
After Width: | Height: | Size: 186 KiB |
BIN
docs/images/xcode-intelligence-window.png
Normal file
|
After Width: | Height: | Size: 182 KiB |
BIN
docs/images/xcode-locally-hosted.png
Normal file
|
After Width: | Height: | Size: 146 KiB |
BIN
docs/images/zed-ollama-dropdown.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
docs/images/zed-settings.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
174
docs/import.mdx
Normal file
@@ -0,0 +1,174 @@
|
||||
---
|
||||
title: Importing a Model
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Importing a Safetensors adapter](#Importing-a-fine-tuned-adapter-from-Safetensors-weights)
|
||||
- [Importing a Safetensors model](#Importing-a-model-from-Safetensors-weights)
|
||||
- [Importing a GGUF file](#Importing-a-GGUF-based-model-or-adapter)
|
||||
- [Sharing models on ollama.com](#Sharing-your-model-on-ollamacom)
|
||||
|
||||
## Importing a fine tuned adapter from Safetensors weights
|
||||
|
||||
First, create a `Modelfile` with a `FROM` command pointing at the base model you used for fine tuning, and an `ADAPTER` command which points to the directory with your Safetensors adapter:
|
||||
|
||||
```dockerfile
|
||||
FROM <base model name>
|
||||
ADAPTER /path/to/safetensors/adapter/directory
|
||||
```
|
||||
|
||||
Make sure that you use the same base model in the `FROM` command as you used to create the adapter otherwise you will get erratic results. Most frameworks use different quantization methods, so it's best to use non-quantized (i.e. non-QLoRA) adapters. If your adapter is in the same directory as your `Modelfile`, use `ADAPTER .` to specify the adapter path.
|
||||
|
||||
Now run `ollama create` from the directory where the `Modelfile` was created:
|
||||
|
||||
```shell
|
||||
ollama create my-model
|
||||
```
|
||||
|
||||
Lastly, test the model:
|
||||
|
||||
```shell
|
||||
ollama run my-model
|
||||
```
|
||||
|
||||
Ollama supports importing adapters based on several different model architectures including:
|
||||
|
||||
- Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2);
|
||||
- Mistral (including Mistral 1, Mistral 2, and Mixtral); and
|
||||
- Gemma (including Gemma 1 and Gemma 2)
|
||||
|
||||
You can create the adapter using a fine tuning framework or tool which can output adapters in the Safetensors format, such as:
|
||||
|
||||
- Hugging Face [fine tuning framework](https://huggingface.co/docs/transformers/en/training)
|
||||
- [Unsloth](https://github.com/unslothai/unsloth)
|
||||
- [MLX](https://github.com/ml-explore/mlx)
|
||||
|
||||
## Importing a model from Safetensors weights
|
||||
|
||||
First, create a `Modelfile` with a `FROM` command which points to the directory containing your Safetensors weights:
|
||||
|
||||
```dockerfile
|
||||
FROM /path/to/safetensors/directory
|
||||
```
|
||||
|
||||
If you create the Modelfile in the same directory as the weights, you can use the command `FROM .`.
|
||||
|
||||
Now run the `ollama create` command from the directory where you created the `Modelfile`:
|
||||
|
||||
```shell
|
||||
ollama create my-model
|
||||
```
|
||||
|
||||
Lastly, test the model:
|
||||
|
||||
```shell
|
||||
ollama run my-model
|
||||
```
|
||||
|
||||
Ollama supports importing models for several different architectures including:
|
||||
|
||||
- Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2);
|
||||
- Mistral (including Mistral 1, Mistral 2, and Mixtral);
|
||||
- Gemma (including Gemma 1 and Gemma 2); and
|
||||
- Phi3
|
||||
|
||||
This includes importing foundation models as well as any fine tuned models which have been _fused_ with a foundation model.
|
||||
|
||||
## Importing a GGUF based model or adapter
|
||||
|
||||
If you have a GGUF based model or adapter it is possible to import it into Ollama. You can obtain a GGUF model or adapter by:
|
||||
|
||||
- converting a Safetensors model with the `convert_hf_to_gguf.py` from Llama.cpp;
|
||||
- converting a Safetensors adapter with the `convert_lora_to_gguf.py` from Llama.cpp; or
|
||||
- downloading a model or adapter from a place such as HuggingFace
|
||||
|
||||
To import a GGUF model, create a `Modelfile` containing:
|
||||
|
||||
```dockerfile
|
||||
FROM /path/to/file.gguf
|
||||
```
|
||||
|
||||
For a GGUF adapter, create the `Modelfile` with:
|
||||
|
||||
```dockerfile
|
||||
FROM <model name>
|
||||
ADAPTER /path/to/file.gguf
|
||||
```
|
||||
|
||||
When importing a GGUF adapter, it's important to use the same base model as the base model that the adapter was created with. You can use:
|
||||
|
||||
- a model from Ollama
|
||||
- a GGUF file
|
||||
- a Safetensors based model
|
||||
|
||||
Once you have created your `Modelfile`, use the `ollama create` command to build the model.
|
||||
|
||||
```shell
|
||||
ollama create my-model
|
||||
```
|
||||
|
||||
## Quantizing a Model
|
||||
|
||||
Quantizing a model allows you to run models faster and with less memory consumption but at reduced accuracy. This allows you to run a model on more modest hardware.
|
||||
|
||||
Ollama can quantize FP16 and FP32 based models into different quantization levels using the `-q/--quantize` flag with the `ollama create` command.
|
||||
|
||||
First, create a Modelfile with the FP16 or FP32 based model you wish to quantize.
|
||||
|
||||
```dockerfile
|
||||
FROM /path/to/my/gemma/f16/model
|
||||
```
|
||||
|
||||
Use `ollama create` to then create the quantized model.
|
||||
|
||||
```shell
|
||||
$ ollama create --quantize q4_K_M mymodel
|
||||
transferring model data
|
||||
quantizing F16 model to Q4_K_M
|
||||
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
|
||||
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
|
||||
writing manifest
|
||||
success
|
||||
```
|
||||
|
||||
### Supported Quantizations
|
||||
|
||||
- `q8_0`
|
||||
|
||||
#### K-means Quantizations
|
||||
|
||||
- `q4_K_S`
|
||||
- `q4_K_M`
|
||||
|
||||
## Sharing your model on ollama.com
|
||||
|
||||
You can share any model you have created by pushing it to [ollama.com](https://ollama.com) so that other users can try it out.
|
||||
|
||||
First, use your browser to go to the [Ollama Sign-Up](https://ollama.com/signup) page. If you already have an account, you can skip this step.
|
||||
|
||||
<img src="images/signup.png" alt="Sign-Up" width="40%" />
|
||||
|
||||
The `Username` field will be used as part of your model's name (e.g. `jmorganca/mymodel`), so make sure you are comfortable with the username that you have selected.
|
||||
|
||||
Now that you have created an account and are signed-in, go to the [Ollama Keys Settings](https://ollama.com/settings/keys) page.
|
||||
|
||||
Follow the directions on the page to determine where your Ollama Public Key is located.
|
||||
|
||||
<img src="images/ollama-keys.png" alt="Ollama Keys" width="80%" />
|
||||
|
||||
Click on the `Add Ollama Public Key` button, and copy and paste the contents of your Ollama Public Key into the text field.
|
||||
|
||||
To push a model to [ollama.com](https://ollama.com), first make sure that it is named correctly with your username. You may have to use the `ollama cp` command to copy
|
||||
your model to give it the correct name. Once you're happy with your model's name, use the `ollama push` command to push it to [ollama.com](https://ollama.com).
|
||||
|
||||
```shell
|
||||
ollama cp mymodel myuser/mymodel
|
||||
ollama push myuser/mymodel
|
||||
```
|
||||
|
||||
Once your model has been pushed, other users can pull and run it by using the command:
|
||||
|
||||
```shell
|
||||
ollama run myuser/mymodel
|
||||
```
|
||||
58
docs/index.mdx
Normal file
@@ -0,0 +1,58 @@
|
||||
---
|
||||
title: Ollama's documentation
|
||||
sidebarTitle: Welcome
|
||||
---
|
||||
|
||||
<img src="/images/welcome.png" noZoom className="rounded-3xl" />
|
||||
|
||||
[Ollama](https://ollama.com) is the easiest way to get up and running with large language models such as gpt-oss, Gemma 3, DeepSeek-R1, Qwen3 and more.
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Quickstart" icon="rocket" href="/quickstart">
|
||||
Get up and running with your first model or integrate Ollama with your favorite tools
|
||||
</Card>
|
||||
<Card
|
||||
title="Download Ollama"
|
||||
icon="download"
|
||||
href="https://ollama.com/download"
|
||||
>
|
||||
Download Ollama on macOS, Windows or Linux
|
||||
</Card>
|
||||
<Card title="Cloud" icon="cloud" href="/cloud">
|
||||
Ollama's cloud models offer larger models with better performance.
|
||||
</Card>
|
||||
<Card title="API reference" icon="terminal" href="/api">
|
||||
View Ollama's API reference
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Libraries
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card
|
||||
title="Ollama's Python Library"
|
||||
icon="python"
|
||||
href="https://github.com/ollama/ollama-python"
|
||||
>
|
||||
The official library for using Ollama with Python
|
||||
</Card>
|
||||
|
||||
<Card title="Ollama's JavaScript library" icon="js" href="https://github.com/ollama/ollama-js">
|
||||
The official library for using Ollama with JavaScript or TypeScript.
|
||||
</Card>
|
||||
<Card title="Community libraries" icon="github" href="https://github.com/ollama/ollama?tab=readme-ov-file#libraries-1">
|
||||
View a list of 20+ community-supported libraries for Ollama
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Community
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Discord" icon="discord" href="https://discord.gg/ollama">
|
||||
Join our Discord community
|
||||
</Card>
|
||||
|
||||
<Card title="Reddit" icon="reddit" href="https://reddit.com/r/ollama">
|
||||
Join our Reddit community
|
||||
</Card>
|
||||
</CardGroup>
|
||||
136
docs/integrations/claude-code.mdx
Normal file
@@ -0,0 +1,136 @@
|
||||
---
|
||||
title: Claude Code
|
||||
---
|
||||
|
||||
Claude Code is Anthropic's agentic coding tool that can read, modify, and execute code in your working directory.
|
||||
|
||||
Open models can be used with Claude Code through Ollama's Anthropic-compatible API, enabling you to use models such as `qwen3.5`, `glm-5:cloud`, `kimi-k2.5:cloud`.
|
||||
|
||||

|
||||
|
||||
## Install
|
||||
|
||||
Install [Claude Code](https://code.claude.com/docs/en/overview):
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```shell macOS / Linux
|
||||
curl -fsSL https://claude.ai/install.sh | bash
|
||||
```
|
||||
|
||||
```powershell Windows
|
||||
irm https://claude.ai/install.ps1 | iex
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
### Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch claude
|
||||
```
|
||||
|
||||
### Run directly with a model
|
||||
```shell
|
||||
ollama launch claude --model kimi-k2.5:cloud
|
||||
```
|
||||
|
||||
## Recommended Models
|
||||
|
||||
- `kimi-k2.5:cloud`
|
||||
- `glm-5:cloud`
|
||||
- `minimax-m2.7:cloud`
|
||||
- `qwen3.5:cloud`
|
||||
- `glm-4.7-flash`
|
||||
- `qwen3.5`
|
||||
|
||||
Cloud models are also available at [ollama.com/search?c=cloud](https://ollama.com/search?c=cloud).
|
||||
|
||||
## Non-interactive (headless) mode
|
||||
|
||||
Run Claude Code without interaction for use in Docker, CI/CD, or scripts:
|
||||
|
||||
```shell
|
||||
ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"
|
||||
```
|
||||
|
||||
The `--yes` flag auto-pulls the model, skips selectors, and requires `--model` to be specified. Arguments after `--` are passed directly to Claude Code.
|
||||
|
||||
## Web search
|
||||
|
||||
Claude Code can search the web through Ollama's web search API. See the [web search documentation](/capabilities/web-search) for setup and usage.
|
||||
|
||||
## Scheduled Tasks with `/loop`
|
||||
|
||||
The `/loop` command runs a prompt or slash command on a recurring schedule inside Claude Code. This is useful for automating repetitive tasks like checking PRs, running research, or setting reminders.
|
||||
|
||||
```
|
||||
/loop <interval> <prompt or /command>
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
**Check in on your PRs**
|
||||
|
||||
```
|
||||
/loop 30m Check my open PRs and summarize their status
|
||||
```
|
||||
|
||||
**Automate research tasks**
|
||||
|
||||
```
|
||||
/loop 1h Research the latest AI news and summarize key developments
|
||||
```
|
||||
|
||||
**Automate bug reporting and triaging**
|
||||
|
||||
```
|
||||
/loop 15m Check for new GitHub issues and triage by priority
|
||||
```
|
||||
|
||||
**Set reminders**
|
||||
|
||||
```
|
||||
/loop 1h Remind me to review the deploy status
|
||||
```
|
||||
|
||||
## Telegram
|
||||
|
||||
Chat with Claude Code from Telegram by connecting a bot to your session. Install the [Telegram plugin](https://github.com/anthropics/claude-plugins-official), create a bot via [@BotFather](https://t.me/BotFather), then launch with the channel flag:
|
||||
|
||||
```shell
|
||||
ollama launch claude -- --channels plugin:telegram@claude-plugins-official
|
||||
```
|
||||
|
||||
Claude Code will prompt for permission on most actions. To allow the bot to work autonomously, configure [permission rules](https://code.claude.com/docs/en/permissions) or pass `--dangerously-skip-permissions` in isolated environments.
|
||||
|
||||
See the [plugin README](https://github.com/anthropics/claude-plugins-official/tree/main/external_plugins/telegram) for full setup instructions including pairing and access control.
|
||||
|
||||
## Manual setup
|
||||
|
||||
Claude Code connects to Ollama using the Anthropic-compatible API.
|
||||
|
||||
1. Set the environment variables:
|
||||
|
||||
```shell
|
||||
export ANTHROPIC_AUTH_TOKEN=ollama
|
||||
export ANTHROPIC_API_KEY=""
|
||||
export ANTHROPIC_BASE_URL=http://localhost:11434
|
||||
```
|
||||
|
||||
2. Run Claude Code with an Ollama model:
|
||||
|
||||
```shell
|
||||
claude --model qwen3.5
|
||||
```
|
||||
|
||||
Or run with environment variables inline:
|
||||
|
||||
```shell
|
||||
ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_API_KEY="" claude --model glm-5:cloud
|
||||
```
|
||||
|
||||
**Note:** Claude Code requires a large context window. We recommend at least 64k tokens. See the [context length documentation](/context-length) for how to adjust context length in Ollama.
|
||||
|
||||
13
docs/integrations/claude-desktop.mdx
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
title: Claude Desktop
|
||||
---
|
||||
|
||||
Claude Desktop is no longer supported by `ollama launch`.
|
||||
|
||||
Existing installations can be restored to the usual Claude profile:
|
||||
|
||||
```shell
|
||||
ollama launch claude-desktop --restore
|
||||
```
|
||||
|
||||
Use [Claude Code](/integrations/claude-code) for Anthropic-compatible coding workflows with Ollama.
|
||||
38
docs/integrations/cline.mdx
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Cline
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [Cline](https://docs.cline.bot/getting-started/installing-cline) in your IDE.
|
||||
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. Open Cline settings > `API Configuration` and set `API Provider` to `Ollama`
|
||||
2. Select a model under `Model` or type one (e.g. `qwen3`)
|
||||
3. Update the context window to at least 32K tokens under `Context Window`
|
||||
|
||||
<Note>Coding tools require a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/cline-settings.png"
|
||||
alt="Cline settings configuration showing API Provider set to Ollama"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
## Connecting to ollama.com
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
|
||||
2. Click on `Use custom base URL` and set it to `https://ollama.com`
|
||||
3. Enter your **Ollama API Key**
|
||||
4. Select a model from the list
|
||||
|
||||
|
||||
### Recommended Models
|
||||
|
||||
- `qwen3-coder:480b`
|
||||
- `deepseek-v3.1:671b`
|
||||
82
docs/integrations/codex-app.mdx
Normal file
@@ -0,0 +1,82 @@
|
||||
---
|
||||
title: Codex App
|
||||
---
|
||||
|
||||
Codex App is OpenAI's desktop coding agent for macOS and Windows. Ollama configures the app to use Ollama's OpenAI-compatible endpoint, so Codex can work with local models and Ollama Cloud models in the desktop app.
|
||||
|
||||
<img
|
||||
src="/images/codex-app-home.png"
|
||||
alt="Codex App with Ollama selected"
|
||||
style={{ borderRadius: "12px" }}
|
||||
/>
|
||||
|
||||
## Install
|
||||
|
||||
Install the [Codex App](https://developers.openai.com/codex/quickstart/) for macOS or Windows.
|
||||
|
||||
<Note>Codex App support is available in Ollama v0.24.0 and newer.</Note>
|
||||
|
||||
|
||||
## Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch codex-app
|
||||
```
|
||||
|
||||
Once Codex App opens, start a task or open a repository as usual.
|
||||
|
||||
## Built-in browser
|
||||
|
||||
Codex App can open local servers and sites in its built-in browser. Annotate directly on the page to request changes.
|
||||
|
||||
<img
|
||||
src="/images/codex-app-annotate.png"
|
||||
alt="Codex App browser annotations"
|
||||
style={{ borderRadius: "12px" }}
|
||||
/>
|
||||
|
||||
## Review mode
|
||||
|
||||
Use review mode to inspect code changes, leave comments, and iterate on fixes without leaving the app.
|
||||
|
||||
<img
|
||||
src="/images/codex-app-review.png"
|
||||
alt="Codex App review comments"
|
||||
style={{ borderRadius: "12px" }}
|
||||
/>
|
||||
|
||||
### Run directly with a model
|
||||
|
||||
```shell
|
||||
ollama launch codex-app --model kimi-k2.6:cloud
|
||||
```
|
||||
|
||||
Use a local model by passing its model name:
|
||||
|
||||
```shell
|
||||
ollama launch codex-app --model gemma4:31b
|
||||
```
|
||||
|
||||
Running `ollama launch codex-app` is persistent and will have your model selected next time you open Codex.
|
||||
|
||||
|
||||
### Restore Codex App
|
||||
|
||||
To switch Codex App back to the profile you were using before `ollama launch codex-app`, run:
|
||||
|
||||
```shell
|
||||
ollama launch codex-app --restore
|
||||
```
|
||||
|
||||
Ollama restores Codex App's settings and configs. If Codex App is open, Ollama asks before restarting it.
|
||||
|
||||
|
||||
The Codex CLI profile managed by `ollama launch codex` is left separate from the Codex App profile.
|
||||
|
||||
Before overwriting Codex App config files, Ollama Launch saves backups under `~/.ollama/backup/codex-app/`. On Windows, `~` resolves to your user profile directory.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If Codex App does not open after setup, open Codex manually once and run `ollama launch codex-app` again.
|
||||
|
||||
If Codex App is already running and does not switch models, allow Ollama to restart it when prompted, or quit Codex App and run `ollama launch codex-app` again.
|
||||
76
docs/integrations/codex.mdx
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
title: Codex CLI
|
||||
---
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
Install the [Codex CLI](https://developers.openai.com/codex/cli/). For the desktop app, see [Codex App](/integrations/codex-app).
|
||||
|
||||
```
|
||||
npm install -g @openai/codex
|
||||
```
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
<Note>Codex requires a larger context window. It is recommended to use a context window of at least 64k tokens.</Note>
|
||||
|
||||
### Quick setup
|
||||
|
||||
```
|
||||
ollama launch codex
|
||||
```
|
||||
|
||||
When launched through `ollama launch codex`, Ollama refreshes the model catalog
|
||||
and passes it to Codex for that session.
|
||||
|
||||
To configure without launching:
|
||||
|
||||
```shell
|
||||
ollama launch codex --config
|
||||
```
|
||||
|
||||
### Manual setup
|
||||
|
||||
To use `codex` with Ollama, use the `--oss` flag:
|
||||
|
||||
```
|
||||
codex --oss
|
||||
```
|
||||
|
||||
To use a specific model, pass the `-m` flag:
|
||||
|
||||
```
|
||||
codex --oss -m gpt-oss:120b
|
||||
```
|
||||
|
||||
To use a cloud model:
|
||||
|
||||
```
|
||||
codex --oss -m gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
### Profile-based setup
|
||||
|
||||
For a persistent configuration, add an Ollama provider and profiles to `~/.codex/config.toml`:
|
||||
|
||||
```toml
|
||||
[model_providers.ollama-launch]
|
||||
name = "Ollama"
|
||||
base_url = "http://localhost:11434/v1"
|
||||
|
||||
[profiles.ollama-launch]
|
||||
model = "gpt-oss:120b"
|
||||
model_provider = "ollama-launch"
|
||||
|
||||
[profiles.ollama-cloud]
|
||||
model = "gpt-oss:120b-cloud"
|
||||
model_provider = "ollama-launch"
|
||||
```
|
||||
|
||||
Then run:
|
||||
|
||||
```
|
||||
codex --profile ollama-launch
|
||||
codex --profile ollama-cloud
|
||||
```
|
||||
93
docs/integrations/copilot-cli.mdx
Normal file
@@ -0,0 +1,93 @@
|
||||
---
|
||||
title: Copilot CLI
|
||||
---
|
||||
|
||||
GitHub Copilot CLI is GitHub's AI coding agent for the terminal. It can understand your codebase, make edits, run commands, and help you build software faster.
|
||||
|
||||
Open models can be used with Copilot CLI through Ollama, enabling you to use models such as `qwen3.5`, `glm-5.1:cloud`, `kimi-k2.5:cloud`.
|
||||
|
||||
## Install
|
||||
|
||||
Install [Copilot CLI](https://github.com/features/copilot/cli/):
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```shell macOS / Linux (Homebrew)
|
||||
brew install copilot-cli
|
||||
```
|
||||
|
||||
```shell npm (all platforms)
|
||||
npm install -g @github/copilot
|
||||
```
|
||||
|
||||
```shell macOS / Linux (script)
|
||||
curl -fsSL https://gh.io/copilot-install | bash
|
||||
```
|
||||
|
||||
```powershell Windows (WinGet)
|
||||
winget install GitHub.Copilot
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
### Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch copilot
|
||||
```
|
||||
|
||||
### Run directly with a model
|
||||
|
||||
```shell
|
||||
ollama launch copilot --model kimi-k2.5:cloud
|
||||
```
|
||||
|
||||
## Recommended Models
|
||||
|
||||
- `kimi-k2.5:cloud`
|
||||
- `glm-5:cloud`
|
||||
- `minimax-m2.7:cloud`
|
||||
- `qwen3.5:cloud`
|
||||
- `glm-4.7-flash`
|
||||
- `qwen3.5`
|
||||
|
||||
Cloud models are also available at [ollama.com/search?c=cloud](https://ollama.com/search?c=cloud).
|
||||
|
||||
## Non-interactive (headless) mode
|
||||
|
||||
Run Copilot CLI without interaction for use in Docker, CI/CD, or scripts:
|
||||
|
||||
```shell
|
||||
ollama launch copilot --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"
|
||||
```
|
||||
|
||||
The `--yes` flag auto-pulls the model, skips selectors, and requires `--model` to be specified. Arguments after `--` are passed directly to Copilot CLI.
|
||||
|
||||
## Manual setup
|
||||
|
||||
Copilot CLI connects to Ollama using the OpenAI-compatible API via environment variables.
|
||||
|
||||
1. Set the environment variables:
|
||||
|
||||
```shell
|
||||
export COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1
|
||||
export COPILOT_PROVIDER_API_KEY=
|
||||
export COPILOT_PROVIDER_WIRE_API=responses
|
||||
export COPILOT_MODEL=qwen3.5
|
||||
```
|
||||
|
||||
1. Run Copilot CLI:
|
||||
|
||||
```shell
|
||||
copilot
|
||||
```
|
||||
|
||||
Or run with environment variables inline:
|
||||
|
||||
```shell
|
||||
COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1 COPILOT_PROVIDER_API_KEY= COPILOT_PROVIDER_WIRE_API=responses COPILOT_MODEL=glm-5:cloud copilot
|
||||
```
|
||||
|
||||
**Note:** Copilot requires a large context window. We recommend at least 64k tokens. See the [context length documentation](/context-length) for how to adjust context length in Ollama.
|
||||
90
docs/integrations/droid.mdx
Normal file
@@ -0,0 +1,90 @@
|
||||
---
|
||||
title: Droid
|
||||
---
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
Install the [Droid CLI](https://factory.ai/):
|
||||
|
||||
```bash
|
||||
curl -fsSL https://app.factory.ai/cli | sh
|
||||
```
|
||||
|
||||
<Note>Droid requires a larger context window. It is recommended to use a context window of at least 64k tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
### Quick setup
|
||||
|
||||
```bash
|
||||
ollama launch droid
|
||||
```
|
||||
|
||||
To configure without launching:
|
||||
|
||||
```shell
|
||||
ollama launch droid --config
|
||||
```
|
||||
|
||||
### Manual setup
|
||||
|
||||
Add a local configuration block to `~/.factory/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_models": [
|
||||
{
|
||||
"model_display_name": "qwen3-coder [Ollama]",
|
||||
"model": "qwen3-coder",
|
||||
"base_url": "http://localhost:11434/v1/",
|
||||
"api_key": "not-needed",
|
||||
"provider": "generic-chat-completion-api",
|
||||
"max_tokens": 32000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Cloud Models
|
||||
`qwen3-coder:480b-cloud` is the recommended model for use with Droid.
|
||||
|
||||
Add the cloud configuration block to `~/.factory/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_models": [
|
||||
{
|
||||
"model_display_name": "qwen3-coder [Ollama Cloud]",
|
||||
"model": "qwen3-coder:480b-cloud",
|
||||
"base_url": "http://localhost:11434/v1/",
|
||||
"api_key": "not-needed",
|
||||
"provider": "generic-chat-completion-api",
|
||||
"max_tokens": 128000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com and export it as `OLLAMA_API_KEY`.
|
||||
2. Add the cloud configuration block to `~/.factory/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_models": [
|
||||
{
|
||||
"model_display_name": "qwen3-coder [Ollama Cloud]",
|
||||
"model": "qwen3-coder:480b",
|
||||
"base_url": "https://ollama.com/v1/",
|
||||
"api_key": "OLLAMA_API_KEY",
|
||||
"provider": "generic-chat-completion-api",
|
||||
"max_tokens": 128000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Run `droid` in a new terminal to load the new settings.
|
||||
49
docs/integrations/goose.mdx
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: Goose
|
||||
---
|
||||
|
||||
## Goose Desktop
|
||||
|
||||
Install [Goose](https://block.github.io/goose/docs/getting-started/installation/) Desktop.
|
||||
|
||||
### Usage with Ollama
|
||||
1. In Goose, open **Settings** → **Configure Provider**.
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/goose-settings.png"
|
||||
alt="Goose settings Panel"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
2. Find **Ollama**, click **Configure**
|
||||
3. Confirm **API Host** is `http://localhost:11434` and click Submit
|
||||
|
||||
|
||||
### Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on ollama.com and save it in your `.env`
|
||||
2. In Goose, set **API Host** to `https://ollama.com`
|
||||
|
||||
|
||||
## Goose CLI
|
||||
|
||||
Install [Goose](https://block.github.io/goose/docs/getting-started/installation/) CLI
|
||||
|
||||
### Usage with Ollama
|
||||
1. Run `goose configure`
|
||||
2. Select **Configure Providers** and select **Ollama**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/goose-cli.png"
|
||||
alt="Goose CLI"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
3. Enter model name (e.g `qwen3`)
|
||||
|
||||
### Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on ollama.com and save it in your `.env`
|
||||
2. Run `goose configure`
|
||||
3. Select **Configure Providers** and select **Ollama**
|
||||
4. Update **OLLAMA_HOST** to `https://ollama.com`
|
||||
119
docs/integrations/hermes.mdx
Normal file
@@ -0,0 +1,119 @@
|
||||
---
|
||||
title: Hermes Agent
|
||||
---
|
||||
|
||||
Hermes Agent is a self-improving AI agent built by Nous Research. It features automatic skill creation, cross-session memory, and 70+ skills that it ships with by default.
|
||||
|
||||

|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
ollama launch hermes
|
||||
```
|
||||
|
||||
Ollama handles everything automatically:
|
||||
|
||||
1. **Install** — If Hermes isn't installed, Ollama prompts to install it via the Nous Research install script
|
||||
2. **Model** — Pick a model from the selector (local or cloud)
|
||||
3. **Onboarding** — Ollama configures the Ollama provider, points Hermes at `http://127.0.0.1:11434/v1`, and sets your model as the primary
|
||||
4. **Gateway** — Optionally connects a messaging platform (Telegram, Discord, Slack, WhatsApp, Signal, Email) and launches the Hermes chat
|
||||
|
||||
<Note>Hermes on Windows requires WSL2. Install it with `wsl --install` and re-run from inside the WSL shell.</Note>
|
||||
|
||||
## Recommended models
|
||||
|
||||
**Cloud models**:
|
||||
|
||||
- `kimi-k2.5:cloud` — Multimodal reasoning with subagents
|
||||
- `glm-5.1:cloud` — Reasoning and code generation
|
||||
- `qwen3.5:cloud` — Reasoning, coding, and agentic tool use with vision
|
||||
- `minimax-m2.7:cloud` — Fast, efficient coding and real-world productivity
|
||||
|
||||
**Local models:**
|
||||
|
||||
- `gemma4` — Reasoning and code generation locally (~16 GB VRAM)
|
||||
- `qwen3.6` — Reasoning, coding, and visual understanding locally (~24 GB VRAM)
|
||||
|
||||
More models at [ollama.com/search](https://ollama.com/search?c=cloud).
|
||||
|
||||
## Connect messaging apps
|
||||
|
||||
Link Telegram, Discord, Slack, WhatsApp, Signal, or Email to chat with your models from anywhere:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
## Reconfigure
|
||||
|
||||
Re-run the full setup wizard at any time:
|
||||
|
||||
```bash
|
||||
hermes setup
|
||||
```
|
||||
|
||||
## Manual setup
|
||||
|
||||
If you'd rather drive Hermes's own wizard instead of `ollama launch hermes`, install it directly:
|
||||
|
||||
```bash
|
||||
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
|
||||
```
|
||||
|
||||
Hermes launches the setup wizard automatically. Choose **Quick setup**:
|
||||
|
||||
```
|
||||
How would you like to set up Hermes?
|
||||
|
||||
→ Quick setup — provider, model & messaging (recommended)
|
||||
Full setup — configure everything
|
||||
```
|
||||
|
||||
### Connect to Ollama
|
||||
|
||||
1. Select **More providers...**
|
||||
2. Select **Custom endpoint (enter URL manually)**
|
||||
3. Set the API base URL to the Ollama OpenAI-compatible endpoint:
|
||||
|
||||
```
|
||||
API base URL [e.g. https://api.example.com/v1]: http://127.0.0.1:11434/v1
|
||||
```
|
||||
|
||||
4. Leave the API key blank (not required for local Ollama):
|
||||
|
||||
```
|
||||
API key [optional]:
|
||||
```
|
||||
|
||||
5. Hermes auto-detects downloaded models, confirm the one you want:
|
||||
|
||||
```
|
||||
Verified endpoint via http://127.0.0.1:11434/v1/models (1 model(s) visible)
|
||||
Detected model: kimi-k2.5:cloud
|
||||
Use this model? [Y/n]:
|
||||
```
|
||||
|
||||
6. Leave context length blank to auto-detect:
|
||||
|
||||
```
|
||||
Context length in tokens [leave blank for auto-detect]:
|
||||
```
|
||||
|
||||
### Connect messaging
|
||||
|
||||
Optionally connect a messaging platform during setup:
|
||||
|
||||
```
|
||||
Connect a messaging platform? (Telegram, Discord, etc.)
|
||||
|
||||
→ Set up messaging now (recommended)
|
||||
Skip — set up later with 'hermes setup gateway'
|
||||
```
|
||||
|
||||
### Launch
|
||||
|
||||
```
|
||||
Launch hermes chat now? [Y/n]: Y
|
||||
```
|
||||
|
||||
55
docs/integrations/index.mdx
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
title: Overview
|
||||
---
|
||||
|
||||
Ollama integrates with a wide range of tools.
|
||||
|
||||
## Coding Agents
|
||||
|
||||
Coding assistants that can read, modify, and execute code in your projects.
|
||||
|
||||
- [Claude Code](/integrations/claude-code)
|
||||
- [Codex App](/integrations/codex-app)
|
||||
- [Codex CLI](/integrations/codex)
|
||||
- [Copilot CLI](/integrations/copilot-cli)
|
||||
- [OpenCode](/integrations/opencode)
|
||||
- [Droid](/integrations/droid)
|
||||
- [Goose](/integrations/goose)
|
||||
- [Pi](/integrations/pi)
|
||||
- [Pool](/integrations/pool)
|
||||
|
||||
## Assistants
|
||||
|
||||
AI assistants that help with everyday tasks.
|
||||
|
||||
- [OpenClaw](/integrations/openclaw)
|
||||
- [Hermes Agent](/integrations/hermes)
|
||||
|
||||
## IDEs & Editors
|
||||
|
||||
Native integrations for popular development environments.
|
||||
|
||||
- [VS Code](/integrations/vscode)
|
||||
- [Cline](/integrations/cline)
|
||||
- [Roo Code](/integrations/roo-code)
|
||||
- [JetBrains](/integrations/jetbrains)
|
||||
- [Xcode](/integrations/xcode)
|
||||
- [Zed](/integrations/zed)
|
||||
|
||||
## Chat & RAG
|
||||
|
||||
Chat interfaces and retrieval-augmented generation platforms.
|
||||
|
||||
- [Onyx](/integrations/onyx)
|
||||
|
||||
## Automation
|
||||
|
||||
Workflow automation platforms with AI integration.
|
||||
|
||||
- [n8n](/integrations/n8n)
|
||||
|
||||
## Notebooks
|
||||
|
||||
Interactive computing environments with AI capabilities.
|
||||
|
||||
- [marimo](/integrations/marimo)
|
||||
47
docs/integrations/jetbrains.mdx
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: JetBrains
|
||||
---
|
||||
|
||||
<Note>This example uses **IntelliJ**; same steps apply to other JetBrains IDEs (e.g., PyCharm).</Note>
|
||||
|
||||
## Install
|
||||
|
||||
Install [IntelliJ](https://www.jetbrains.com/idea/).
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
<Note>
|
||||
To use **Ollama**, you will need a [JetBrains AI Subscription](https://www.jetbrains.com/ai-ides/buy/?section=personal&billing=yearly).
|
||||
</Note>
|
||||
|
||||
1. In Intellij, click the **chat icon** located in the right sidebar
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/intellij-chat-sidebar.png"
|
||||
alt="Intellij Sidebar Chat"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Select the **current model** in the sidebar, then click **Set up Local Models**
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/intellij-current-model.png"
|
||||
alt="Intellij model bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
3. Under **Third Party AI Providers**, choose **Ollama**
|
||||
4. Confirm the **Host URL** is `http://localhost:11434`, then click **Ok**
|
||||
5. Once connected, select a model under **Local models by Ollama**
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/intellij-local-models.png"
|
||||
alt="Zed star icon in bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
73
docs/integrations/marimo.mdx
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
title: marimo
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [marimo](https://marimo.io). You can use `pip` or `uv` for this. You
|
||||
can also use `uv` to create a sandboxed environment for marimo by running:
|
||||
|
||||
```
|
||||
uvx marimo edit --sandbox notebook.py
|
||||
```
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. In marimo, go to the user settings and go to the AI tab. From here
|
||||
you can find and configure Ollama as an AI provider. For local use you
|
||||
would typically point the base url to `http://localhost:11434/v1`.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/marimo-settings.png"
|
||||
alt="Ollama settings in marimo"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Once the AI provider is set up, you can turn on/off specific AI models you'd like to access.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/marimo-models.png"
|
||||
alt="Selecting an Ollama model"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
3. You can also add a model to the list of available models by scrolling to the bottom and using the UI there.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/marimo-add-model.png"
|
||||
alt="Adding a new Ollama model"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
4. Once configured, you can now use Ollama for AI chats in marimo.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/marimo-chat.png"
|
||||
alt="Configure code completion"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
4. Alternatively, you can now use Ollama for **inline code completion** in marimo. This can be configured in the "AI Features" tab.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/marimo-code-completion.png"
|
||||
alt="Configure code completion"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
|
||||
## Connecting to ollama.com
|
||||
|
||||
1. Sign in to ollama cloud via `ollama signin`
|
||||
2. In the ollama model settings add a model that ollama hosts, like `gpt-oss:120b`.
|
||||
3. You can now refer to this model in marimo!
|
||||
68
docs/integrations/n8n.mdx
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
title: n8n
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [n8n](https://docs.n8n.io/choose-n8n/).
|
||||
|
||||
## Using Ollama Locally
|
||||
|
||||
1. In the top right corner, click the dropdown and select **Create Credential**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-credential-creation.png"
|
||||
alt="Create a n8n Credential"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Under **Add new credential** select **Ollama**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-ollama-form.png"
|
||||
alt="Select Ollama under Credential"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
3. Confirm Base URL is set to `http://localhost:11434` if running locally or `http://host.docker.internal:11434` if running through docker and click **Save**
|
||||
|
||||
<Note>
|
||||
In environments that don't use Docker Desktop (ie, Linux server installations), `host.docker.internal` is not automatically added.
|
||||
|
||||
Run n8n in docker with `--add-host=host.docker.internal:host-gateway`
|
||||
|
||||
or add the following to a docker compose file:
|
||||
|
||||
```yaml
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
```
|
||||
</Note>
|
||||
|
||||
You should see a `Connection tested successfully` message.
|
||||
|
||||
4. When creating a new workflow, select **Add a first step** and select an **Ollama node**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-chat-node.png"
|
||||
alt="Add a first step with Ollama node"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
5. Select your model of choice (e.g. `qwen3-coder`)
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-models.png"
|
||||
alt="Set up Ollama credentials"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
## Connecting to ollama.com
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on **ollama.com**.
|
||||
2. In n8n, click **Create Credential** and select **Ollama**
|
||||
4. Set the **API URL** to `https://ollama.com`
|
||||
5. Enter your **API Key** and click **Save**
|
||||
|
||||
|
||||
67
docs/integrations/nemoclaw.mdx
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
title: NemoClaw
|
||||
---
|
||||
|
||||
NemoClaw is NVIDIA's open source security stack for [OpenClaw](/integrations/openclaw). It wraps OpenClaw with the NVIDIA OpenShell runtime to provide kernel-level sandboxing, network policy controls, and audit trails for AI agents.
|
||||
|
||||
## Quick start
|
||||
|
||||
Pull a model:
|
||||
|
||||
```bash
|
||||
ollama pull nemotron-3-nano:30b
|
||||
```
|
||||
|
||||
Run the installer:
|
||||
|
||||
```bash
|
||||
curl -fsSL https://www.nvidia.com/nemoclaw.sh | \
|
||||
NEMOCLAW_NON_INTERACTIVE=1 \
|
||||
NEMOCLAW_PROVIDER=ollama \
|
||||
NEMOCLAW_MODEL=nemotron-3-nano:30b \
|
||||
bash
|
||||
```
|
||||
|
||||
Connect to your sandbox:
|
||||
|
||||
```bash
|
||||
nemoclaw my-assistant connect
|
||||
```
|
||||
|
||||
Open the TUI:
|
||||
|
||||
```bash
|
||||
openclaw tui
|
||||
```
|
||||
|
||||
<Note>Ollama support in NemoClaw is still experimental.</Note>
|
||||
|
||||
## Platform support
|
||||
|
||||
| Platform | Runtime | Status |
|
||||
|----------|---------|--------|
|
||||
| Linux (Ubuntu 22.04+) | Docker | Primary |
|
||||
| macOS (Apple Silicon) | Colima or Docker Desktop | Supported |
|
||||
| Windows | WSL2 with Docker Desktop | Supported |
|
||||
|
||||
CMD and PowerShell are not supported on Windows — WSL2 is required.
|
||||
|
||||
<Note>Ollama must be installed and running before the installer runs. When running inside WSL2 or a container, ensure Ollama is reachable from the sandbox (e.g. `OLLAMA_HOST=0.0.0.0`).</Note>
|
||||
|
||||
## System requirements
|
||||
|
||||
- CPU: 4 vCPU minimum
|
||||
- RAM: 8 GB minimum (16 GB recommended)
|
||||
- Disk: 20 GB free (40 GB recommended for local models)
|
||||
- Node.js 20+ and npm 10+
|
||||
- Container runtime (Docker preferred)
|
||||
|
||||
## Recommended models
|
||||
|
||||
- `nemotron-3-super:cloud` — Strong reasoning and coding
|
||||
- `qwen3.5:cloud` — 397B; reasoning and code generation
|
||||
- `nemotron-3-nano:30b` — Recommended local model; fits in 24 GB VRAM
|
||||
- `qwen3.5:27b` — Fast local reasoning (~18 GB VRAM)
|
||||
- `glm-4.7-flash` — Reasoning and code generation (~25 GB VRAM)
|
||||
|
||||
More models at [ollama.com/search](https://ollama.com/search).
|
||||
63
docs/integrations/onyx.mdx
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: Onyx
|
||||
---
|
||||
|
||||
## Overview
|
||||
[Onyx](http://onyx.app/) is a self-hostable Chat UI that integrates with all Ollama models. Features include:
|
||||
- Creating custom Agents
|
||||
- Web search
|
||||
- Deep Research
|
||||
- RAG over uploaded documents and connected apps
|
||||
- Connectors to applications like Google Drive, Email, Slack, etc.
|
||||
- MCP and OpenAPI Actions support
|
||||
- Image generation
|
||||
- User/Groups management, RBAC, SSO, etc.
|
||||
|
||||
Onyx can be deployed for single users or large organizations.
|
||||
|
||||
## Install Onyx
|
||||
|
||||
Deploy Onyx with the [quickstart guide](https://docs.onyx.app/deployment/getting_started/quickstart).
|
||||
|
||||
<Info>
|
||||
Resourcing/scaling docs [here](https://docs.onyx.app/deployment/getting_started/resourcing).
|
||||
</Info>
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. Login to your Onyx deployment (create an account first).
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/onyx-login.png"
|
||||
alt="Onyx Login Page"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
2. In the set-up process select `Ollama` as the LLM provider.
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/onyx-ollama-llm.png"
|
||||
alt="Onyx Set Up Form"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
3. Provide your **Ollama API URL** and select your models.
|
||||
<Note>If you're running Onyx in Docker, to access your computer's local network use `http://host.docker.internal` instead of `http://127.0.0.1`.</Note>
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/onyx-ollama-form.png"
|
||||
alt="Selecting Ollama Models"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
You can also easily connect up Onyx Cloud with the `Ollama Cloud` tab of the setup.
|
||||
|
||||
## Send your first query
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/onyx-query.png"
|
||||
alt="Onyx Query Example"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
95
docs/integrations/openclaw.mdx
Normal file
@@ -0,0 +1,95 @@
|
||||
---
|
||||
title: OpenClaw
|
||||
---
|
||||
|
||||
OpenClaw is a personal AI assistant that runs on your own devices. It bridges messaging services (WhatsApp, Telegram, Slack, Discord, iMessage, and more) to AI coding agents through a centralized gateway.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
ollama launch openclaw
|
||||
```
|
||||
|
||||
Ollama handles everything automatically:
|
||||
|
||||
1. **Install** — If OpenClaw isn't installed, Ollama prompts to install it via npm
|
||||
2. **Security** — On the first launch, a security notice explains the risks of tool access
|
||||
3. **Model** — Pick a model from the selector (local or cloud)
|
||||
4. **Onboarding** — Ollama configures the provider, installs the gateway daemon, sets your model as the primary, and enables OpenClaw's bundled Ollama web search
|
||||
5. **Gateway** — Starts in the background and opens the OpenClaw TUI
|
||||
|
||||
<Note>OpenClaw requires a larger context window. It is recommended to use a context window of at least 64k tokens if using local models. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
<Note>Previously known as Clawdbot. `ollama launch clawdbot` still works as an alias.</Note>
|
||||
|
||||
## Web search and fetch
|
||||
|
||||
OpenClaw ships with a bundled Ollama `web_search` provider that lets local or cloud-backed Ollama setups search the web through the configured Ollama host.
|
||||
|
||||
```bash
|
||||
ollama launch openclaw
|
||||
```
|
||||
|
||||
Ollama web search is enabled automatically when launching OpenClaw through Ollama. To configure it manually:
|
||||
|
||||
```bash
|
||||
openclaw configure --section web
|
||||
```
|
||||
|
||||
<Note>Ollama web search for local models requires `ollama signin`.</Note>
|
||||
|
||||
## Configure without launching
|
||||
|
||||
To change the model without starting the gateway and TUI:
|
||||
|
||||
```bash
|
||||
ollama launch openclaw --config
|
||||
```
|
||||
|
||||
To use a specific model directly:
|
||||
|
||||
```bash
|
||||
ollama launch openclaw --model kimi-k2.5:cloud
|
||||
```
|
||||
|
||||
If the gateway is already running, it restarts automatically to pick up the new model.
|
||||
|
||||
## Recommended models
|
||||
|
||||
**Cloud models**:
|
||||
|
||||
- `kimi-k2.5:cloud` — Multimodal reasoning with subagents
|
||||
- `qwen3.5:cloud` — Reasoning, coding, and agentic tool use with vision
|
||||
- `glm-5.1:cloud` — Reasoning and code generation
|
||||
- `minimax-m2.7:cloud` — Fast, efficient coding and real-world productivity
|
||||
|
||||
**Local models:**
|
||||
|
||||
- `gemma4` — Reasoning and code generation locally (~16 GB VRAM)
|
||||
- `qwen3.5` — Reasoning, coding, and visual understanding locally (~11 GB VRAM)
|
||||
|
||||
More models at [ollama.com/search](https://ollama.com/search?c=cloud).
|
||||
|
||||
## Non-interactive (headless) mode
|
||||
|
||||
Run OpenClaw without interaction for use in Docker, CI/CD, or scripts:
|
||||
|
||||
```bash
|
||||
ollama launch openclaw --model kimi-k2.5:cloud --yes
|
||||
```
|
||||
|
||||
The `--yes` flag auto-pulls the model, skips selectors, and requires `--model` to be specified.
|
||||
|
||||
## Connect messaging apps
|
||||
|
||||
```bash
|
||||
openclaw configure --section channels
|
||||
```
|
||||
|
||||
Link WhatsApp, Telegram, Slack, Discord, or iMessage to chat with your local models from anywhere.
|
||||
|
||||
## Stopping the gateway
|
||||
|
||||
```bash
|
||||
openclaw gateway stop
|
||||
```
|
||||
31
docs/integrations/opencode.mdx
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: OpenCode
|
||||
---
|
||||
|
||||
OpenCode is an open-source AI coding assistant that runs in your terminal.
|
||||
|
||||
## Install
|
||||
|
||||
Install the [OpenCode CLI](https://opencode.ai):
|
||||
|
||||
```bash
|
||||
curl -fsSL https://opencode.ai/install | bash
|
||||
```
|
||||
|
||||
<Note>OpenCode requires a larger context window. It is recommended to use a context window of at least 64k tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
### Quick setup
|
||||
|
||||
```bash
|
||||
ollama launch opencode
|
||||
```
|
||||
|
||||
To configure without launching:
|
||||
|
||||
```shell
|
||||
ollama launch opencode --config
|
||||
```
|
||||
|
||||
<Note>`ollama launch opencode` passes its configuration to OpenCode inline via the `OPENCODE_CONFIG_CONTENT` environment variable. OpenCode deep-merges its config sources on startup, so anything you declare in `~/.config/opencode/opencode.json` is still respected and available inside OpenCode. Models declared only in `opencode.json` won't appear in `ollama launch`'s model-selection menu.</Note>
|
||||
109
docs/integrations/pi.mdx
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
title: Pi
|
||||
---
|
||||
|
||||
Pi is a minimal and extensible coding agent.
|
||||
|
||||
## Install
|
||||
|
||||
Install [Pi](https://github.com/badlogic/pi-mono):
|
||||
|
||||
```bash
|
||||
npm install -g @mariozechner/pi-coding-agent
|
||||
```
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
### Quick setup
|
||||
|
||||
```bash
|
||||
ollama launch pi
|
||||
```
|
||||
|
||||
This installs Pi, configures Ollama as a provider including web tools, and drops you into an interactive session.
|
||||
|
||||
To configure without launching:
|
||||
|
||||
```shell
|
||||
ollama launch pi --config
|
||||
```
|
||||
|
||||
### Run directly with a model
|
||||
|
||||
```shell
|
||||
ollama launch pi --model qwen3.5:cloud
|
||||
```
|
||||
|
||||
Cloud models are also available at [ollama.com](https://ollama.com/search?c=cloud).
|
||||
|
||||
## Extensions
|
||||
|
||||
Pi ships with four core tools: `read`, `write`, `edit`, and `bash`. All other capabilities are added through its extension system.
|
||||
|
||||
On-demand capability packages invoked via `/skill:name` commands.
|
||||
|
||||
Install from npm or git:
|
||||
|
||||
```bash
|
||||
pi install npm:@foo/some-tools
|
||||
pi install git:github.com/user/repo@v1
|
||||
```
|
||||
|
||||
See all packages at [pi.dev](https://pi.dev/packages)
|
||||
|
||||
### Web search
|
||||
|
||||
Pi can use web search and fetch tools via the `@ollama/pi-web-search` package.
|
||||
|
||||
When launching Pi through Ollama, package install/update is managed automatically.
|
||||
To install manually:
|
||||
|
||||
```bash
|
||||
pi install npm:@ollama/pi-web-search
|
||||
```
|
||||
|
||||
### Autoresearch with `pi-autoresearch`
|
||||
|
||||
[pi-autoresearch](https://github.com/davebcn87/pi-autoresearch) brings autonomous experiment loops to Pi. Inspired by Karpathy's autoresearch, it turns any measurable metric into an optimization target: test speed, bundle size, build time, model training loss, Lighthouse scores.
|
||||
|
||||
```bash
|
||||
pi install https://github.com/davebcn87/pi-autoresearch
|
||||
```
|
||||
|
||||
Tell Pi what to optimize. It runs experiments, benchmarks each one, keeps improvements, reverts regressions, and repeats — all autonomously. A built-in dashboard tracks every run with confidence scoring to distinguish real gains from benchmark noise.
|
||||
|
||||
```bash
|
||||
/autoresearch optimize unit test runtime
|
||||
```
|
||||
|
||||
Each kept experiment is automatically committed. Each failed one is reverted. When you're done, Pi can group improvements into independent branches for clean review and merge.
|
||||
|
||||
## Manual setup
|
||||
|
||||
Add a configuration block to `~/.pi/agent/models.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"baseUrl": "http://localhost:11434/v1",
|
||||
"api": "openai-completions",
|
||||
"apiKey": "ollama",
|
||||
"models": [
|
||||
{
|
||||
"id": "qwen3-coder"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Update `~/.pi/agent/settings.json` to set the default provider:
|
||||
|
||||
```json
|
||||
{
|
||||
"defaultProvider": "ollama",
|
||||
"defaultModel": "qwen3-coder"
|
||||
}
|
||||
```
|
||||
54
docs/integrations/pool.mdx
Normal file
@@ -0,0 +1,54 @@
|
||||
---
|
||||
title: Pool
|
||||
---
|
||||
|
||||
Pool is Poolside's software agent for the terminal, built for enterprise development workflows.
|
||||
|
||||
## Install
|
||||
|
||||
Install [Pool](https://github.com/poolsideai/pool):
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
### Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch pool
|
||||
```
|
||||
|
||||
### Run directly with a model
|
||||
|
||||
```shell
|
||||
ollama launch pool --model kimi-k2.6:cloud
|
||||
```
|
||||
|
||||
### Pass arguments through to Pool
|
||||
|
||||
Arguments after `--` are passed directly to Pool:
|
||||
|
||||
```shell
|
||||
ollama launch pool -- --help
|
||||
```
|
||||
|
||||
## Manual setup
|
||||
|
||||
Pool connects to Ollama using the OpenAI-compatible API via environment variables.
|
||||
|
||||
1. Set the environment variables:
|
||||
|
||||
```shell
|
||||
export POOLSIDE_STANDALONE_BASE_URL=http://localhost:11434/v1
|
||||
export POOLSIDE_API_KEY=ollama
|
||||
```
|
||||
|
||||
2. Run Pool with an Ollama model:
|
||||
|
||||
```shell
|
||||
pool -m kimi-k2.6:cloud
|
||||
```
|
||||
|
||||
Or run with environment variables inline:
|
||||
|
||||
```shell
|
||||
POOLSIDE_STANDALONE_BASE_URL=http://localhost:11434/v1 POOLSIDE_API_KEY=ollama pool -m kimi-k2.6:cloud
|
||||
```
|
||||
30
docs/integrations/roo-code.mdx
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: Roo Code
|
||||
---
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
Install [Roo Code](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline) from the VS Code Marketplace.
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. Open Roo Code in VS Code and click the **gear icon** on the top right corner of the Roo Code window to open **Provider Settings**
|
||||
2. Set `API Provider` to `Ollama`
|
||||
3. (Optional) Update `Base URL` if your Ollama instance is running remotely. The default is `http://localhost:11434`
|
||||
4. Enter a valid `Model ID` (for example `qwen3` or `qwen3-coder:480b-cloud`)
|
||||
5. Adjust the `Context Window` to at least 32K tokens for coding tasks
|
||||
|
||||
<Note>Coding tools require a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
## Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
|
||||
2. Enable `Use custom base URL` and set it to `https://ollama.com`
|
||||
3. Enter your **Ollama API Key**
|
||||
4. Select a model from the list
|
||||
|
||||
### Recommended Models
|
||||
|
||||
- `qwen3-coder:480b`
|
||||
- `deepseek-v3.1:671b`
|
||||
85
docs/integrations/vscode.mdx
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
title: VS Code
|
||||
---
|
||||
|
||||
VS Code includes built-in AI chat through GitHub Copilot Chat. Ollama models can be used directly in the Copilot Chat model picker.
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Ollama v0.18.3+
|
||||
- [VS Code 1.113+](https://code.visualstudio.com/download)
|
||||
- [GitHub Copilot Chat extension 0.41.0+](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat)
|
||||
|
||||
<Note> VS Code requires you to be logged in to use its model selector, even for custom models. This doesn't require a paid GitHub Copilot account; GitHub Copilot Free will enable model selection for custom models.</Note>
|
||||
|
||||
## Quick setup
|
||||
|
||||
```shell
|
||||
ollama launch vscode
|
||||
```
|
||||
|
||||
Recommended models will be shown after running the command. See the latest models at [ollama.com](https://ollama.com/search?c=tools).
|
||||
|
||||
Make sure **Local** is selected at the bottom of the Copilot Chat panel to use your Ollama models.
|
||||
<div style={{ display: "flex", justifyContent: "center" }}>
|
||||
<img
|
||||
src="/images/local.png"
|
||||
alt="Ollama Local Models"
|
||||
width="60%"
|
||||
style={{ borderRadius: "4px", marginTop: "10px", marginBottom: "10px" }}
|
||||
/>
|
||||
</div>
|
||||
|
||||
|
||||
## Run directly with a model
|
||||
|
||||
```shell
|
||||
ollama launch vscode --model qwen3.5:cloud
|
||||
```
|
||||
Cloud models are also available at [ollama.com](https://ollama.com/search?c=cloud).
|
||||
|
||||
## Manual setup
|
||||
|
||||
To configure Ollama manually without `ollama launch`:
|
||||
|
||||
1. Open the **Copilot Chat** side bar from the top right corner
|
||||
<div style={{ display: "flex", justifyContent: "center" }}>
|
||||
<img
|
||||
src="/images/vscode-sidebar.png"
|
||||
alt="VS Code chat Sidebar"
|
||||
width="75%"
|
||||
style={{ borderRadius: "4px" }}
|
||||
/>
|
||||
</div>
|
||||
2. Click the **settings gear icon** (<Icon icon="gear" />) to bring up the Language Models window
|
||||
<div style={{ display: "flex", justifyContent: "center" }}>
|
||||
<img
|
||||
src="/images/vscode-other-models.png"
|
||||
alt="VS Code model picker"
|
||||
width="75%"
|
||||
style={{ borderRadius: "4px" }}
|
||||
/>
|
||||
</div>
|
||||
3. Click **Add Models** and select **Ollama** to load all your Ollama models into VS Code
|
||||
<div style={{ display: "flex", justifyContent: "center" }}>
|
||||
<img
|
||||
src="/images/vscode-add-ollama.png"
|
||||
alt="VS Code model options dropdown to add ollama models"
|
||||
width="75%"
|
||||
style={{ borderRadius: "4px" }}
|
||||
/>
|
||||
</div>
|
||||
|
||||
4. Click the **Unhide** button in the model picker to show your Ollama models
|
||||
<div style={{ display: "flex", justifyContent: "center" }}>
|
||||
<img
|
||||
src="/images/vscode-unhide.png"
|
||||
alt="VS Code unhide models button"
|
||||
width="75%"
|
||||
style={{ borderRadius: "4px" }}
|
||||
/>
|
||||
</div>
|
||||
45
docs/integrations/xcode.mdx
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: Xcode
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [XCode](https://developer.apple.com/xcode/)
|
||||
|
||||
|
||||
## Usage with Ollama
|
||||
<Note> Ensure Apple Intelligence is setup and the latest XCode version is v26.0 </Note>
|
||||
|
||||
1. Click **XCode** in top left corner > **Settings**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/xcode-intelligence-window.png"
|
||||
alt="Xcode Intelligence window"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Select **Locally Hosted**, enter port **11434** and click **Add**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/xcode-locally-hosted.png"
|
||||
alt="Xcode settings"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
3. Select the **star icon** on the top left corner and click the **dropdown**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/xcode-chat-icon.png"
|
||||
alt="Xcode settings"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
4. Click **My Account** and select your desired model
|
||||
|
||||
|
||||
## Connecting to ollama.com directly
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
|
||||
2. Select **Internet Hosted** and enter URL as `https://ollama.com`
|
||||
3. Enter your **Ollama API Key** and click **Add**
|
||||
38
docs/integrations/zed.mdx
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Zed
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [Zed](https://zed.dev/download).
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. In Zed, click the **star icon** in the bottom-right corner, then select **Configure**.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/zed-settings.png"
|
||||
alt="Zed star icon in bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Under **LLM Providers**, choose **Ollama**
|
||||
3. Confirm the **Host URL** is `http://localhost:11434`, then click **Connect**
|
||||
4. Once connected, select a model under **Ollama**
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/zed-ollama-dropdown.png"
|
||||
alt="Zed star icon in bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
## Connecting to ollama.com
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on **ollama.com**
|
||||
2. In Zed, open the **star icon** → **Configure**
|
||||
3. Under **LLM Providers**, select **Ollama**
|
||||
4. Set the **API URL** to `https://ollama.com`
|
||||
|
||||