Compare commits

..

70 Commits

Author SHA1 Message Date
Jeffrey Morgan
7e26a8df31 cmd: use environment variables for server options 2023-08-10 14:17:53 -07:00
Jeffrey Morgan
4ab1da38ba guard around id() 2023-08-10 14:11:54 -07:00
Patrick Devine
be989d89d1 Token auth (#314) 2023-08-10 11:34:25 -07:00
Soroush Javadi
bea683e3bf cmd: check GetBlobsPath error (#317)
The error returned by `server.GetBlobsPath` in `showLayer` was never
checked. Check the error and return if not nil. Also, make newlines at
the end of error messages consistent and fix a typo.
2023-08-10 09:57:49 -07:00
Jeffrey Morgan
178237d37f tweak README.md 2023-08-10 09:54:03 -07:00
Jeffrey Morgan
76a678af34 app: dont always show installer window on top now that it lives in the dock 2023-08-10 09:53:46 -07:00
Jeffrey Morgan
f65169b13e clean up cli flags 2023-08-10 09:28:56 -07:00
Jeffrey Morgan
040a5b9750 clean up cli flags 2023-08-10 09:27:03 -07:00
Bruce MacDonald
5b5cc9c9f1 embeddings endpoint 2023-08-10 11:49:55 -04:00
Bruce MacDonald
4b3507f036 embeddings endpoint
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-10 11:45:57 -04:00
Jun Tian
5ebce03c77 Add an example on multiline input (#311) 2023-08-10 08:22:28 -07:00
Bruce MacDonald
5e25f801ed fix a typo in the tweetwriter example Modelfile 2023-08-10 10:19:53 -04:00
Bruce MacDonald
8e1234b758 fix embeddings invalid values 2023-08-10 10:17:00 -04:00
Soroush Javadi
10885986b8 fix a typo in the tweetwriter example Modelfile 2023-08-10 15:12:48 +03:30
Bruce MacDonald
984c9c628c fix embeddings invalid values 2023-08-09 16:50:53 -04:00
Bruce MacDonald
c4861360ec remove embed docs 2023-08-09 16:14:19 -04:00
Bruce MacDonald
9738ef85db allow for concurrent pulls of the same files 2023-08-09 11:35:24 -04:00
Bruce MacDonald
ac971c56d1 Update images.go 2023-08-09 11:31:54 -04:00
Bruce MacDonald
8228d166ce pr comments 2023-08-09 11:31:54 -04:00
Bruce MacDonald
907e6c56b3 unlock downloadu in case or requestDownload err 2023-08-09 11:31:54 -04:00
Bruce MacDonald
868e3b31c7 allow for concurrent pulls of the same files 2023-08-09 11:31:54 -04:00
Bruce MacDonald
09d8bf6730 fix build errors 2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd embed text document in modelfile 2023-08-09 10:26:19 -04:00
Jeffrey Morgan
cff002b824 use content type application/x-ndjson for streaming responses 2023-08-08 21:38:10 -07:00
Jeffrey Morgan
55cf5021f0 update langchain example to include python 2023-08-08 21:03:10 -07:00
Jeffrey Morgan
f58caa5ab5 update README.md 2023-08-08 15:50:23 -07:00
Jeffrey Morgan
82df473ec9 use note syntax in README.md 2023-08-08 15:49:50 -07:00
Jeffrey Morgan
e184c1d035 Link to api.md in README.md 2023-08-08 15:48:47 -07:00
Jeffrey Morgan
371d4e5df3 docs: fix invalid json in api.md 2023-08-08 15:46:05 -07:00
Jeffrey Morgan
1f78e409b4 docs: format with prettier 2023-08-08 15:41:48 -07:00
Jeffrey Morgan
34a88cd776 docs: update api.md formatting 2023-08-08 15:41:19 -07:00
Bruce MacDonald
1bee2347be pr feedback
- defer closing llm on embedding
- do not override licenses
- remove debugging print line
- reformat model file docs
2023-08-08 17:01:37 -04:00
Jeffrey Morgan
a027a7dd65 add 0.0.0.0 as an allowed origin by default
Fixes #282
2023-08-08 13:39:50 -07:00
Jeffrey Morgan
22986ccb38 add llama2:70b to the model library list 2023-08-08 13:08:05 -07:00
Bruce MacDonald
884d78ceb3 allow embedding from model binary 2023-08-08 14:38:57 -04:00
Bruce MacDonald
3ceac05108 Add embedding docs 2023-08-08 14:04:11 -04:00
Bruce MacDonald
21ddcaa1f1 pr comments
- default to embeddings enabled
- move embedding logic for loaded model to request
- allow embedding full directory
- close llm on reload
2023-08-08 13:49:37 -04:00
Michael Yang
f2074ed4c0 Merge pull request #306 from jmorganca/default-keep-system
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83 embed text document in modelfile 2023-08-08 11:27:17 -04:00
Bruce MacDonald
34a13a9d05 pass flags to serve to allow setting allowed-origins + host and port 2023-08-08 10:41:42 -04:00
Jeffrey Morgan
8713ac23a8 allow overriding template and system in /api/generate
Fixes #297
Fixes #296
2023-08-08 00:55:34 -04:00
Jeffrey Morgan
5eb712f962 trim whitespace before checking stop conditions
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd automatically set num_keep if num_keep < 0
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Matt Williams
931a5f3cb9 Merge pull request #304 from jmorganca/matt/docs
missed a backtick
2023-08-07 15:14:06 -07:00
Jeffrey Morgan
639288bf2b make ollama binary executable on build 2023-08-07 18:10:37 -04:00
Jeffrey Morgan
d112c15d58 remove old library and web directories 2023-08-07 18:09:24 -04:00
Matt Williams
1267895e44 missed a backtick
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:53:49 -07:00
Matt Williams
089d03bc8d Merge pull request #289 from jmorganca/docs
First draft of API Docs
2023-08-07 13:46:22 -07:00
Michael Yang
ab3ced9d32 Merge pull request #276 from jmorganca/rope-freq
configurable rope frequency parameters
2023-08-07 13:39:38 -07:00
Matt Williams
0c52b4509b get rid of namespace and site
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:27:58 -07:00
Matt Williams
13aace3d34 clarify some more
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:21:54 -07:00
Matt Williams
2b3bb41598 model name format added
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:17:16 -07:00
cmiller01
93492f1e18 correct precedence of serve params (args over env over default) 2023-08-07 19:55:20 +00:00
Michael Chiang
54ba3e2ceb langchain JS integration (#302)
langchain JS integration
2023-08-07 12:21:36 -04:00
Matt Williams
4904cd8bcd update simpler code samples
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 07:40:38 -07:00
Matt Williams
8a45359ec6 Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-07 07:33:05 -07:00
cmiller01
fb593b7bfc pass flags to serve to allow setting allowed-origins + host and port
* resolves: https://github.com/jmorganca/ollama/issues/300 and
https://github.com/jmorganca/ollama/issues/282

* example usage:
```
ollama serve --port 9999 --allowed-origins "http://foo.example.com,http://192.0.0.1"
```
2023-08-07 03:34:37 +00:00
Matt Williams
2544b8afa1 update as per Mike's comments
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 17:42:24 -07:00
Matt Williams
ac1b04f271 Update docs/api.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-04 17:40:52 -07:00
Matt Williams
123fdeb919 Update docs/api.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-04 17:38:52 -07:00
Matt Williams
5c82bf95d1 Update docs/api.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-04 17:12:24 -07:00
Matt Williams
38a9b1618c missed some quotes
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 16:09:07 -07:00
Matt Williams
c18be72a3b complete 1st draft of api docs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 16:08:11 -07:00
Matt Williams
a101fe51a7 clean up
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:56:41 -07:00
Bruce MacDonald
06fc48ad66 Update README.md (#285)
Ollama now supports Intel Macs
2023-08-04 15:45:55 -04:00
Matt Williams
d93e2f9210 fleshing out response
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:38:58 -07:00
Matt Williams
31edc829fc continuing
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:30:23 -07:00
Matt Williams
b31104768c filling out generate
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:27:47 -07:00
Matt Williams
b662d9fd8c starting to build out some docs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 11:55:00 -07:00
Michael Yang
b9f4d67554 configurable rope frequency parameters 2023-08-03 22:11:58 -07:00
52 changed files with 1281 additions and 5950 deletions

View File

@@ -9,13 +9,13 @@
[![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama)
> Note: Ollama is in early preview. Please report any issues you find.
Run, create, and share large language models (LLMs).
> Note: Ollama is in early preview. Please report any issues you find.
## Download
- [Download](https://ollama.ai/download) for macOS on Apple Silicon (Intel coming soon)
- [Download](https://ollama.ai/download) for macOS
- Download for Windows and Linux (coming soon)
- Build [from source](#building)
@@ -34,8 +34,9 @@ ollama run llama2
| Model | Parameters | Size | Download |
| ------------------------ | ---------- | ----- | ------------------------------- |
| Llama2 | 7B | 3.8GB | `ollama pull llama2` |
| Llama2 Uncensored | 7B | 3.8GB | `ollama pull llama2-uncensored` |
| Llama2 13B | 13B | 7.3GB | `ollama pull llama2:13b` |
| Llama2 70B | 70B | 39GB | `ollama pull llama2:70b` |
| Llama2 Uncensored | 7B | 3.8GB | `ollama pull llama2-uncensored` |
| Orca Mini | 3B | 1.9GB | `ollama pull orca` |
| Vicuna | 7B | 3.8GB | `ollama pull vicuna` |
| Nous-Hermes | 13B | 7.3GB | `ollama pull nous-hermes` |
@@ -53,6 +54,15 @@ ollama run llama2
Hello! How can I help you today?
```
For multiline input, you can wrap text with `"""`:
```
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
```
### Create a custom model
Pull a base model:
@@ -60,6 +70,7 @@ Pull a base model:
```
ollama pull llama2
```
> To update a model to the latest version, run `ollama pull llama2` again. The model will be updated (if necessary).
Create a `Modelfile`:
@@ -85,9 +96,7 @@ ollama run mario
Hello! It's your friend Mario.
```
For more examples, see the [examples](./examples) directory.
For more information on creating a Modelfile, see the [Modelfile](./docs/modelfile.md) documentation.
For more examples, see the [examples](./examples) directory. For more information on creating a Modelfile, see the [Modelfile](./docs/modelfile.md) documentation.
### Pull a model from the registry
@@ -132,24 +141,20 @@ Finally, run a model!
## REST API
### `POST /api/generate`
> See the [API documentation](./docs/api.md) for all endpoints.
Generate text from a model.
Ollama has an API for running and managing models. For example to generate text from a model:
```
curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt":"Why is the sky blue?"}'
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
```
### `POST /api/create`
Create a model from a `Modelfile`.
```
curl -X POST http://localhost:11434/api/create -d '{"name": "my-model", "path": "/path/to/modelfile"}'
```
## Projects built with Ollama
## Tools using Ollama
- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with a question-answering [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa).
- [Continue](https://github.com/continuedev/continue) - embeds Ollama inside Visual Studio Code. The extension lets you highlight code to add to the prompt, ask questions in the sidebar, and generate code inline.
- [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot) - interact with Ollama as a chatbot on Discord.
- [Raycast Ollama](https://github.com/MassimilianoPasquini97/raycast_ollama) - Raycast extension to use Ollama for local llama inference on Raycast.

View File

@@ -33,13 +33,26 @@ func (e StatusError) Error() string {
}
type GenerateRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
Context []int `json:"context,omitempty"`
Model string `json:"model"`
Prompt string `json:"prompt"`
System string `json:"system"`
Template string `json:"template"`
Context []int `json:"context,omitempty"`
Options map[string]interface{} `json:"options"`
}
type EmbeddingRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
Options map[string]interface{} `json:"options"`
}
type EmbeddingResponse struct {
Embedding []float64 `json:"embedding"`
}
type CreateRequest struct {
Name string `json:"name"`
Path string `json:"path"`
@@ -85,6 +98,10 @@ type ListResponseModel struct {
Size int `json:"size"`
}
type TokenResponse struct {
Token string `json:"token"`
}
type GenerateResponse struct {
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
@@ -147,19 +164,21 @@ type Options struct {
UseNUMA bool `json:"numa,omitempty"`
// Model options
NumCtx int `json:"num_ctx,omitempty"`
NumKeep int `json:"num_keep,omitempty"`
NumBatch int `json:"num_batch,omitempty"`
NumGQA int `json:"num_gqa,omitempty"`
NumGPU int `json:"num_gpu,omitempty"`
MainGPU int `json:"main_gpu,omitempty"`
LowVRAM bool `json:"low_vram,omitempty"`
F16KV bool `json:"f16_kv,omitempty"`
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap bool `json:"use_mmap,omitempty"`
UseMLock bool `json:"use_mlock,omitempty"`
EmbeddingOnly bool `json:"embedding_only,omitempty"`
NumCtx int `json:"num_ctx,omitempty"`
NumKeep int `json:"num_keep,omitempty"`
NumBatch int `json:"num_batch,omitempty"`
NumGQA int `json:"num_gqa,omitempty"`
NumGPU int `json:"num_gpu,omitempty"`
MainGPU int `json:"main_gpu,omitempty"`
LowVRAM bool `json:"low_vram,omitempty"`
F16KV bool `json:"f16_kv,omitempty"`
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap bool `json:"use_mmap,omitempty"`
UseMLock bool `json:"use_mlock,omitempty"`
EmbeddingOnly bool `json:"embedding_only,omitempty"`
RopeFrequencyBase float32 `json:"rope_frequency_base,omitempty"`
RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
// Predict options
RepeatLastN int `json:"repeat_last_n,omitempty"`
@@ -261,14 +280,18 @@ func DefaultOptions() Options {
UseNUMA: false,
NumCtx: 2048,
NumBatch: 512,
NumGPU: 1,
NumGQA: 1,
LowVRAM: false,
F16KV: true,
UseMMap: true,
UseMLock: false,
NumCtx: 2048,
NumKeep: -1,
NumBatch: 512,
NumGPU: 1,
NumGQA: 1,
LowVRAM: false,
F16KV: true,
UseMMap: true,
UseMLock: false,
RopeFrequencyBase: 10000.0,
RopeFrequencyScale: 1.0,
EmbeddingOnly: true,
RepeatLastN: 64,
RepeatPenalty: 1.1,

View File

@@ -71,7 +71,6 @@ function firstRunWindow() {
nodeIntegration: true,
contextIsolation: false,
},
alwaysOnTop: true,
})
require('@electron/remote/main').enable(welcomeWindow.webContents)
@@ -237,13 +236,18 @@ app.on('window-all-closed', () => {
// In this file you can include the rest of your app's specific main process
// code. You can also put them in separate files and import them here.
let aid = ''
try {
aid = id()
} catch (e) {}
autoUpdater.setFeedURL({
url: `https://ollama.ai/api/update?os=${process.platform}&arch=${process.arch}&version=${app.getVersion()}`,
url: `https://ollama.ai/api/update?os=${process.platform}&arch=${process.arch}&version=${app.getVersion()}&id=${aid}`,
})
async function heartbeat() {
analytics.track({
anonymousId: id(),
anonymousId: aid,
event: 'heartbeat',
properties: {
version: app.getVersion(),

View File

@@ -48,12 +48,18 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
spinner.Stop()
}
currentDigest = resp.Digest
bar = progressbar.DefaultBytes(
int64(resp.Total),
fmt.Sprintf("pulling %s...", resp.Digest[7:19]),
)
bar.Set(resp.Completed)
switch {
case strings.Contains(resp.Status, "embeddings"):
bar = progressbar.Default(int64(resp.Total), resp.Status)
bar.Set(resp.Completed)
default:
// pulling
bar = progressbar.DefaultBytes(
int64(resp.Total),
resp.Status,
)
bar.Set(resp.Completed)
}
} else if resp.Digest == currentDigest && resp.Digest != "" {
bar.Set(resp.Completed)
} else {
@@ -312,12 +318,16 @@ func generate(cmd *cobra.Command, model, prompt string) error {
func showLayer(l *server.Layer) {
filename, err := server.GetBlobsPath(l.Digest)
bts, err := os.ReadFile(filename)
if err != nil {
fmt.Printf("Couldn't read layer")
fmt.Println("Couldn't get layer's path")
return
}
fmt.Printf(string(bts) + "\n")
bts, err := os.ReadFile(filename)
if err != nil {
fmt.Println("Couldn't read layer")
return
}
fmt.Println(string(bts))
}
func generateInteractive(cmd *cobra.Command, model string) error {
@@ -454,7 +464,7 @@ func generateInteractive(cmd *cobra.Command, model string) error {
mp := server.ParseModelPath(model)
manifest, err := server.GetManifest(mp)
if err != nil {
fmt.Printf("error: couldn't get a manifestfor this model")
fmt.Println("error: couldn't get a manifest for this model")
continue
}
switch args[1] {
@@ -513,15 +523,21 @@ func generateBatch(cmd *cobra.Command, model string) error {
return nil
}
func RunServer(_ *cobra.Command, _ []string) error {
host := os.Getenv("OLLAMA_HOST")
if host == "" {
host = "127.0.0.1"
func RunServer(cmd *cobra.Command, _ []string) error {
var host, port = "127.0.0.1", "11434"
parts := strings.Split(os.Getenv("OLLAMA_HOST"), ":")
if ip := net.ParseIP(parts[0]); ip != nil {
host = ip.String()
}
port := os.Getenv("OLLAMA_PORT")
if port == "" {
port = "11434"
if len(parts) > 1 {
port = parts[1]
}
// deprecated: include port in OLLAMA_HOST
if p := os.Getenv("OLLAMA_PORT"); p != "" {
port = p
}
ln, err := net.Listen("tcp", fmt.Sprintf("%s:%s", host, port))
@@ -529,7 +545,12 @@ func RunServer(_ *cobra.Command, _ []string) error {
return err
}
return server.Serve(ln)
var origins []string
if o := os.Getenv("OLLAMA_ORIGINS"); o != "" {
origins = strings.Split(o, ",")
}
return server.Serve(ln, origins)
}
func startMacApp(client *api.Client) error {

5
docs/README.md Normal file
View File

@@ -0,0 +1,5 @@
# Documentation
- [Modelfile](./modelfile.md)
- [How to develop Ollama](./development.md)
- [API](./api.md)

222
docs/api.md Normal file
View File

@@ -0,0 +1,222 @@
# API
## Endpoints
- [Generate a completion](#generate-a-completion)
- [Create a model](#create-a-model)
- [List local models](#list-local-models)
- [Copy a model](#copy-a-model)
- [Delete a model](#delete-a-model)
- [Pull a model](#pull-a-model)
## Conventions
### Model names
Model names follow a `model:tag` format. Some examples are `orca:3b-q4_1` and `llama2:70b`. The tag is optional and if not provided will default to `latest`. The tag is used to identify a specific version.
### Durations
All durations are returned in nanoseconds.
## Generate a completion
```
POST /api/generate
```
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
### Parameters
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
Advanced parameters:
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system prompt to (overrides what is defined in the `Modelfile`)
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
### Request
```
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2:7b",
"prompt": "Why is the sky blue?"
}'
```
### Response
A stream of JSON objects:
```json
{
"model": "llama2:7b",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
```
The final response in the stream also includes additional data about the generation:
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `sample_count`: number of samples generated
- `sample_duration`: time spent generating samples
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
```json
{
"model": "llama2:7b",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 5589157167,
"load_duration": 3013701500,
"sample_count": 114,
"sample_duration": 81442000,
"prompt_eval_count": 46,
"prompt_eval_duration": 1160282000,
"eval_count": 113,
"eval_duration": 1325948000
}
```
## Create a Model
```
POST /api/create
```
Create a model from a [`Modelfile`](./modelfile.md)
### Parameters
- `name`: name of the model to create
- `path`: path to the Modelfile
### Request
```
curl -X POST http://localhost:11434/api/create -d '{
"name": "mario",
"path": "~/Modelfile"
}'
```
### Response
A stream of JSON objects. When finished, `status` is `success`
```json
{
"status": "parsing modelfile"
}
```
## List Local Models
```
GET /api/tags
```
List models that are available locally.
### Request
```
curl http://localhost:11434/api/tags
```
### Response
```json
{
"models": [
{
"name": "llama2:7b",
"modified_at": "2023-08-02T17:02:23.713454393-07:00",
"size": 3791730596
},
{
"name": "llama2:13b",
"modified_at": "2023-08-08T12:08:38.093596297-07:00",
"size": 7323310500
}
]
}
```
## Copy a Model
```
POST /api/copy
```
Copy a model. Creates a model with another name from an existing model.
### Request
```
curl http://localhost:11434/api/copy -d '{
"source": "llama2:7b",
"destination": "llama2-backup"
}'
```
## Delete a Model
```
DELETE /api/delete
```
Delete a model and its data.
### Parameters
- `model`: model name to delete
### Request
```
curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama2:13b"
}'
```
## Pull a Model
```
POST /api/pull
```
Download a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple calls to will share the same download progress.
### Parameters
- `name`: name of the model to pull
### Request
```
curl -X POST http://localhost:11434/api/pull -d '{
"name": "llama2:7b"
}'
```
### Response
```json
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208
}
```

View File

@@ -30,19 +30,15 @@ Now you can run `ollama`:
To release a new version of Ollama you'll need to set some environment variables:
* `GITHUB_TOKEN`: your GitHub token
* `APPLE_IDENTITY`: the Apple signing identity (macOS only)
* `APPLE_ID`: your Apple ID
* `APPLE_PASSWORD`: your Apple ID app-specific password
* `APPLE_TEAM_ID`: the Apple team ID for the signing identity
* `TELEMETRY_WRITE_KEY`: segment write key for telemetry
- `GITHUB_TOKEN`: your GitHub token
- `APPLE_IDENTITY`: the Apple signing identity (macOS only)
- `APPLE_ID`: your Apple ID
- `APPLE_PASSWORD`: your Apple ID app-specific password
- `APPLE_TEAM_ID`: the Apple team ID for the signing identity
- `TELEMETRY_WRITE_KEY`: segment write key for telemetry
Then run the publish script with the target version:
```
VERSION=0.0.2 ./scripts/publish.sh
```

17
docs/faq.md Normal file
View File

@@ -0,0 +1,17 @@
# FAQ
## How can I expose the Ollama server?
```
OLLAMA_HOST=0.0.0.0:11435 ollama serve
```
By default, Ollama allows cross origin requests from `127.0.0.1` and `0.0.0.0`. To support more origins, you can use the `OLLAMA_ORIGINS` environment variable:
```
OLLAMA_ORIGINS=http://192.168.1.1:*,https://example.com ollama serve
```
## Where are models stored?
Raw model data is stored under `~/.ollama/models`.

View File

@@ -163,4 +163,4 @@ LICENSE """
## Notes
- the **modelfile is not case sensitive**. In the examples, we use uppercase for instructions to make it easier to distinguish it from arguments.
- Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.
- Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.

View File

@@ -3,5 +3,5 @@
FROM nous-hermes
SYSTEM """
You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be includes as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be included as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
"""

1
go.mod
View File

@@ -42,6 +42,7 @@ require (
golang.org/x/sys v0.10.0 // indirect
golang.org/x/term v0.10.0
golang.org/x/text v0.10.0 // indirect
gonum.org/v1/gonum v0.13.0
google.golang.org/protobuf v1.30.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

2
go.sum
View File

@@ -139,6 +139,8 @@ golang.org/x/text v0.10.0 h1:UpjohKhiEgNc0CSauXmwYftY1+LlaC75SJwh0SgCX58=
golang.org/x/text v0.10.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gonum.org/v1/gonum v0.13.0 h1:a0T3bh+7fhRyqeNbiC3qVHYmkiQgit3wnNan/2c0HMM=
gonum.org/v1/gonum v0.13.0/go.mod h1:/WPYRckkfWrhWefxyYTfrTtQR0KH4iyHNuzxqXAKyAU=
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
google.golang.org/protobuf v1.28.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
google.golang.org/protobuf v1.30.0 h1:kPPoIgf3TsEvrm0PFe15JQ+570QVxYzEvvHqChK+cng=

1
library/.gitignore vendored
View File

@@ -1 +0,0 @@
models

View File

@@ -1,7 +0,0 @@
https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin e84705205f71dd55be7b24a778f248f0eda9999a125d313358c087e092d83148
https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML/resolve/main/nous-hermes-13b.ggmlv3.q4_0.bin d1735b93e1dc503f1045ccd6c8bd73277b18ba892befd1dc29e9b9a7822ed998
https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML/resolve/main/vicuna-7b-v1.3.ggmlv3.q4_0.bin 23ce5ed290b56a19305178b9ada2c3d96036bd69a6c18304b6158eb6672d6c0f
https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin 1f08b147a5bce41cfcbb3fd5d51ba765dea1786e15b5655ab69ba3a337a893b7
https://huggingface.co/TheBloke/Llama-2-7B-GGML/resolve/main/llama-2-7b.ggmlv3.q4_0.bin bfa26d855e44629c4cf919985e90bd7fa03b77eea1676791519e39a4d45fd4d5
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin 8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin f79142715bc9539a2edbb4b253548db8b34fac22736593eeaa28555874476e30

View File

@@ -1,147 +0,0 @@
FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
TEMPLATE """
{{- if .First }}
<<SYS>>
{{ .System }}
<</SYS>>
{{- end }}
[INST] {{ .Prompt }} [/INST]
"""
SYSTEM """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""
LICENSE """
Llama 2 Community License Agreement
Llama 2 Version Release Date: July 18, 2023
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entitys behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
“Llama Materials” means, collectively, Metas proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
1. License Rights and Redistribution.
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Metas intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
b. Redistribution and Use.
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensees affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
5. Intellectual Property.
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
b. Subject to Metas ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
"""
LICENSE """
Llama 2 Acceptable Use Policy
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
Prohibited Uses
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
1. Violate the law or others rights, including to:
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
i. Violence or terrorism
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
b. Human trafficking, exploitation, and sexual violence
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
iv. Sexual solicitation
vi. Any other criminal activity
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
b. Guns and illegal weapons (including weapon development)
c. Illegal drugs and regulated/controlled substances
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
c. Generating, promoting, or further distributing spam
d. Impersonating another individual without consent, authorization, or legal right
e. Representing that the use of Llama 2 or outputs are human-generated
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
4. Fail to appropriately disclose to end users any known dangers of your AI system
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
Reporting issues with the model: github.com/facebookresearch/llama
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
"""

View File

@@ -1,147 +0,0 @@
FROM ../models/llama-2-13b-chat.ggmlv3.q4_0.bin
TEMPLATE """
{{- if .First }}
<<SYS>>
{{ .System }}
<</SYS>>
{{- end }}
[INST] {{ .Prompt }} [/INST]
"""
SYSTEM """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""
LICENSE """
Llama 2 Community License Agreement
Llama 2 Version Release Date: July 18, 2023
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entitys behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
“Llama Materials” means, collectively, Metas proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
1. License Rights and Redistribution.
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Metas intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
b. Redistribution and Use.
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensees affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
5. Intellectual Property.
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
b. Subject to Metas ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
"""
LICENSE """
Llama 2 Acceptable Use Policy
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
Prohibited Uses
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
1. Violate the law or others rights, including to:
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
i. Violence or terrorism
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
b. Human trafficking, exploitation, and sexual violence
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
iv. Sexual solicitation
vi. Any other criminal activity
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
b. Guns and illegal weapons (including weapon development)
c. Illegal drugs and regulated/controlled substances
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
c. Generating, promoting, or further distributing spam
d. Impersonating another individual without consent, authorization, or legal right
e. Representing that the use of Llama 2 or outputs are human-generated
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
4. Fail to appropriately disclose to end users any known dangers of your AI system
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
Reporting issues with the model: github.com/facebookresearch/llama
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
"""

View File

@@ -1,147 +0,0 @@
FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
TEMPLATE """
{{- if .First }}
<<SYS>>
{{ .System }}
<</SYS>>
{{- end }}
[INST] {{ .Prompt }} [/INST]
"""
SYSTEM """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""
LICENSE """
Llama 2 Community License Agreement
Llama 2 Version Release Date: July 18, 2023
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entitys behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
“Llama Materials” means, collectively, Metas proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
1. License Rights and Redistribution.
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Metas intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
b. Redistribution and Use.
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensees affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
5. Intellectual Property.
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
b. Subject to Metas ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
"""
LICENSE """
Llama 2 Acceptable Use Policy
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
Prohibited Uses
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
1. Violate the law or others rights, including to:
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
i. Violence or terrorism
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
b. Human trafficking, exploitation, and sexual violence
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
iv. Sexual solicitation
vi. Any other criminal activity
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
b. Guns and illegal weapons (including weapon development)
c. Illegal drugs and regulated/controlled substances
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
c. Generating, promoting, or further distributing spam
d. Impersonating another individual without consent, authorization, or legal right
e. Representing that the use of Llama 2 or outputs are human-generated
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
4. Fail to appropriately disclose to end users any known dangers of your AI system
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
Reporting issues with the model: github.com/facebookresearch/llama
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
"""

View File

@@ -1,7 +0,0 @@
FROM ../models/nous-hermes-13b.ggmlv3.q4_0.bin
TEMPLATE """
### Instruction:
{{ .Prompt }}
### Response:
"""

View File

@@ -1,14 +0,0 @@
FROM ../models/orca-mini-3b.ggmlv3.q4_0.bin
TEMPLATE """
{{- if .First }}
### System:
{{ .System }}
{{- end }}
### User:
{{ .Prompt }}
### Response:
"""
SYSTEM """You are an AI assistant that follows instruction extremely well. Help as much as you can."""

View File

@@ -1,11 +0,0 @@
FROM ../models/vicuna-7b-v1.3.ggmlv3.q4_0.bin
TEMPLATE """
{{ if .First }}
{{ .System }}
{{- end }}
USER: {{ .Prompt }}
ASSISTANT:
"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""

View File

@@ -1,5 +0,0 @@
FROM ../models/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
TEMPLATE """
USER: {{ .Prompt }}
ASSISTANT:
"""

View File

@@ -1,52 +0,0 @@
#!/bin/bash
mkdir -p models
# download binaries
function process_line {
local url=$1
local checksum=$2
# Get the filename from the URL
local filename=models/$(basename $url)
echo "verifying $filename..."
# If the file exists, compute its checksum
if [ -f $filename ]; then
local existing_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
fi
# If the file does not exist, or its checksum does not match, download it
if [ ! -f $filename ] || [ $existing_checksum != $checksum ]; then
echo "downloading $filename..."
# Download the file
curl -L $url -o $filename
# Compute the SHA256 hash of the downloaded file
local computed_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
# Verify the checksum
if [ $computed_checksum != $checksum ]; then
echo "Checksum verification failed for $filename"
exit 1
fi
fi
}
while IFS=' ' read -r url checksum
do
process_line $url $checksum
done < "downloads"
# create and publish the models
for file in modelfiles/*; do
if [ -f "$file" ]; then
filename=$(basename "$file")
echo $filename
ollama create "library/${filename}" -f "$file"
ollama push "${filename}"
fi
done

View File

@@ -85,6 +85,7 @@ llama_token llama_sample(
}
*/
import "C"
import (
"bytes"
"embed"
@@ -142,6 +143,8 @@ func New(model string, opts api.Options) (*LLM, error) {
params.use_mmap = C.bool(llm.UseMMap)
params.use_mlock = C.bool(llm.UseMLock)
params.embedding = C.bool(llm.EmbeddingOnly)
params.rope_freq_base = C.float(llm.RopeFrequencyBase)
params.rope_freq_scale = C.float(llm.RopeFrequencyScale)
llm.params = &params
cModel := C.CString(model)
@@ -187,10 +190,6 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
tokens[i] = C.llama_token(ctx[i])
}
if len(tokens) == 0 {
tokens = llm.tokenize(" ")
}
llm.marshalPrompt(tokens, prompt)
C.llama_set_rng_seed(llm.ctx, C.uint(llm.Seed))
@@ -206,7 +205,7 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
return err
}
b.WriteString(llm.detokenize(token))
b.WriteString(llm.Decode(token))
if err := llm.checkStopConditions(b); err != nil {
if errors.Is(err, io.EOF) {
@@ -224,17 +223,15 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
}
}
last := make([]int, 0, len(llm.last))
for _, i := range llm.last {
if i != 0 {
last = append(last, int(i))
}
embd := make([]int, len(llm.embd))
for i := range llm.embd {
embd[i] = int(llm.embd[i])
}
timings := C.llama_get_timings(llm.ctx)
fn(api.GenerateResponse{
Done: true,
Context: last,
Context: embd,
SampleCount: int(timings.n_sample),
SampleDuration: parseDurationMs(float64(timings.t_sample_ms)),
PromptEvalCount: int(timings.n_p_eval),
@@ -248,9 +245,9 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
func (llm *LLM) checkStopConditions(b bytes.Buffer) error {
for _, stopCondition := range llm.Stop {
if stopCondition == b.String() {
if stopCondition == strings.TrimSpace(b.String()) {
return io.EOF
} else if strings.HasPrefix(stopCondition, b.String()) {
} else if strings.HasPrefix(stopCondition, strings.TrimSpace(b.String())) {
return errNeedMoreData
}
}
@@ -259,7 +256,7 @@ func (llm *LLM) checkStopConditions(b bytes.Buffer) error {
}
func (llm *LLM) marshalPrompt(ctx []C.llama_token, prompt string) []C.llama_token {
tokens := append(ctx, llm.tokenize(prompt)...)
tokens := append(ctx, llm.Encode(prompt)...)
if llm.NumKeep < 0 {
llm.NumKeep = len(tokens)
}
@@ -301,7 +298,7 @@ func (llm *LLM) marshalPrompt(ctx []C.llama_token, prompt string) []C.llama_toke
return tokens
}
func (llm *LLM) tokenize(prompt string) []C.llama_token {
func (llm *LLM) Encode(prompt string) []C.llama_token {
cPrompt := C.CString(prompt)
defer C.free(unsafe.Pointer(cPrompt))
@@ -313,7 +310,7 @@ func (llm *LLM) tokenize(prompt string) []C.llama_token {
return nil
}
func (llm *LLM) detokenize(tokens ...C.llama_token) string {
func (llm *LLM) Decode(tokens ...C.llama_token) string {
var sb strings.Builder
for _, token := range tokens {
sb.WriteString(C.GoString(C.llama_token_to_str(llm.ctx, token)))
@@ -412,3 +409,31 @@ func (llm *LLM) next() (C.llama_token, error) {
return token, nil
}
func (llm *LLM) Embedding(input string) ([]float64, error) {
if !llm.EmbeddingOnly {
return nil, errors.New("llama: embedding not enabled")
}
tokens := llm.Encode(input)
if tokens == nil {
return nil, errors.New("llama: tokenize embedding")
}
retval := C.llama_eval(llm.ctx, unsafe.SliceData(tokens), C.int(len(tokens)), 0, C.int(llm.NumThread))
if retval != 0 {
return nil, errors.New("llama: eval")
}
n := C.llama_n_embd(llm.ctx)
if n <= 0 {
return nil, errors.New("llama: no embeddings generated")
}
cEmbeddings := unsafe.Slice(C.llama_get_embeddings(llm.ctx), n)
embeddings := make([]float64, len(cEmbeddings))
for i, v := range cEmbeddings {
embeddings[i] = float64(v)
}
return embeddings, nil
}

View File

@@ -40,7 +40,7 @@ func Parse(reader io.Reader) ([]Command, error) {
command.Args = string(fields[1])
// copy command for validation
modelCommand = command
case "LICENSE", "TEMPLATE", "SYSTEM", "PROMPT":
case "LICENSE", "TEMPLATE", "SYSTEM", "PROMPT", "EMBED":
command.Name = string(bytes.ToLower(fields[0]))
command.Args = string(fields[1])
case "PARAMETER":

View File

@@ -8,6 +8,7 @@ CGO_ENABLED=1 GOARCH=amd64 go build -o dist/ollama-darwin-amd64
lipo -create -output dist/ollama dist/ollama-darwin-arm64 dist/ollama-darwin-amd64
rm dist/ollama-darwin-amd64 dist/ollama-darwin-arm64
codesign --deep --force --options=runtime --sign "$APPLE_IDENTITY" --timestamp dist/ollama
chmod +x dist/ollama
# build and sign the mac app
npm install --prefix app

164
server/auth.go Normal file
View File

@@ -0,0 +1,164 @@
package server
import (
"bytes"
"crypto/rand"
"crypto/sha256"
"encoding/base64"
"encoding/hex"
"encoding/json"
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"os"
"path"
"strings"
"time"
"golang.org/x/crypto/ssh"
"github.com/jmorganca/ollama/api"
)
type AuthRedirect struct {
Realm string
Service string
Scope string
}
type SignatureData struct {
Method string
Path string
Data []byte
}
func generateNonce(length int) (string, error) {
nonce := make([]byte, length)
_, err := rand.Read(nonce)
if err != nil {
return "", err
}
return base64.RawURLEncoding.EncodeToString(nonce), nil
}
func (r AuthRedirect) URL() (string, error) {
nonce, err := generateNonce(16)
if err != nil {
return "", err
}
return fmt.Sprintf("%s?service=%s&scope=%s&ts=%d&nonce=%s", r.Realm, r.Service, r.Scope, time.Now().Unix(), nonce), nil
}
func getAuthToken(redirData AuthRedirect, regOpts *RegistryOptions) (string, error) {
url, err := redirData.URL()
if err != nil {
return "", err
}
home, err := os.UserHomeDir()
if err != nil {
return "", err
}
keyPath := path.Join(home, ".ollama/id_ed25519")
rawKey, err := ioutil.ReadFile(keyPath)
if err != nil {
log.Printf("Failed to load private key: %v", err)
return "", err
}
s := SignatureData{
Method: "GET",
Path: url,
Data: nil,
}
if !strings.HasPrefix(s.Path, "http") {
if regOpts.Insecure {
s.Path = "http://" + url
} else {
s.Path = "https://" + url
}
}
sig, err := s.Sign(rawKey)
if err != nil {
return "", err
}
headers := map[string]string{
"Authorization": sig,
}
resp, err := makeRequest("GET", url, headers, nil, regOpts)
if err != nil {
log.Printf("couldn't get token: %q", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return "", fmt.Errorf("on pull registry responded with code %d: %s", resp.StatusCode, body)
}
respBody, err := io.ReadAll(resp.Body)
if err != nil {
return "", err
}
var tok api.TokenResponse
if err := json.Unmarshal(respBody, &tok); err != nil {
return "", err
}
return tok.Token, nil
}
// Bytes returns a byte slice of the data to sign for the request
func (s SignatureData) Bytes() []byte {
// We first derive the content hash of the request body using:
// base64(hex(sha256(request body)))
hash := sha256.Sum256(s.Data)
hashHex := make([]byte, hex.EncodedLen(len(hash)))
hex.Encode(hashHex, hash[:])
contentHash := base64.StdEncoding.EncodeToString(hashHex)
// We then put the entire request together in a serialize string using:
// "<method>,<uri>,<content hash>"
// e.g. "GET,http://localhost,OTdkZjM1O..."
return []byte(strings.Join([]string{s.Method, s.Path, contentHash}, ","))
}
// SignData takes a SignatureData object and signs it with a raw private key
func (s SignatureData) Sign(rawKey []byte) (string, error) {
privateKey, err := ssh.ParseRawPrivateKey(rawKey)
if err != nil {
return "", err
}
signer, err := ssh.NewSignerFromKey(privateKey)
if err != nil {
return "", err
}
// get the pubkey, but remove the type
pubKey := ssh.MarshalAuthorizedKey(signer.PublicKey())
parts := bytes.Split(pubKey, []byte(" "))
if len(parts) < 2 {
return "", fmt.Errorf("malformed public key")
}
signedData, err := signer.Sign(nil, s.Bytes())
if err != nil {
return "", err
}
// signature is <pubkey>:<signature>
sig := fmt.Sprintf("%s:%s", bytes.TrimSpace(parts[1]), base64.StdEncoding.EncodeToString(signedData.Blob))
return sig, nil
}

215
server/download.go Normal file
View File

@@ -0,0 +1,215 @@
package server
import (
"context"
"errors"
"fmt"
"io"
"log"
"net/http"
"os"
"path"
"strconv"
"sync"
"time"
"github.com/jmorganca/ollama/api"
)
type FileDownload struct {
Digest string
FilePath string
Total int64
Completed int64
}
var inProgress sync.Map // map of digests currently being downloaded to their current download progress
// downloadBlob downloads a blob from the registry and stores it in the blobs directory
func downloadBlob(ctx context.Context, mp ModelPath, digest string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
fp, err := GetBlobsPath(digest)
if err != nil {
return err
}
if fi, _ := os.Stat(fp); fi != nil {
// we already have the file, so return
fn(api.ProgressResponse{
Digest: digest,
Total: int(fi.Size()),
Completed: int(fi.Size()),
})
return nil
}
fileDownload := &FileDownload{
Digest: digest,
FilePath: fp,
Total: 1, // dummy value to indicate that we don't know the total size yet
Completed: 0,
}
_, downloading := inProgress.LoadOrStore(digest, fileDownload)
if downloading {
// this is another client requesting the server to download the same blob concurrently
return monitorDownload(ctx, mp, regOpts, fileDownload, fn)
}
return doDownload(ctx, mp, regOpts, fileDownload, fn)
}
var downloadMu sync.Mutex // mutex to check to resume a download while monitoring
// monitorDownload monitors the download progress of a blob and resumes it if it is interrupted
func monitorDownload(ctx context.Context, mp ModelPath, regOpts *RegistryOptions, f *FileDownload, fn func(api.ProgressResponse)) error {
tick := time.NewTicker(time.Second)
for range tick.C {
done, resume, err := func() (bool, bool, error) {
downloadMu.Lock()
defer downloadMu.Unlock()
val, downloading := inProgress.Load(f.Digest)
if !downloading {
// check once again if the download is complete
if fi, _ := os.Stat(f.FilePath); fi != nil {
// successful download while monitoring
fn(api.ProgressResponse{
Digest: f.Digest,
Total: int(fi.Size()),
Completed: int(fi.Size()),
})
return true, false, nil
}
// resume the download
inProgress.Store(f.Digest, f) // store the file download again to claim the resume
return false, true, nil
}
f, ok := val.(*FileDownload)
if !ok {
return false, false, fmt.Errorf("invalid type for in progress download: %T", val)
}
fn(api.ProgressResponse{
Status: fmt.Sprintf("downloading %s", f.Digest),
Digest: f.Digest,
Total: int(f.Total),
Completed: int(f.Completed),
})
return false, false, nil
}()
if err != nil {
return err
}
if done {
// done downloading
return nil
}
if resume {
return doDownload(ctx, mp, regOpts, f, fn)
}
}
return nil
}
var chunkSize = 1024 * 1024 // 1 MiB in bytes
// doDownload downloads a blob from the registry and stores it in the blobs directory
func doDownload(ctx context.Context, mp ModelPath, regOpts *RegistryOptions, f *FileDownload, fn func(api.ProgressResponse)) error {
var size int64
fi, err := os.Stat(f.FilePath + "-partial")
switch {
case errors.Is(err, os.ErrNotExist):
// noop, file doesn't exist so create it
case err != nil:
return fmt.Errorf("stat: %w", err)
default:
size = fi.Size()
// Ensure the size is divisible by the chunk size by removing excess bytes
size -= size % int64(chunkSize)
err := os.Truncate(f.FilePath+"-partial", size)
if err != nil {
return fmt.Errorf("truncate: %w", err)
}
}
url := fmt.Sprintf("%s/v2/%s/blobs/%s", mp.Registry, mp.GetNamespaceRepository(), f.Digest)
headers := map[string]string{
"Range": fmt.Sprintf("bytes=%d-", size),
}
resp, err := makeRequest("GET", url, headers, nil, regOpts)
if err != nil {
log.Printf("couldn't download blob: %v", err)
return err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusPartialContent {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("on download registry responded with code %d: %v", resp.StatusCode, string(body))
}
err = os.MkdirAll(path.Dir(f.FilePath), 0o700)
if err != nil {
return fmt.Errorf("make blobs directory: %w", err)
}
remaining, _ := strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
f.Completed = size
f.Total = remaining + f.Completed
inProgress.Store(f.Digest, f)
out, err := os.OpenFile(f.FilePath+"-partial", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
if err != nil {
return fmt.Errorf("open file: %w", err)
}
defer out.Close()
outerLoop:
for {
select {
case <-ctx.Done():
// handle client request cancellation
inProgress.Delete(f.Digest)
return nil
default:
fn(api.ProgressResponse{
Status: fmt.Sprintf("downloading %s", f.Digest),
Digest: f.Digest,
Total: int(f.Total),
Completed: int(f.Completed),
})
if f.Completed >= f.Total {
if err := out.Close(); err != nil {
return err
}
if err := os.Rename(f.FilePath+"-partial", f.FilePath); err != nil {
fn(api.ProgressResponse{
Status: fmt.Sprintf("error renaming file: %v", err),
Digest: f.Digest,
Total: int(f.Total),
Completed: int(f.Completed),
})
return err
}
break outerLoop
}
}
n, err := io.CopyN(out, resp.Body, int64(chunkSize))
if err != nil && !errors.Is(err, io.EOF) {
return err
}
f.Completed += n
inProgress.Store(f.Digest, f)
}
inProgress.Delete(f.Digest)
log.Printf("success getting %s\n", f.Digest)
return nil
}

View File

@@ -1,43 +1,53 @@
package server
import (
"bufio"
"bytes"
"context"
"crypto/sha256"
"encoding/json"
"errors"
"fmt"
"html/template"
"io"
"log"
"net/http"
"os"
"path"
"path/filepath"
"reflect"
"strconv"
"strings"
"text/template"
"github.com/jmorganca/ollama/api"
"github.com/jmorganca/ollama/llama"
"github.com/jmorganca/ollama/parser"
"github.com/jmorganca/ollama/vector"
)
type RegistryOptions struct {
Insecure bool
Username string
Password string
Token string
}
type Model struct {
Name string `json:"name"`
ModelPath string
Template string
System string
Digest string
Options map[string]interface{}
Name string `json:"name"`
ModelPath string
Template string
System string
Digest string
Options map[string]interface{}
Embeddings []vector.Embedding
}
func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
tmpl, err := template.New("").Parse(m.Template)
func (m *Model) Prompt(request api.GenerateRequest, embedding string) (string, error) {
t := m.Template
if request.Template != "" {
t = request.Template
}
tmpl, err := template.New("").Parse(t)
if err != nil {
return "", err
}
@@ -46,6 +56,7 @@ func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
First bool
System string
Prompt string
Embed string
// deprecated: versions <= 0.0.7 used this to omit the system prompt
Context []int
@@ -55,6 +66,11 @@ func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
vars.System = m.System
vars.Prompt = request.Prompt
vars.Context = request.Context
vars.Embed = embedding
if request.System != "" {
vars.System = request.System
}
var sb strings.Builder
if err := tmpl.Execute(&sb, vars); err != nil {
@@ -148,6 +164,16 @@ func GetModel(name string) (*Model, error) {
switch layer.MediaType {
case "application/vnd.ollama.image.model":
model.ModelPath = filename
case "application/vnd.ollama.image.embed":
file, err := os.Open(filename)
if err != nil {
return nil, fmt.Errorf("failed to open file: %s", filename)
}
defer file.Close()
if err = json.NewDecoder(file).Decode(&model.Embeddings); err != nil {
return nil, err
}
case "application/vnd.ollama.image.template":
bts, err := os.ReadFile(filename)
if err != nil {
@@ -186,7 +212,27 @@ func GetModel(name string) (*Model, error) {
return model, nil
}
func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) error {
func filenameWithPath(path, f string) (string, error) {
// if filePath starts with ~/, replace it with the user's home directory.
if strings.HasPrefix(f, "~/") {
parts := strings.Split(f, "/")
home, err := os.UserHomeDir()
if err != nil {
return "", fmt.Errorf("failed to open file: %v", err)
}
f = filepath.Join(home, filepath.Join(parts[1:]...))
}
// if filePath is not an absolute path, make it relative to the modelfile path
if !filepath.IsAbs(f) {
f = filepath.Join(filepath.Dir(path), f)
}
return f, nil
}
func CreateModel(ctx context.Context, name string, path string, fn func(resp api.ProgressResponse)) error {
mf, err := os.Open(path)
if err != nil {
fn(api.ProgressResponse{Status: fmt.Sprintf("couldn't open modelfile '%s'", path)})
@@ -202,52 +248,37 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
var layers []*LayerReader
params := make(map[string][]string)
embed := EmbeddingParams{fn: fn, opts: api.DefaultOptions()}
for _, c := range commands {
log.Printf("[%s] - %s\n", c.Name, c.Args)
switch c.Name {
case "model":
fn(api.ProgressResponse{Status: "looking for model"})
embed.model = c.Args
mf, err := GetManifest(ParseModelPath(c.Args))
if err != nil {
fp := c.Args
// If filePath starts with ~/, replace it with the user's home directory.
if strings.HasPrefix(fp, "~/") {
parts := strings.Split(fp, "/")
home, err := os.UserHomeDir()
if err != nil {
return fmt.Errorf("failed to open file: %v", err)
}
fp = filepath.Join(home, filepath.Join(parts[1:]...))
modelFile, err := filenameWithPath(path, c.Args)
if err != nil {
return err
}
// If filePath is not an absolute path, make it relative to the modelfile path
if !filepath.IsAbs(fp) {
fp = filepath.Join(filepath.Dir(path), fp)
}
if _, err := os.Stat(fp); err != nil {
if _, err := os.Stat(modelFile); err != nil {
// the model file does not exist, try pulling it
if errors.Is(err, os.ErrNotExist) {
fn(api.ProgressResponse{Status: "pulling model file"})
if err := PullModel(c.Args, &RegistryOptions{}, fn); err != nil {
if err := PullModel(ctx, c.Args, &RegistryOptions{}, fn); err != nil {
return err
}
mf, err = GetManifest(ParseModelPath(c.Args))
if err != nil {
return fmt.Errorf("failed to open file after pull: %v", err)
}
} else {
return err
}
} else {
// create a model from this specified file
fn(api.ProgressResponse{Status: "creating model layer"})
file, err := os.Open(fp)
file, err := os.Open(modelFile)
if err != nil {
return fmt.Errorf("failed to open file: %v", err)
}
@@ -271,9 +302,14 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
layers = append(layers, newLayer)
}
}
case "embed":
embedFilePath, err := filenameWithPath(path, c.Args)
if err != nil {
return err
}
embed.files = append(embed.files, embedFilePath)
case "license":
fn(api.ProgressResponse{Status: fmt.Sprintf("creating model %s layer", c.Name)})
// remove the prompt layer if one exists
mediaType := fmt.Sprintf("application/vnd.ollama.image.%s", c.Name)
layer, err := CreateLayer(strings.NewReader(c.Args))
@@ -306,18 +342,35 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
if len(params) > 0 {
fn(api.ProgressResponse{Status: "creating parameter layer"})
layers = removeLayerFromLayers(layers, "application/vnd.ollama.image.params")
paramData, err := paramsToReader(params)
formattedParams, err := formatParams(params)
if err != nil {
return fmt.Errorf("couldn't create params json: %v", err)
}
l, err := CreateLayer(paramData)
bts, err := json.Marshal(formattedParams)
if err != nil {
return err
}
l, err := CreateLayer(bytes.NewReader(bts))
if err != nil {
return fmt.Errorf("failed to create layer: %v", err)
}
l.MediaType = "application/vnd.ollama.image.params"
layers = append(layers, l)
// apply these parameters to the embedding options, in case embeddings need to be generated using this model
embed.opts = api.DefaultOptions()
embed.opts.FromMap(formattedParams)
}
// generate the embedding layers
embeddingLayers, err := embeddingLayers(embed)
if err != nil {
return err
}
layers = append(layers, embeddingLayers...)
digests, err := getLayerDigests(layers)
if err != nil {
return err
@@ -352,6 +405,117 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
return nil
}
type EmbeddingParams struct {
model string
opts api.Options
files []string // paths to files to embed
fn func(resp api.ProgressResponse)
}
// embeddingLayers loads the associated LLM and generates the embeddings to be stored from an input file
func embeddingLayers(e EmbeddingParams) ([]*LayerReader, error) {
layers := []*LayerReader{}
if len(e.files) > 0 {
if _, err := os.Stat(e.model); err != nil {
if os.IsNotExist(err) {
// this is a model name rather than the file
model, err := GetModel(e.model)
if err != nil {
return nil, fmt.Errorf("failed to get model to generate embeddings: %v", err)
}
e.model = model.ModelPath
} else {
return nil, fmt.Errorf("failed to get model file to generate embeddings: %v", err)
}
}
e.opts.EmbeddingOnly = true
llm, err := llama.New(e.model, e.opts)
if err != nil {
return nil, fmt.Errorf("load model to generate embeddings: %v", err)
}
defer func() {
if llm != nil {
llm.Close()
}
}()
addedFiles := make(map[string]bool) // keep track of files that have already been added
for _, filePattern := range e.files {
matchingFiles, err := filepath.Glob(filePattern)
if err != nil {
return nil, fmt.Errorf("could not find files with pattern %s: %w", filePattern, err)
}
for _, filePath := range matchingFiles {
if addedFiles[filePath] {
continue
}
addedFiles[filePath] = true
// TODO: check file type
f, err := os.Open(filePath)
if err != nil {
return nil, fmt.Errorf("could not open embed file: %w", err)
}
scanner := bufio.NewScanner(f)
scanner.Split(bufio.ScanLines)
data := []string{}
for scanner.Scan() {
data = append(data, scanner.Text())
}
f.Close()
// the digest of the file is set here so that the client knows a new operation is in progress
fileDigest, _ := GetSHA256Digest(bytes.NewReader([]byte(filePath)))
embeddings := []vector.Embedding{}
for i, d := range data {
if strings.TrimSpace(d) == "" {
continue
}
e.fn(api.ProgressResponse{
Status: fmt.Sprintf("creating embeddings for file %s", filePath),
Digest: fileDigest,
Total: len(data) - 1,
Completed: i,
})
embed, err := llm.Embedding(d)
if err != nil {
log.Printf("failed to generate embedding for '%s' line %d: %v", filePath, i+1, err)
continue
}
embeddings = append(embeddings, vector.Embedding{Data: d, Vector: embed})
}
b, err := json.Marshal(embeddings)
if err != nil {
return nil, fmt.Errorf("failed to encode embeddings: %w", err)
}
r := bytes.NewReader(b)
digest, size := GetSHA256Digest(r)
// Reset the position of the reader after calculating the digest
if _, err := r.Seek(0, io.SeekStart); err != nil {
return nil, fmt.Errorf("could not reset embed reader: %w", err)
}
layer := &LayerReader{
Layer: Layer{
MediaType: "application/vnd.ollama.image.embed",
Digest: digest,
Size: size,
},
Reader: r,
}
layers = append(layers, layer)
}
}
}
return layers, nil
}
func removeLayerFromLayers(layers []*LayerReader, mediaType string) []*LayerReader {
j := 0
for _, l := range layers {
@@ -440,8 +604,8 @@ func GetLayerWithBufferFromLayer(layer *Layer) (*LayerReader, error) {
return newLayer, nil
}
// paramsToReader converts specified parameter options to their correct types, and returns a reader for the json
func paramsToReader(params map[string][]string) (io.ReadSeeker, error) {
// formatParams converts specified parameter options to their correct types
func formatParams(params map[string][]string) (map[string]interface{}, error) {
opts := api.Options{}
valueOpts := reflect.ValueOf(&opts).Elem() // names of the fields in the options struct
typeOpts := reflect.TypeOf(opts) // types of the fields in the options struct
@@ -495,12 +659,7 @@ func paramsToReader(params map[string][]string) (io.ReadSeeker, error) {
}
}
bts, err := json.Marshal(out)
if err != nil {
return nil, err
}
return bytes.NewReader(bts), nil
return out, nil
}
func getLayerDigests(layers []*LayerReader) ([]string, error) {
@@ -720,7 +879,7 @@ func PushModel(name string, regOpts *RegistryOptions, fn func(api.ProgressRespon
return nil
}
func PullModel(name string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
func PullModel(ctx context.Context, name string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
mp := ParseModelPath(name)
fn(api.ProgressResponse{Status: "pulling manifest"})
@@ -735,7 +894,7 @@ func PullModel(name string, regOpts *RegistryOptions, fn func(api.ProgressRespon
layers = append(layers, &manifest.Config)
for _, layer := range layers {
if err := downloadBlob(mp, layer.Digest, regOpts, fn); err != nil {
if err := downloadBlob(ctx, mp, layer.Digest, regOpts, fn); err != nil {
return err
}
}
@@ -962,112 +1121,6 @@ func uploadBlobChunked(mp ModelPath, url string, layer *Layer, regOpts *Registry
return nil
}
func downloadBlob(mp ModelPath, digest string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
fp, err := GetBlobsPath(digest)
if err != nil {
return err
}
if fi, _ := os.Stat(fp); fi != nil {
// we already have the file, so return
fn(api.ProgressResponse{
Digest: digest,
Total: int(fi.Size()),
Completed: int(fi.Size()),
})
return nil
}
var size int64
chunkSize := 1024 * 1024 // 1 MiB in bytes
fi, err := os.Stat(fp + "-partial")
switch {
case errors.Is(err, os.ErrNotExist):
// noop, file doesn't exist so create it
case err != nil:
return fmt.Errorf("stat: %w", err)
default:
size = fi.Size()
// Ensure the size is divisible by the chunk size by removing excess bytes
size -= size % int64(chunkSize)
err := os.Truncate(fp+"-partial", size)
if err != nil {
return fmt.Errorf("truncate: %w", err)
}
}
url := fmt.Sprintf("%s/v2/%s/blobs/%s", mp.Registry, mp.GetNamespaceRepository(), digest)
headers := map[string]string{
"Range": fmt.Sprintf("bytes=%d-", size),
}
resp, err := makeRequest("GET", url, headers, nil, regOpts)
if err != nil {
log.Printf("couldn't download blob: %v", err)
return err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusPartialContent {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("on download registry responded with code %d: %v", resp.StatusCode, string(body))
}
err = os.MkdirAll(path.Dir(fp), 0o700)
if err != nil {
return fmt.Errorf("make blobs directory: %w", err)
}
out, err := os.OpenFile(fp+"-partial", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
if err != nil {
return fmt.Errorf("open file: %w", err)
}
defer out.Close()
remaining, _ := strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
completed := size
total := remaining + completed
for {
fn(api.ProgressResponse{
Status: fmt.Sprintf("downloading %s", digest),
Digest: digest,
Total: int(total),
Completed: int(completed),
})
if completed >= total {
if err := out.Close(); err != nil {
return err
}
if err := os.Rename(fp+"-partial", fp); err != nil {
fn(api.ProgressResponse{
Status: fmt.Sprintf("error renaming file: %v", err),
Digest: digest,
Total: int(total),
Completed: int(completed),
})
return err
}
break
}
n, err := io.CopyN(out, resp.Body, int64(chunkSize))
if err != nil && !errors.Is(err, io.EOF) {
return err
}
completed += n
}
log.Printf("success getting %s\n", digest)
return nil
}
func makeRequest(method, url string, headers map[string]string, body io.Reader, regOpts *RegistryOptions) (*http.Response, error) {
if !strings.HasPrefix(url, "http") {
if regOpts.Insecure {
@@ -1077,18 +1130,30 @@ func makeRequest(method, url string, headers map[string]string, body io.Reader,
}
}
req, err := http.NewRequest(method, url, body)
// make a copy of the body in case we need to try the call to makeRequest again
var buf bytes.Buffer
if body != nil {
_, err := io.Copy(&buf, body)
if err != nil {
return nil, err
}
}
bodyCopy := bytes.NewReader(buf.Bytes())
req, err := http.NewRequest(method, url, bodyCopy)
if err != nil {
return nil, err
}
for k, v := range headers {
req.Header.Set(k, v)
if regOpts.Token != "" {
req.Header.Set("Authorization", "Bearer "+regOpts.Token)
} else if regOpts.Username != "" && regOpts.Password != "" {
req.SetBasicAuth(regOpts.Username, regOpts.Password)
}
// TODO: better auth
if regOpts.Username != "" && regOpts.Password != "" {
req.SetBasicAuth(regOpts.Username, regOpts.Password)
for k, v := range headers {
req.Header.Set(k, v)
}
client := &http.Client{
@@ -1105,9 +1170,55 @@ func makeRequest(method, url string, headers map[string]string, body io.Reader,
return nil, err
}
// if the request is unauthenticated, try to authenticate and make the request again
if resp.StatusCode == http.StatusUnauthorized {
auth := resp.Header.Get("Www-Authenticate")
authRedir := ParseAuthRedirectString(string(auth))
token, err := getAuthToken(authRedir, regOpts)
if err != nil {
return nil, err
}
regOpts.Token = token
bodyCopy = bytes.NewReader(buf.Bytes())
return makeRequest(method, url, headers, bodyCopy, regOpts)
}
return resp, nil
}
func getValue(header, key string) string {
startIdx := strings.Index(header, key+"=")
if startIdx == -1 {
return ""
}
// Move the index to the starting quote after the key.
startIdx += len(key) + 2
endIdx := startIdx
for endIdx < len(header) {
if header[endIdx] == '"' {
if endIdx+1 < len(header) && header[endIdx+1] != ',' { // If the next character isn't a comma, continue
endIdx++
continue
}
break
}
endIdx++
}
return header[startIdx:endIdx]
}
func ParseAuthRedirectString(authStr string) AuthRedirect {
authStr = strings.TrimPrefix(authStr, "Bearer ")
return AuthRedirect{
Realm: getValue(authStr, "realm"),
Service: getValue(authStr, "service"),
Scope: getValue(authStr, "scope"),
}
}
var errDigestMismatch = fmt.Errorf("digest mismatch, file must be downloaded again")
func verifyBlob(digest string) error {

View File

@@ -1,6 +1,7 @@
package server
import (
"context"
"encoding/json"
"errors"
"fmt"
@@ -17,15 +18,18 @@ import (
"github.com/gin-contrib/cors"
"github.com/gin-gonic/gin"
"gonum.org/v1/gonum/mat"
"github.com/jmorganca/ollama/api"
"github.com/jmorganca/ollama/llama"
"github.com/jmorganca/ollama/vector"
)
var loaded struct {
mu sync.Mutex
llm *llama.LLM
llm *llama.LLM
Embeddings []vector.Embedding
expireAt time.Time
expireTimer *time.Timer
@@ -34,6 +38,81 @@ var loaded struct {
options api.Options
}
// load a model into memory if it is not already loaded, it is up to the caller to lock loaded.mu before calling this function
func load(model *Model, reqOpts map[string]interface{}, sessionDuration time.Duration) error {
opts := api.DefaultOptions()
if err := opts.FromMap(model.Options); err != nil {
log.Printf("could not load model options: %v", err)
return err
}
if err := opts.FromMap(reqOpts); err != nil {
log.Printf("could not merge model options: %v", err)
return err
}
if model.Digest != loaded.digest || !reflect.DeepEqual(loaded.options, opts) {
if loaded.llm != nil {
loaded.llm.Close()
loaded.llm = nil
loaded.digest = ""
}
if model.Embeddings != nil && len(model.Embeddings) > 0 {
opts.EmbeddingOnly = true // this is requried to generate embeddings, completions will still work
loaded.Embeddings = model.Embeddings
}
llm, err := llama.New(model.ModelPath, opts)
if err != nil {
return err
}
if opts.NumKeep < 0 {
promptWithSystem, err := model.Prompt(api.GenerateRequest{}, "")
if err != nil {
return err
}
promptNoSystem, err := model.Prompt(api.GenerateRequest{Context: []int{0}}, "")
if err != nil {
return err
}
tokensWithSystem := llm.Encode(promptWithSystem)
tokensNoSystem := llm.Encode(promptNoSystem)
llm.NumKeep = len(tokensWithSystem) - len(tokensNoSystem) + 1
}
loaded.llm = llm
loaded.digest = model.Digest
loaded.options = opts
}
loaded.expireAt = time.Now().Add(sessionDuration)
if loaded.expireTimer == nil {
loaded.expireTimer = time.AfterFunc(sessionDuration, func() {
loaded.mu.Lock()
defer loaded.mu.Unlock()
if time.Now().Before(loaded.expireAt) {
return
}
if loaded.llm == nil {
return
}
loaded.llm.Close()
loaded.llm = nil
loaded.digest = ""
})
}
loaded.expireTimer.Reset(sessionDuration)
return nil
}
func GenerateHandler(c *gin.Context) {
loaded.mu.Lock()
defer loaded.mu.Unlock()
@@ -52,63 +131,30 @@ func GenerateHandler(c *gin.Context) {
return
}
opts := api.DefaultOptions()
if err := opts.FromMap(model.Options); err != nil {
log.Printf("could not load model options: %v", err)
sessionDuration := 5 * time.Minute
if err := load(model, req.Options, sessionDuration); err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
if err := opts.FromMap(req.Options); err != nil {
log.Printf("could not merge model options: %v", err)
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
checkpointLoaded := time.Now()
if model.Digest != loaded.digest || !reflect.DeepEqual(loaded.options, opts) {
if loaded.llm != nil {
loaded.llm.Close()
loaded.llm = nil
loaded.digest = ""
}
llm, err := llama.New(model.ModelPath, opts)
embedding := ""
if model.Embeddings != nil && len(model.Embeddings) > 0 {
promptEmbed, err := loaded.llm.Embedding(req.Prompt)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
loaded.llm = llm
loaded.digest = model.Digest
loaded.options = opts
// TODO: set embed_top from specified parameters in modelfile
embed_top := 3
topK := vector.TopK(embed_top, mat.NewVecDense(len(promptEmbed), promptEmbed), loaded.Embeddings)
for _, e := range topK {
embedding = fmt.Sprintf("%s %s", embedding, e.Embedding.Data)
}
}
sessionDuration := 5 * time.Minute
loaded.expireAt = time.Now().Add(sessionDuration)
if loaded.expireTimer == nil {
loaded.expireTimer = time.AfterFunc(sessionDuration, func() {
loaded.mu.Lock()
defer loaded.mu.Unlock()
if time.Now().Before(loaded.expireAt) {
return
}
if loaded.llm == nil {
return
}
loaded.llm.Close()
loaded.llm = nil
loaded.digest = ""
})
}
loaded.expireTimer.Reset(sessionDuration)
checkpointLoaded := time.Now()
prompt, err := model.Prompt(req)
prompt, err := model.Prompt(req, embedding)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
@@ -139,6 +185,44 @@ func GenerateHandler(c *gin.Context) {
streamResponse(c, ch)
}
func EmbeddingHandler(c *gin.Context) {
loaded.mu.Lock()
defer loaded.mu.Unlock()
var req api.EmbeddingRequest
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
model, err := GetModel(req.Model)
if err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
if err := load(model, req.Options, 5*time.Minute); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
if !loaded.options.EmbeddingOnly {
c.JSON(http.StatusBadRequest, gin.H{"error": "embedding option must be set to true"})
return
}
embedding, err := loaded.llm.Embedding(req.Prompt)
if err != nil {
log.Printf("embedding generation failed: %v", err)
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to generate embedding"})
return
}
resp := api.EmbeddingResponse{
Embedding: embedding,
}
c.JSON(http.StatusOK, resp)
}
func PullModelHandler(c *gin.Context) {
var req api.PullRequest
if err := c.ShouldBindJSON(&req); err != nil {
@@ -159,7 +243,10 @@ func PullModelHandler(c *gin.Context) {
Password: req.Password,
}
if err := PullModel(req.Name, regOpts, fn); err != nil {
ctx, cancel := context.WithCancel(c.Request.Context())
defer cancel()
if err := PullModel(ctx, req.Name, regOpts, fn); err != nil {
ch <- gin.H{"error": err.Error()}
}
}()
@@ -209,7 +296,10 @@ func CreateModelHandler(c *gin.Context) {
ch <- resp
}
if err := CreateModel(req.Name, req.Path, fn); err != nil {
ctx, cancel := context.WithCancel(c.Request.Context())
defer cancel()
if err := CreateModel(ctx, req.Name, req.Path, fn); err != nil {
ch <- gin.H{"error": err.Error()}
}
}()
@@ -301,11 +391,10 @@ func CopyModelHandler(c *gin.Context) {
}
}
func Serve(ln net.Listener) error {
func Serve(ln net.Listener, origins []string) error {
config := cors.DefaultConfig()
config.AllowWildcard = true
// only allow http/https from localhost
config.AllowOrigins = []string{
config.AllowOrigins = append(origins, []string{
"http://localhost",
"http://localhost:*",
"https://localhost",
@@ -314,7 +403,11 @@ func Serve(ln net.Listener) error {
"http://127.0.0.1:*",
"https://127.0.0.1",
"https://127.0.0.1:*",
}
"http://0.0.0.0",
"http://0.0.0.0:*",
"https://0.0.0.0",
"https://0.0.0.0:*",
}...)
r := gin.Default()
r.Use(cors.New(config))
@@ -328,6 +421,7 @@ func Serve(ln net.Listener) error {
r.POST("/api/pull", PullModelHandler)
r.POST("/api/generate", GenerateHandler)
r.POST("/api/embeddings", EmbeddingHandler)
r.POST("/api/create", CreateModelHandler)
r.POST("/api/push", PushModelHandler)
r.POST("/api/copy", CopyModelHandler)
@@ -343,6 +437,7 @@ func Serve(ln net.Listener) error {
}
func streamResponse(c *gin.Context, ch chan any) {
c.Header("Content-Type", "application/x-ndjson")
c.Stream(func(w io.Writer) bool {
val, ok := <-ch
if !ok {

69
vector/store.go Normal file
View File

@@ -0,0 +1,69 @@
package vector
import (
"container/heap"
"sort"
"gonum.org/v1/gonum/mat"
)
type Embedding struct {
Vector []float64 // the embedding vector
Data string // the data represted by the embedding
}
type EmbeddingSimilarity struct {
Embedding Embedding // the embedding that was used to calculate the similarity
Similarity float64 // the similarity between the embedding and the query
}
type Heap []EmbeddingSimilarity
func (h Heap) Len() int { return len(h) }
func (h Heap) Less(i, j int) bool { return h[i].Similarity < h[j].Similarity }
func (h Heap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *Heap) Push(e any) {
*h = append(*h, e.(EmbeddingSimilarity))
}
func (h *Heap) Pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
// cosineSimilarity is a measure that calculates the cosine of the angle between two vectors.
// This value will range from -1 to 1, where 1 means the vectors are identical.
func cosineSimilarity(vec1, vec2 *mat.VecDense) float64 {
dotProduct := mat.Dot(vec1, vec2)
norms := mat.Norm(vec1, 2) * mat.Norm(vec2, 2)
if norms == 0 {
return 0
}
return dotProduct / norms
}
func TopK(k int, query *mat.VecDense, embeddings []Embedding) []EmbeddingSimilarity {
h := &Heap{}
heap.Init(h)
for _, emb := range embeddings {
similarity := cosineSimilarity(query, mat.NewVecDense(len(emb.Vector), emb.Vector))
heap.Push(h, EmbeddingSimilarity{Embedding: emb, Similarity: similarity})
if h.Len() > k {
heap.Pop(h)
}
}
topK := make([]EmbeddingSimilarity, 0, h.Len())
for h.Len() > 0 {
topK = append(topK, heap.Pop(h).(EmbeddingSimilarity))
}
sort.Slice(topK, func(i, j int) bool {
return topK[i].Similarity > topK[j].Similarity
})
return topK
}

View File

@@ -1,3 +0,0 @@
{
"extends": "next/core-web-vitals"
}

35
web/.gitignore vendored
View File

@@ -1,35 +0,0 @@
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
# dependencies
/node_modules
/.pnp
.pnp.js
# testing
/coverage
# next.js
/.next/
/out/
# production
/build
# misc
.DS_Store
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# local env files
.env*.local
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts

View File

@@ -1,9 +0,0 @@
# Ollama.ai
This website renders helpful information, blog posts, docs and more for the Ollama project.
## Develop
```bash
npm run dev
```

View File

@@ -1,27 +0,0 @@
import { Analytics } from '@segment/analytics-node'
import { v4 as uuid } from 'uuid'
const analytics = new Analytics({ writeKey: process.env.TELEMETRY_WRITE_KEY || '<empty>' })
export async function POST(req: Request) {
const { email } = await req.json()
const id = uuid()
await analytics.identify({
anonymousId: id,
traits: {
email,
},
})
await analytics.track({
anonymousId: id,
event: 'signup',
properties: {
email,
},
})
return new Response(null, { status: 200 })
}

View File

@@ -1,43 +0,0 @@
import { NextResponse } from 'next/server'
import semver from 'semver'
export async function GET(req: Request) {
const { searchParams } = new URL(req.url)
const os = searchParams.get('os') || 'darwin'
const version = searchParams.get('version') || '0.0.0'
if (!version) {
return new Response('not found', { status: 404 })
}
const res = await fetch('https://api.github.com/repos/jmorganca/ollama/releases', { next: { revalidate: 60 } })
const data = await res.json()
const latest = data?.filter((f: any) => !f.prerelease)?.[0]
if (!latest) {
return new Response('not found', { status: 404 })
}
const assets = latest.assets || []
if (assets.length === 0) {
return new Response('not found', { status: 404 })
}
// todo: get the correct asset for the current arch/os
const asset = assets.find((a: any) => a.name.toLowerCase().includes(os) && a.name.toLowerCase().includes('.zip'))
if (!asset) {
return new Response('not found', { status: 404 })
}
console.log(asset)
if (semver.lt(version, latest.tag_name)) {
return NextResponse.json({ version: data.tag_name, url: asset.browser_download_url })
}
return new Response(null, { status: 204 })
}

View File

@@ -1,11 +0,0 @@
'use client'
import { useEffect } from 'react'
export default function Downloader({ url }: { url: string }) {
useEffect(() => {
window.location.href = url
}, [])
return null
}

View File

@@ -1,47 +0,0 @@
import Image from 'next/image'
import Header from '../header'
import Downloader from './downloader'
import Signup from './signup'
export default async function Download() {
const res = await fetch('https://api.github.com/repos/jmorganca/ollama/releases', { next: { revalidate: 60 } })
const data = await res.json()
if (data.length === 0) {
return null
}
const latest = data[0]
const assets = latest.assets || []
if (assets.length === 0) {
return null
}
// todo: get the correct asset for the current arch/os
const asset = assets.find(
(a: any) => a.name.toLowerCase().includes('darwin') && a.name.toLowerCase().includes('.zip')
)
if (!asset) {
return null
}
return (
<>
<Header />
<main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 lg:p-32 items-center mx-auto'>
<Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
<section className='mt-12 mb-8 text-center'>
<h2 className='my-2 max-w-md text-3xl tracking-tight'>Downloading...</h2>
<h3 className='text-base text-neutral-500 mt-12 max-w-[16rem]'>
While Ollama downloads, sign up to get notified of new updates.
</h3>
<Downloader url={asset.browser_download_url} />
</section>
<Signup />
</main>
</>
)
}

View File

@@ -1,51 +0,0 @@
'use client'
import { useState } from 'react'
export default function Signup() {
const [email, setEmail] = useState('')
const [submitting, setSubmitting] = useState(false)
const [success, setSuccess] = useState(false)
return (
<form
onSubmit={async e => {
e.preventDefault()
setSubmitting(true)
await fetch('/api/signup', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ email }),
})
setSubmitting(false)
setSuccess(true)
setEmail('')
return false
}}
className='flex self-stretch flex-col gap-3 h-32 md:mx-40 lg:mx-72'
>
<input
required
autoFocus
value={email}
onChange={e => setEmail(e.target.value)}
type='email'
placeholder='your@email.com'
className='border border-neutral-200 rounded-lg px-4 py-2 focus:outline-none placeholder-neutral-300'
/>
<input
type='submit'
value='Get updates'
disabled={submitting}
className='bg-black text-white disabled:text-neutral-200 disabled:bg-neutral-700 rounded-full px-4 py-2 focus:outline-none cursor-pointer'
/>
{success && <p className='text-center text-sm'>You&apos;re signed up for updates</p>}
</form>
)
}

View File

@@ -1,3 +0,0 @@
@tailwind base;
@tailwind components;
@tailwind utilities;

View File

@@ -1,26 +0,0 @@
import Link from "next/link"
const navigation = [
{ name: 'Discord', href: 'https://discord.com/invite/ollama' },
{ name: 'GitHub', href: 'https://github.com/jmorganca/ollama' },
{ name: 'Download', href: '/download' },
]
export default function Header() {
return (
<header className="absolute inset-x-0 top-0 z-50">
<nav className="mx-auto flex items-center justify-between px-10 py-4">
<Link className="flex-1 font-bold" href="/">
Ollama
</Link>
<div className="flex space-x-8">
{navigation.map((item) => (
<Link key={item.name} href={item.href} className="text-sm leading-6 text-gray-900">
{item.name}
</Link>
))}
</div>
</nav>
</header>
)
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.8 KiB

View File

@@ -1,14 +0,0 @@
import './globals.css'
export const metadata = {
title: 'Ollama',
description: 'A tool for running large language models',
}
export default function RootLayout({ children }: { children: React.ReactNode }) {
return (
<html lang='en'>
<body className='antialiased'>{children}</body>
</html>
)
}

View File

@@ -1,37 +0,0 @@
import Image from 'next/image'
import Link from 'next/link'
import Header from './header'
export default async function Home() {
return (
<>
<Header />
<main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 md:p-32 items-center mx-auto'>
<Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
<section className='my-12 text-center'>
<div className='flex flex-col space-y-2'>
<h2 className='md:max-w-md mx-auto my-2 text-3xl tracking-tight'>
Get up and running with large language models, locally.
</h2>
<h3 className='md:max-w-xs mx-auto text-base text-neutral-500'>
Run Llama 2 and other models on macOS. Customize and create your own.
</h3>
</div>
<div className='mx-auto max-w-xs flex flex-col space-y-4 mt-12'>
<Link
href='/download'
className='md:mx-10 lg:mx-14 bg-black text-white rounded-full px-4 py-2 focus:outline-none cursor-pointer'
>
Download
</Link>
<p className='text-neutral-500 text-sm '>
Available for macOS with Apple Silicon <br />
Windows & Linux support coming soon.
</p>
</div>
</section>
</main>
</>
)
}

View File

@@ -1,7 +0,0 @@
{
"compilerOptions": {
"paths": {
"@/*": ["./*"]
}
}
}

View File

@@ -1,4 +0,0 @@
/** @type {import('next').NextConfig} */
const nextConfig = {}
module.exports = nextConfig

4696
web/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,37 +0,0 @@
{
"name": "web",
"version": "0.0.0",
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "next lint"
},
"dependencies": {
"@octokit/rest": "^19.0.13",
"@octokit/types": "^11.0.0",
"@segment/analytics-node": "^1.0.0",
"@types/node": "20.4.0",
"@types/react": "18.2.14",
"@types/react-dom": "18.2.6",
"autoprefixer": "10.4.14",
"encoding": "^0.1.13",
"eslint": "8.44.0",
"eslint-config-next": "13.4.7",
"next": "13.4.9",
"postcss": "8.4.24",
"react": "18.2.0",
"react-dom": "18.2.0",
"react-icons": "^4.10.1",
"semver": "^7.5.3",
"tailwindcss": "3.3.2",
"typescript": "5.1.6",
"uuid": "^9.0.0"
},
"devDependencies": {
"@types/semver": "^7.5.0",
"@types/uuid": "^9.0.2",
"prettier": "^3.0.0",
"prettier-plugin-tailwindcss": "^0.4.0"
}
}

View File

@@ -1,6 +0,0 @@
module.exports = {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.3 KiB

View File

@@ -1,18 +0,0 @@
/** @type {import('tailwindcss').Config} */
module.exports = {
content: [
'./pages/**/*.{js,ts,jsx,tsx,mdx}',
'./components/**/*.{js,ts,jsx,tsx,mdx}',
'./app/**/*.{js,ts,jsx,tsx,mdx}',
],
theme: {
extend: {
fontFamily: {
sans: ['sans-serif'],
serif: ['serif'],
monospace: ['monospace'],
},
},
},
plugins: [],
}

View File

@@ -1,28 +0,0 @@
{
"compilerOptions": {
"target": "es5",
"lib": ["dom", "dom.iterable", "esnext"],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"forceConsistentCasingInFileNames": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "node",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"incremental": true,
"plugins": [
{
"name": "next"
}
],
"paths": {
"@/*": ["./*"]
}
},
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
"exclude": ["node_modules"]
}

View File

@@ -1,5 +0,0 @@
{
"github": {
"silent": true
}
}