cmd: use environment variables for server options

guard around id()
Token auth (#314 )
2023-08-10 14:17:53 -07:00 · 2023-08-10 14:11:54 -07:00 · 2023-08-10 11:34:25 -07:00 · 2023-08-10 09:57:49 -07:00 · 2023-08-10 09:54:03 -07:00 · 2023-08-10 09:53:46 -07:00
52 changed files with 1281 additions and 5950 deletions
--- a/README.md
+++ b/README.md
@@ -9,13 +9,13 @@

 [![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama)

-> Note: Ollama is in early preview. Please report any issues you find.
-
 Run, create, and share large language models (LLMs).

+> Note: Ollama is in early preview. Please report any issues you find.
+
 ## Download

- [Download](https://ollama.ai/download) for macOS on Apple Silicon (Intel coming soon)
+- [Download](https://ollama.ai/download) for macOS
 - Download for Windows and Linux (coming soon)
 - Build [from source](#building)

@@ -34,8 +34,9 @@ ollama run llama2
 | Model                    | Parameters | Size  | Download                        |
 | ------------------------ | ---------- | ----- | ------------------------------- |
 | Llama2                   | 7B         | 3.8GB | `ollama pull llama2`            |
-| Llama2 Uncensored        | 7B         | 3.8GB | `ollama pull llama2-uncensored` |
 | Llama2 13B               | 13B        | 7.3GB | `ollama pull llama2:13b`        |
+| Llama2 70B               | 70B        | 39GB  | `ollama pull llama2:70b`        |
+| Llama2 Uncensored        | 7B         | 3.8GB | `ollama pull llama2-uncensored` |
 | Orca Mini                | 3B         | 1.9GB | `ollama pull orca`              |
 | Vicuna                   | 7B         | 3.8GB | `ollama pull vicuna`            |
 | Nous-Hermes              | 13B        | 7.3GB | `ollama pull nous-hermes`       |
@@ -53,6 +54,15 @@ ollama run llama2
 Hello! How can I help you today?
 ```

+For multiline input, you can wrap text with `"""`:
+
+```
+>>> """Hello,
+... world!
+... """
+I'm a basic program that prints the famous "Hello, world!" message to the console.
+```
+
 ### Create a custom model

 Pull a base model:
@@ -60,6 +70,7 @@ Pull a base model:
 ```
 ollama pull llama2
 ```
+
 > To update a model to the latest version, run `ollama pull llama2` again. The model will be updated (if necessary).

 Create a `Modelfile`:
@@ -85,9 +96,7 @@ ollama run mario
 Hello! It's your friend Mario.
 ```

-For more examples, see the [examples](./examples) directory.
-
-For more information on creating a Modelfile, see the [Modelfile](./docs/modelfile.md) documentation.
+For more examples, see the [examples](./examples) directory. For more information on creating a Modelfile, see the [Modelfile](./docs/modelfile.md) documentation.

 ### Pull a model from the registry

@@ -132,24 +141,20 @@ Finally, run a model!

 ## REST API

-### `POST /api/generate`
+> See the [API documentation](./docs/api.md) for all endpoints.

-Generate text from a model.
+Ollama has an API for running and managing models. For example to generate text from a model:

 ```
-curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt":"Why is the sky blue?"}'
+curl -X POST http://localhost:11434/api/generate -d '{
+  "model": "llama2",
+  "prompt":"Why is the sky blue?"
+}'
 ```

-### `POST /api/create`
-
-Create a model from a `Modelfile`.
-
-```
-curl -X POST http://localhost:11434/api/create -d '{"name": "my-model", "path": "/path/to/modelfile"}'
-```
-
-## Projects built with Ollama
+## Tools using Ollama

+- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with a question-answering [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa).
 - [Continue](https://github.com/continuedev/continue) - embeds Ollama inside Visual Studio Code. The extension lets you highlight code to add to the prompt, ask questions in the sidebar, and generate code inline.
 - [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot) - interact with Ollama as a chatbot on Discord.
 - [Raycast Ollama](https://github.com/MassimilianoPasquini97/raycast_ollama) - Raycast extension to use Ollama for local llama inference on Raycast.
--- a/api/types.go
+++ b/api/types.go
@@ -33,13 +33,26 @@ func (e StatusError) Error() string {
 }

 type GenerateRequest struct {
-	Model   string `json:"model"`
-	Prompt  string `json:"prompt"`
-	Context []int  `json:"context,omitempty"`
+	Model    string `json:"model"`
+	Prompt   string `json:"prompt"`
+	System   string `json:"system"`
+	Template string `json:"template"`
+	Context  []int  `json:"context,omitempty"`

 	Options map[string]interface{} `json:"options"`
 }

+type EmbeddingRequest struct {
+	Model  string `json:"model"`
+	Prompt string `json:"prompt"`
+
+	Options map[string]interface{} `json:"options"`
+}
+
+type EmbeddingResponse struct {
+	Embedding []float64 `json:"embedding"`
+}
+
 type CreateRequest struct {
 	Name string `json:"name"`
 	Path string `json:"path"`
@@ -85,6 +98,10 @@ type ListResponseModel struct {
 	Size       int       `json:"size"`
 }

+type TokenResponse struct {
+	Token string `json:"token"`
+}
+
 type GenerateResponse struct {
 	Model     string    `json:"model"`
 	CreatedAt time.Time `json:"created_at"`
@@ -147,19 +164,21 @@ type Options struct {
 	UseNUMA bool `json:"numa,omitempty"`

 	// Model options
-	NumCtx        int  `json:"num_ctx,omitempty"`
-	NumKeep       int  `json:"num_keep,omitempty"`
-	NumBatch      int  `json:"num_batch,omitempty"`
-	NumGQA        int  `json:"num_gqa,omitempty"`
-	NumGPU        int  `json:"num_gpu,omitempty"`
-	MainGPU       int  `json:"main_gpu,omitempty"`
-	LowVRAM       bool `json:"low_vram,omitempty"`
-	F16KV         bool `json:"f16_kv,omitempty"`
-	LogitsAll     bool `json:"logits_all,omitempty"`
-	VocabOnly     bool `json:"vocab_only,omitempty"`
-	UseMMap       bool `json:"use_mmap,omitempty"`
-	UseMLock      bool `json:"use_mlock,omitempty"`
-	EmbeddingOnly bool `json:"embedding_only,omitempty"`
+	NumCtx             int     `json:"num_ctx,omitempty"`
+	NumKeep            int     `json:"num_keep,omitempty"`
+	NumBatch           int     `json:"num_batch,omitempty"`
+	NumGQA             int     `json:"num_gqa,omitempty"`
+	NumGPU             int     `json:"num_gpu,omitempty"`
+	MainGPU            int     `json:"main_gpu,omitempty"`
+	LowVRAM            bool    `json:"low_vram,omitempty"`
+	F16KV              bool    `json:"f16_kv,omitempty"`
+	LogitsAll          bool    `json:"logits_all,omitempty"`
+	VocabOnly          bool    `json:"vocab_only,omitempty"`
+	UseMMap            bool    `json:"use_mmap,omitempty"`
+	UseMLock           bool    `json:"use_mlock,omitempty"`
+	EmbeddingOnly      bool    `json:"embedding_only,omitempty"`
+	RopeFrequencyBase  float32 `json:"rope_frequency_base,omitempty"`
+	RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`

 	// Predict options
 	RepeatLastN      int      `json:"repeat_last_n,omitempty"`
@@ -261,14 +280,18 @@ func DefaultOptions() Options {

 		UseNUMA: false,

-		NumCtx:   2048,
-		NumBatch: 512,
-		NumGPU:   1,
-		NumGQA:   1,
-		LowVRAM:  false,
-		F16KV:    true,
-		UseMMap:  true,
-		UseMLock: false,
+		NumCtx:             2048,
+		NumKeep:            -1,
+		NumBatch:           512,
+		NumGPU:             1,
+		NumGQA:             1,
+		LowVRAM:            false,
+		F16KV:              true,
+		UseMMap:            true,
+		UseMLock:           false,
+		RopeFrequencyBase:  10000.0,
+		RopeFrequencyScale: 1.0,
+		EmbeddingOnly:      true,

 		RepeatLastN:      64,
 		RepeatPenalty:    1.1,
--- a/app/src/index.ts
+++ b/app/src/index.ts
@@ -71,7 +71,6 @@ function firstRunWindow() {
      nodeIntegration: true,
      contextIsolation: false,
    },
-    alwaysOnTop: true,
  })

  require('@electron/remote/main').enable(welcomeWindow.webContents)
@@ -237,13 +236,18 @@ app.on('window-all-closed', () => {

 // In this file you can include the rest of your app's specific main process
 // code. You can also put them in separate files and import them here.
+let aid = ''
+try {
+  aid = id()
+} catch (e) {}
+
 autoUpdater.setFeedURL({
-  url: `https://ollama.ai/api/update?os=${process.platform}&arch=${process.arch}&version=${app.getVersion()}`,
+  url: `https://ollama.ai/api/update?os=${process.platform}&arch=${process.arch}&version=${app.getVersion()}&id=${aid}`,
 })

 async function heartbeat() {
  analytics.track({
-    anonymousId: id(),
+    anonymousId: aid,
    event: 'heartbeat',
    properties: {
      version: app.getVersion(),
--- a/cmd/cmd.go
+++ b/cmd/cmd.go
@@ -48,12 +48,18 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 				spinner.Stop()
 			}
 			currentDigest = resp.Digest
-			bar = progressbar.DefaultBytes(
-				int64(resp.Total),
-				fmt.Sprintf("pulling %s...", resp.Digest[7:19]),
-			)
-
-			bar.Set(resp.Completed)
+			switch {
+			case strings.Contains(resp.Status, "embeddings"):
+				bar = progressbar.Default(int64(resp.Total), resp.Status)
+				bar.Set(resp.Completed)
+			default:
+				// pulling
+				bar = progressbar.DefaultBytes(
+					int64(resp.Total),
+					resp.Status,
+				)
+				bar.Set(resp.Completed)
+			}
 		} else if resp.Digest == currentDigest && resp.Digest != "" {
 			bar.Set(resp.Completed)
 		} else {
@@ -312,12 +318,16 @@ func generate(cmd *cobra.Command, model, prompt string) error {

 func showLayer(l *server.Layer) {
 	filename, err := server.GetBlobsPath(l.Digest)
-	bts, err := os.ReadFile(filename)
 	if err != nil {
-		fmt.Printf("Couldn't read layer")
+		fmt.Println("Couldn't get layer's path")
 		return
 	}
-	fmt.Printf(string(bts) + "\n")
+	bts, err := os.ReadFile(filename)
+	if err != nil {
+		fmt.Println("Couldn't read layer")
+		return
+	}
+	fmt.Println(string(bts))
 }

 func generateInteractive(cmd *cobra.Command, model string) error {
@@ -454,7 +464,7 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 				mp := server.ParseModelPath(model)
 				manifest, err := server.GetManifest(mp)
 				if err != nil {
-					fmt.Printf("error: couldn't get a manifestfor this model")
+					fmt.Println("error: couldn't get a manifest for this model")
 					continue
 				}
 				switch args[1] {
@@ -513,15 +523,21 @@ func generateBatch(cmd *cobra.Command, model string) error {
 	return nil
 }

-func RunServer(_ *cobra.Command, _ []string) error {
-	host := os.Getenv("OLLAMA_HOST")
-	if host == "" {
-		host = "127.0.0.1"
+func RunServer(cmd *cobra.Command, _ []string) error {
+	var host, port = "127.0.0.1", "11434"
+
+	parts := strings.Split(os.Getenv("OLLAMA_HOST"), ":")
+	if ip := net.ParseIP(parts[0]); ip != nil {
+		host = ip.String()
 	}

-	port := os.Getenv("OLLAMA_PORT")
-	if port == "" {
-		port = "11434"
+	if len(parts) > 1 {
+		port = parts[1]
+	}
+
+	// deprecated: include port in OLLAMA_HOST
+	if p := os.Getenv("OLLAMA_PORT"); p != "" {
+		port = p
 	}

 	ln, err := net.Listen("tcp", fmt.Sprintf("%s:%s", host, port))
@@ -529,7 +545,12 @@ func RunServer(_ *cobra.Command, _ []string) error {
 		return err
 	}

-	return server.Serve(ln)
+	var origins []string
+	if o := os.Getenv("OLLAMA_ORIGINS"); o != "" {
+		origins = strings.Split(o, ",")
+	}
+
+	return server.Serve(ln, origins)
 }

 func startMacApp(client *api.Client) error {
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,5 @@
+# Documentation
+
+- [Modelfile](./modelfile.md)
+- [How to develop Ollama](./development.md)
+- [API](./api.md)
--- a/docs/api.md
+++ b/docs/api.md
@@ -0,0 +1,222 @@
+# API
+
+## Endpoints
+
+- [Generate a completion](#generate-a-completion)
+- [Create a model](#create-a-model)
+- [List local models](#list-local-models)
+- [Copy a model](#copy-a-model)
+- [Delete a model](#delete-a-model)
+- [Pull a model](#pull-a-model)
+
+## Conventions
+
+### Model names
+
+Model names follow a `model:tag` format. Some examples are `orca:3b-q4_1` and `llama2:70b`. The tag is optional and if not provided will default to `latest`. The tag is used to identify a specific version.
+
+### Durations
+
+All durations are returned in nanoseconds.
+
+## Generate a completion
+
+```
+POST /api/generate
+```
+
+Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
+
+### Parameters
+
+- `model`: (required) the [model name](#model-names)
+- `prompt`: the prompt to generate a response for
+
+Advanced parameters:
+
+- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
+- `system`: system prompt to (overrides what is defined in the `Modelfile`)
+- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
+
+### Request
+
+```
+curl -X POST http://localhost:11434/api/generate -d '{
+  "model": "llama2:7b",
+  "prompt": "Why is the sky blue?"
+}'
+```
+
+### Response
+
+A stream of JSON objects:
+
+```json
+{
+  "model": "llama2:7b",
+  "created_at": "2023-08-04T08:52:19.385406455-07:00",
+  "response": "The",
+  "done": false
+}
+```
+
+The final response in the stream also includes additional data about the generation:
+
+- `total_duration`: time spent generating the response
+- `load_duration`: time spent in nanoseconds loading the model
+- `sample_count`: number of samples generated
+- `sample_duration`: time spent generating samples
+- `prompt_eval_count`: number of tokens in the prompt
+- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
+- `eval_count`: number of tokens the response
+- `eval_duration`: time in nanoseconds spent generating the response
+
+To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
+
+```json
+{
+  "model": "llama2:7b",
+  "created_at": "2023-08-04T19:22:45.499127Z",
+  "done": true,
+  "total_duration": 5589157167,
+  "load_duration": 3013701500,
+  "sample_count": 114,
+  "sample_duration": 81442000,
+  "prompt_eval_count": 46,
+  "prompt_eval_duration": 1160282000,
+  "eval_count": 113,
+  "eval_duration": 1325948000
+}
+```
+
+## Create a Model
+
+```
+POST /api/create
+```
+
+Create a model from a [`Modelfile`](./modelfile.md)
+
+### Parameters
+
+- `name`: name of the model to create
+- `path`: path to the Modelfile
+
+### Request
+
+```
+curl -X POST http://localhost:11434/api/create -d '{
+  "name": "mario",
+  "path": "~/Modelfile"
+}'
+```
+
+### Response
+
+A stream of JSON objects. When finished, `status` is `success`
+
+```json
+{
+  "status": "parsing modelfile"
+}
+```
+
+## List Local Models
+
+```
+GET /api/tags
+```
+
+List models that are available locally.
+
+### Request
+
+```
+curl http://localhost:11434/api/tags
+```
+
+### Response
+
+```json
+{
+  "models": [
+    {
+      "name": "llama2:7b",
+      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
+      "size": 3791730596
+    },
+    {
+      "name": "llama2:13b",
+      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
+      "size": 7323310500
+    }
+  ]
+}
+```
+
+## Copy a Model
+
+```
+POST /api/copy
+```
+
+Copy a model. Creates a model with another name from an existing model.
+
+### Request
+
+```
+curl http://localhost:11434/api/copy -d '{
+  "source": "llama2:7b",
+  "destination": "llama2-backup"
+}'
+```
+
+## Delete a Model
+
+```
+DELETE /api/delete
+```
+
+Delete a model and its data.
+
+### Parameters
+
+- `model`: model name to delete
+
+### Request
+
+```
+curl -X DELETE http://localhost:11434/api/delete -d '{
+  "name": "llama2:13b"
+}'
+```
+
+## Pull a Model
+
+```
+POST /api/pull
+```
+
+Download a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple calls to will share the same download progress.
+
+### Parameters
+
+- `name`: name of the model to pull
+
+### Request
+
+```
+curl -X POST http://localhost:11434/api/pull -d '{
+  "name": "llama2:7b"
+}'
+```
+
+### Response
+
+```json
+{
+  "status": "downloading digestname",
+  "digest": "digestname",
+  "total": 2142590208
+}
+```
--- a/docs/development.md
+++ b/docs/development.md
@@ -30,19 +30,15 @@ Now you can run `ollama`:

 To release a new version of Ollama you'll need to set some environment variables:

-* `GITHUB_TOKEN`: your GitHub token
-* `APPLE_IDENTITY`: the Apple signing identity (macOS only)
-* `APPLE_ID`: your Apple ID
-* `APPLE_PASSWORD`: your Apple ID app-specific password
-* `APPLE_TEAM_ID`: the Apple team ID for the signing identity
-* `TELEMETRY_WRITE_KEY`: segment write key for telemetry
+- `GITHUB_TOKEN`: your GitHub token
+- `APPLE_IDENTITY`: the Apple signing identity (macOS only)
+- `APPLE_ID`: your Apple ID
+- `APPLE_PASSWORD`: your Apple ID app-specific password
+- `APPLE_TEAM_ID`: the Apple team ID for the signing identity
+- `TELEMETRY_WRITE_KEY`: segment write key for telemetry

 Then run the publish script with the target version:

 ```
 VERSION=0.0.2 ./scripts/publish.sh
 ```
-
-
-
-
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -0,0 +1,17 @@
+# FAQ
+
+## How can I expose the Ollama server?
+
+```
+OLLAMA_HOST=0.0.0.0:11435 ollama serve
+```
+
+By default, Ollama allows cross origin requests from `127.0.0.1` and `0.0.0.0`. To support more origins, you can use the `OLLAMA_ORIGINS` environment variable:
+
+```
+OLLAMA_ORIGINS=http://192.168.1.1:*,https://example.com ollama serve
+```
+
+## Where are models stored?
+
+Raw model data is stored under `~/.ollama/models`.
--- a/docs/modelfile.md
+++ b/docs/modelfile.md
@@ -163,4 +163,4 @@ LICENSE """
 ## Notes

 - the **modelfile is not case sensitive**. In the examples, we use uppercase for instructions to make it easier to distinguish it from arguments.
- Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.
+- Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.
--- a/examples/tweetwriter/Modelfile
+++ b/examples/tweetwriter/Modelfile
@@ -3,5 +3,5 @@

 FROM nous-hermes
 SYSTEM """
-You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be includes as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
+You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be included as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
 """
--- a/go.mod
+++ b/go.mod
@@ -42,6 +42,7 @@ require (
 	golang.org/x/sys v0.10.0 // indirect
 	golang.org/x/term v0.10.0
 	golang.org/x/text v0.10.0 // indirect
+	gonum.org/v1/gonum v0.13.0
 	google.golang.org/protobuf v1.30.0 // indirect
 	gopkg.in/yaml.v3 v3.0.1 // indirect
 )
--- a/go.sum
+++ b/go.sum
@@ -139,6 +139,8 @@ golang.org/x/text v0.10.0 h1:UpjohKhiEgNc0CSauXmwYftY1+LlaC75SJwh0SgCX58=
 golang.org/x/text v0.10.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
 golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
 golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
+gonum.org/v1/gonum v0.13.0 h1:a0T3bh+7fhRyqeNbiC3qVHYmkiQgit3wnNan/2c0HMM=
+gonum.org/v1/gonum v0.13.0/go.mod h1:/WPYRckkfWrhWefxyYTfrTtQR0KH4iyHNuzxqXAKyAU=
 google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
 google.golang.org/protobuf v1.28.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
 google.golang.org/protobuf v1.30.0 h1:kPPoIgf3TsEvrm0PFe15JQ+570QVxYzEvvHqChK+cng=
--- a/library/.gitignore
+++ b/library/.gitignore
@@ -1 +0,0 @@
-models
--- a/library/downloads
+++ b/library/downloads
@@ -1,7 +0,0 @@
-https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin e84705205f71dd55be7b24a778f248f0eda9999a125d313358c087e092d83148
-https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML/resolve/main/nous-hermes-13b.ggmlv3.q4_0.bin d1735b93e1dc503f1045ccd6c8bd73277b18ba892befd1dc29e9b9a7822ed998
-https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML/resolve/main/vicuna-7b-v1.3.ggmlv3.q4_0.bin 23ce5ed290b56a19305178b9ada2c3d96036bd69a6c18304b6158eb6672d6c0f
-https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin 1f08b147a5bce41cfcbb3fd5d51ba765dea1786e15b5655ab69ba3a337a893b7
-https://huggingface.co/TheBloke/Llama-2-7B-GGML/resolve/main/llama-2-7b.ggmlv3.q4_0.bin bfa26d855e44629c4cf919985e90bd7fa03b77eea1676791519e39a4d45fd4d5
-https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin 8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
-https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin f79142715bc9539a2edbb4b253548db8b34fac22736593eeaa28555874476e30
--- a/library/modelfiles/llama2
+++ b/library/modelfiles/llama2
@@ -1,147 +0,0 @@
-FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
-
-TEMPLATE """
-{{- if .First }}
-<<SYS>>
-{{ .System }}
-<</SYS>>
-{{- end }}
-
-[INST] {{ .Prompt }} [/INST]
-"""
-
-SYSTEM """
-You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
-
-If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
-"""
-
-LICENSE """
-Llama 2 Community License Agreement
-
-Llama 2 Version Release Date: July 18, 2023
-
-“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
-
-“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
-
-“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
-
-“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
-
-“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
-
-“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
-
-By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
-
-1. License Rights and Redistribution.
-
-a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
-
-b. Redistribution and Use.
-
-i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
-
-ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
-
-iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
-
-iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
-
-v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
-
-2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
-
-3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
-
-4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
-
-5. Intellectual Property.
-
-a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
-
-b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
-
-c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
-
-6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
-
-7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
-
-"""
-
-LICENSE """
-Llama 2 Acceptable Use Policy
-
-Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
-
-Prohibited Uses
-
-We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
-
-1. Violate the law or others’ rights, including to:
-
-a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
-
-i. Violence or terrorism
-
-ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
-
-b. Human trafficking, exploitation, and sexual violence
-
-iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
-
-iv. Sexual solicitation
-
-vi. Any other criminal activity
-
-c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
-
-d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
-
-e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
-
-f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
-
-g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
-
-h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
-
-2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
-
-a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
-
-b. Guns and illegal weapons (including weapon development)
-
-c. Illegal drugs and regulated/controlled substances
-
-d. Operation of critical infrastructure, transportation technologies, or heavy machinery
-
-e. Self-harm or harm to others, including suicide, cutting, and eating disorders
-
-f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
-
-3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
-
-a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
-
-b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
-
-c. Generating, promoting, or further distributing spam
-
-d. Impersonating another individual without consent, authorization, or legal right
-
-e. Representing that the use of Llama 2 or outputs are human-generated
-
-f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
-
-4. Fail to appropriately disclose to end users any known dangers of your AI system
-
-Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
-
-Reporting issues with the model: github.com/facebookresearch/llama
-Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
-Reporting bugs and security concerns: facebook.com/whitehat/info
-Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
-"""
--- a/library/modelfiles/llama2_13b
+++ b/library/modelfiles/llama2_13b
@@ -1,147 +0,0 @@
-FROM ../models/llama-2-13b-chat.ggmlv3.q4_0.bin
-
-TEMPLATE """
-{{- if .First }}
-<<SYS>>
-{{ .System }}
-<</SYS>>
-{{- end }}
-
-[INST] {{ .Prompt }} [/INST]
-"""
-
-SYSTEM """
-You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
-
-If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
-"""
-
-LICENSE """
-Llama 2 Community License Agreement
-
-Llama 2 Version Release Date: July 18, 2023
-
-“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
-
-“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
-
-“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
-
-“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
-
-“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
-
-“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
-
-By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
-
-1. License Rights and Redistribution.
-
-a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
-
-b. Redistribution and Use.
-
-i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
-
-ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
-
-iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
-
-iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
-
-v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
-
-2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
-
-3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
-
-4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
-
-5. Intellectual Property.
-
-a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
-
-b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
-
-c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
-
-6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
-
-7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
-
-"""
-
-LICENSE """
-Llama 2 Acceptable Use Policy
-
-Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
-
-Prohibited Uses
-
-We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
-
-1. Violate the law or others’ rights, including to:
-
-a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
-
-i. Violence or terrorism
-
-ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
-
-b. Human trafficking, exploitation, and sexual violence
-
-iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
-
-iv. Sexual solicitation
-
-vi. Any other criminal activity
-
-c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
-
-d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
-
-e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
-
-f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
-
-g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
-
-h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
-
-2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
-
-a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
-
-b. Guns and illegal weapons (including weapon development)
-
-c. Illegal drugs and regulated/controlled substances
-
-d. Operation of critical infrastructure, transportation technologies, or heavy machinery
-
-e. Self-harm or harm to others, including suicide, cutting, and eating disorders
-
-f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
-
-3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
-
-a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
-
-b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
-
-c. Generating, promoting, or further distributing spam
-
-d. Impersonating another individual without consent, authorization, or legal right
-
-e. Representing that the use of Llama 2 or outputs are human-generated
-
-f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
-
-4. Fail to appropriately disclose to end users any known dangers of your AI system
-
-Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
-
-Reporting issues with the model: github.com/facebookresearch/llama
-Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
-Reporting bugs and security concerns: facebook.com/whitehat/info
-Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
-"""
--- a/library/modelfiles/llama2_7b
+++ b/library/modelfiles/llama2_7b
@@ -1,147 +0,0 @@
-FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
-
-TEMPLATE """
-{{- if .First }}
-<<SYS>>
-{{ .System }}
-<</SYS>>
-{{- end }}
-
-[INST] {{ .Prompt }} [/INST]
-"""
-
-SYSTEM """
-You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
-
-If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
-"""
-
-LICENSE """
-Llama 2 Community License Agreement
-
-Llama 2 Version Release Date: July 18, 2023
-
-“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
-
-“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
-
-“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
-
-“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
-
-“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
-
-“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
-
-By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
-
-1. License Rights and Redistribution.
-
-a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
-
-b. Redistribution and Use.
-
-i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
-
-ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
-
-iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
-
-iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
-
-v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
-
-2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
-
-3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
-
-4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
-
-5. Intellectual Property.
-
-a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
-
-b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
-
-c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
-
-6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
-
-7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
-
-"""
-
-LICENSE """
-Llama 2 Acceptable Use Policy
-
-Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
-
-Prohibited Uses
-
-We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
-
-1. Violate the law or others’ rights, including to:
-
-a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
-
-i. Violence or terrorism
-
-ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
-
-b. Human trafficking, exploitation, and sexual violence
-
-iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
-
-iv. Sexual solicitation
-
-vi. Any other criminal activity
-
-c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
-
-d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
-
-e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
-
-f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
-
-g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
-
-h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
-
-2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
-
-a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
-
-b. Guns and illegal weapons (including weapon development)
-
-c. Illegal drugs and regulated/controlled substances
-
-d. Operation of critical infrastructure, transportation technologies, or heavy machinery
-
-e. Self-harm or harm to others, including suicide, cutting, and eating disorders
-
-f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
-
-3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
-
-a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
-
-b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
-
-c. Generating, promoting, or further distributing spam
-
-d. Impersonating another individual without consent, authorization, or legal right
-
-e. Representing that the use of Llama 2 or outputs are human-generated
-
-f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
-
-4. Fail to appropriately disclose to end users any known dangers of your AI system
-
-Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
-
-Reporting issues with the model: github.com/facebookresearch/llama
-Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
-Reporting bugs and security concerns: facebook.com/whitehat/info
-Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
-"""
--- a/library/modelfiles/nous-hermes
+++ b/library/modelfiles/nous-hermes
@@ -1,7 +0,0 @@
-FROM ../models/nous-hermes-13b.ggmlv3.q4_0.bin
-TEMPLATE """
-### Instruction:
-{{ .Prompt }}
-
-### Response:
-"""
--- a/library/modelfiles/orca
+++ b/library/modelfiles/orca
@@ -1,14 +0,0 @@
-FROM ../models/orca-mini-3b.ggmlv3.q4_0.bin
-TEMPLATE """
-{{- if .First }}
-### System:
-{{ .System }}
-{{- end }}
-
-### User:
-{{ .Prompt }}
-
-### Response:
-"""
-
-SYSTEM """You are an AI assistant that follows instruction extremely well. Help as much as you can."""
--- a/library/modelfiles/vicuna
+++ b/library/modelfiles/vicuna
@@ -1,11 +0,0 @@
-FROM ../models/vicuna-7b-v1.3.ggmlv3.q4_0.bin
-TEMPLATE """
-{{ if .First }}
-{{ .System }}
-{{- end }}
-
-USER: {{ .Prompt }}
-ASSISTANT:
-"""
-
-SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""
--- a/library/modelfiles/wizard-vicuna
+++ b/library/modelfiles/wizard-vicuna
@@ -1,5 +0,0 @@
-FROM ../models/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
-TEMPLATE """
-USER: {{ .Prompt }}
-ASSISTANT:
-"""
--- a/library/publish.sh
+++ b/library/publish.sh
@@ -1,52 +0,0 @@
-#!/bin/bash
-
-mkdir -p models
-
-# download binaries
-function process_line {
-    local url=$1
-    local checksum=$2
-
-    # Get the filename from the URL
-    local filename=models/$(basename $url)
-
-    echo "verifying $filename..."
-
-    # If the file exists, compute its checksum
-    if [ -f $filename ]; then
-        local existing_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
-    fi
-
-    # If the file does not exist, or its checksum does not match, download it
-    if [ ! -f $filename ] || [ $existing_checksum != $checksum ]; then
-        echo "downloading $filename..."
-        
-        # Download the file
-        curl -L $url -o $filename
-
-        # Compute the SHA256 hash of the downloaded file
-        local computed_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
-
-        # Verify the checksum
-        if [ $computed_checksum != $checksum ]; then
-            echo "Checksum verification failed for $filename"
-            exit 1
-        fi
-    fi
-}
-
-while IFS=' ' read -r url checksum
-do
-    process_line $url $checksum
-done < "downloads"
-
-# create and publish the models
-for file in modelfiles/*; do
-  if [ -f "$file" ]; then
-    filename=$(basename "$file")
-    echo $filename
-    ollama create "library/${filename}" -f "$file"
-    ollama push "${filename}"
-  fi
-done
-
--- a/llama/llama.go
+++ b/llama/llama.go
@@ -85,6 +85,7 @@ llama_token llama_sample(
 }
 */
 import "C"
+
 import (
 	"bytes"
 	"embed"
@@ -142,6 +143,8 @@ func New(model string, opts api.Options) (*LLM, error) {
 	params.use_mmap = C.bool(llm.UseMMap)
 	params.use_mlock = C.bool(llm.UseMLock)
 	params.embedding = C.bool(llm.EmbeddingOnly)
+	params.rope_freq_base = C.float(llm.RopeFrequencyBase)
+	params.rope_freq_scale = C.float(llm.RopeFrequencyScale)
 	llm.params = &params

 	cModel := C.CString(model)
@@ -187,10 +190,6 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
 		tokens[i] = C.llama_token(ctx[i])
 	}

-	if len(tokens) == 0 {
-		tokens = llm.tokenize(" ")
-	}
-
 	llm.marshalPrompt(tokens, prompt)

 	C.llama_set_rng_seed(llm.ctx, C.uint(llm.Seed))
@@ -206,7 +205,7 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
 			return err
 		}

-		b.WriteString(llm.detokenize(token))
+		b.WriteString(llm.Decode(token))

 		if err := llm.checkStopConditions(b); err != nil {
 			if errors.Is(err, io.EOF) {
@@ -224,17 +223,15 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))
 		}
 	}

-	last := make([]int, 0, len(llm.last))
-	for _, i := range llm.last {
-		if i != 0 {
-			last = append(last, int(i))
-		}
+	embd := make([]int, len(llm.embd))
+	for i := range llm.embd {
+		embd[i] = int(llm.embd[i])
 	}

 	timings := C.llama_get_timings(llm.ctx)
 	fn(api.GenerateResponse{
 		Done:               true,
-		Context:            last,
+		Context:            embd,
 		SampleCount:        int(timings.n_sample),
 		SampleDuration:     parseDurationMs(float64(timings.t_sample_ms)),
 		PromptEvalCount:    int(timings.n_p_eval),
@@ -248,9 +245,9 @@ func (llm *LLM) Predict(ctx []int, prompt string, fn func(api.GenerateResponse))

 func (llm *LLM) checkStopConditions(b bytes.Buffer) error {
 	for _, stopCondition := range llm.Stop {
-		if stopCondition == b.String() {
+		if stopCondition == strings.TrimSpace(b.String()) {
 			return io.EOF
-		} else if strings.HasPrefix(stopCondition, b.String()) {
+		} else if strings.HasPrefix(stopCondition, strings.TrimSpace(b.String())) {
 			return errNeedMoreData
 		}
 	}
@@ -259,7 +256,7 @@ func (llm *LLM) checkStopConditions(b bytes.Buffer) error {
 }

 func (llm *LLM) marshalPrompt(ctx []C.llama_token, prompt string) []C.llama_token {
-	tokens := append(ctx, llm.tokenize(prompt)...)
+	tokens := append(ctx, llm.Encode(prompt)...)
 	if llm.NumKeep < 0 {
 		llm.NumKeep = len(tokens)
 	}
@@ -301,7 +298,7 @@ func (llm *LLM) marshalPrompt(ctx []C.llama_token, prompt string) []C.llama_toke
 	return tokens
 }

-func (llm *LLM) tokenize(prompt string) []C.llama_token {
+func (llm *LLM) Encode(prompt string) []C.llama_token {
 	cPrompt := C.CString(prompt)
 	defer C.free(unsafe.Pointer(cPrompt))

@@ -313,7 +310,7 @@ func (llm *LLM) tokenize(prompt string) []C.llama_token {
 	return nil
 }

-func (llm *LLM) detokenize(tokens ...C.llama_token) string {
+func (llm *LLM) Decode(tokens ...C.llama_token) string {
 	var sb strings.Builder
 	for _, token := range tokens {
 		sb.WriteString(C.GoString(C.llama_token_to_str(llm.ctx, token)))
@@ -412,3 +409,31 @@ func (llm *LLM) next() (C.llama_token, error) {

 	return token, nil
 }
+
+func (llm *LLM) Embedding(input string) ([]float64, error) {
+	if !llm.EmbeddingOnly {
+		return nil, errors.New("llama: embedding not enabled")
+	}
+
+	tokens := llm.Encode(input)
+	if tokens == nil {
+		return nil, errors.New("llama: tokenize embedding")
+	}
+
+	retval := C.llama_eval(llm.ctx, unsafe.SliceData(tokens), C.int(len(tokens)), 0, C.int(llm.NumThread))
+	if retval != 0 {
+		return nil, errors.New("llama: eval")
+	}
+
+	n := C.llama_n_embd(llm.ctx)
+	if n <= 0 {
+		return nil, errors.New("llama: no embeddings generated")
+	}
+	cEmbeddings := unsafe.Slice(C.llama_get_embeddings(llm.ctx), n)
+
+	embeddings := make([]float64, len(cEmbeddings))
+	for i, v := range cEmbeddings {
+		embeddings[i] = float64(v)
+	}
+	return embeddings, nil
+}
--- a/parser/parser.go
+++ b/parser/parser.go
@@ -40,7 +40,7 @@ func Parse(reader io.Reader) ([]Command, error) {
 			command.Args = string(fields[1])
 			// copy command for validation
 			modelCommand = command
-		case "LICENSE", "TEMPLATE", "SYSTEM", "PROMPT":
+		case "LICENSE", "TEMPLATE", "SYSTEM", "PROMPT", "EMBED":
 			command.Name = string(bytes.ToLower(fields[0]))
 			command.Args = string(fields[1])
 		case "PARAMETER":
--- a/scripts/build_darwin.sh
+++ b/scripts/build_darwin.sh
@@ -8,6 +8,7 @@ CGO_ENABLED=1 GOARCH=amd64 go build -o dist/ollama-darwin-amd64
 lipo -create -output dist/ollama dist/ollama-darwin-arm64 dist/ollama-darwin-amd64
 rm dist/ollama-darwin-amd64 dist/ollama-darwin-arm64
 codesign --deep --force --options=runtime --sign "$APPLE_IDENTITY" --timestamp dist/ollama
+chmod +x dist/ollama

 # build and sign the mac app
 npm install --prefix app
--- a/server/auth.go
+++ b/server/auth.go
@@ -0,0 +1,164 @@
+package server
+
+import (
+	"bytes"
+	"crypto/rand"
+	"crypto/sha256"
+	"encoding/base64"
+	"encoding/hex"
+	"encoding/json"
+	"fmt"
+	"io"
+	"io/ioutil"
+	"log"
+	"net/http"
+	"os"
+	"path"
+	"strings"
+	"time"
+
+	"golang.org/x/crypto/ssh"
+
+	"github.com/jmorganca/ollama/api"
+)
+
+type AuthRedirect struct {
+	Realm   string
+	Service string
+	Scope   string
+}
+
+type SignatureData struct {
+	Method string
+	Path   string
+	Data   []byte
+}
+
+func generateNonce(length int) (string, error) {
+	nonce := make([]byte, length)
+	_, err := rand.Read(nonce)
+	if err != nil {
+		return "", err
+	}
+	return base64.RawURLEncoding.EncodeToString(nonce), nil
+}
+
+func (r AuthRedirect) URL() (string, error) {
+	nonce, err := generateNonce(16)
+	if err != nil {
+		return "", err
+	}
+	return fmt.Sprintf("%s?service=%s&scope=%s&ts=%d&nonce=%s", r.Realm, r.Service, r.Scope, time.Now().Unix(), nonce), nil
+}
+
+func getAuthToken(redirData AuthRedirect, regOpts *RegistryOptions) (string, error) {
+	url, err := redirData.URL()
+	if err != nil {
+		return "", err
+	}
+
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "", err
+	}
+
+	keyPath := path.Join(home, ".ollama/id_ed25519")
+
+	rawKey, err := ioutil.ReadFile(keyPath)
+	if err != nil {
+		log.Printf("Failed to load private key: %v", err)
+		return "", err
+	}
+
+	s := SignatureData{
+		Method: "GET",
+		Path:   url,
+		Data:   nil,
+	}
+
+	if !strings.HasPrefix(s.Path, "http") {
+		if regOpts.Insecure {
+			s.Path = "http://" + url
+		} else {
+			s.Path = "https://" + url
+		}
+	}
+
+	sig, err := s.Sign(rawKey)
+	if err != nil {
+		return "", err
+	}
+
+	headers := map[string]string{
+		"Authorization": sig,
+	}
+
+	resp, err := makeRequest("GET", url, headers, nil, regOpts)
+	if err != nil {
+		log.Printf("couldn't get token: %q", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(resp.Body)
+		return "", fmt.Errorf("on pull registry responded with code %d: %s", resp.StatusCode, body)
+	}
+
+	respBody, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return "", err
+	}
+
+	var tok api.TokenResponse
+	if err := json.Unmarshal(respBody, &tok); err != nil {
+		return "", err
+	}
+
+	return tok.Token, nil
+}
+
+// Bytes returns a byte slice of the data to sign for the request
+func (s SignatureData) Bytes() []byte {
+	// We first derive the content hash of the request body using:
+	//     base64(hex(sha256(request body)))
+
+	hash := sha256.Sum256(s.Data)
+	hashHex := make([]byte, hex.EncodedLen(len(hash)))
+	hex.Encode(hashHex, hash[:])
+	contentHash := base64.StdEncoding.EncodeToString(hashHex)
+
+	// We then put the entire request together in a serialize string using:
+	//       "<method>,<uri>,<content hash>"
+	// e.g.  "GET,http://localhost,OTdkZjM1O..."
+
+	return []byte(strings.Join([]string{s.Method, s.Path, contentHash}, ","))
+}
+
+// SignData takes a SignatureData object and signs it with a raw private key
+func (s SignatureData) Sign(rawKey []byte) (string, error) {
+	privateKey, err := ssh.ParseRawPrivateKey(rawKey)
+	if err != nil {
+		return "", err
+	}
+
+	signer, err := ssh.NewSignerFromKey(privateKey)
+	if err != nil {
+		return "", err
+	}
+
+	// get the pubkey, but remove the type
+	pubKey := ssh.MarshalAuthorizedKey(signer.PublicKey())
+	parts := bytes.Split(pubKey, []byte(" "))
+	if len(parts) < 2 {
+		return "", fmt.Errorf("malformed public key")
+	}
+
+	signedData, err := signer.Sign(nil, s.Bytes())
+	if err != nil {
+		return "", err
+	}
+
+	// signature is <pubkey>:<signature>
+	sig := fmt.Sprintf("%s:%s", bytes.TrimSpace(parts[1]), base64.StdEncoding.EncodeToString(signedData.Blob))
+	return sig, nil
+}
--- a/server/download.go
+++ b/server/download.go
@@ -0,0 +1,215 @@
+package server
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"io"
+	"log"
+	"net/http"
+	"os"
+	"path"
+	"strconv"
+	"sync"
+	"time"
+
+	"github.com/jmorganca/ollama/api"
+)
+
+type FileDownload struct {
+	Digest    string
+	FilePath  string
+	Total     int64
+	Completed int64
+}
+
+var inProgress sync.Map // map of digests currently being downloaded to their current download progress
+
+// downloadBlob downloads a blob from the registry and stores it in the blobs directory
+func downloadBlob(ctx context.Context, mp ModelPath, digest string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
+	fp, err := GetBlobsPath(digest)
+	if err != nil {
+		return err
+	}
+
+	if fi, _ := os.Stat(fp); fi != nil {
+		// we already have the file, so return
+		fn(api.ProgressResponse{
+			Digest:    digest,
+			Total:     int(fi.Size()),
+			Completed: int(fi.Size()),
+		})
+
+		return nil
+	}
+
+	fileDownload := &FileDownload{
+		Digest:    digest,
+		FilePath:  fp,
+		Total:     1, // dummy value to indicate that we don't know the total size yet
+		Completed: 0,
+	}
+
+	_, downloading := inProgress.LoadOrStore(digest, fileDownload)
+	if downloading {
+		// this is another client requesting the server to download the same blob concurrently
+		return monitorDownload(ctx, mp, regOpts, fileDownload, fn)
+	}
+	return doDownload(ctx, mp, regOpts, fileDownload, fn)
+}
+
+var downloadMu sync.Mutex // mutex to check to resume a download while monitoring
+
+// monitorDownload monitors the download progress of a blob and resumes it if it is interrupted
+func monitorDownload(ctx context.Context, mp ModelPath, regOpts *RegistryOptions, f *FileDownload, fn func(api.ProgressResponse)) error {
+	tick := time.NewTicker(time.Second)
+	for range tick.C {
+		done, resume, err := func() (bool, bool, error) {
+			downloadMu.Lock()
+			defer downloadMu.Unlock()
+			val, downloading := inProgress.Load(f.Digest)
+			if !downloading {
+				// check once again if the download is complete
+				if fi, _ := os.Stat(f.FilePath); fi != nil {
+					// successful download while monitoring
+					fn(api.ProgressResponse{
+						Digest:    f.Digest,
+						Total:     int(fi.Size()),
+						Completed: int(fi.Size()),
+					})
+					return true, false, nil
+				}
+				// resume the download
+				inProgress.Store(f.Digest, f) // store the file download again to claim the resume
+				return false, true, nil
+			}
+			f, ok := val.(*FileDownload)
+			if !ok {
+				return false, false, fmt.Errorf("invalid type for in progress download: %T", val)
+			}
+			fn(api.ProgressResponse{
+				Status:    fmt.Sprintf("downloading %s", f.Digest),
+				Digest:    f.Digest,
+				Total:     int(f.Total),
+				Completed: int(f.Completed),
+			})
+			return false, false, nil
+		}()
+		if err != nil {
+			return err
+		}
+		if done {
+			// done downloading
+			return nil
+		}
+		if resume {
+			return doDownload(ctx, mp, regOpts, f, fn)
+		}
+	}
+	return nil
+}
+
+var chunkSize = 1024 * 1024 // 1 MiB in bytes
+
+// doDownload downloads a blob from the registry and stores it in the blobs directory
+func doDownload(ctx context.Context, mp ModelPath, regOpts *RegistryOptions, f *FileDownload, fn func(api.ProgressResponse)) error {
+	var size int64
+
+	fi, err := os.Stat(f.FilePath + "-partial")
+	switch {
+	case errors.Is(err, os.ErrNotExist):
+		// noop, file doesn't exist so create it
+	case err != nil:
+		return fmt.Errorf("stat: %w", err)
+	default:
+		size = fi.Size()
+		// Ensure the size is divisible by the chunk size by removing excess bytes
+		size -= size % int64(chunkSize)
+
+		err := os.Truncate(f.FilePath+"-partial", size)
+		if err != nil {
+			return fmt.Errorf("truncate: %w", err)
+		}
+	}
+
+	url := fmt.Sprintf("%s/v2/%s/blobs/%s", mp.Registry, mp.GetNamespaceRepository(), f.Digest)
+	headers := map[string]string{
+		"Range": fmt.Sprintf("bytes=%d-", size),
+	}
+
+	resp, err := makeRequest("GET", url, headers, nil, regOpts)
+	if err != nil {
+		log.Printf("couldn't download blob: %v", err)
+		return err
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusPartialContent {
+		body, _ := io.ReadAll(resp.Body)
+		return fmt.Errorf("on download registry responded with code %d: %v", resp.StatusCode, string(body))
+	}
+
+	err = os.MkdirAll(path.Dir(f.FilePath), 0o700)
+	if err != nil {
+		return fmt.Errorf("make blobs directory: %w", err)
+	}
+
+	remaining, _ := strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
+	f.Completed = size
+	f.Total = remaining + f.Completed
+
+	inProgress.Store(f.Digest, f)
+
+	out, err := os.OpenFile(f.FilePath+"-partial", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
+	if err != nil {
+		return fmt.Errorf("open file: %w", err)
+	}
+	defer out.Close()
+outerLoop:
+	for {
+		select {
+		case <-ctx.Done():
+			// handle client request cancellation
+			inProgress.Delete(f.Digest)
+			return nil
+		default:
+			fn(api.ProgressResponse{
+				Status:    fmt.Sprintf("downloading %s", f.Digest),
+				Digest:    f.Digest,
+				Total:     int(f.Total),
+				Completed: int(f.Completed),
+			})
+
+			if f.Completed >= f.Total {
+				if err := out.Close(); err != nil {
+					return err
+				}
+
+				if err := os.Rename(f.FilePath+"-partial", f.FilePath); err != nil {
+					fn(api.ProgressResponse{
+						Status:    fmt.Sprintf("error renaming file: %v", err),
+						Digest:    f.Digest,
+						Total:     int(f.Total),
+						Completed: int(f.Completed),
+					})
+					return err
+				}
+
+				break outerLoop
+			}
+		}
+
+		n, err := io.CopyN(out, resp.Body, int64(chunkSize))
+		if err != nil && !errors.Is(err, io.EOF) {
+			return err
+		}
+		f.Completed += n
+
+		inProgress.Store(f.Digest, f)
+	}
+
+	inProgress.Delete(f.Digest)
+
+	log.Printf("success getting %s\n", f.Digest)
+	return nil
+}
--- a/server/images.go
+++ b/server/images.go
@@ -1,43 +1,53 @@
 package server

 import (
+	"bufio"
 	"bytes"
+	"context"
 	"crypto/sha256"
 	"encoding/json"
 	"errors"
 	"fmt"
+	"html/template"
 	"io"
 	"log"
 	"net/http"
 	"os"
-	"path"
 	"path/filepath"
 	"reflect"
 	"strconv"
 	"strings"
-	"text/template"

 	"github.com/jmorganca/ollama/api"
+	"github.com/jmorganca/ollama/llama"
 	"github.com/jmorganca/ollama/parser"
+	"github.com/jmorganca/ollama/vector"
 )

 type RegistryOptions struct {
 	Insecure bool
 	Username string
 	Password string
+	Token    string
 }

 type Model struct {
-	Name      string `json:"name"`
-	ModelPath string
-	Template  string
-	System    string
-	Digest    string
-	Options   map[string]interface{}
+	Name       string `json:"name"`
+	ModelPath  string
+	Template   string
+	System     string
+	Digest     string
+	Options    map[string]interface{}
+	Embeddings []vector.Embedding
 }

-func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
-	tmpl, err := template.New("").Parse(m.Template)
+func (m *Model) Prompt(request api.GenerateRequest, embedding string) (string, error) {
+	t := m.Template
+	if request.Template != "" {
+		t = request.Template
+	}
+
+	tmpl, err := template.New("").Parse(t)
 	if err != nil {
 		return "", err
 	}
@@ -46,6 +56,7 @@ func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
 		First  bool
 		System string
 		Prompt string
+		Embed  string

 		// deprecated: versions <= 0.0.7 used this to omit the system prompt
 		Context []int
@@ -55,6 +66,11 @@ func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
 	vars.System = m.System
 	vars.Prompt = request.Prompt
 	vars.Context = request.Context
+	vars.Embed = embedding
+
+	if request.System != "" {
+		vars.System = request.System
+	}

 	var sb strings.Builder
 	if err := tmpl.Execute(&sb, vars); err != nil {
@@ -148,6 +164,16 @@ func GetModel(name string) (*Model, error) {
 		switch layer.MediaType {
 		case "application/vnd.ollama.image.model":
 			model.ModelPath = filename
+		case "application/vnd.ollama.image.embed":
+			file, err := os.Open(filename)
+			if err != nil {
+				return nil, fmt.Errorf("failed to open file: %s", filename)
+			}
+			defer file.Close()
+
+			if err = json.NewDecoder(file).Decode(&model.Embeddings); err != nil {
+				return nil, err
+			}
 		case "application/vnd.ollama.image.template":
 			bts, err := os.ReadFile(filename)
 			if err != nil {
@@ -186,7 +212,27 @@ func GetModel(name string) (*Model, error) {
 	return model, nil
 }

-func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) error {
+func filenameWithPath(path, f string) (string, error) {
+	// if filePath starts with ~/, replace it with the user's home directory.
+	if strings.HasPrefix(f, "~/") {
+		parts := strings.Split(f, "/")
+		home, err := os.UserHomeDir()
+		if err != nil {
+			return "", fmt.Errorf("failed to open file: %v", err)
+		}
+
+		f = filepath.Join(home, filepath.Join(parts[1:]...))
+	}
+
+	// if filePath is not an absolute path, make it relative to the modelfile path
+	if !filepath.IsAbs(f) {
+		f = filepath.Join(filepath.Dir(path), f)
+	}
+
+	return f, nil
+}
+
+func CreateModel(ctx context.Context, name string, path string, fn func(resp api.ProgressResponse)) error {
 	mf, err := os.Open(path)
 	if err != nil {
 		fn(api.ProgressResponse{Status: fmt.Sprintf("couldn't open modelfile '%s'", path)})
@@ -202,52 +248,37 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e

 	var layers []*LayerReader
 	params := make(map[string][]string)
-
+	embed := EmbeddingParams{fn: fn, opts: api.DefaultOptions()}
 	for _, c := range commands {
 		log.Printf("[%s] - %s\n", c.Name, c.Args)
 		switch c.Name {
 		case "model":
 			fn(api.ProgressResponse{Status: "looking for model"})
+			embed.model = c.Args
 			mf, err := GetManifest(ParseModelPath(c.Args))
 			if err != nil {
-				fp := c.Args
-
-				// If filePath starts with ~/, replace it with the user's home directory.
-				if strings.HasPrefix(fp, "~/") {
-					parts := strings.Split(fp, "/")
-					home, err := os.UserHomeDir()
-					if err != nil {
-						return fmt.Errorf("failed to open file: %v", err)
-					}
-
-					fp = filepath.Join(home, filepath.Join(parts[1:]...))
+				modelFile, err := filenameWithPath(path, c.Args)
+				if err != nil {
+					return err
 				}
-
-				// If filePath is not an absolute path, make it relative to the modelfile path
-				if !filepath.IsAbs(fp) {
-					fp = filepath.Join(filepath.Dir(path), fp)
-				}
-
-				if _, err := os.Stat(fp); err != nil {
+				if _, err := os.Stat(modelFile); err != nil {
 					// the model file does not exist, try pulling it
 					if errors.Is(err, os.ErrNotExist) {
 						fn(api.ProgressResponse{Status: "pulling model file"})
-						if err := PullModel(c.Args, &RegistryOptions{}, fn); err != nil {
+						if err := PullModel(ctx, c.Args, &RegistryOptions{}, fn); err != nil {
 							return err
 						}
 						mf, err = GetManifest(ParseModelPath(c.Args))
 						if err != nil {
 							return fmt.Errorf("failed to open file after pull: %v", err)
 						}
-
 					} else {
 						return err
 					}
 				} else {
 					// create a model from this specified file
 					fn(api.ProgressResponse{Status: "creating model layer"})
-
-					file, err := os.Open(fp)
+					file, err := os.Open(modelFile)
 					if err != nil {
 						return fmt.Errorf("failed to open file: %v", err)
 					}
@@ -271,9 +302,14 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
 					layers = append(layers, newLayer)
 				}
 			}
+		case "embed":
+			embedFilePath, err := filenameWithPath(path, c.Args)
+			if err != nil {
+				return err
+			}
+			embed.files = append(embed.files, embedFilePath)
 		case "license":
 			fn(api.ProgressResponse{Status: fmt.Sprintf("creating model %s layer", c.Name)})
-			// remove the prompt layer if one exists
 			mediaType := fmt.Sprintf("application/vnd.ollama.image.%s", c.Name)

 			layer, err := CreateLayer(strings.NewReader(c.Args))
@@ -306,18 +342,35 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
 	if len(params) > 0 {
 		fn(api.ProgressResponse{Status: "creating parameter layer"})
 		layers = removeLayerFromLayers(layers, "application/vnd.ollama.image.params")
-		paramData, err := paramsToReader(params)
+		formattedParams, err := formatParams(params)
 		if err != nil {
 			return fmt.Errorf("couldn't create params json: %v", err)
 		}
-		l, err := CreateLayer(paramData)
+
+		bts, err := json.Marshal(formattedParams)
+		if err != nil {
+			return err
+		}
+
+		l, err := CreateLayer(bytes.NewReader(bts))
 		if err != nil {
 			return fmt.Errorf("failed to create layer: %v", err)
 		}
 		l.MediaType = "application/vnd.ollama.image.params"
 		layers = append(layers, l)
+
+		// apply these parameters to the embedding options, in case embeddings need to be generated using this model
+		embed.opts = api.DefaultOptions()
+		embed.opts.FromMap(formattedParams)
 	}

+	// generate the embedding layers
+	embeddingLayers, err := embeddingLayers(embed)
+	if err != nil {
+		return err
+	}
+	layers = append(layers, embeddingLayers...)
+
 	digests, err := getLayerDigests(layers)
 	if err != nil {
 		return err
@@ -352,6 +405,117 @@ func CreateModel(name string, path string, fn func(resp api.ProgressResponse)) e
 	return nil
 }

+type EmbeddingParams struct {
+	model string
+	opts  api.Options
+	files []string // paths to files to embed
+	fn    func(resp api.ProgressResponse)
+}
+
+// embeddingLayers loads the associated LLM and generates the embeddings to be stored from an input file
+func embeddingLayers(e EmbeddingParams) ([]*LayerReader, error) {
+	layers := []*LayerReader{}
+	if len(e.files) > 0 {
+		if _, err := os.Stat(e.model); err != nil {
+			if os.IsNotExist(err) {
+				// this is a model name rather than the file
+				model, err := GetModel(e.model)
+				if err != nil {
+					return nil, fmt.Errorf("failed to get model to generate embeddings: %v", err)
+				}
+				e.model = model.ModelPath
+			} else {
+				return nil, fmt.Errorf("failed to get model file to generate embeddings: %v", err)
+			}
+		}
+
+		e.opts.EmbeddingOnly = true
+		llm, err := llama.New(e.model, e.opts)
+		if err != nil {
+			return nil, fmt.Errorf("load model to generate embeddings: %v", err)
+		}
+		defer func() {
+			if llm != nil {
+				llm.Close()
+			}
+		}()
+
+		addedFiles := make(map[string]bool) // keep track of files that have already been added
+		for _, filePattern := range e.files {
+			matchingFiles, err := filepath.Glob(filePattern)
+			if err != nil {
+				return nil, fmt.Errorf("could not find files with pattern %s: %w", filePattern, err)
+			}
+
+			for _, filePath := range matchingFiles {
+				if addedFiles[filePath] {
+					continue
+				}
+				addedFiles[filePath] = true
+				// TODO: check file type
+				f, err := os.Open(filePath)
+				if err != nil {
+					return nil, fmt.Errorf("could not open embed file: %w", err)
+				}
+				scanner := bufio.NewScanner(f)
+				scanner.Split(bufio.ScanLines)
+
+				data := []string{}
+				for scanner.Scan() {
+					data = append(data, scanner.Text())
+				}
+				f.Close()
+
+				// the digest of the file is set here so that the client knows a new operation is in progress
+				fileDigest, _ := GetSHA256Digest(bytes.NewReader([]byte(filePath)))
+
+				embeddings := []vector.Embedding{}
+				for i, d := range data {
+					if strings.TrimSpace(d) == "" {
+						continue
+					}
+					e.fn(api.ProgressResponse{
+						Status:    fmt.Sprintf("creating embeddings for file %s", filePath),
+						Digest:    fileDigest,
+						Total:     len(data) - 1,
+						Completed: i,
+					})
+					embed, err := llm.Embedding(d)
+					if err != nil {
+						log.Printf("failed to generate embedding for '%s' line %d: %v", filePath, i+1, err)
+						continue
+					}
+					embeddings = append(embeddings, vector.Embedding{Data: d, Vector: embed})
+				}
+
+				b, err := json.Marshal(embeddings)
+				if err != nil {
+					return nil, fmt.Errorf("failed to encode embeddings: %w", err)
+				}
+				r := bytes.NewReader(b)
+
+				digest, size := GetSHA256Digest(r)
+				// Reset the position of the reader after calculating the digest
+				if _, err := r.Seek(0, io.SeekStart); err != nil {
+					return nil, fmt.Errorf("could not reset embed reader: %w", err)
+				}
+
+				layer := &LayerReader{
+					Layer: Layer{
+						MediaType: "application/vnd.ollama.image.embed",
+						Digest:    digest,
+						Size:      size,
+					},
+					Reader: r,
+				}
+
+				layers = append(layers, layer)
+			}
+		}
+	}
+	return layers, nil
+}
+
 func removeLayerFromLayers(layers []*LayerReader, mediaType string) []*LayerReader {
 	j := 0
 	for _, l := range layers {
@@ -440,8 +604,8 @@ func GetLayerWithBufferFromLayer(layer *Layer) (*LayerReader, error) {
 	return newLayer, nil
 }

-// paramsToReader converts specified parameter options to their correct types, and returns a reader for the json
-func paramsToReader(params map[string][]string) (io.ReadSeeker, error) {
+// formatParams converts specified parameter options to their correct types
+func formatParams(params map[string][]string) (map[string]interface{}, error) {
 	opts := api.Options{}
 	valueOpts := reflect.ValueOf(&opts).Elem() // names of the fields in the options struct
 	typeOpts := reflect.TypeOf(opts)           // types of the fields in the options struct
@@ -495,12 +659,7 @@ func paramsToReader(params map[string][]string) (io.ReadSeeker, error) {
 		}
 	}

-	bts, err := json.Marshal(out)
-	if err != nil {
-		return nil, err
-	}
-
-	return bytes.NewReader(bts), nil
+	return out, nil
 }

 func getLayerDigests(layers []*LayerReader) ([]string, error) {
@@ -720,7 +879,7 @@ func PushModel(name string, regOpts *RegistryOptions, fn func(api.ProgressRespon
 	return nil
 }

-func PullModel(name string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
+func PullModel(ctx context.Context, name string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
 	mp := ParseModelPath(name)

 	fn(api.ProgressResponse{Status: "pulling manifest"})
@@ -735,7 +894,7 @@ func PullModel(name string, regOpts *RegistryOptions, fn func(api.ProgressRespon
 	layers = append(layers, &manifest.Config)

 	for _, layer := range layers {
-		if err := downloadBlob(mp, layer.Digest, regOpts, fn); err != nil {
+		if err := downloadBlob(ctx, mp, layer.Digest, regOpts, fn); err != nil {
 			return err
 		}
 	}
@@ -962,112 +1121,6 @@ func uploadBlobChunked(mp ModelPath, url string, layer *Layer, regOpts *Registry
 	return nil
 }

-func downloadBlob(mp ModelPath, digest string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
-	fp, err := GetBlobsPath(digest)
-	if err != nil {
-		return err
-	}
-
-	if fi, _ := os.Stat(fp); fi != nil {
-		// we already have the file, so return
-		fn(api.ProgressResponse{
-			Digest:    digest,
-			Total:     int(fi.Size()),
-			Completed: int(fi.Size()),
-		})
-
-		return nil
-	}
-
-	var size int64
-	chunkSize := 1024 * 1024 // 1 MiB in bytes
-
-	fi, err := os.Stat(fp + "-partial")
-	switch {
-	case errors.Is(err, os.ErrNotExist):
-		// noop, file doesn't exist so create it
-	case err != nil:
-		return fmt.Errorf("stat: %w", err)
-	default:
-		size = fi.Size()
-		// Ensure the size is divisible by the chunk size by removing excess bytes
-		size -= size % int64(chunkSize)
-
-		err := os.Truncate(fp+"-partial", size)
-		if err != nil {
-			return fmt.Errorf("truncate: %w", err)
-		}
-	}
-
-	url := fmt.Sprintf("%s/v2/%s/blobs/%s", mp.Registry, mp.GetNamespaceRepository(), digest)
-	headers := map[string]string{
-		"Range": fmt.Sprintf("bytes=%d-", size),
-	}
-
-	resp, err := makeRequest("GET", url, headers, nil, regOpts)
-	if err != nil {
-		log.Printf("couldn't download blob: %v", err)
-		return err
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusPartialContent {
-		body, _ := io.ReadAll(resp.Body)
-		return fmt.Errorf("on download registry responded with code %d: %v", resp.StatusCode, string(body))
-	}
-
-	err = os.MkdirAll(path.Dir(fp), 0o700)
-	if err != nil {
-		return fmt.Errorf("make blobs directory: %w", err)
-	}
-
-	out, err := os.OpenFile(fp+"-partial", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
-	if err != nil {
-		return fmt.Errorf("open file: %w", err)
-	}
-	defer out.Close()
-
-	remaining, _ := strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
-	completed := size
-	total := remaining + completed
-
-	for {
-		fn(api.ProgressResponse{
-			Status:    fmt.Sprintf("downloading %s", digest),
-			Digest:    digest,
-			Total:     int(total),
-			Completed: int(completed),
-		})
-
-		if completed >= total {
-			if err := out.Close(); err != nil {
-				return err
-			}
-
-			if err := os.Rename(fp+"-partial", fp); err != nil {
-				fn(api.ProgressResponse{
-					Status:    fmt.Sprintf("error renaming file: %v", err),
-					Digest:    digest,
-					Total:     int(total),
-					Completed: int(completed),
-				})
-				return err
-			}
-
-			break
-		}
-
-		n, err := io.CopyN(out, resp.Body, int64(chunkSize))
-		if err != nil && !errors.Is(err, io.EOF) {
-			return err
-		}
-		completed += n
-	}
-
-	log.Printf("success getting %s\n", digest)
-	return nil
-}
-
 func makeRequest(method, url string, headers map[string]string, body io.Reader, regOpts *RegistryOptions) (*http.Response, error) {
 	if !strings.HasPrefix(url, "http") {
 		if regOpts.Insecure {
@@ -1077,18 +1130,30 @@ func makeRequest(method, url string, headers map[string]string, body io.Reader,
 		}
 	}

-	req, err := http.NewRequest(method, url, body)
+	// make a copy of the body in case we need to try the call to makeRequest again
+	var buf bytes.Buffer
+	if body != nil {
+		_, err := io.Copy(&buf, body)
+		if err != nil {
+			return nil, err
+		}
+	}
+
+	bodyCopy := bytes.NewReader(buf.Bytes())
+
+	req, err := http.NewRequest(method, url, bodyCopy)
 	if err != nil {
 		return nil, err
 	}

-	for k, v := range headers {
-		req.Header.Set(k, v)
+	if regOpts.Token != "" {
+		req.Header.Set("Authorization", "Bearer "+regOpts.Token)
+	} else if regOpts.Username != "" && regOpts.Password != "" {
+		req.SetBasicAuth(regOpts.Username, regOpts.Password)
 	}

-	// TODO: better auth
-	if regOpts.Username != "" && regOpts.Password != "" {
-		req.SetBasicAuth(regOpts.Username, regOpts.Password)
+	for k, v := range headers {
+		req.Header.Set(k, v)
 	}

 	client := &http.Client{
@@ -1105,9 +1170,55 @@ func makeRequest(method, url string, headers map[string]string, body io.Reader,
 		return nil, err
 	}

+	// if the request is unauthenticated, try to authenticate and make the request again
+	if resp.StatusCode == http.StatusUnauthorized {
+		auth := resp.Header.Get("Www-Authenticate")
+		authRedir := ParseAuthRedirectString(string(auth))
+		token, err := getAuthToken(authRedir, regOpts)
+		if err != nil {
+			return nil, err
+		}
+		regOpts.Token = token
+		bodyCopy = bytes.NewReader(buf.Bytes())
+		return makeRequest(method, url, headers, bodyCopy, regOpts)
+	}
+
 	return resp, nil
 }

+func getValue(header, key string) string {
+	startIdx := strings.Index(header, key+"=")
+	if startIdx == -1 {
+		return ""
+	}
+
+	// Move the index to the starting quote after the key.
+	startIdx += len(key) + 2
+	endIdx := startIdx
+
+	for endIdx < len(header) {
+		if header[endIdx] == '"' {
+			if endIdx+1 < len(header) && header[endIdx+1] != ',' { // If the next character isn't a comma, continue
+				endIdx++
+				continue
+			}
+			break
+		}
+		endIdx++
+	}
+	return header[startIdx:endIdx]
+}
+
+func ParseAuthRedirectString(authStr string) AuthRedirect {
+	authStr = strings.TrimPrefix(authStr, "Bearer ")
+
+	return AuthRedirect{
+		Realm:   getValue(authStr, "realm"),
+		Service: getValue(authStr, "service"),
+		Scope:   getValue(authStr, "scope"),
+	}
+}
+
 var errDigestMismatch = fmt.Errorf("digest mismatch, file must be downloaded again")

 func verifyBlob(digest string) error {
--- a/server/routes.go
+++ b/server/routes.go
@@ -1,6 +1,7 @@
 package server

 import (
+	"context"
 	"encoding/json"
 	"errors"
 	"fmt"
@@ -17,15 +18,18 @@ import (

 	"github.com/gin-contrib/cors"
 	"github.com/gin-gonic/gin"
+	"gonum.org/v1/gonum/mat"

 	"github.com/jmorganca/ollama/api"
 	"github.com/jmorganca/ollama/llama"
+	"github.com/jmorganca/ollama/vector"
 )

 var loaded struct {
 	mu sync.Mutex

-	llm *llama.LLM
+	llm        *llama.LLM
+	Embeddings []vector.Embedding

 	expireAt    time.Time
 	expireTimer *time.Timer
@@ -34,6 +38,81 @@ var loaded struct {
 	options api.Options
 }

+// load a model into memory if it is not already loaded, it is up to the caller to lock loaded.mu before calling this function
+func load(model *Model, reqOpts map[string]interface{}, sessionDuration time.Duration) error {
+	opts := api.DefaultOptions()
+	if err := opts.FromMap(model.Options); err != nil {
+		log.Printf("could not load model options: %v", err)
+		return err
+	}
+
+	if err := opts.FromMap(reqOpts); err != nil {
+		log.Printf("could not merge model options: %v", err)
+		return err
+	}
+
+	if model.Digest != loaded.digest || !reflect.DeepEqual(loaded.options, opts) {
+		if loaded.llm != nil {
+			loaded.llm.Close()
+			loaded.llm = nil
+			loaded.digest = ""
+		}
+
+		if model.Embeddings != nil && len(model.Embeddings) > 0 {
+			opts.EmbeddingOnly = true // this is requried to generate embeddings, completions will still work
+			loaded.Embeddings = model.Embeddings
+		}
+
+		llm, err := llama.New(model.ModelPath, opts)
+		if err != nil {
+			return err
+		}
+
+		if opts.NumKeep < 0 {
+			promptWithSystem, err := model.Prompt(api.GenerateRequest{}, "")
+			if err != nil {
+				return err
+			}
+
+			promptNoSystem, err := model.Prompt(api.GenerateRequest{Context: []int{0}}, "")
+			if err != nil {
+				return err
+			}
+
+			tokensWithSystem := llm.Encode(promptWithSystem)
+			tokensNoSystem := llm.Encode(promptNoSystem)
+
+			llm.NumKeep = len(tokensWithSystem) - len(tokensNoSystem) + 1
+		}
+
+		loaded.llm = llm
+		loaded.digest = model.Digest
+		loaded.options = opts
+	}
+	loaded.expireAt = time.Now().Add(sessionDuration)
+
+	if loaded.expireTimer == nil {
+		loaded.expireTimer = time.AfterFunc(sessionDuration, func() {
+			loaded.mu.Lock()
+			defer loaded.mu.Unlock()
+
+			if time.Now().Before(loaded.expireAt) {
+				return
+			}
+
+			if loaded.llm == nil {
+				return
+			}
+
+			loaded.llm.Close()
+			loaded.llm = nil
+			loaded.digest = ""
+		})
+	}
+	loaded.expireTimer.Reset(sessionDuration)
+	return nil
+}
+
 func GenerateHandler(c *gin.Context) {
 	loaded.mu.Lock()
 	defer loaded.mu.Unlock()
@@ -52,63 +131,30 @@ func GenerateHandler(c *gin.Context) {
 		return
 	}

-	opts := api.DefaultOptions()
-	if err := opts.FromMap(model.Options); err != nil {
-		log.Printf("could not load model options: %v", err)
+	sessionDuration := 5 * time.Minute
+	if err := load(model, req.Options, sessionDuration); err != nil {
 		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 		return
 	}

-	if err := opts.FromMap(req.Options); err != nil {
-		log.Printf("could not merge model options: %v", err)
-		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
-		return
-	}
+	checkpointLoaded := time.Now()

-	if model.Digest != loaded.digest || !reflect.DeepEqual(loaded.options, opts) {
-		if loaded.llm != nil {
-			loaded.llm.Close()
-			loaded.llm = nil
-			loaded.digest = ""
-		}
-
-		llm, err := llama.New(model.ModelPath, opts)
+	embedding := ""
+	if model.Embeddings != nil && len(model.Embeddings) > 0 {
+		promptEmbed, err := loaded.llm.Embedding(req.Prompt)
 		if err != nil {
 			c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 			return
 		}
-
-		loaded.llm = llm
-		loaded.digest = model.Digest
-		loaded.options = opts
+		// TODO: set embed_top from specified parameters in modelfile
+		embed_top := 3
+		topK := vector.TopK(embed_top, mat.NewVecDense(len(promptEmbed), promptEmbed), loaded.Embeddings)
+		for _, e := range topK {
+			embedding = fmt.Sprintf("%s %s", embedding, e.Embedding.Data)
+		}
 	}

-	sessionDuration := 5 * time.Minute
-
-	loaded.expireAt = time.Now().Add(sessionDuration)
-	if loaded.expireTimer == nil {
-		loaded.expireTimer = time.AfterFunc(sessionDuration, func() {
-			loaded.mu.Lock()
-			defer loaded.mu.Unlock()
-
-			if time.Now().Before(loaded.expireAt) {
-				return
-			}
-
-			if loaded.llm == nil {
-				return
-			}
-
-			loaded.llm.Close()
-			loaded.llm = nil
-			loaded.digest = ""
-		})
-	}
-	loaded.expireTimer.Reset(sessionDuration)
-
-	checkpointLoaded := time.Now()
-
-	prompt, err := model.Prompt(req)
+	prompt, err := model.Prompt(req, embedding)
 	if err != nil {
 		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 		return
@@ -139,6 +185,44 @@ func GenerateHandler(c *gin.Context) {
 	streamResponse(c, ch)
 }

+func EmbeddingHandler(c *gin.Context) {
+	loaded.mu.Lock()
+	defer loaded.mu.Unlock()
+
+	var req api.EmbeddingRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
+		return
+	}
+
+	model, err := GetModel(req.Model)
+	if err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
+		return
+	}
+	if err := load(model, req.Options, 5*time.Minute); err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
+		return
+	}
+
+	if !loaded.options.EmbeddingOnly {
+		c.JSON(http.StatusBadRequest, gin.H{"error": "embedding option must be set to true"})
+		return
+	}
+
+	embedding, err := loaded.llm.Embedding(req.Prompt)
+	if err != nil {
+		log.Printf("embedding generation failed: %v", err)
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to generate embedding"})
+		return
+	}
+
+	resp := api.EmbeddingResponse{
+		Embedding: embedding,
+	}
+	c.JSON(http.StatusOK, resp)
+}
+
 func PullModelHandler(c *gin.Context) {
 	var req api.PullRequest
 	if err := c.ShouldBindJSON(&req); err != nil {
@@ -159,7 +243,10 @@ func PullModelHandler(c *gin.Context) {
 			Password: req.Password,
 		}

-		if err := PullModel(req.Name, regOpts, fn); err != nil {
+		ctx, cancel := context.WithCancel(c.Request.Context())
+		defer cancel()
+
+		if err := PullModel(ctx, req.Name, regOpts, fn); err != nil {
 			ch <- gin.H{"error": err.Error()}
 		}
 	}()
@@ -209,7 +296,10 @@ func CreateModelHandler(c *gin.Context) {
 			ch <- resp
 		}

-		if err := CreateModel(req.Name, req.Path, fn); err != nil {
+		ctx, cancel := context.WithCancel(c.Request.Context())
+		defer cancel()
+
+		if err := CreateModel(ctx, req.Name, req.Path, fn); err != nil {
 			ch <- gin.H{"error": err.Error()}
 		}
 	}()
@@ -301,11 +391,10 @@ func CopyModelHandler(c *gin.Context) {
 	}
 }

-func Serve(ln net.Listener) error {
+func Serve(ln net.Listener, origins []string) error {
 	config := cors.DefaultConfig()
 	config.AllowWildcard = true
-	// only allow http/https from localhost
-	config.AllowOrigins = []string{
+	config.AllowOrigins = append(origins, []string{
 		"http://localhost",
 		"http://localhost:*",
 		"https://localhost",
@@ -314,7 +403,11 @@ func Serve(ln net.Listener) error {
 		"http://127.0.0.1:*",
 		"https://127.0.0.1",
 		"https://127.0.0.1:*",
-	}
+		"http://0.0.0.0",
+		"http://0.0.0.0:*",
+		"https://0.0.0.0",
+		"https://0.0.0.0:*",
+	}...)

 	r := gin.Default()
 	r.Use(cors.New(config))
@@ -328,6 +421,7 @@ func Serve(ln net.Listener) error {

 	r.POST("/api/pull", PullModelHandler)
 	r.POST("/api/generate", GenerateHandler)
+	r.POST("/api/embeddings", EmbeddingHandler)
 	r.POST("/api/create", CreateModelHandler)
 	r.POST("/api/push", PushModelHandler)
 	r.POST("/api/copy", CopyModelHandler)
@@ -343,6 +437,7 @@ func Serve(ln net.Listener) error {
 }

 func streamResponse(c *gin.Context, ch chan any) {
+	c.Header("Content-Type", "application/x-ndjson")
 	c.Stream(func(w io.Writer) bool {
 		val, ok := <-ch
 		if !ok {
--- a/vector/store.go
+++ b/vector/store.go
@@ -0,0 +1,69 @@
+package vector
+
+import (
+	"container/heap"
+	"sort"
+
+	"gonum.org/v1/gonum/mat"
+)
+
+type Embedding struct {
+	Vector []float64 // the embedding vector
+	Data   string    // the data represted by the embedding
+}
+
+type EmbeddingSimilarity struct {
+	Embedding  Embedding // the embedding that was used to calculate the similarity
+	Similarity float64   // the similarity between the embedding and the query
+}
+
+type Heap []EmbeddingSimilarity
+
+func (h Heap) Len() int           { return len(h) }
+func (h Heap) Less(i, j int) bool { return h[i].Similarity < h[j].Similarity }
+func (h Heap) Swap(i, j int)      { h[i], h[j] = h[j], h[i] }
+func (h *Heap) Push(e any) {
+	*h = append(*h, e.(EmbeddingSimilarity))
+}
+
+func (h *Heap) Pop() interface{} {
+	old := *h
+	n := len(old)
+	x := old[n-1]
+	*h = old[0 : n-1]
+	return x
+}
+
+// cosineSimilarity is a measure that calculates the cosine of the angle between two vectors.
+// This value will range from -1 to 1, where 1 means the vectors are identical.
+func cosineSimilarity(vec1, vec2 *mat.VecDense) float64 {
+	dotProduct := mat.Dot(vec1, vec2)
+	norms := mat.Norm(vec1, 2) * mat.Norm(vec2, 2)
+
+	if norms == 0 {
+		return 0
+	}
+	return dotProduct / norms
+}
+
+func TopK(k int, query *mat.VecDense, embeddings []Embedding) []EmbeddingSimilarity {
+	h := &Heap{}
+	heap.Init(h)
+	for _, emb := range embeddings {
+		similarity := cosineSimilarity(query, mat.NewVecDense(len(emb.Vector), emb.Vector))
+		heap.Push(h, EmbeddingSimilarity{Embedding: emb, Similarity: similarity})
+		if h.Len() > k {
+			heap.Pop(h)
+		}
+	}
+
+	topK := make([]EmbeddingSimilarity, 0, h.Len())
+	for h.Len() > 0 {
+		topK = append(topK, heap.Pop(h).(EmbeddingSimilarity))
+	}
+	sort.Slice(topK, func(i, j int) bool {
+		return topK[i].Similarity > topK[j].Similarity
+	})
+
+	return topK
+}
--- a/web/.eslintrc.json
+++ b/web/.eslintrc.json
@@ -1,3 +0,0 @@
-{
-  "extends": "next/core-web-vitals"
-}
--- a/web/.gitignore
+++ b/web/.gitignore
@@ -1,35 +0,0 @@
-# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
-
-# dependencies
-/node_modules
-/.pnp
-.pnp.js
-
-# testing
-/coverage
-
-# next.js
-/.next/
-/out/
-
-# production
-/build
-
-# misc
-.DS_Store
-*.pem
-
-# debug
-npm-debug.log*
-yarn-debug.log*
-yarn-error.log*
-
-# local env files
-.env*.local
-
-# vercel
-.vercel
-
-# typescript
-*.tsbuildinfo
-next-env.d.ts
--- a/web/README.md
+++ b/web/README.md
@@ -1,9 +0,0 @@
-# Ollama.ai
-
-This website renders helpful information, blog posts, docs and more for the Ollama project.
-
-## Develop
-
-```bash
-npm run dev
-```
--- a/web/app/api/signup/route.ts
+++ b/web/app/api/signup/route.ts
@@ -1,27 +0,0 @@
-import { Analytics } from '@segment/analytics-node'
-import { v4 as uuid } from 'uuid'
-
-const analytics = new Analytics({ writeKey: process.env.TELEMETRY_WRITE_KEY || '<empty>' })
-
-export async function POST(req: Request) {
-  const { email } = await req.json()
-
-  const id = uuid()
-
-  await analytics.identify({
-    anonymousId: id,
-    traits: {
-      email,
-    },
-  })
-
-  await analytics.track({
-    anonymousId: id,
-    event: 'signup',
-    properties: {
-      email,
-    },
-  })
-
-  return new Response(null, { status: 200 })
-}
--- a/web/app/api/update/route.ts
+++ b/web/app/api/update/route.ts
@@ -1,43 +0,0 @@
-import { NextResponse } from 'next/server'
-import semver from 'semver'
-
-export async function GET(req: Request) {
-  const { searchParams } = new URL(req.url)
-
-  const os = searchParams.get('os') || 'darwin'
-  const version = searchParams.get('version') || '0.0.0'
-
-  if (!version) {
-    return new Response('not found', { status: 404 })
-  }
-
-  const res = await fetch('https://api.github.com/repos/jmorganca/ollama/releases', { next: { revalidate: 60 } })
-  const data = await res.json()
-
-  const latest = data?.filter((f: any) => !f.prerelease)?.[0]
-
-  if (!latest) {
-    return new Response('not found', { status: 404 })
-  }
-
-  const assets = latest.assets || []
-
-  if (assets.length === 0) {
-    return new Response('not found', { status: 404 })
-  }
-
-  // todo: get the correct asset for the current arch/os
-  const asset = assets.find((a: any) => a.name.toLowerCase().includes(os) && a.name.toLowerCase().includes('.zip'))
-
-  if (!asset) {
-    return new Response('not found', { status: 404 })
-  }
-
-  console.log(asset)
-
-  if (semver.lt(version, latest.tag_name)) {
-    return NextResponse.json({ version: data.tag_name, url: asset.browser_download_url })
-  }
-
-  return new Response(null, { status: 204 })
-}
--- a/web/app/download/downloader.tsx
+++ b/web/app/download/downloader.tsx
@@ -1,11 +0,0 @@
-'use client'
-
-import { useEffect } from 'react'
-
-export default function Downloader({ url }: { url: string }) {
-  useEffect(() => {
-    window.location.href = url
-  }, [])
-
-  return null
-}
--- a/web/app/download/page.tsx
+++ b/web/app/download/page.tsx
@@ -1,47 +0,0 @@
-import Image from 'next/image'
-
-import Header from '../header'
-import Downloader from './downloader'
-import Signup from './signup'
-
-export default async function Download() {
-  const res = await fetch('https://api.github.com/repos/jmorganca/ollama/releases', { next: { revalidate: 60 } })
-  const data = await res.json()
-
-  if (data.length === 0) {
-    return null
-  }
-
-  const latest = data[0]
-  const assets = latest.assets || []
-
-  if (assets.length === 0) {
-    return null
-  }
-
-  // todo: get the correct asset for the current arch/os
-  const asset = assets.find(
-    (a: any) => a.name.toLowerCase().includes('darwin') && a.name.toLowerCase().includes('.zip')
-  )
-
-  if (!asset) {
-    return null
-  }
-
-  return (
-    <>
-      <Header />
-      <main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 lg:p-32 items-center mx-auto'>
-        <Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
-        <section className='mt-12 mb-8 text-center'>
-          <h2 className='my-2 max-w-md text-3xl tracking-tight'>Downloading...</h2>
-          <h3 className='text-base text-neutral-500 mt-12 max-w-[16rem]'>
-            While Ollama downloads, sign up to get notified of new updates.
-          </h3>
-          <Downloader url={asset.browser_download_url} />
-        </section>
-        <Signup />
-      </main>
-    </>
-  )
-}
--- a/web/app/download/signup.tsx
+++ b/web/app/download/signup.tsx
@@ -1,51 +0,0 @@
-'use client'
-
-import { useState } from 'react'
-
-export default function Signup() {
-  const [email, setEmail] = useState('')
-  const [submitting, setSubmitting] = useState(false)
-  const [success, setSuccess] = useState(false)
-
-  return (
-    <form
-      onSubmit={async e => {
-        e.preventDefault()
-
-        setSubmitting(true)
-
-        await fetch('/api/signup', {
-          method: 'POST',
-          headers: {
-            'Content-Type': 'application/json',
-          },
-          body: JSON.stringify({ email }),
-        })
-
-        setSubmitting(false)
-        setSuccess(true)
-        setEmail('')
-
-        return false
-      }}
-      className='flex self-stretch flex-col gap-3 h-32 md:mx-40 lg:mx-72'
-    >
-      <input
-        required
-        autoFocus
-        value={email}
-        onChange={e => setEmail(e.target.value)}
-        type='email'
-        placeholder='your@email.com'
-        className='border border-neutral-200 rounded-lg px-4 py-2 focus:outline-none placeholder-neutral-300'
-      />
-      <input
-        type='submit'
-        value='Get updates'
-        disabled={submitting}
-        className='bg-black text-white disabled:text-neutral-200 disabled:bg-neutral-700 rounded-full px-4 py-2 focus:outline-none cursor-pointer'
-      />
-      {success && <p className='text-center text-sm'>You&apos;re signed up for updates</p>}
-    </form>
-  )
-}
--- a/web/app/globals.css
+++ b/web/app/globals.css
@@ -1,3 +0,0 @@
-@tailwind base;
-@tailwind components;
-@tailwind utilities;
--- a/web/app/header.tsx
+++ b/web/app/header.tsx
@@ -1,26 +0,0 @@
-import Link from "next/link"
-
-const navigation = [
-  { name: 'Discord', href: 'https://discord.com/invite/ollama' },
-  { name: 'GitHub', href: 'https://github.com/jmorganca/ollama' },
-  { name: 'Download', href: '/download' },
-]
-
-export default function Header() {  
-  return (
-    <header className="absolute inset-x-0 top-0 z-50">
-      <nav className="mx-auto flex items-center justify-between px-10 py-4">        
-        <Link className="flex-1 font-bold" href="/">
-          Ollama
-        </Link>
-        <div className="flex space-x-8">
-          {navigation.map((item) => (
-            <Link key={item.name} href={item.href} className="text-sm leading-6 text-gray-900">
-              {item.name}
-            </Link>
-          ))}
-        </div>
-      </nav>
-    </header>
-  )
-}
--- a/web/app/icon.png
+++ b/web/app/icon.png
--- a/web/app/layout.tsx
+++ b/web/app/layout.tsx
@@ -1,14 +0,0 @@
-import './globals.css'
-
-export const metadata = {
-  title: 'Ollama',
-  description: 'A tool for running large language models',
-}
-
-export default function RootLayout({ children }: { children: React.ReactNode }) {
-  return (
-    <html lang='en'>
-      <body className='antialiased'>{children}</body>
-    </html>
-  )
-}
--- a/web/app/page.tsx
+++ b/web/app/page.tsx
@@ -1,37 +0,0 @@
-import Image from 'next/image'
-import Link from 'next/link'
-
-import Header from './header'
-
-export default async function Home() {
-  return (
-    <>
-      <Header />
-      <main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 md:p-32 items-center mx-auto'>
-        <Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
-        <section className='my-12 text-center'>
-          <div className='flex flex-col space-y-2'>
-            <h2 className='md:max-w-md mx-auto my-2 text-3xl tracking-tight'>
-              Get up and running with large language models, locally.
-            </h2>
-            <h3 className='md:max-w-xs mx-auto text-base text-neutral-500'>
-              Run Llama 2 and other models on macOS. Customize and create your own.
-            </h3>
-          </div>
-          <div className='mx-auto max-w-xs flex flex-col space-y-4 mt-12'>
-            <Link
-              href='/download'
-              className='md:mx-10 lg:mx-14 bg-black text-white rounded-full px-4 py-2 focus:outline-none cursor-pointer'
-            >
-              Download
-            </Link>
-            <p className='text-neutral-500 text-sm '>
-              Available for macOS with Apple Silicon <br />
-              Windows & Linux support coming soon.
-            </p>
-          </div>
-        </section>
-      </main>
-    </>
-  )
-}
--- a/web/jsconfig.json
+++ b/web/jsconfig.json
@@ -1,7 +0,0 @@
-{
-  "compilerOptions": {
-    "paths": {
-      "@/*": ["./*"]
-    }
-  }
-}
--- a/web/next.config.js
+++ b/web/next.config.js
@@ -1,4 +0,0 @@
-/** @type {import('next').NextConfig} */
-const nextConfig = {}
-
-module.exports = nextConfig
--- a/web/package-lock.json
+++ b/web/package-lock.json
--- a/web/package.json
+++ b/web/package.json
@@ -1,37 +0,0 @@
-{
-  "name": "web",
-  "version": "0.0.0",
-  "scripts": {
-    "dev": "next dev",
-    "build": "next build",
-    "start": "next start",
-    "lint": "next lint"
-  },
-  "dependencies": {
-    "@octokit/rest": "^19.0.13",
-    "@octokit/types": "^11.0.0",
-    "@segment/analytics-node": "^1.0.0",
-    "@types/node": "20.4.0",
-    "@types/react": "18.2.14",
-    "@types/react-dom": "18.2.6",
-    "autoprefixer": "10.4.14",
-    "encoding": "^0.1.13",
-    "eslint": "8.44.0",
-    "eslint-config-next": "13.4.7",
-    "next": "13.4.9",
-    "postcss": "8.4.24",
-    "react": "18.2.0",
-    "react-dom": "18.2.0",
-    "react-icons": "^4.10.1",
-    "semver": "^7.5.3",
-    "tailwindcss": "3.3.2",
-    "typescript": "5.1.6",
-    "uuid": "^9.0.0"
-  },
-  "devDependencies": {
-    "@types/semver": "^7.5.0",
-    "@types/uuid": "^9.0.2",
-    "prettier": "^3.0.0",
-    "prettier-plugin-tailwindcss": "^0.4.0"
-  }
-}
--- a/web/postcss.config.js
+++ b/web/postcss.config.js
@@ -1,6 +0,0 @@
-module.exports = {
-  plugins: {
-    tailwindcss: {},
-    autoprefixer: {},
-  },
-}
--- a/web/public/ollama.png
+++ b/web/public/ollama.png
--- a/web/tailwind.config.js
+++ b/web/tailwind.config.js
@@ -1,18 +0,0 @@
-/** @type {import('tailwindcss').Config} */
-module.exports = {
-  content: [
-    './pages/**/*.{js,ts,jsx,tsx,mdx}',
-    './components/**/*.{js,ts,jsx,tsx,mdx}',
-    './app/**/*.{js,ts,jsx,tsx,mdx}',
-  ],
-  theme: {
-    extend: {
-      fontFamily: {
-        sans: ['sans-serif'],
-        serif: ['serif'],
-        monospace: ['monospace'],
-      },
-    },
-  },
-  plugins: [],
-}
--- a/web/tsconfig.json
+++ b/web/tsconfig.json
@@ -1,28 +0,0 @@
-{
-  "compilerOptions": {
-    "target": "es5",
-    "lib": ["dom", "dom.iterable", "esnext"],
-    "allowJs": true,
-    "skipLibCheck": true,
-    "strict": true,
-    "forceConsistentCasingInFileNames": true,
-    "noEmit": true,
-    "esModuleInterop": true,
-    "module": "esnext",
-    "moduleResolution": "node",
-    "resolveJsonModule": true,
-    "isolatedModules": true,
-    "jsx": "preserve",
-    "incremental": true,
-    "plugins": [
-      {
-        "name": "next"
-      }
-    ],
-    "paths": {
-      "@/*": ["./*"]
-    }
-  },
-  "include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
-  "exclude": ["node_modules"]
-}
--- a/web/vercel.json
+++ b/web/vercel.json
@@ -1,5 +0,0 @@
-{
-    "github": {
-        "silent": true
-    }
-}
Author	SHA1	Message	Date
Jeffrey Morgan	7e26a8df31	cmd: use environment variables for server options	2023-08-10 14:17:53 -07:00
Jeffrey Morgan	4ab1da38ba	guard around `id()`	2023-08-10 14:11:54 -07:00
Patrick Devine	be989d89d1	Token auth (#314 )	2023-08-10 11:34:25 -07:00
Soroush Javadi	bea683e3bf	cmd: check GetBlobsPath error (#317 ) The error returned by `server.GetBlobsPath` in `showLayer` was never checked. Check the error and return if not nil. Also, make newlines at the end of error messages consistent and fix a typo.	2023-08-10 09:57:49 -07:00
Jeffrey Morgan	178237d37f	tweak `README.md`	2023-08-10 09:54:03 -07:00
Jeffrey Morgan	76a678af34	app: dont always show installer window on top now that it lives in the dock	2023-08-10 09:53:46 -07:00
Jeffrey Morgan	f65169b13e	clean up cli flags	2023-08-10 09:28:56 -07:00
Jeffrey Morgan	040a5b9750	clean up cli flags	2023-08-10 09:27:03 -07:00
Bruce MacDonald	5b5cc9c9f1	embeddings endpoint	2023-08-10 11:49:55 -04:00
Bruce MacDonald	4b3507f036	embeddings endpoint Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>	2023-08-10 11:45:57 -04:00
Jun Tian	5ebce03c77	Add an example on multiline input (#311 )	2023-08-10 08:22:28 -07:00
Bruce MacDonald	5e25f801ed	fix a typo in the tweetwriter example Modelfile	2023-08-10 10:19:53 -04:00
Bruce MacDonald	8e1234b758	fix embeddings invalid values	2023-08-10 10:17:00 -04:00
Soroush Javadi	10885986b8	fix a typo in the tweetwriter example Modelfile	2023-08-10 15:12:48 +03:30
Bruce MacDonald	984c9c628c	fix embeddings invalid values	2023-08-09 16:50:53 -04:00
Bruce MacDonald	c4861360ec	remove embed docs	2023-08-09 16:14:19 -04:00
Bruce MacDonald	9738ef85db	allow for concurrent pulls of the same files	2023-08-09 11:35:24 -04:00
Bruce MacDonald	ac971c56d1	Update images.go	2023-08-09 11:31:54 -04:00
Bruce MacDonald	8228d166ce	pr comments	2023-08-09 11:31:54 -04:00
Bruce MacDonald	907e6c56b3	unlock downloadu in case or requestDownload err	2023-08-09 11:31:54 -04:00
Bruce MacDonald	868e3b31c7	allow for concurrent pulls of the same files	2023-08-09 11:31:54 -04:00
Bruce MacDonald	09d8bf6730	fix build errors	2023-08-09 10:45:57 -04:00
Bruce MacDonald	7a5f3616fd	embed text document in modelfile	2023-08-09 10:26:19 -04:00
Jeffrey Morgan	cff002b824	use content type `application/x-ndjson` for streaming responses	2023-08-08 21:38:10 -07:00
Jeffrey Morgan	55cf5021f0	update langchain example to include python	2023-08-08 21:03:10 -07:00
Jeffrey Morgan	f58caa5ab5	update `README.md`	2023-08-08 15:50:23 -07:00
Jeffrey Morgan	82df473ec9	use note syntax in `README.md`	2023-08-08 15:49:50 -07:00
Jeffrey Morgan	e184c1d035	Link to `api.md` in `README.md`	2023-08-08 15:48:47 -07:00
Jeffrey Morgan	371d4e5df3	docs: fix invalid json in `api.md`	2023-08-08 15:46:05 -07:00
Jeffrey Morgan	1f78e409b4	docs: format with `prettier`	2023-08-08 15:41:48 -07:00
Jeffrey Morgan	34a88cd776	docs: update `api.md` formatting	2023-08-08 15:41:19 -07:00
Bruce MacDonald	1bee2347be	pr feedback - defer closing llm on embedding - do not override licenses - remove debugging print line - reformat model file docs	2023-08-08 17:01:37 -04:00
Jeffrey Morgan	a027a7dd65	add `0.0.0.0` as an allowed origin by default Fixes #282	2023-08-08 13:39:50 -07:00
Jeffrey Morgan	22986ccb38	add `llama2:70b` to the model library list	2023-08-08 13:08:05 -07:00
Bruce MacDonald	884d78ceb3	allow embedding from model binary	2023-08-08 14:38:57 -04:00
Bruce MacDonald	3ceac05108	Add embedding docs	2023-08-08 14:04:11 -04:00
Bruce MacDonald	21ddcaa1f1	pr comments - default to embeddings enabled - move embedding logic for loaded model to request - allow embedding full directory - close llm on reload	2023-08-08 13:49:37 -04:00
Michael Yang	f2074ed4c0	Merge pull request #306 from jmorganca/default-keep-system automatically set num_keep if num_keep < 0	2023-08-08 09:25:34 -07:00
Bruce MacDonald	a6f6d18f83	embed text document in modelfile	2023-08-08 11:27:17 -04:00
Bruce MacDonald	34a13a9d05	pass flags to `serve` to allow setting allowed-origins + host and port	2023-08-08 10:41:42 -04:00
Jeffrey Morgan	8713ac23a8	allow overriding `template` and `system` in `/api/generate` Fixes #297 Fixes #296	2023-08-08 00:55:34 -04:00
Jeffrey Morgan	5eb712f962	trim whitespace before checking stop conditions Fixes #295	2023-08-08 00:29:19 -04:00
Michael Yang	4dc5b117dd	automatically set num_keep if num_keep < 0 num_keep defines how many tokens to keep in the context when truncating inputs. if left to its default value of -1, the server will calculate num_keep to be the left of the system instructions	2023-08-07 16:19:12 -07:00
Matt Williams	931a5f3cb9	Merge pull request #304 from jmorganca/matt/docs missed a backtick	2023-08-07 15:14:06 -07:00
Jeffrey Morgan	639288bf2b	make `ollama` binary executable on build	2023-08-07 18:10:37 -04:00
Jeffrey Morgan	d112c15d58	remove old `library` and `web` directories	2023-08-07 18:09:24 -04:00
Matt Williams	1267895e44	missed a backtick Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-07 13:53:49 -07:00
Matt Williams	089d03bc8d	Merge pull request #289 from jmorganca/docs First draft of API Docs	2023-08-07 13:46:22 -07:00
Michael Yang	ab3ced9d32	Merge pull request #276 from jmorganca/rope-freq configurable rope frequency parameters	2023-08-07 13:39:38 -07:00
Matt Williams	0c52b4509b	get rid of namespace and site Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-07 13:27:58 -07:00
Matt Williams	13aace3d34	clarify some more Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-07 13:21:54 -07:00
Matt Williams	2b3bb41598	model name format added Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-07 13:17:16 -07:00
cmiller01	93492f1e18	correct precedence of serve params (args over env over default)	2023-08-07 19:55:20 +00:00
Michael Chiang	54ba3e2ceb	langchain JS integration (#302 ) langchain JS integration	2023-08-07 12:21:36 -04:00
Matt Williams	4904cd8bcd	update simpler code samples Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-07 07:40:38 -07:00
Matt Williams	8a45359ec6	Update docs/api.md Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2023-08-07 07:33:05 -07:00
cmiller01	fb593b7bfc	pass flags to `serve` to allow setting allowed-origins + host and port * resolves: https://github.com/jmorganca/ollama/issues/300 and https://github.com/jmorganca/ollama/issues/282 * example usage: ``` ollama serve --port 9999 --allowed-origins "http://foo.example.com,http://192.0.0.1" ```	2023-08-07 03:34:37 +00:00
Matt Williams	2544b8afa1	update as per Mike's comments Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 17:42:24 -07:00
Matt Williams	ac1b04f271	Update docs/api.md Co-authored-by: Michael Yang <mxyng@pm.me>	2023-08-04 17:40:52 -07:00
Matt Williams	123fdeb919	Update docs/api.md Co-authored-by: Michael Yang <mxyng@pm.me>	2023-08-04 17:38:52 -07:00
Matt Williams	5c82bf95d1	Update docs/api.md Co-authored-by: Michael Yang <mxyng@pm.me>	2023-08-04 17:12:24 -07:00
Matt Williams	38a9b1618c	missed some quotes Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 16:09:07 -07:00
Matt Williams	c18be72a3b	complete 1st draft of api docs Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 16:08:11 -07:00
Matt Williams	a101fe51a7	clean up Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 12:56:41 -07:00
Bruce MacDonald	06fc48ad66	Update README.md (#285 ) Ollama now supports Intel Macs	2023-08-04 15:45:55 -04:00
Matt Williams	d93e2f9210	fleshing out response Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 12:38:58 -07:00
Matt Williams	31edc829fc	continuing Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 12:30:23 -07:00
Matt Williams	b31104768c	filling out generate Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 12:27:47 -07:00
Matt Williams	b662d9fd8c	starting to build out some docs Signed-off-by: Matt Williams <m@technovangelist.com>	2023-08-04 11:55:00 -07:00
Michael Yang	b9f4d67554	configurable rope frequency parameters	2023-08-03 22:11:58 -07:00