Commit Graph

  • 6d36b8dcfb
    benchmark: remove unused benchmark test (#11120) Jeffrey Morgan 2025-06-18 12:58:50 -0700
  • 5e3fb4744b
    Revert "Revert "ggml: Export GPU UUIDs" (#11115)" (#11117) Jeffrey Morgan 2025-06-18 07:30:49 -0700
  • c5237d9462
    Revert "ggml: Export GPU UUIDs" (#11115) Jeffrey Morgan 2025-06-18 05:45:00 -0700
  • 4f1588bc37
    Revert "feat: incremental gguf parser (#10822)" (#11114) Jeffrey Morgan 2025-06-18 05:42:44 -0700
  • 8c3501c161
    cache: fix comment function name in cache.go (#11110) 曹家巧 2025-06-18 20:21:45 +0800
  • 829e77105a
    tools: return empty arguments object instead of null (#11113) Jeffrey Morgan 2025-06-18 05:20:43 -0700
  • 1dc12706c5
    tools: fix parsing tool calls without any parameters (#11101) Jeffrey Morgan 2025-06-17 10:51:43 -0700
  • 2c371ff357
    model: treat 'user defined' tokens as special tokens (#11077) Jeffrey Morgan 2025-06-16 16:03:16 -0700
  • 142efb91b1
    gguf: fix write order (#11068) Michael Yang 2025-06-16 10:42:32 -0700
  • 7e0b662c6c
    readme: add ollama-launcher to community integrations (#11080) NGC13009 2025-06-16 12:27:49 +0800
  • 4c7cf115fe
    readme: add GPTranslate to community integrations (#11071) Phil 2025-06-14 17:54:03 +0200
  • 2d86651985
    tools: loosen tool parsing to allow for more formats (#11030) Jeffrey Morgan 2025-06-12 14:18:54 -0700
  • 2c6f1dc9c8
    feat: incremental gguf parser (#10822) Michael Yang 2025-06-12 11:04:11 -0700
  • db3a312edf
    feat: uneven splits (#11048) Michael Yang 2025-06-11 12:10:54 -0700
  • 0d5c118679
    skip tokenizer.model if possible (#11050) Michael Yang 2025-06-11 12:10:35 -0700
  • eb2c2d61e5
    use nn.Linear in place of ml.Tensor (#11049) Michael Yang 2025-06-11 12:10:15 -0700
  • 4fff1738a4
    readme: add ollama-multirun to community integrations (#11038) Attogram Project 2025-06-10 23:14:51 +0200
  • 26a1129d71
    readme: update quickstart link text to Gemma 3 Jeffrey Morgan 2025-06-10 09:34:23 -0700
  • deaf879fb9
    readme: update quickstart example to Gemma 3 Jeffrey Morgan 2025-06-10 09:33:54 -0700
  • 3d1278ab26
    mac: handle "keep" named apps (#11031) Daniel Hiltgen 2025-06-09 16:29:57 -0700
  • 1effde30cb
    spawn desktop quickly (#11011) Daniel Hiltgen 2025-06-08 09:34:52 -0700
  • 874e02626f
    docs: update link to AMD drivers in linux.md (#10973) Krzysztof Jeziorny 2025-06-07 05:30:04 +0200
  • 3b70283d35
    Revert "server: add model capabilities to the list endpoint (#10174)" (#11004) Jeffrey Morgan 2025-06-06 23:29:14 -0400
  • 5d8b0297df
    launch app hidden (#10962) Daniel Hiltgen 2025-06-06 14:06:29 -0700
  • 1b1eb74ab1
    win: handle more than 2048 processes (#10997) Daniel Hiltgen 2025-06-06 14:06:09 -0700
  • 1fb5a3d56a
    move thinking logic into its own package (#10990) Devon Rifkin 2025-06-06 12:02:20 -0700
  • 8b158c2049
    docs: fix typo in development.md (#10998) Hunter Wittenborn 2025-06-06 11:07:29 -0500
  • 237fdab92d
    export ThinkingParser Devon Rifkin 2025-06-05 10:22:32 -0700
  • 47bebce5f8
    server: add model capabilities to the list endpoint (#10174) JasonHonKL 2025-06-05 02:39:48 +0800
  • dfd002e57f
    readme: add SimpleOllamaUnity to community integrations (#10817) HardCodeDev 2025-05-31 06:50:16 +0400
  • b43f6b223c
    tools: resiliency upgrade to name and arg extraction from template (#10917) Parth Sareen 2025-05-30 15:18:09 -0700
  • 0b9c6cb497
    ggml: Export GPU UUIDs Jesse Gross 2025-04-24 11:48:49 -0700
  • f6fc508ec6
    llm: Make "POST predict" error message more informative Jesse Gross 2025-05-13 17:26:46 -0700
  • 026aba9f11
    add thinking support to the api and cli (#10584) Devon Rifkin 2025-05-28 19:38:52 -0700
  • a2bdc43bc8
    client: add request signing to the client (#10881) Patrick Devine 2025-05-27 16:50:57 -0700
  • 9c5c197393
    kvcache: Skip computing causal mask for worst case graph reservation Jesse Gross 2025-05-27 13:33:57 -0700
  • 8d989025e2
    server: abort download on empty digest Kyle Steere 2025-05-27 18:28:48 +0000
  • 75e3b372a1
    tools: relax JSON parse constraints for tool calling (#10872) Parth Sareen 2025-05-26 18:59:06 -0700
  • 951b332cd2
    tools: remove newline stripping (#10869) Parth Sareen 2025-05-26 17:16:00 -0700
  • 2f6d9234ac
    readme: add AWS Strands Agents SDK example to community integrations (#10865) RAPID ARCHITECT 2025-05-26 14:05:03 -0500
  • 02b0285474
    readme: Add macLlama to community integrations (#10790) Min Yoo 2025-05-25 05:18:32 +0900
  • 6185310f2f
    tests: drop llama3.2-vision embedding tests (#10837) Daniel Hiltgen 2025-05-24 13:17:53 -0700
  • 56765df3ee
    docs: remove unsupported quantizations (#10842) frob 2025-05-24 22:17:26 +0200
  • 4fed7101b7
    server: add hint to the error message when model path access fails (#10843) frob 2025-05-24 22:17:04 +0200
  • f34f58bbb2
    ml: Improve slog formatting for BackendMemory Jesse Gross 2025-05-23 15:37:32 -0700
  • 8cd2b6478e
    tools: refactor tool call parsing and enable streaming (#10415) Parth Sareen 2025-05-23 14:19:31 -0700
  • 5ae2770e0d
    llama: add minimum memory for grammar (#10820) Parth Sareen 2025-05-22 18:53:31 -0700
  • d1ed4b17ef
    ml: Panic rather than return error on tensor allocation failure Jesse Gross 2025-05-19 10:43:56 -0700
  • 6e68feda00
    ollamarunner: Memory usage reporting Jesse Gross 2025-04-17 11:00:25 -0700
  • b3de134eda
    ggml: Report graph memory for failed allocations Jesse Gross 2025-05-16 14:05:08 -0700
  • 99880e7254
    sched: fix runner leak during reloading unload (#10819) Daniel Hiltgen 2025-05-22 14:31:36 -0700
  • df4b146c49
    fix: mllama quality (#10807) Michael Yang 2025-05-22 11:30:49 -0700
  • d25bde723c
    server: improve tensor quantization fallback logic (#10806) Bruce MacDonald 2025-05-22 10:48:08 -0700
  • 1dbe9ba784
    integration: add qwen2.5-vl (#10815) Daniel Hiltgen 2025-05-22 09:12:32 -0700
  • 197db4eccd
    remove support for multiple ggufs in a single file (#10722) Michael Yang 2025-05-21 13:55:31 -0700
  • bf0fbfeb0e
    win: detect background upgrade in progress (#10785) Daniel Hiltgen 2025-05-21 10:46:56 -0700
  • dc8ee7636b
    feat: port qwen2 model (#10782) Michael Yang 2025-05-21 10:21:24 -0700
  • 9215b190fa
    feat: qwen3 dense and sparse models (#10708) Michael Yang 2025-05-21 10:21:07 -0700
  • 7f3e4d6f06
    fix cmakelists (#10804) Michael Yang 2025-05-21 09:52:52 -0700
  • 02fd383448
    chore: disable debug in binary libraries (#10788) Michael Yang 2025-05-21 09:39:38 -0700
  • 9213339549
    fix: qwen25vl assign samebatch in multimodal input (#10789) Michael Yang 2025-05-21 09:39:20 -0700
  • 20dcadf7e8
    ml: add more rope options (#10775) Michael Yang 2025-05-20 15:51:08 -0700
  • 3decfd28a8
    llama: fix incorrect initialization of C.struct_common_sampler_cparams.penalty_present (#10779) DarkCaster 2025-05-20 20:41:15 +0300
  • 20a612834f
    fix llama and mistral3 models (#10774) Michael Yang 2025-05-19 15:06:35 -0700
  • dba546a24a
    llm: Use first layer as memory buffer in estimation Jesse Gross 2025-05-19 11:40:44 -0700
  • f7a5f0da58
    avoid kv truncation during create (#10761) Daniel Hiltgen 2025-05-19 13:54:54 -0700
  • 7b9ab4cb32
    ggml: Seperate tensor load from backend creation Jesse Gross 2025-04-17 13:42:40 -0700
  • 07030ffa59
    llm: Estimate projector memory correctly for Ollama engine Jesse Gross 2025-05-13 11:36:52 -0700
  • a9beff33f8
    llm: Consistently track unassigned model data Jesse Gross 2025-05-13 13:04:20 -0700
  • b84eda2b82
    readme: add TinyNotepad to community integrations (#10763) Ronald Wilson 2025-05-19 01:13:22 +0530
  • af9708c72d
    model: handle multiple eos tokens (#10577) Michael Yang 2025-05-16 13:40:23 -0700
  • 48a1fc0830
    Fix lingering Q4_0 help reference (#10720) Daniel Hiltgen 2025-05-15 16:33:23 -0700
  • 88114310e6
    cmd: add ellipses to truncated show metadata (#10717) Bruce MacDonald 2025-05-15 15:45:52 -0700
  • cdae35b52a
    ollamarunner: Multi-modal worst case graph Jesse Gross 2025-04-07 13:59:11 -0700
  • e54f602a15
    ollamarunner: Separate text and multimodal graphs Jesse Gross 2025-05-05 13:32:11 -0700
  • 8c75fb33d1
    ollamarunner: Base cached tokens on current prompt Jesse Gross 2025-05-09 16:51:47 -0700
  • 4e77815773
    fix pixel values padding (#10718) Michael Yang 2025-05-15 13:44:44 -0700
  • d507f23b0d
    fix mllama conversion (#10716) Michael Yang 2025-05-15 12:15:01 -0700
  • c38d583c99
    ggml: update qwen25vl vision size estimate (#10711) Bruce MacDonald 2025-05-14 16:42:30 -0700
  • a017e78f35
    fix crash in old clients with quantization progress (#10710) Daniel Hiltgen 2025-05-14 14:54:18 -0700
  • 558b0f5fe9
    model: add Qwen2.5-VL support (#10385) Bruce MacDonald 2025-05-13 20:58:02 -0700
  • 4d12503049
    chore: update mllama to use ollama engine (#10637) Michael Yang 2025-05-13 17:36:02 -0700
  • 783739ee9f
    Fixed over vram allcation dure to small initial layer sizes. tej 2025-05-13 18:42:39 -0500
  • d0ed25bde8
    llama: fix memory leak for grammar (#10696) Parth Sareen 2025-05-13 15:39:27 -0700
  • 24118aa1db
    llama: fix defrag patch to defragment when no slots are available (#10695) Jeffrey Morgan 2025-05-13 14:02:08 -0700
  • d344573e5b
    Revert "remove cuda v11 (#10569)" (#10692) Daniel Hiltgen 2025-05-13 13:12:54 -0700
  • 3f2b7658af
    llama: fix crash on snowflake embedding model (#10690) Jeffrey Morgan 2025-05-13 13:11:11 -0700
  • 595b683ffb
    server: add webp image input support (#10653) Jeffrey Morgan 2025-05-12 20:41:42 -0700
  • b9c7aed5ce
    fix vocabulary (#10679) Michael Yang 2025-05-12 17:29:46 -0700
  • f1c017735b
    models: remove unused qwen2vl processing (#10677) Bruce MacDonald 2025-05-12 16:08:42 -0700
  • 0132148534
    Follow up to #10363 (#10647) Daniel Hiltgen 2025-05-12 15:23:31 -0700
  • 9163ed39d1
    llama: update to commit de4c07f93 (#10655) Jeffrey Morgan 2025-05-12 12:17:26 -0700
  • 5b54f682ed
    convert: quantize from safetensors needs kv (#10675) Bruce MacDonald 2025-05-12 12:04:20 -0700
  • 7085a3f89b
    feat: add trace log level (#10650) Michael Yang 2025-05-12 11:43:00 -0700
  • d69f623dd6
    readme: add UnityCodeLama to community integrations (#10665) HardCodeDev 2025-05-12 00:44:51 +0400
  • 87ad1fe2d2
    readme: add OllamaPlusPlus C++ library to community integrations (#10664) HardCodeDev 2025-05-12 00:40:41 +0400
  • 1791b68cc2
    llama: allocate grammar buffer based on schema length (#10649) frob 2025-05-10 20:57:30 +0200
  • 6faf548d3a
    envconfig: Remove no longer supported max vram var (#10623) frob 2025-05-10 20:31:04 +0200
  • d9cf336ade
    feat: add threshold to dump options (#10639) Michael Yang 2025-05-10 11:27:15 -0700
  • 69446104a8
    readme: add ojira to community integrations (#10648) AliAhmedNada 2025-05-10 20:36:40 +0300