Compare commits
base: pali112:v0.1.18
pali112:main
pali112:implement-anthropic-api
pali112:hoyyeva/upgrade-config
pali112:parth/agents
pali112:parth/add-models-websearch
pali112:parth/prompt-renderer-mcp
pali112:jmorganca/native-settings
pali112:jmorganca/download-stream-hash
pali112:jmorganca/client2-rebased
pali112:brucemacd/oai-chat-req-multipart
pali112:jessegross/multi_chunk_reserve
pali112:grace/additional-omit-empty
pali112:grace/mistral-3-large
pali112:mxyng/tokenizer2
pali112:mxyng/tokenizer
pali112:jessegross/flash
pali112:hoyyeva/windows-nacked-app
pali112:mxyng/cleanup-attention
pali112:grace/deepseek-parser
pali112:hoyyeva/remember-unsent-prompt
pali112:parth/add-lfs-pointer-error-conversion
pali112:parth/olmo2-test2
pali112:hoyyeva/ollama-launchagent-plist
pali112:nicole/olmo-model
pali112:parth/olmo-test
pali112:mxyng/remove-embedded
pali112:parth/render-template
pali112:jmorganca/intellect-3
pali112:parth/remove-prealloc-linter
pali112:jmorganca/cmd-eval
pali112:nicole/nomic-embed-text-fix
pali112:mxyng/lint-2
pali112:hoyyeva/add-gemini-3-pro-preview
pali112:hoyyeva/load-model-list
pali112:mxyng/expand-path
pali112:mxyng/environ-2
pali112:hoyyeva/deeplink-json-encoding
pali112:parth/improve-tool-calling-tests
pali112:hoyyeva/conversation
pali112:hoyyeva/assistant-edit-response
pali112:hoyyeva/thinking
pali112:origin/brucemacd/invalid-char-i-err
pali112:parth/improve-tool-calling
pali112:jmorganca/required-omitempty
pali112:grace/qwen3-vl-tests
pali112:mxyng/iter-client
pali112:parth/docs-readme
pali112:nicole/embed-test
pali112:pdevine/integration-benchstat
pali112:parth/remove-generate-cmd
pali112:parth/add-toolcall-id
pali112:mxyng/server-tests
pali112:jmorganca/glm-4.6
pali112:jmorganca/gin-h-compat
pali112:drifkin/stable-tool-args
pali112:pdevine/qwen3-more-thinking
pali112:parth/add-websearch-client
pali112:nicole/websearch_local
pali112:jmorganca/qwen3-coder-updates
pali112:grace/deepseek-v3-migration-tests
pali112:mxyng/fix-create
pali112:jmorganca/cloud-errors
pali112:pdevine/parser-tidy
pali112:revert-12233-parth/simplify-entrypoints-runner
pali112:parth/enable-so-gpt-oss
pali112:brucemacd/qwen3vl
pali112:jmorganca/readme-simplify
pali112:parth/gpt-oss-structured-outputs
pali112:revert-12039-jmorganca/tools-braces
pali112:mxyng/embeddings
pali112:mxyng/gguf
pali112:mxyng/benchmark
pali112:mxyng/types-null
pali112:parth/move-parsing
pali112:mxyng/gemma2
pali112:jmorganca/docs
pali112:mxyng/16-bit
pali112:mxyng/create-stdin
pali112:pdevine/authorizedkeys
pali112:mxyng/quant
pali112:parth/opt-in-error-context-window
pali112:brucemacd/cache-models
pali112:brucemacd/runner-completion
pali112:jmorganca/llama-update-6
pali112:brucemacd/benchmark-list
pali112:brucemacd/partial-read-caps
pali112:parth/deepseek-r1-tools
pali112:mxyng/omit-array
pali112:parth/tool-prefix-temp
pali112:brucemacd/runner-test
pali112:jmorganca/qwen25vl
pali112:brucemacd/model-forward-test-ext
pali112:parth/python-function-parsing
pali112:jmorganca/cuda-compression-none
pali112:drifkin/num-parallel
pali112:drifkin/chat-truncation-fix
pali112:jmorganca/sync
pali112:parth/python-tools-calling
pali112:drifkin/array-head-count
pali112:brucemacd/create-no-loop
pali112:parth/server-enable-content-stream-with-tools
pali112:qwen25omni
pali112:mxyng/v3
pali112:brucemacd/ropeconfig
pali112:jmorganca/silence-tokenizer
pali112:parth/sample-so-test
pali112:parth/sampling-structured-outputs
pali112:brucemacd/doc-go-engine
pali112:parth/constrained-sampling-json
pali112:jmorganca/mistral-wip
pali112:brucemacd/mistral-small-convert
pali112:parth/sample-unmarshal-json-for-params
pali112:brucemacd/jomorganca/mistral
pali112:pdevine/bfloat16
pali112:jmorganca/mistral
pali112:brucemacd/mistral
pali112:pdevine/logging
pali112:parth/sample-correctness-fix
pali112:parth/sample-fix-sorting
pali112:jmorgan/sample-fix-sorting-extras
pali112:jmorganca/temp-0-images
pali112:brucemacd/parallel-embed-models
pali112:brucemacd/shim-grammar
pali112:jmorganca/fix-gguf-error
pali112:bmizerany/nameswork
pali112:jmorganca/faster-releases
pali112:bmizerany/validatenames
pali112:brucemacd/err-no-vocab
pali112:brucemacd/rope-config
pali112:brucemacd/err-hint
pali112:brucemacd/qwen2_5
pali112:brucemacd/logprobs
pali112:brucemacd/new_runner_graph_bench
pali112:progress-flicker
pali112:brucemacd/forward-test
pali112:brucemacd/go_qwen2
pali112:pdevine/gemma2
pali112:jmorganca/add-missing-symlink-eval
pali112:mxyng/next-debug
pali112:parth/set-context-size-openai
pali112:brucemacd/next-bpe-bench
pali112:brucemacd/next-bpe-test
pali112:brucemacd/new_runner_e2e
pali112:brucemacd/new_runner_qwen2
pali112:pdevine/convert-cohere2
pali112:brucemacd/convert-cli
pali112:parth/log-probs
pali112:mxyng/next-mlx
pali112:mxyng/cmd-history
pali112:parth/templating
pali112:parth/tokenize-detokenize
pali112:brucemacd/check-key-register
pali112:bmizerany/grammar
pali112:jmorganca/vendor-081b29bd
pali112:mxyng/func-checks
pali112:jmorganca/fix-null-format
pali112:parth/fix-default-to-warn-json
pali112:jmorganca/qwen2vl
pali112:jmorganca/no-concat
pali112:parth/cmd-cleanup-SO
pali112:brucemacd/check-key-register-structured-err
pali112:parth/openai-stream-usage
pali112:parth/fix-referencing-so
pali112:stream-tools-stop
pali112:jmorganca/degin-1
pali112:brucemacd/install-path-clean
pali112:brucemacd/push-name-validation
pali112:brucemacd/browser-key-register
pali112:jmorganca/openai-fix-first-message
pali112:jmorganca/fix-proxy
pali112:jessegross/sample
pali112:parth/disallow-streaming-tools
pali112:dhiltgen/remove_submodule
pali112:jmorganca/ga
pali112:jmorganca/mllama
pali112:pdevine/newlines
pali112:pdevine/geems-2b
pali112:jmorganca/llama-bump
pali112:mxyng/modelname-7
pali112:mxyng/gin-slog
pali112:mxyng/modelname-6
pali112:jyan/convert-prog
pali112:jyan/quant5
pali112:paligemma-support
pali112:pdevine/import-docs
pali112:jmorganca/openai-context
pali112:jyan/paligemma
pali112:jyan/p2
pali112:jyan/palitest
pali112:bmizerany/embedspeedup
pali112:jmorganca/llama-vit
pali112:brucemacd/allow-ollama
pali112:royh/ep-methods
pali112:royh/whisper
pali112:mxyng/api-models
pali112:mxyng/fix-memory
pali112:jyan/q4_4/8
pali112:jyan/ollama-v
pali112:royh/stream-tools
pali112:roy-embed-parallel
pali112:bmizerany/hrm
pali112:revert-5963-revert-5924-mxyng/llama3.1-rope
pali112:royh/embed-viz
pali112:jyan/local2
pali112:jyan/auth
pali112:jyan/local
pali112:jyan/parse-temp
pali112:jmorganca/template-mistral
pali112:jyan/reord-g
pali112:royh-openai-suffixdocs
pali112:royh-imgembed
pali112:royh-embed-parallel
pali112:jyan/quant4
pali112:royh-precision
pali112:jyan/progress
pali112:pdevine/fix-template
pali112:jyan/quant3
pali112:pdevine/ggla
pali112:mxyng/update-registry-domain
pali112:jmorganca/ggml-static
pali112:mxyng/create-context
pali112:jyan/v0.146
pali112:mxyng/layers-from-files
pali112:build_dist
pali112:bmizerany/noseek
pali112:royh-ls
pali112:royh-name
pali112:timeout
pali112:mxyng/server-timestamp
pali112:bmizerany/nosillyggufslurps
pali112:royh-params
pali112:jmorganca/llama-cpp-7c26775
pali112:royh-openai-delete
pali112:royh-show-rigid
pali112:jmorganca/enable-fa
pali112:jmorganca/no-error-template
pali112:jyan/format
pali112:royh-testdelete
pali112:bmizerany/fastverify
pali112:language_support
pali112:pdevine/ps-glitches
pali112:brucemacd/tokenize
pali112:bruce/iq-quants
pali112:bmizerany/filepathwithcoloninhost
pali112:mxyng/split-bin
pali112:bmizerany/client-registry
pali112:jmorganca/if-none-match
pali112:native
pali112:jmorganca/native
pali112:jmorganca/batch-embeddings
pali112:jmorganca/initcmake
pali112:jmorganca/mm
pali112:pdevine/showggmlinfo
pali112:modenameenforcealphanum
pali112:bmizerany/modenameenforcealphanum
pali112:jmorganca/done-reason
pali112:jmorganca/llama-cpp-8960fe8
pali112:ollama.com
pali112:bmizerany/filepathnobuild
pali112:bmizerany/types/model/defaultfix
pali112:rmdisplaylong
pali112:nogogen
pali112:bmizerany/x
pali112:modelfile-readme
pali112:bmizerany/replacecolon
pali112:jmorganca/limit
pali112:jmorganca/execstack
pali112:jmorganca/replace-assets
pali112:mxyng/tune-concurrency
pali112:jmorganca/testing
pali112:whitespace-detection
pali112:jmorganca/options
pali112:upgrade-all
pali112:scratch
pali112:cuda-search
pali112:mattw/airenamer
pali112:mattw/allmodelsonhuggingface
pali112:mattw/quantcontext
pali112:mattw/whatneedstorun
pali112:brucemacd/llama-mem-calc
pali112:mattw/faq-context
pali112:mattw/communitylinks
pali112:mattw/noprune
pali112:mattw/python-functioncalling
pali112:rename
pali112:mxyng/install
pali112:pulse
pali112:remove-first
pali112:editor
pali112:mattw/selfqueryingretrieval
pali112:cgo
pali112:mattw/howtoquant
pali112:api
pali112:matt/streamingapi
pali112:format-config
pali112:mxyng/extra-args
pali112:shell
pali112:update-nous-hermes
pali112:cp-model
pali112:upload-progress
pali112:fix-unknown-model
pali112:fix-model-names
pali112:delete-fix
pali112:insecure-registry
pali112:ls
pali112:deletemodels
pali112:progressbar
pali112:readme-updates
pali112:license-layers
pali112:skip-list
pali112:list-models
pali112:modelpath
pali112:matt/examplemodelfiles
pali112:distribution
pali112:go-opts
pali112:v0.13.5-rc1
pali112:v0.13.5
pali112:v0.13.5-rc0
pali112:v0.13.4-rc2
pali112:v0.13.4
pali112:v0.13.4-rc1
pali112:v0.13.4-rc0
pali112:v0.13.3
pali112:v0.13.3-rc1
pali112:v0.13.3-rc0
pali112:v0.13.2
pali112:v0.13.2-rc2
pali112:v0.13.2-rc1
pali112:v0.13.2-rc0
pali112:v0.13.1
pali112:v0.13.1-rc2
pali112:v0.13.1-rc1
pali112:v0.13.1-rc0
pali112:v0.13.0-rc0
pali112:v0.13.0
pali112:v0.12.11
pali112:v0.12.11-rc1
pali112:v0.12.11-rc0
pali112:v0.12.10-rc1
pali112:v0.12.10
pali112:v0.12.10-rc0
pali112:v0.12.9-rc0
pali112:v0.12.9
pali112:v0.12.8-rc0
pali112:v0.12.8
pali112:v0.12.7
pali112:v0.12.7-rc1
pali112:v0.12.7-rc0
pali112:v0.12.6
pali112:v0.12.6-rc1
pali112:v0.12.6-rc0
pali112:v0.12.5-rc0
pali112:v0.12.5
pali112:v0.12.4
pali112:v0.12.4-rc7
pali112:v0.12.4-rc6
pali112:v0.12.4-rc5
pali112:v0.12.4-rc4
pali112:v0.12.4-rc3
pali112:v0.12.4-rc2
pali112:v0.12.4-rc1
pali112:v0.12.4-rc0
pali112:v0.12.3
pali112:v0.12.2-rc0
pali112:v0.12.2
pali112:v0.12.1-rc2
pali112:v0.12.1-rc1
pali112:v0.12.1
pali112:v0.12.1-rc0
pali112:v0.12.0-rc1
pali112:v0.12.0
pali112:v0.12.0-rc0
pali112:v0.11.11-rc3
pali112:v0.11.11-rc2
pali112:v0.11.11
pali112:v0.11.11-rc1
pali112:v0.11.11-rc0
pali112:v0.11.10
pali112:v0.11.9
pali112:v0.11.9-rc0
pali112:v0.11.8
pali112:v0.11.8-rc0
pali112:v0.11.7-rc1
pali112:v0.11.7-rc0
pali112:v0.11.7
pali112:v0.11.6-rc0
pali112:v0.11.6
pali112:v0.11.5-rc5
pali112:v0.11.5-rc4
pali112:v0.11.5-rc3
pali112:v0.11.5
pali112:v0.11.5-rc2
pali112:v0.11.5-rc1
pali112:v0.11.5-rc0
pali112:v0.11.4
pali112:v0.11.4-rc0
pali112:v0.11.3
pali112:v0.11.3-rc0
pali112:v0.11.2
pali112:v0.11.1
pali112:v0.11.0
pali112:v0.10.1
pali112:v0.10.0
pali112:v0.10.0-rc4
pali112:v0.10.0-rc3
pali112:v0.10.0-rc2
pali112:v0.10.0-rc1
pali112:v0.10.0-rc0
pali112:v0.9.7-rc1
pali112:v0.9.7-rc0
pali112:v0.9.6-rc0
pali112:v0.9.6
pali112:v0.9.5
pali112:v0.9.4-rc6
pali112:v0.9.4-rc5
pali112:v0.9.4-rc4
pali112:v0.9.4-rc3
pali112:v0.9.4
pali112:v0.9.4-rc2
pali112:v0.9.4-rc1
pali112:v0.9.4-rc0
pali112:v0.9.3
pali112:v0.9.3-rc5
pali112:v0.9.4-citest0
pali112:v0.9.3-rc4
pali112:v0.9.3-rc3
pali112:v0.9.3-rc2
pali112:v0.9.3-rc1
pali112:v0.9.3-rc0
pali112:v0.9.2
pali112:v0.9.1
pali112:v0.9.1-rc1
pali112:v0.9.1-rc0
pali112:v0.9.0-rc0
pali112:v0.9.0
pali112:v0.8.0
pali112:v0.8.0-rc0
pali112:v0.7.1-rc2
pali112:v0.7.1
pali112:v0.7.1-rc1
pali112:v0.7.1-rc0
pali112:v0.7.0
pali112:v0.7.0-rc1
pali112:v0.7.0-rc0
pali112:v0.6.8-rc0
pali112:v0.6.8
pali112:v0.6.7
pali112:v0.6.7-rc2
pali112:v0.6.7-rc1
pali112:v0.6.7-rc0
pali112:v0.6.6
pali112:v0.6.6-rc2
pali112:v0.6.6-rc1
pali112:v0.6.6-rc0
pali112:v0.6.5-rc1
pali112:v0.6.5
pali112:v0.6.5-rc0
pali112:v0.6.4-rc0
pali112:v0.6.4
pali112:v0.6.3-rc1
pali112:v0.6.3
pali112:v0.6.3-rc0
pali112:v0.6.2-rc0
pali112:v0.6.2
pali112:v0.6.1
pali112:v0.6.1-rc0
pali112:v0.6.0-rc0
pali112:v0.6.0
pali112:v0.5.13
pali112:v0.5.13-rc6
pali112:v0.5.13-rc5
pali112:v0.5.13-rc4
pali112:v0.5.13-rc3
pali112:v0.5.13-rc2
pali112:v0.5.13-rc1
pali112:v0.5.13-rc0
pali112:v0.5.12
pali112:v0.5.12-rc1
pali112:v0.5.12-rc0
pali112:v0.5.11
pali112:v0.5.10
pali112:v0.5.9
pali112:v0.5.9-rc0
pali112:v0.5.8-rc13
pali112:v0.5.8
pali112:v0.5.8-rc12
pali112:v0.5.8-rc11
pali112:v0.5.8-rc10
pali112:v0.5.8-rc9
pali112:v0.5.8-rc8
pali112:v0.5.8-rc7
pali112:v0.5.8-rc6
pali112:v0.5.8-rc5
pali112:v0.5.8-rc4
pali112:v0.5.8-rc3
pali112:v0.5.8-rc2
pali112:v0.5.8-rc1
pali112:v0.5.8-rc0
pali112:v0.5.7
pali112:v0.5.6
pali112:v0.5.5
pali112:v0.5.5-rc0
pali112:v0.5.4
pali112:v0.5.3
pali112:v0.5.3-rc0
pali112:v0.5.2
pali112:v0.5.2-rc3
pali112:v0.5.2-rc2
pali112:v0.5.2-rc1
pali112:v0.5.2-rc0
pali112:v0.5.1
pali112:v0.5.0-rc1
pali112:v0.5.0
pali112:v0.4.8-rc0
pali112:v0.4.7
pali112:v0.4.6
pali112:v0.4.5
pali112:v0.4.4
pali112:v0.4.3
pali112:v0.4.3-rc0
pali112:v0.4.2
pali112:v0.4.2-rc1
pali112:v0.4.2-rc0
pali112:v0.4.1-rc0
pali112:v0.4.1
pali112:v0.4.0
pali112:v0.4.0-rc8
pali112:v0.4.0-rc7
pali112:v0.4.0-rc6
pali112:v0.4.0-rc5
pali112:v0.4.0-rc4
pali112:v0.4.0-rc3
pali112:v0.4.0-rc2
pali112:v0.4.0-rc1
pali112:v0.4.0-rc0
pali112:v0.4.0-ci3
pali112:v0.3.14-rc0
pali112:v0.3.14
pali112:v0.3.13
pali112:v0.3.12
pali112:v0.3.12-rc5
pali112:v0.3.12-rc4
pali112:v0.3.12-rc3
pali112:v0.3.12-rc2
pali112:v0.3.12-rc1
pali112:v0.3.11
pali112:v0.3.11-rc4
pali112:v0.3.11-rc3
pali112:v0.3.11-rc2
pali112:v0.3.11-rc1
pali112:v0.3.10
pali112:v0.3.10-rc1
pali112:v0.3.9
pali112:v0.3.8
pali112:v0.3.7
pali112:v0.3.7-rc6
pali112:v0.3.7-rc5
pali112:v0.3.7-rc4
pali112:v0.3.7-rc3
pali112:v0.3.7-rc2
pali112:v0.3.7-rc1
pali112:v0.3.6
pali112:v0.3.5
pali112:v0.3.4
pali112:v0.3.3
pali112:v0.3.2
pali112:v0.3.1
pali112:v0.3.0
pali112:v0.2.8
pali112:v0.2.8-rc2
pali112:v0.2.8-rc1
pali112:v0.2.7
pali112:v0.2.6
pali112:v0.2.5
pali112:v0.2.4
pali112:v0.2.3
pali112:v0.2.2
pali112:v0.2.2-rc2
pali112:v0.2.2-rc1
pali112:v0.2.1
pali112:v0.2.0
pali112:v0.1.49-rc14
pali112:v0.1.49-rc13
pali112:v0.1.49-rc12
pali112:v0.1.49-rc11
pali112:v0.1.49-rc10
pali112:v0.1.49-rc9
pali112:v0.1.49-rc8
pali112:v0.1.49-rc7
pali112:v0.1.49-rc6
pali112:v0.1.49-rc5
pali112:v0.1.49-rc4
pali112:v0.1.49-rc3
pali112:v0.1.49-rc2
pali112:v0.1.49-rc1
pali112:v0.1.48
pali112:v0.1.47
pali112:v0.1.46
pali112:v0.1.45-rc5
pali112:v0.1.45
pali112:v0.1.45-rc4
pali112:v0.1.45-rc3
pali112:v0.1.45-rc2
pali112:v0.1.45-rc1
pali112:v0.1.44
pali112:v0.1.43
pali112:v0.1.42
pali112:v0.1.41
pali112:v0.1.40
pali112:v0.1.40-rc1
pali112:v0.1.39
pali112:v0.1.39-rc2
pali112:v0.1.39-rc1
pali112:v0.1.38
pali112:v0.1.37
pali112:v0.1.36
pali112:v0.1.35
pali112:v0.1.35-rc1
pali112:v0.1.34
pali112:v0.1.34-rc1
pali112:v0.1.33
pali112:v0.1.33-rc7
pali112:v0.1.33-rc6
pali112:v0.1.33-rc5
pali112:v0.1.33-rc4
pali112:v0.1.33-rc3
pali112:v0.1.33-rc2
pali112:v0.1.33-rc1
pali112:v0.1.32
pali112:v0.1.32-rc2
pali112:v0.1.32-rc1
pali112:v0.1.31
pali112:v0.1.30
pali112:v0.1.29
pali112:v0.1.28
pali112:v0.1.27
pali112:v0.1.26
pali112:v0.1.25
pali112:v0.1.24
pali112:v0.1.23
pali112:v0.1.22
pali112:v0.1.21
pali112:v0.1.20
pali112:v0.1.19
pali112:v0.1.18
pali112:v0.1.17
pali112:v0.1.16
pali112:v0.1.15
pali112:v0.1.14
pali112:v0.1.13
pali112:v0.1.12
pali112:v0.1.11
pali112:v0.1.10
pali112:v0.1.9
pali112:v0.1.8
pali112:v0.1.7
pali112:v0.1.6
pali112:v0.1.5
pali112:v0.1.4
pali112:v0.1.3
pali112:v0.1.2
pali112:v0.1.1
pali112:v0.1.0
pali112:v0.0.21
pali112:v0.0.20
pali112:v0.0.19
pali112:v0.0.18
pali112:v0.0.17
pali112:v0.0.16
pali112:v0.0.15
pali112:v0.0.14
pali112:v0.0.13
pali112:v0.0.12
pali112:v0.0.11
pali112:v0.0.10
pali112:v0.0.9
pali112:v0.0.8
pali112:v0.0.7
pali112:v0.0.6
pali112:v0.0.5
pali112:v0.0.4
pali112:v0.0.3
pali112:v0.0.2
pali112:v0.0.1
...
compare: pali112:mattw/quantcontext
pali112:main
pali112:implement-anthropic-api
pali112:hoyyeva/upgrade-config
pali112:parth/agents
pali112:parth/add-models-websearch
pali112:parth/prompt-renderer-mcp
pali112:jmorganca/native-settings
pali112:jmorganca/download-stream-hash
pali112:jmorganca/client2-rebased
pali112:brucemacd/oai-chat-req-multipart
pali112:jessegross/multi_chunk_reserve
pali112:grace/additional-omit-empty
pali112:grace/mistral-3-large
pali112:mxyng/tokenizer2
pali112:mxyng/tokenizer
pali112:jessegross/flash
pali112:hoyyeva/windows-nacked-app
pali112:mxyng/cleanup-attention
pali112:grace/deepseek-parser
pali112:hoyyeva/remember-unsent-prompt
pali112:parth/add-lfs-pointer-error-conversion
pali112:parth/olmo2-test2
pali112:hoyyeva/ollama-launchagent-plist
pali112:nicole/olmo-model
pali112:parth/olmo-test
pali112:mxyng/remove-embedded
pali112:parth/render-template
pali112:jmorganca/intellect-3
pali112:parth/remove-prealloc-linter
pali112:jmorganca/cmd-eval
pali112:nicole/nomic-embed-text-fix
pali112:mxyng/lint-2
pali112:hoyyeva/add-gemini-3-pro-preview
pali112:hoyyeva/load-model-list
pali112:mxyng/expand-path
pali112:mxyng/environ-2
pali112:hoyyeva/deeplink-json-encoding
pali112:parth/improve-tool-calling-tests
pali112:hoyyeva/conversation
pali112:hoyyeva/assistant-edit-response
pali112:hoyyeva/thinking
pali112:origin/brucemacd/invalid-char-i-err
pali112:parth/improve-tool-calling
pali112:jmorganca/required-omitempty
pali112:grace/qwen3-vl-tests
pali112:mxyng/iter-client
pali112:parth/docs-readme
pali112:nicole/embed-test
pali112:pdevine/integration-benchstat
pali112:parth/remove-generate-cmd
pali112:parth/add-toolcall-id
pali112:mxyng/server-tests
pali112:jmorganca/glm-4.6
pali112:jmorganca/gin-h-compat
pali112:drifkin/stable-tool-args
pali112:pdevine/qwen3-more-thinking
pali112:parth/add-websearch-client
pali112:nicole/websearch_local
pali112:jmorganca/qwen3-coder-updates
pali112:grace/deepseek-v3-migration-tests
pali112:mxyng/fix-create
pali112:jmorganca/cloud-errors
pali112:pdevine/parser-tidy
pali112:revert-12233-parth/simplify-entrypoints-runner
pali112:parth/enable-so-gpt-oss
pali112:brucemacd/qwen3vl
pali112:jmorganca/readme-simplify
pali112:parth/gpt-oss-structured-outputs
pali112:revert-12039-jmorganca/tools-braces
pali112:mxyng/embeddings
pali112:mxyng/gguf
pali112:mxyng/benchmark
pali112:mxyng/types-null
pali112:parth/move-parsing
pali112:mxyng/gemma2
pali112:jmorganca/docs
pali112:mxyng/16-bit
pali112:mxyng/create-stdin
pali112:pdevine/authorizedkeys
pali112:mxyng/quant
pali112:parth/opt-in-error-context-window
pali112:brucemacd/cache-models
pali112:brucemacd/runner-completion
pali112:jmorganca/llama-update-6
pali112:brucemacd/benchmark-list
pali112:brucemacd/partial-read-caps
pali112:parth/deepseek-r1-tools
pali112:mxyng/omit-array
pali112:parth/tool-prefix-temp
pali112:brucemacd/runner-test
pali112:jmorganca/qwen25vl
pali112:brucemacd/model-forward-test-ext
pali112:parth/python-function-parsing
pali112:jmorganca/cuda-compression-none
pali112:drifkin/num-parallel
pali112:drifkin/chat-truncation-fix
pali112:jmorganca/sync
pali112:parth/python-tools-calling
pali112:drifkin/array-head-count
pali112:brucemacd/create-no-loop
pali112:parth/server-enable-content-stream-with-tools
pali112:qwen25omni
pali112:mxyng/v3
pali112:brucemacd/ropeconfig
pali112:jmorganca/silence-tokenizer
pali112:parth/sample-so-test
pali112:parth/sampling-structured-outputs
pali112:brucemacd/doc-go-engine
pali112:parth/constrained-sampling-json
pali112:jmorganca/mistral-wip
pali112:brucemacd/mistral-small-convert
pali112:parth/sample-unmarshal-json-for-params
pali112:brucemacd/jomorganca/mistral
pali112:pdevine/bfloat16
pali112:jmorganca/mistral
pali112:brucemacd/mistral
pali112:pdevine/logging
pali112:parth/sample-correctness-fix
pali112:parth/sample-fix-sorting
pali112:jmorgan/sample-fix-sorting-extras
pali112:jmorganca/temp-0-images
pali112:brucemacd/parallel-embed-models
pali112:brucemacd/shim-grammar
pali112:jmorganca/fix-gguf-error
pali112:bmizerany/nameswork
pali112:jmorganca/faster-releases
pali112:bmizerany/validatenames
pali112:brucemacd/err-no-vocab
pali112:brucemacd/rope-config
pali112:brucemacd/err-hint
pali112:brucemacd/qwen2_5
pali112:brucemacd/logprobs
pali112:brucemacd/new_runner_graph_bench
pali112:progress-flicker
pali112:brucemacd/forward-test
pali112:brucemacd/go_qwen2
pali112:pdevine/gemma2
pali112:jmorganca/add-missing-symlink-eval
pali112:mxyng/next-debug
pali112:parth/set-context-size-openai
pali112:brucemacd/next-bpe-bench
pali112:brucemacd/next-bpe-test
pali112:brucemacd/new_runner_e2e
pali112:brucemacd/new_runner_qwen2
pali112:pdevine/convert-cohere2
pali112:brucemacd/convert-cli
pali112:parth/log-probs
pali112:mxyng/next-mlx
pali112:mxyng/cmd-history
pali112:parth/templating
pali112:parth/tokenize-detokenize
pali112:brucemacd/check-key-register
pali112:bmizerany/grammar
pali112:jmorganca/vendor-081b29bd
pali112:mxyng/func-checks
pali112:jmorganca/fix-null-format
pali112:parth/fix-default-to-warn-json
pali112:jmorganca/qwen2vl
pali112:jmorganca/no-concat
pali112:parth/cmd-cleanup-SO
pali112:brucemacd/check-key-register-structured-err
pali112:parth/openai-stream-usage
pali112:parth/fix-referencing-so
pali112:stream-tools-stop
pali112:jmorganca/degin-1
pali112:brucemacd/install-path-clean
pali112:brucemacd/push-name-validation
pali112:brucemacd/browser-key-register
pali112:jmorganca/openai-fix-first-message
pali112:jmorganca/fix-proxy
pali112:jessegross/sample
pali112:parth/disallow-streaming-tools
pali112:dhiltgen/remove_submodule
pali112:jmorganca/ga
pali112:jmorganca/mllama
pali112:pdevine/newlines
pali112:pdevine/geems-2b
pali112:jmorganca/llama-bump
pali112:mxyng/modelname-7
pali112:mxyng/gin-slog
pali112:mxyng/modelname-6
pali112:jyan/convert-prog
pali112:jyan/quant5
pali112:paligemma-support
pali112:pdevine/import-docs
pali112:jmorganca/openai-context
pali112:jyan/paligemma
pali112:jyan/p2
pali112:jyan/palitest
pali112:bmizerany/embedspeedup
pali112:jmorganca/llama-vit
pali112:brucemacd/allow-ollama
pali112:royh/ep-methods
pali112:royh/whisper
pali112:mxyng/api-models
pali112:mxyng/fix-memory
pali112:jyan/q4_4/8
pali112:jyan/ollama-v
pali112:royh/stream-tools
pali112:roy-embed-parallel
pali112:bmizerany/hrm
pali112:revert-5963-revert-5924-mxyng/llama3.1-rope
pali112:royh/embed-viz
pali112:jyan/local2
pali112:jyan/auth
pali112:jyan/local
pali112:jyan/parse-temp
pali112:jmorganca/template-mistral
pali112:jyan/reord-g
pali112:royh-openai-suffixdocs
pali112:royh-imgembed
pali112:royh-embed-parallel
pali112:jyan/quant4
pali112:royh-precision
pali112:jyan/progress
pali112:pdevine/fix-template
pali112:jyan/quant3
pali112:pdevine/ggla
pali112:mxyng/update-registry-domain
pali112:jmorganca/ggml-static
pali112:mxyng/create-context
pali112:jyan/v0.146
pali112:mxyng/layers-from-files
pali112:build_dist
pali112:bmizerany/noseek
pali112:royh-ls
pali112:royh-name
pali112:timeout
pali112:mxyng/server-timestamp
pali112:bmizerany/nosillyggufslurps
pali112:royh-params
pali112:jmorganca/llama-cpp-7c26775
pali112:royh-openai-delete
pali112:royh-show-rigid
pali112:jmorganca/enable-fa
pali112:jmorganca/no-error-template
pali112:jyan/format
pali112:royh-testdelete
pali112:bmizerany/fastverify
pali112:language_support
pali112:pdevine/ps-glitches
pali112:brucemacd/tokenize
pali112:bruce/iq-quants
pali112:bmizerany/filepathwithcoloninhost
pali112:mxyng/split-bin
pali112:bmizerany/client-registry
pali112:jmorganca/if-none-match
pali112:native
pali112:jmorganca/native
pali112:jmorganca/batch-embeddings
pali112:jmorganca/initcmake
pali112:jmorganca/mm
pali112:pdevine/showggmlinfo
pali112:modenameenforcealphanum
pali112:bmizerany/modenameenforcealphanum
pali112:jmorganca/done-reason
pali112:jmorganca/llama-cpp-8960fe8
pali112:ollama.com
pali112:bmizerany/filepathnobuild
pali112:bmizerany/types/model/defaultfix
pali112:rmdisplaylong
pali112:nogogen
pali112:bmizerany/x
pali112:modelfile-readme
pali112:bmizerany/replacecolon
pali112:jmorganca/limit
pali112:jmorganca/execstack
pali112:jmorganca/replace-assets
pali112:mxyng/tune-concurrency
pali112:jmorganca/testing
pali112:whitespace-detection
pali112:jmorganca/options
pali112:upgrade-all
pali112:scratch
pali112:cuda-search
pali112:mattw/airenamer
pali112:mattw/allmodelsonhuggingface
pali112:mattw/quantcontext
pali112:mattw/whatneedstorun
pali112:brucemacd/llama-mem-calc
pali112:mattw/faq-context
pali112:mattw/communitylinks
pali112:mattw/noprune
pali112:mattw/python-functioncalling
pali112:rename
pali112:mxyng/install
pali112:pulse
pali112:remove-first
pali112:editor
pali112:mattw/selfqueryingretrieval
pali112:cgo
pali112:mattw/howtoquant
pali112:api
pali112:matt/streamingapi
pali112:format-config
pali112:mxyng/extra-args
pali112:shell
pali112:update-nous-hermes
pali112:cp-model
pali112:upload-progress
pali112:fix-unknown-model
pali112:fix-model-names
pali112:delete-fix
pali112:insecure-registry
pali112:ls
pali112:deletemodels
pali112:progressbar
pali112:readme-updates
pali112:license-layers
pali112:skip-list
pali112:list-models
pali112:modelpath
pali112:matt/examplemodelfiles
pali112:distribution
pali112:go-opts
pali112:v0.13.5-rc1
pali112:v0.13.5
pali112:v0.13.5-rc0
pali112:v0.13.4-rc2
pali112:v0.13.4
pali112:v0.13.4-rc1
pali112:v0.13.4-rc0
pali112:v0.13.3
pali112:v0.13.3-rc1
pali112:v0.13.3-rc0
pali112:v0.13.2
pali112:v0.13.2-rc2
pali112:v0.13.2-rc1
pali112:v0.13.2-rc0
pali112:v0.13.1
pali112:v0.13.1-rc2
pali112:v0.13.1-rc1
pali112:v0.13.1-rc0
pali112:v0.13.0-rc0
pali112:v0.13.0
pali112:v0.12.11
pali112:v0.12.11-rc1
pali112:v0.12.11-rc0
pali112:v0.12.10-rc1
pali112:v0.12.10
pali112:v0.12.10-rc0
pali112:v0.12.9-rc0
pali112:v0.12.9
pali112:v0.12.8-rc0
pali112:v0.12.8
pali112:v0.12.7
pali112:v0.12.7-rc1
pali112:v0.12.7-rc0
pali112:v0.12.6
pali112:v0.12.6-rc1
pali112:v0.12.6-rc0
pali112:v0.12.5-rc0
pali112:v0.12.5
pali112:v0.12.4
pali112:v0.12.4-rc7
pali112:v0.12.4-rc6
pali112:v0.12.4-rc5
pali112:v0.12.4-rc4
pali112:v0.12.4-rc3
pali112:v0.12.4-rc2
pali112:v0.12.4-rc1
pali112:v0.12.4-rc0
pali112:v0.12.3
pali112:v0.12.2-rc0
pali112:v0.12.2
pali112:v0.12.1-rc2
pali112:v0.12.1-rc1
pali112:v0.12.1
pali112:v0.12.1-rc0
pali112:v0.12.0-rc1
pali112:v0.12.0
pali112:v0.12.0-rc0
pali112:v0.11.11-rc3
pali112:v0.11.11-rc2
pali112:v0.11.11
pali112:v0.11.11-rc1
pali112:v0.11.11-rc0
pali112:v0.11.10
pali112:v0.11.9
pali112:v0.11.9-rc0
pali112:v0.11.8
pali112:v0.11.8-rc0
pali112:v0.11.7-rc1
pali112:v0.11.7-rc0
pali112:v0.11.7
pali112:v0.11.6-rc0
pali112:v0.11.6
pali112:v0.11.5-rc5
pali112:v0.11.5-rc4
pali112:v0.11.5-rc3
pali112:v0.11.5
pali112:v0.11.5-rc2
pali112:v0.11.5-rc1
pali112:v0.11.5-rc0
pali112:v0.11.4
pali112:v0.11.4-rc0
pali112:v0.11.3
pali112:v0.11.3-rc0
pali112:v0.11.2
pali112:v0.11.1
pali112:v0.11.0
pali112:v0.10.1
pali112:v0.10.0
pali112:v0.10.0-rc4
pali112:v0.10.0-rc3
pali112:v0.10.0-rc2
pali112:v0.10.0-rc1
pali112:v0.10.0-rc0
pali112:v0.9.7-rc1
pali112:v0.9.7-rc0
pali112:v0.9.6-rc0
pali112:v0.9.6
pali112:v0.9.5
pali112:v0.9.4-rc6
pali112:v0.9.4-rc5
pali112:v0.9.4-rc4
pali112:v0.9.4-rc3
pali112:v0.9.4
pali112:v0.9.4-rc2
pali112:v0.9.4-rc1
pali112:v0.9.4-rc0
pali112:v0.9.3
pali112:v0.9.3-rc5
pali112:v0.9.4-citest0
pali112:v0.9.3-rc4
pali112:v0.9.3-rc3
pali112:v0.9.3-rc2
pali112:v0.9.3-rc1
pali112:v0.9.3-rc0
pali112:v0.9.2
pali112:v0.9.1
pali112:v0.9.1-rc1
pali112:v0.9.1-rc0
pali112:v0.9.0-rc0
pali112:v0.9.0
pali112:v0.8.0
pali112:v0.8.0-rc0
pali112:v0.7.1-rc2
pali112:v0.7.1
pali112:v0.7.1-rc1
pali112:v0.7.1-rc0
pali112:v0.7.0
pali112:v0.7.0-rc1
pali112:v0.7.0-rc0
pali112:v0.6.8-rc0
pali112:v0.6.8
pali112:v0.6.7
pali112:v0.6.7-rc2
pali112:v0.6.7-rc1
pali112:v0.6.7-rc0
pali112:v0.6.6
pali112:v0.6.6-rc2
pali112:v0.6.6-rc1
pali112:v0.6.6-rc0
pali112:v0.6.5-rc1
pali112:v0.6.5
pali112:v0.6.5-rc0
pali112:v0.6.4-rc0
pali112:v0.6.4
pali112:v0.6.3-rc1
pali112:v0.6.3
pali112:v0.6.3-rc0
pali112:v0.6.2-rc0
pali112:v0.6.2
pali112:v0.6.1
pali112:v0.6.1-rc0
pali112:v0.6.0-rc0
pali112:v0.6.0
pali112:v0.5.13
pali112:v0.5.13-rc6
pali112:v0.5.13-rc5
pali112:v0.5.13-rc4
pali112:v0.5.13-rc3
pali112:v0.5.13-rc2
pali112:v0.5.13-rc1
pali112:v0.5.13-rc0
pali112:v0.5.12
pali112:v0.5.12-rc1
pali112:v0.5.12-rc0
pali112:v0.5.11
pali112:v0.5.10
pali112:v0.5.9
pali112:v0.5.9-rc0
pali112:v0.5.8-rc13
pali112:v0.5.8
pali112:v0.5.8-rc12
pali112:v0.5.8-rc11
pali112:v0.5.8-rc10
pali112:v0.5.8-rc9
pali112:v0.5.8-rc8
pali112:v0.5.8-rc7
pali112:v0.5.8-rc6
pali112:v0.5.8-rc5
pali112:v0.5.8-rc4
pali112:v0.5.8-rc3
pali112:v0.5.8-rc2
pali112:v0.5.8-rc1
pali112:v0.5.8-rc0
pali112:v0.5.7
pali112:v0.5.6
pali112:v0.5.5
pali112:v0.5.5-rc0
pali112:v0.5.4
pali112:v0.5.3
pali112:v0.5.3-rc0
pali112:v0.5.2
pali112:v0.5.2-rc3
pali112:v0.5.2-rc2
pali112:v0.5.2-rc1
pali112:v0.5.2-rc0
pali112:v0.5.1
pali112:v0.5.0-rc1
pali112:v0.5.0
pali112:v0.4.8-rc0
pali112:v0.4.7
pali112:v0.4.6
pali112:v0.4.5
pali112:v0.4.4
pali112:v0.4.3
pali112:v0.4.3-rc0
pali112:v0.4.2
pali112:v0.4.2-rc1
pali112:v0.4.2-rc0
pali112:v0.4.1-rc0
pali112:v0.4.1
pali112:v0.4.0
pali112:v0.4.0-rc8
pali112:v0.4.0-rc7
pali112:v0.4.0-rc6
pali112:v0.4.0-rc5
pali112:v0.4.0-rc4
pali112:v0.4.0-rc3
pali112:v0.4.0-rc2
pali112:v0.4.0-rc1
pali112:v0.4.0-rc0
pali112:v0.4.0-ci3
pali112:v0.3.14-rc0
pali112:v0.3.14
pali112:v0.3.13
pali112:v0.3.12
pali112:v0.3.12-rc5
pali112:v0.3.12-rc4
pali112:v0.3.12-rc3
pali112:v0.3.12-rc2
pali112:v0.3.12-rc1
pali112:v0.3.11
pali112:v0.3.11-rc4
pali112:v0.3.11-rc3
pali112:v0.3.11-rc2
pali112:v0.3.11-rc1
pali112:v0.3.10
pali112:v0.3.10-rc1
pali112:v0.3.9
pali112:v0.3.8
pali112:v0.3.7
pali112:v0.3.7-rc6
pali112:v0.3.7-rc5
pali112:v0.3.7-rc4
pali112:v0.3.7-rc3
pali112:v0.3.7-rc2
pali112:v0.3.7-rc1
pali112:v0.3.6
pali112:v0.3.5
pali112:v0.3.4
pali112:v0.3.3
pali112:v0.3.2
pali112:v0.3.1
pali112:v0.3.0
pali112:v0.2.8
pali112:v0.2.8-rc2
pali112:v0.2.8-rc1
pali112:v0.2.7
pali112:v0.2.6
pali112:v0.2.5
pali112:v0.2.4
pali112:v0.2.3
pali112:v0.2.2
pali112:v0.2.2-rc2
pali112:v0.2.2-rc1
pali112:v0.2.1
pali112:v0.2.0
pali112:v0.1.49-rc14
pali112:v0.1.49-rc13
pali112:v0.1.49-rc12
pali112:v0.1.49-rc11
pali112:v0.1.49-rc10
pali112:v0.1.49-rc9
pali112:v0.1.49-rc8
pali112:v0.1.49-rc7
pali112:v0.1.49-rc6
pali112:v0.1.49-rc5
pali112:v0.1.49-rc4
pali112:v0.1.49-rc3
pali112:v0.1.49-rc2
pali112:v0.1.49-rc1
pali112:v0.1.48
pali112:v0.1.47
pali112:v0.1.46
pali112:v0.1.45-rc5
pali112:v0.1.45
pali112:v0.1.45-rc4
pali112:v0.1.45-rc3
pali112:v0.1.45-rc2
pali112:v0.1.45-rc1
pali112:v0.1.44
pali112:v0.1.43
pali112:v0.1.42
pali112:v0.1.41
pali112:v0.1.40
pali112:v0.1.40-rc1
pali112:v0.1.39
pali112:v0.1.39-rc2
pali112:v0.1.39-rc1
pali112:v0.1.38
pali112:v0.1.37
pali112:v0.1.36
pali112:v0.1.35
pali112:v0.1.35-rc1
pali112:v0.1.34
pali112:v0.1.34-rc1
pali112:v0.1.33
pali112:v0.1.33-rc7
pali112:v0.1.33-rc6
pali112:v0.1.33-rc5
pali112:v0.1.33-rc4
pali112:v0.1.33-rc3
pali112:v0.1.33-rc2
pali112:v0.1.33-rc1
pali112:v0.1.32
pali112:v0.1.32-rc2
pali112:v0.1.32-rc1
pali112:v0.1.31
pali112:v0.1.30
pali112:v0.1.29
pali112:v0.1.28
pali112:v0.1.27
pali112:v0.1.26
pali112:v0.1.25
pali112:v0.1.24
pali112:v0.1.23
pali112:v0.1.22
pali112:v0.1.21
pali112:v0.1.20
pali112:v0.1.19
pali112:v0.1.18
pali112:v0.1.17
pali112:v0.1.16
pali112:v0.1.15
pali112:v0.1.14
pali112:v0.1.13
pali112:v0.1.12
pali112:v0.1.11
pali112:v0.1.10
pali112:v0.1.9
pali112:v0.1.8
pali112:v0.1.7
pali112:v0.1.6
pali112:v0.1.5
pali112:v0.1.4
pali112:v0.1.3
pali112:v0.1.2
pali112:v0.1.1
pali112:v0.1.0
pali112:v0.0.21
pali112:v0.0.20
pali112:v0.0.19
pali112:v0.0.18
pali112:v0.0.17
pali112:v0.0.16
pali112:v0.0.15
pali112:v0.0.14
pali112:v0.0.13
pali112:v0.0.12
pali112:v0.0.11
pali112:v0.0.10
pali112:v0.0.9
pali112:v0.0.8
pali112:v0.0.7
pali112:v0.0.6
pali112:v0.0.5
pali112:v0.0.4
pali112:v0.0.3
pali112:v0.0.2
pali112:v0.0.1
2 Commits
v0.1.18
...
mattw/quan
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
fed3843be2 |
update to resolve jmorganca comments
Signed-off-by: Matt Williams <m@technovangelist.com> |
||
|
|
01d4047ed3 |
add faq about quant and context
Signed-off-by: Matt Williams <m@technovangelist.com> |
1 changed files with 23 additions and 0 deletions
23
docs/faq.md
23
docs/faq.md
|
|
@@ -112,3 +112,26 @@ This can impact both installing Ollama, as well as downloading models.
|
|||
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
|
||||
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
||||
properties.
|
||||
|
||||
## What does the q in the model tag mean? What is quantization?
|
||||
|
||||
Whenever you pull a model without a tag, Ollama will actually pull the q4_0 quantization of the model. You can verify this on the tags page. On https://ollama.ai/library/llama2/tags you can see that the hash for the latest tag matches the hash for the 7b model. 
|
||||
|
||||
Looking at the that page for any model, you can see several quantization options available. Quantization is a method of compression that allows the model to fit in less space and thus use less RAM and VRAM on your machine.
|
||||
|
||||
At a high level, a model is made of an enormous collection of nodes that determine how to generate text. These nodes are connected at different levels with weights. The training process adjusts these weights to be able to output the right text every time.
|
||||
|
||||
Most of the source models that we use start with weights that are 32bit floating-point numbers. Those weights, and another concept called biases, add up to be the parameters. So a source model with 7 billion parameters has 7 billion 32bit floating-point numbers, plus a description of all the nodes and more. That adds up to needing at least 28 Gigabytes of memory to load, if you choose to load one of those source models.
|
||||
|
||||
Quantization turns those 32bit floating point weights into much smaller integers. The number next to the q indicates the bit size of the weights. So a q4 model converted those 32bit floats into 4bit integers. A 4bit quantization takes up the space for 7billion 4bit integers, plus a little overhead. That comes out to almost 4 Gigabytes. Obviously, there is some loss of information in this process of going from 30GB to 4GB, but it turns out in most cases it isn't really noticeable. In fact, even the 2bit quantization which fits in less than 3GB can be very useful.
|
||||
|
||||
There are three major sets of quantizations you will see in the Ollama Library of models: **fp16**, models with just a q and a number, like **q4_0**, and then models with a **K** in the tag. The **fp16** model is one that has been converted and quantized from the source 32bit to 16bit. This will be about half the size of the 32bit source model and is the largest quantization we deliver in the library. The **q4_0**, **q4_1**, **q5_0**, etc. models use two different quantization methods that were the original methods.
|
||||
|
||||
The models with a **K** are often referred to as K Quants. This is a method that allows for models of a similar quality but smaller than the original method used. Essentially, it finds clusters of weights and quantizes those together, allowing for higher precision while using the same bit sizes as the regular quantization options. But this requires a set of maps for the model to figure out the original values which have a computational cost. You may see some impact on the speed of models with K quants compared to the regular quantizations.
|
||||
|
||||
## What is context, can I increase it, and why doesn't every model support a huge context?
|
||||
|
||||
Context refers to the size of the input you can send to a model and get sensible output back. Many models have a context size of 2048 tokens. It's sometimes possible to give it more using the **num_ctx** parameter, but the answers start to degrade. This is because half of the context is "freed" up to allow for more memory. Newer models have been able to increase that context size using different methods. This increase in context size results in a corresponding increase in memory required, sometimes by orders of magnitude.
|
||||
|
||||
> !WARNING]
|
||||
> Currently, over-allocating context size may result in model quality or stability issues.
|
||||
|
|
|
|||
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.