ollama/model
Grace fbd82ba5bb
Grace/deepseek v3 migration (#12385)
* init deepseek model file

* temp removal of flash attention implementation

* shapes and proper, can make a pass

* query, key, value have good cosine similarity, but the max diff is a bit high

* Attention block is working! ** with eager for now, have not added the mask line

* Attention block is working! ** with eager for now, have not added the mask line

* working MoE at around 0.95 cosine sim

* added cosine similarity function

* Starting end to end structure

* Trying (and failing) to get rope to work, going to test full thing on tater

* running on tater36... just not the right outputs

* we have the right values for rope... but its still not working?

* chnage Extrapolation Factor to 1

* removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer

* Temporary modelfiles for cpu

* change kpass intermediate step to kv, two layer outputs [0,1] look fine

* this calls for 16 chicken nuggets

* whoops

* cleaning up code

* delete stuff we dont need

* getting rid of debug statements for llama cpp

* working with long contexts

* fix long context view error

* reverting some changes I made for files that are not apart of pr

* Added proper tokenizer for deeepseek3

* clean up model and go test

* remove Modelfile

* not passing the tests

* whoops

* how to pass the ci tests

* resolving some of the comments

* rename

* linted and renamed deepseek3 -> deepseek2

* remove name go

* addressed changes - main change was adopting qwen3 naming scheme

* I cannot with linters

* clean up logs

* clean up logs

---------

Co-authored-by: Grace Guo <graceguo@Graces-MBP.localdomain>
Co-authored-by: Grace Guo <graceguo@Graces-MacBook-Pro.local>
Co-authored-by: graceguo <graceguo@tater36.localdomain>
2025-09-24 15:19:47 -07:00
..
imageproc imageproc mllama refactor (#7537) 2024-12-14 19:50:15 -08:00
input batch: use tensors for outputs (#12185) 2025-09-15 14:33:06 -07:00
models Grace/deepseek v3 migration (#12385) 2025-09-24 15:19:47 -07:00
parsers Merge pull request #12339 from ollama/drifkin/harmony-refactor-to-builtin 2025-09-22 13:13:40 -07:00
renderers address comments 2025-09-15 11:46:25 -07:00
testdata gemma2 impl 2025-03-11 14:35:08 -07:00
bytepairencoding.go multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
bytepairencoding_test.go multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
model.go fix: leaf alt name (#12390) 2025-09-23 17:50:53 -07:00
model_test.go fix: leaf alt name (#12390) 2025-09-23 17:50:53 -07:00
sentencepiece.go model: implement bert in ollama engine (#9080) 2025-09-15 15:35:59 -07:00
sentencepiece_test.go model: implement bert in ollama engine (#9080) 2025-09-15 15:35:59 -07:00
textprocessor.go model: handle multiple eos tokens (#10577) 2025-05-16 13:40:23 -07:00
vocabulary.go embedding gemma model (#12181) 2025-09-04 09:09:07 -07:00
vocabulary_test.go model: treat 'user defined' tokens as special tokens (#11077) 2025-06-16 16:03:16 -07:00
wordpiece.go model: implement bert in ollama engine (#9080) 2025-09-15 15:35:59 -07:00
wordpiece_test.go model: implement bert in ollama engine (#9080) 2025-09-15 15:35:59 -07:00