ollama

History

Grace fbd82ba5bb Grace/deepseek v3 migration (#12385 ) * init deepseek model file * temp removal of flash attention implementation * shapes and proper, can make a pass * query, key, value have good cosine similarity, but the max diff is a bit high * Attention block is working! ** with eager for now, have not added the mask line * Attention block is working! ** with eager for now, have not added the mask line * working MoE at around 0.95 cosine sim * added cosine similarity function * Starting end to end structure * Trying (and failing) to get rope to work, going to test full thing on tater * running on tater36... just not the right outputs * we have the right values for rope... but its still not working? * chnage Extrapolation Factor to 1 * removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer * Temporary modelfiles for cpu * change kpass intermediate step to kv, two layer outputs [0,1] look fine * this calls for 16 chicken nuggets * whoops * cleaning up code * delete stuff we dont need * getting rid of debug statements for llama cpp * working with long contexts * fix long context view error * reverting some changes I made for files that are not apart of pr * Added proper tokenizer for deeepseek3 * clean up model and go test * remove Modelfile * not passing the tests * whoops * how to pass the ci tests * resolving some of the comments * rename * linted and renamed deepseek3 -> deepseek2 * remove name go * addressed changes - main change was adopting qwen3 naming scheme * I cannot with linters * clean up logs * clean up logs --------- Co-authored-by: Grace Guo <graceguo@Graces-MBP.localdomain> Co-authored-by: Grace Guo <graceguo@Graces-MacBook-Pro.local> Co-authored-by: graceguo <graceguo@tater36.localdomain>		2025-09-24 15:19:47 -07:00
..
imageproc	imageproc mllama refactor (#7537 )	2024-12-14 19:50:15 -08:00
input	batch: use tensors for outputs (#12185 )	2025-09-15 14:33:06 -07:00
models	Grace/deepseek v3 migration (#12385 )	2025-09-24 15:19:47 -07:00
parsers	Merge pull request #12339 from ollama/drifkin/harmony-refactor-to-builtin	2025-09-22 13:13:40 -07:00
renderers	address comments	2025-09-15 11:46:25 -07:00
testdata	gemma2 impl	2025-03-11 14:35:08 -07:00
bytepairencoding.go	multi-regexp pretokenizer (#12325 )	2025-09-23 13:21:47 -07:00
bytepairencoding_test.go	multi-regexp pretokenizer (#12325 )	2025-09-23 13:21:47 -07:00
model.go	fix: leaf alt name (#12390 )	2025-09-23 17:50:53 -07:00
model_test.go	fix: leaf alt name (#12390 )	2025-09-23 17:50:53 -07:00
sentencepiece.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
sentencepiece_test.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
textprocessor.go	model: handle multiple eos tokens (#10577 )	2025-05-16 13:40:23 -07:00
vocabulary.go	embedding gemma model (#12181 )	2025-09-04 09:09:07 -07:00
vocabulary_test.go	model: treat 'user defined' tokens as special tokens (#11077 )	2025-06-16 16:03:16 -07:00
wordpiece.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
wordpiece_test.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00