* Add mla for flash attention * Revert to using chunks |
||
|---|---|---|
| .. | ||
| fast | ||
| pooling | ||
| rope | ||
| attention.go | ||
| convolution.go | ||
| embedding.go | ||
| linear.go | ||
| normalization.go | ||
* Add mla for flash attention * Revert to using chunks |
||
|---|---|---|
| .. | ||
| fast | ||
| pooling | ||
| rope | ||
| attention.go | ||
| convolution.go | ||
| embedding.go | ||
| linear.go | ||
| normalization.go | ||