Commit Graph

6 Commits

Author SHA1 Message Date
Devon Rifkin d20cd8df80 fix incorrect chat truncation
The dynamically calculated `NumCtx` value wasn't making it all the way
to the chat handler

This fix made us notice that the minimum setting of `NumCtx` to 4 inside
of `server/sched.go` was accidentally removed in #10364. The minimum
doesn't make it out to the client code, which is important for
embeddings, as demonstrated in `TestAllMiniLMEmbedTruncate`. This should
be cleaned up more, but probably is caused by start and end tokens in
the embedding, so small context sizes need some work there. See the
comment in `server/routes.go` for more information on the temporary hack
that's been added to propagate the dynamically calculated `NumCtx` (the
-1 guard there is to keep embeddings working if you set `NumCtx` to some
small value like `1`).

Fixes: #10441
2025-04-28 16:11:36 -07:00
Daniel Hiltgen dc6fe82051
integration: harden embedding test (#7306)
Use cosine similarity to make the embeddings tests more robust
2024-10-22 15:25:22 -07:00
Daniel Hiltgen 90ca84172c
Fix embeddings memory corruption (#6467)
* Fix embeddings memory corruption

The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.

* Fix embed integration test assumption

The token eval count has changed with recent llama.cpp bumps (0.3.5+)
2024-08-22 14:51:42 -07:00
royjhan 1b44d873e7
Add Metrics to `api\embed` response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
royjhan ac33aa7d37
Fix Embed Test Flakes (#5893)
* float cmp

* increase tolerance
2024-07-24 11:15:46 -07:00
royjhan b9f5e16c80
Introduce `/api/embed` endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00