Compare commits

..

2 Commits

Author SHA1 Message Date
Jesse Gross
d875e99e46 runner.go: Propagate panics back to the user.
This is a partial revert of 8a35bb92
"runner.go: Increase survivability of main processing loop", removing
the panic handler.

Although we want to avoid errors taking down the runner, we also
should make the user aware of problems when they happen. In the
future, we can restructure things so both parts are true.
2024-11-15 11:52:25 -08:00
Jesse Gross
8a35bb926e runner.go: Increase survivability of main processing loop
Currently, if an error occurs during the prep stages (such as
tokenizing) of a single request, it will only affect that request.
However, if an error happens during decoding, it can take down the
entire runner.

Instead, it's better to drop the tokens that triggered the error and try to
keep going. However, we also need to stop when we run out of tokens,
otherwise, this just causes an infinite loop. This is likely the cause
of at least some of the hanging issues that have been reported.

Bug #7573
2024-11-14 17:18:41 -08:00

View File

@@ -357,6 +357,14 @@ func (s *Server) processBatch(tokenBatch *llama.Batch, embedBatch *llama.Batch)
continue
}
// If an error occurred during the processing of a previous batch then we may have emptied the inputs
// without adding a new one. In this case, end the sequence rather than infinite looping.
if len(seq.inputs) == 0 {
slog.Error("removing sequence due to no input tokens", "index", seqIdx, "cache id", seq.cache.Id)
s.removeSequence(seqIdx, "error")
continue
}
// if past the num predict limit
if seq.numPredict > 0 && seq.numPredicted >= seq.numPredict {
s.removeSequence(seqIdx, "limit")