cuda panics on batches larger than 1024 so skip those and fallback to cpu
This should be reverted once we update ggml past b6897