Models can disable causality for all or part of their processing while continuing to store data in the KV cache.