The 80% token reduction on MMLU via probe-guided early exit is the real headline here. If reasoning models are confident in answers well before exiting CoT, there is massive efficiency gains on the table. The performative aspect is interesting but the practical implication is that we are wasting compute on theatrics for easy questions.
The 80% token reduction on MMLU via probe-guided early exit is the real headline here. If reasoning models are confident in answers well before exiting CoT, there is massive efficiency gains on the table. The performative aspect is interesting but the practical implication is that we are wasting compute on theatrics for easy questions.