VentureBeat August 26, 2024
Given the high costs and slow speed of training large language models (LLMs), there is an ongoing discussion about whether spending more compute cycles on inference can help improve the performance of LLMs without the need for retraining them.
In a new study, researchers at DeepMind and the University of California, Berkeley explore ways to improve the performance of LLMs by strategically allocating compute resources during inference. Their findings, detailed in a new research paper, suggest that by optimizing the use of inference-time compute, LLMs can achieve substantial performance gains without the need for larger models or extensive pre-training.
The tradeoff between inference-time and pre-training compute
The dominant approach to improving LLM performance has been to scale up model size...