VentureBeat November 14, 2024
Advances in large language models (LLMs) have lowered the barriers to creating machine learning applications. With simple instructions and prompt engineering techniques, you can get an LLM to perform tasks that would have otherwise required training custom machine learning models. This is especially useful for companies that don’t have in-house machine learning talent and infrastructure, or product managers and software engineers who want to create their own AI-powered products.
However, the benefits of easy-to-use models are not without tradeoffs. Without a systematic approach to keeping track of the performance of LLMs in their applications, enterprises can end up getting mixed and unstable results.
Public benchmarks vs custom evals
The current popular way to evaluate LLMs is to measure their...