VentureBeat October 24, 2024
Meta Platforms has created smaller versions of its Llama artificial intelligence models that can run on smartphones and tablets, opening new possibilities for AI beyond data centers.
The company announced compressed versions of its Llama 3.2 1B and 3B models today that run up to four times faster while using less than half the memory of earlier versions. These smaller models perform nearly as well as their larger counterparts, according to Meta’s testing.
How Meta made large language models work on phones
The advancement uses a compression technique called quantization, which simplifies the mathematical calculations that power AI models. Meta combined two methods: Quantization-Aware Training with LoRA adaptors (QLoRA) to maintain accuracy, and SpinQuant to improve portability.
This technical achievement...