VentureBeat November 13, 2024
One-bit large language models (LLMs) have emerged as a promising approach to making generative AI more accessible and affordable. By representing model weights with a very limited number of bits, 1-bit LLMs dramatically reduce the memory and computational resources required to run them.
Microsoft Research has been pushing the boundaries of 1-bit LLMs with its BitNet architecture. In a new paper, the researchers introduce BitNet a4.8, a new technique that further improves the efficiency of 1-bit LLMs without sacrificing their performance.
The rise of 1-bit LLMs
Traditional LLMs use 16-bit floating-point numbers (FP16) to represent their parameters. This requires a lot of memory and compute resources, which limits the accessibility and deployment options for LLMs. One-bit LLMs address this challenge...