VentureBeat October 16, 2024
Improving the capabilities of large language models (LLMs) in retrieving in-prompt information remains an area of active research that can impact important applications such as retrieval-augmented generation (RAG) and in-context learning (ICL).
Microsoft Research and Tsinghua University researchers have introduced Differential Transformer (Diff Transformer), a new LLM architecture that improves performance by amplifying attention to relevant context while filtering out noise. Their findings, published in a research paper, show that Diff Transformer outperforms the classic Transformer architecture in various settings.
Transformers and the “lost-in-the-middle” phenomenon
The Transformer architecture is the foundation of most modern LLMs. It uses an attention mechanism to weigh the importance of different parts of the input sequence when generating output. The attention mechanism employs the softmax...