Microsoft’s Differential Transformer cancels attention noise in LLMs

VentureBeat October 16, 2024
Ben Dickson

Improving the capabilities of large language models (LLMs) in retrieving in-prompt information remains an area of active research that can impact important applications such as retrieval-augmented generation (RAG) and in-context learning (ICL).

Microsoft Research and Tsinghua University researchers have introduced Differential Transformer (Diff Transformer), a new LLM architecture that improves performance by amplifying attention to relevant context while filtering out noise. Their findings, published in a research paper, show that Diff Transformer outperforms the classic Transformer architecture in various settings.

Transformers and the “lost-in-the-middle” phenomenon

The Transformer architecture is the foundation of most modern LLMs. It uses an attention mechanism to weigh the importance of different parts of the input sequence when generating output. The attention mechanism employs the softmax...

Today's Sponsors

Today's Sponsor

Topics: AI (Artificial Intelligence), Technology

2024-10-16T21:23:22-04:00

Share This Article

Microsoft’s Differential Transformer cancels attention noise in LLMs

Today's Sponsors

Today's Sponsor

Share This Article