Meta claims its AI improves speech recognition quality by reading lips

VentureBeat January 7, 2022
Kyle Wiggers

People perceive speech both by listening to it and watching the lip movements of speakers. In fact, studies show that visual cues play a key role in language learning. By contrast, AI speech recognition systems are built mostly — or entirely — on audio. And they require a substantial amount of data to train, typically ranging in the tens of thousands of hours of recordings.

To investigate whether visuals — specifically footage of mouth movement — can improve the performance of speech recognition systems, researchers at Meta (formerly Facebook) developed Audio-Visual Hidden Unit BERT (AV-HuBERT), a framework that learns to understand speech by both watching and hearing people speak. Meta claims that AV-HuBERT is 75% more accurate than the best...

Today's Sponsors

Today's Sponsor

Topics: AI (Artificial Intelligence), Technology, Voice Assistant

2022-01-07T17:15:31-05:00

Share This Article

Meta claims its AI improves speech recognition quality by reading lips

Today's Sponsors

Today's Sponsor

Share This Article