VentureBeat October 22, 2020
Amazon’s Alexa is getting better at recognizing who’s speaking and what they’re speaking about, understanding words through on-device techniques, and leveraging models trained without needing human review. That’s according to automatic speech recognition head Shehzad Mevawalla, who spoke with VentureBeat ahead of a keynote address at this year’s Interspeech conference.
Alexa is now running “full-capability” speech recognition on-device, after previously relying on models many gigabytes in size that required huge amounts of memory and ran on servers in the cloud. That change is because of a move to end-to-end models, Mevawalla said, or AI models that take acoustic speech signals as input and directly output transcribed speech. Alexa’s previous speech recognizers had specialized components that processed inputs in sequence, such...