Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

VentureBeat October 18, 2024
Carl Franzen

Just in time for Halloween 2024, Meta has unveiled Meta Spirit LM, the company’s first open-source multimodal language model capable of seamlessly integrating text and speech inputs and outputs.

As such, it competes directly with OpenAI’s GPT-4o (also natively multimodal) and other multimodal models such as Hume’s EVI 2, as well as dedicated text-to-speech and speech-to-text offerings such as ElevenLabs.

Designed by Meta’s Fundamental AI Research (FAIR) team, Spirit LM aims to address the limitations of existing AI voice experiences by offering a more expressive and natural-sounding speech generation, while learning tasks across modalities like automatic speech recognition (ASR), text-to-speech (TTS), and speech classification.

Unfortunately for entrepreneurs and business leaders, the model is only currently available for non-commercial usage under...

Today's Sponsors

Today's Sponsor

Topics: AI (Artificial Intelligence), Technology

2024-10-19T16:19:42-04:00

Share This Article

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

Today's Sponsors

Today's Sponsor

Share This Article