VentureBeat October 18, 2024
Just in time for Halloween 2024, Meta has unveiled Meta Spirit LM, the company’s first open-source multimodal language model capable of seamlessly integrating text and speech inputs and outputs.
As such, it competes directly with OpenAI’s GPT-4o (also natively multimodal) and other multimodal models such as Hume’s EVI 2, as well as dedicated text-to-speech and speech-to-text offerings such as ElevenLabs.
Designed by Meta’s Fundamental AI Research (FAIR) team, Spirit LM aims to address the limitations of existing AI voice experiences by offering a more expressive and natural-sounding speech generation, while learning tasks across modalities like automatic speech recognition (ASR), text-to-speech (TTS), and speech classification.
Unfortunately for entrepreneurs and business leaders, the model is only currently available for non-commercial usage under...