Forbes December 30, 2024
Ulrik Stig Hansen, Co-founder and President of Encord.
The AI industry has long been dominated by text-based large language models (LLMs), but the future lies beyond the written word. Multimodal AI represents the next major wave in artificial intelligence. Unlike LLMs, these multimodal AI platforms combine text, documents, images, audio and video into unified AI models designed to handle multiple data streams for more accurate outputs. It’s not merely about making larger models; it’s about designing ones that are more human-like.
These new multimodal systems can mimic human communication across various interactions more accurately. For instance, Meta’s MovieGen now generates short films from written prompts, while OpenAI’s advanced voice mode enables real-time voice-based conversations. This is just the beginning. As...