DeepMind’s Demis Hassabis Reveals Google’s Future Integration of Gemini and Veo AI Models
In a recent episode of the podcast Possible, co-hosted by LinkedIn co-founder Reid Hoffman, Demis Hassabis, CEO of Google DeepMind, revealed the tech giant’s ambitious plans to enhance its AI capabilities. The company aims to merge its Gemini AI models with its Veo video-generating technology, significantly improving the understanding of the physical world through advanced AI systems.
Vision for a Universal Digital Assistant
During the podcast, Hassabis emphasized the foundational vision behind the Gemini models, stating, “We’ve always built Gemini to be multimodal from the beginning.” This approach is geared toward creating a universal digital assistant that can genuinely assist users in navigating real-world scenarios.
The Rise of Omni Models in AI
The AI landscape is progressively shifting towards what can be termed as “omni” models—capable of comprehending and synthesizing various forms of media. Key developments include:
- Google’s Gemini Models: These models now support audio, images, and text generation.
- OpenAI’s ChatGPT: The default model has recently incorporated image creation, including styles reminiscent of Studio Ghibli.
- Amazon’s Innovations: Plans are underway to launch an “any-to-any” model later this year.
Data Sources for Training Omni Models
Training these advanced omni models requires extensive datasets comprising images, videos, audio, and text. Hassabis indicated that the video data utilized by Veo primarily derives from YouTube, a platform under Google’s ownership.
He explained, “Basically, by watching YouTube videos — a lot of YouTube videos — Veo 2 can figure out the physics of the world.” This method allows the AI to learn and adapt by analyzing real-world scenarios presented in videos.
Google’s Approach to YouTube Content
In a statement to TechCrunch, Google acknowledged that its AI models “may be” trained on “some” YouTube content, a practice governed by agreements with content creators. The company also broadened its terms of service last year, enabling access to more data for training its AI systems.
This strategic integration of Gemini and Veo models reflects Google’s commitment to advancing AI technology, aiming to deliver a more intuitive and capable digital assistant that can seamlessly interact with users in their everyday lives.