
Google Gemini Omni Video Model Leak: Is This Veo 4 or Google’s Next AI Video Agent?
Google may be preparing its next big move in AI video. A new name, Gemini Omni, has appeared in leaked Gemini video-generation UI screenshots, just days before Google I/O 2026, which officially runs May 19–20 and is expected to include major AI updates from Google.
The leak matters because Google’s current public video model story is already strong: Veo 3.1 can generate high-fidelity short videos with native audio, support 720p, 1080p, and 4K outputs, and handle features like portrait video, video extension, first-and-last-frame generation, and multiple reference images.
So why would Google need “Omni”?
That is the real question. Gemini Omni could be a new name for Veo inside Gemini. It could be the long-rumored Veo 4. Or it could be something more ambitious: a Gemini-native video agent that can generate, remix, edit, and orchestrate video projects through chat.
What Actually Leaked?
According to reports from TestingCatalog and 9to5Google, some users briefly saw a new Gemini model card or interface copy that read along the lines of: “Create with Gemini Omni”, with descriptions mentioning video remixing, direct chat editing, templates, and more.
Another earlier UI string reportedly said: “Start with an idea or try a template. Powered by Omni.” TestingCatalog noted that this appeared near references to “Toucan,” described as a current Gemini video-generation pathway powered by Veo.
That placement is important. If “Omni” were only an internal experiment, it might stay hidden in code. But visible UI copy usually suggests something closer to launch, A/B testing, or staged rollout.
There are also two videos said to be generated by Omni:
Still, Google has not officially announced Gemini Omni at the time of writing. For now, it should be treated as a leak, not a confirmed product.
Is Gemini Omni Actually Veo 4?
The simplest theory is that Gemini Omni is Veo 4 under a new Gemini-first name.
Google already has a branding split: Gemini handles text, reasoning, chat, and multimodal interaction; Veo handles video generation; Imagen and Nano Banana cover image generation; Flow acts as the AI filmmaking workspace. That structure makes technical sense, but it can feel fragmented for everyday users.
A name like “Omni” would solve that. Instead of asking users to understand which model powers which feature, Google could say: open Gemini, describe your idea, edit by chat, and let the system choose the right video engine.
In this case, “Omni” may not be a totally separate model. It could be a product layer, a routing system, or a Gemini-branded interface powered by Veo 3.1, Veo 4, or multiple Veo variants behind the scenes.
This would also fit Google’s recent direction with Flow. In October 2025, Google described Veo 3.1 as bringing richer audio, more narrative control, stronger prompt adherence, and improved audiovisual quality, while Flow gained tools like reference-image “Ingredients to Video,” start/end-frame “Frames to Video,” Extend, Insert, and object removal.
Gemini Omni could be the next step: not just a better model, but a cleaner creative interface.
Or Is It a New AI Video Agent?
The more exciting theory is that Gemini Omni is not simply “Veo 4.” It may be an AI video agent.
Why? Because the leaked descriptions focus less on raw generation and more on workflow: remix videos, edit directly in chat, try templates, make changes through conversation. TestingCatalog also reported separate signs of an upcoming Agent Mode for Flow, where an assistant could plan scenes, discuss project changes, trigger generation workflows, manage creative tools, and update project state from a chat surface.
That sounds less like a normal text-to-video model and more like an AI director’s assistant.
A standard AI video generator answers one prompt:
“Generate a cinematic perfume ad in 16:9.”
An AI video agent understands a project:
“Create a luxury perfume campaign. Keep the same model, make three scenes, use a warm golden palette, add a product close-up, then give me a YouTube cover and a vertical ad version.”
This difference matters. The next competition in AI video will not only be about realism. It will be about control, consistency, editing, and workflow automation.
Why “Omni” Could Be Bigger Than Video Generation
The name “Omni” suggests something broader than video. In AI, “omni” usually points to a system that can work across multiple input and output types: text, image, video, audio, and maybe even project files.
If Gemini Omni is truly omni-modal, it could combine several tasks that currently require separate tools:
- Turn a script into a storyboard
- Generate images for character and product references
- Convert those references into video
- Add native sound, dialogue, or ambient audio
- Edit scenes through chat
- Replace objects or adjust visual style
- Extend clips into longer sequences
- Prepare different aspect ratios for TikTok, YouTube, Instagram, and ads
That would make Gemini Omni less like a single model and more like a creative operating system.
For creators, marketers, and AI video teams, that is the bigger shift. The value is not just “better pixels.” The value is reducing the number of steps between idea, asset, edit, and final campaign.
How Gemini Omni Could Compare With Veo 3.1, Seedance 2.0, Kling V3.0
If Gemini Omni launches, it will enter a crowded race. Veo 3.1 is already strong in cinematic realism, native audio, and prompt adherence. Seedance 2.0 has become a major benchmark name for fast, high-quality motion. Kling remains popular for expressive motion and creator-friendly video outputs.
But Gemini Omni may compete on a different axis: agentic editing.
Early reports suggest Omni’s standout area may be editing rather than pure first-shot generation. It is reported that early reactions were mixed on raw cinematic fidelity, but more positive around editing-style tasks such as object swapping and scene rewriting through chat.
That would make sense strategically. Many AI video models can create a beautiful first clip. Fewer can help creators revise it reliably.
For real production, revision is everything. A marketer rarely needs one random cool clip. They need consistent characters, brand-safe product shots, controllable camera movement, clean transitions, multiple formats, and quick changes when the creative director says, “Make it brighter, move the product closer, and change the ending.”
What This Means for AI Video Creators
If Gemini Omni is real, creators should watch three things.
First, chat-based editing. If users can revise video through natural language instead of regenerating from scratch, AI video will become much more practical for ads, social content, product demos, and cinematic shorts.
Second, template workflows. The leaked copy mentions templates. That could help casual users make videos faster, but it may also help brands create repeatable formats: product ads, UGC-style clips, app promos, fashion shots, avatar explainers, and music visuals.
Third, agentic production. If Omni connects with Flow Agent Mode, the workflow could move from “prompt one clip” to “manage a whole video project.” That is where AI video starts feeling less like a generator and more like a creative teammate.
Should You Wait for Gemini Omni?
Probably not.
Leaks are useful for understanding where the industry is going, but they are not a production plan. Google may announce Omni at I/O 2026, rename it, delay it, limit access, or roll it out only to certain plans or regions first.
For now, creators should keep building flexible workflows that can use multiple models. The smartest AI video workflow is not locked to one model. It combines the best model for each task: one for realism, one for motion, one for image consistency, one for editing, one for audio, and one for fast ad variations.
That is also why all-in-one AI video platforms are becoming more important. A creator should not need to rebuild their workflow every time Google, OpenAI, ByteDance, Kuaishou, or another lab releases a new model.
FAQ
What is Gemini Omni?
Gemini Omni appears to be an unreleased Google video model or video-creation feature spotted in Gemini UI leaks. It has not been officially announced by Google yet.
Is Gemini Omni the same as Veo 4?
Not confirmed. It could be Veo 4, a new Gemini-branded wrapper for Veo, a separate video model, or a broader AI video agent.
When could Google announce Gemini Omni?
The most likely window is Google I/O 2026, which takes place May 19–20, but Google has not confirmed an Omni announcement.
What makes Gemini Omni important?
The leak suggests stronger chat-based video editing, remixing, templates, and possibly agentic video workflows, not just text-to-video generation.
Will Gemini Omni replace Veo 3.1?
It is too early to say. Veo 3.1 remains Google’s current officially documented state-of-the-art video generation model in the Gemini API.