Google on July 1 released Gemini Omni Flash in public preview, a new AI model that combines the company’s multimodal reasoning with native video generation and iterative, conversation-driven editing.
The model accepts text, images, and short video clips as input and generates up to 10 seconds of video per request. Its distinguishing feature is multi-turn, context-aware editing: after creating an initial clip, users can issue a sequence of natural-language instructions — adjusting the background, changing the lighting, or placing a logo — with each instruction applied to the running state of the video, preserving character, audio, and camera consistency across the whole chain.
Available Across Google’s AI Platforms
Gemini Omni Flash is accessible through Google AI Studio, the Gemini API, the Gemini app, Google Flow, and the Gemini Enterprise Agent Platform. Google is targeting developers and businesses that need cost-efficient video generation integrated into existing workflows. All output is watermarked via SynthID and tagged with C2PA content credentials by default.
Pricing
At $0.10 per second of video output, a 10-second clip costs roughly one dollar — in line with Veo 3.1 Fast, Google’s other video generation offering.
Also Released: Nano Banana 2 Lite
Alongside Omni Flash, Google moved Nano Banana 2 Lite — its fast, lower-cost image-generation model — to general availability. It produces 1,024-pixel-resolution images in four seconds at $0.034 each, and is now live on consumer surfaces including Google Search AI Mode, the Gemini app, Google Photos, and NotebookLM.
Current Limitations
Video generation is capped at 10 seconds per request. Audio reference inputs, scene extension, and higher resolutions are listed as coming features. Google also notes that character consistency can be challenging when scenes change significantly.