Google is getting itself into a position where it can challenge OpenAI’s dominance of AI.
Google has introduced its latest artificial intelligence model, Lumiere, a multimodal video generation tool capable of producing realistic 5-second-long videos.
Lumiere supports both text-to-video and image-to-video generation, using a Space-Time U-Net (STUNet) architecture to enhance the realism of motion in AI-generated videos.
Unlike existing models such as Runway Gen-2 and Pika 1.0, Lumiere has not been made public yet.
According to a preprint paper accompanying the release, Lumiere’s innovation lies in generating the entire video in a single process rather than combining still frames.
This approach allows for the simultaneous creation of both spatial (objects in the video) and temporal (movement within the video) aspects, resulting in a more natural perception of motion.
Lumiere generates 80 frames, compared to Stable Diffusion’s 25 frames, utilizing spatial and temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model.
Although Lumiere is not available for testing, its website showcases various videos created using the AI model, along with the corresponding text prompts and input images.
The tool can produce videos in different styles, create cinemagraphs for animating specific video parts, and perform inpainting by completing masked-out videos or images based on prompts.
Google’s Lumiere competes with existing AI models like Runway Gen-2 (launched in March 2023) and Pika Lab’s Pika 1.0, both accessible to the public.
While Pika can create 3-second-long videos (extendable to 4 more seconds), Runway can generate videos up to 4 seconds long. Both models offer multimodal capabilities and support video editing.
(With inputs from agencies)