AI video generation is an increasingly mainstream feature of AI chatbots. Gemini has the Flow filmmaker tool, Veo 3 video generation model, and Whisk AI animator. One of Gemini’s standout features is its ability to generate videos with audio, which simply isn’t possible with ChatGPT’s Sora video generation. However, video generation with audio is a feature exclusive to Veo 3, which is currently available only to AI Ultra subscribers.
Veo 3 is truly a leap forward in AI video generation, and it can give you some genuinely breathtaking results. But are Google’s Veo 3 demo reels and viral Veo 3 clips on social media indicative of Veo 3’s actual performance, or are they carefully selected outliers that give the wrong impression? To find out, I gave Veo 3, alongside ChatGPT’s Sora, a series of prompts. I started by asking for a video of “Somebody going about their daily life in a trendy apartment with rustic decor.”
Gemini’s apartment looks excellent, but there are clear issues. For example, the person in the clip seems to be holding pieces of fruit in both hands before one disappears, and the audio mix is oddly loud. Sora fared even worse, showing somebody squatting beside a chair instead of sitting on it. Gemini’s result is more impressive, but it’s hard to call either good.
To test how chatbot videos handle complex motion, I gave Gemini and ChatGPT the following prompt: “Show me a pro Rubik’s Cube solver solving a cube.” Results are, once again, mixed. The person in Gemini’s video looks great, and the audio is serviceable. Gemini handles fingers and hands well, too, which is historically a tough thing for a chatbot to do. I also appreciate the end of the video where the camera pans up before cutting off, simulating a person stopping a selfie camera recording. However, the actual Rubik’s Cube solving doesn’t look quite right. ChatGPT also struggles with the cube, heavily distorting it.
My final test evaluates text generation. I gave Gemini and ChatGPT the following prompt: “Generate me a video of a teacher in front of a class writing down y = mx+b on a whiteboard while explaining the concept.” Gemini’s video looks and sounds excellent, especially the voice of the teacher. However, it failed to include the text I asked for and sits firmly in the uncanny valley, thanks to the nonsensical text on the whiteboard and strange silence at the end. ChatGPT’s video struggles with distortion on the teacher’s mouth, and again, its text was nonsense. Gemini is the clear winner, but it didn’t manage a compelling result.
As mentioned, Gemini’s Veo 3 model can generate some truly amazing things. However, as evidenced by my testing, getting there requires some careful prompt calibration across multiple generations. This isn’t necessarily a big deal until you factor in the cost of the AI Ultra plan and the fact that each generation requires 150 credits (you get only 12,500 credits per month with said plan). Credits quickly disappear when it takes five, 10, or more generations of each prompt to get something you like.
Flow is another essential component of Gemini’s video generation. Flow allows you to trim video clips you generate and even extend clips based on a new prompt. Continuity between the clip you start with and the extension is generally good in my experience, but extensions suffer from the same issues I mentioned above. That said, with enough credits, you could conceivably make a movie entirely with Flow, something no other video generators can currently do.
You also get access to Whisk, Google’s experimental AI animation tool. It lets you upload pictures of a scene, style, and subject, and then accepts a prompt. Once it generates an image you like, you can use Whisk to animate it via another prompt. I uploaded a picture of myself, my desk area, and a still from an anime to Whisk for testing.
(Credit: Google/PCMag)
Like with a good Snapchat filter, it’s amusing to see myself as an animated character. But, as an AI image, this one has obvious errors and distortions, so it doesn’t impress on a technical level. When I asked Whisk to animate this image by having me turn around and work on the computer, the results were similarly awkward and uncanny. I don’t recommend using Whisk for anything beyond creating something strange to show your friends.
Leave a Comment
Your email address will not be published. Required fields are marked *