Runway’s latest AI video generator brings giant cotton candy monsters to life

98

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha that's still under development, but it appears to create video of similar quality to OpenAI's Sora, which debuted earlier this year (and has also not yet been released). It can generate novel, high-definition video from text prompts that range from realistic humans to surrealistic monsters stomping the countryside.

Unlike Runway's previous best model from June 2023, which could only create two-second-long clips, Gen-3 Alpha can reportedly create 10-second-long video segments of people, places, and things that have a consistency and coherency that easily surpasses Gen-2. If 10 seconds sounds short compared to Sora's full minute of video, consider that the company is working with a shoestring budget of compute compared to more lavishly funded OpenAI—and actually has a history of shipping video generation capability to commercial users.

Gen-3 Alpha does not generate audio to accompany the video clips, and it's highly likely that temporally coherent generations (those that keep a character consistent over time) are dependent on similar high-quality training material. But Runway's improvement in visual fidelity over the past year is difficult to ignore.

AI video heats up

It's been a busy couple of weeks for AI video synthesis in the AI research community, including the launch of the Chinese model Kling, created by Beijing-based Kuaishou Technology (sometimes called "Kwai"). Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherency that reportedly matches Sora.

Gen-3 Alpha prompt: "Subtle reflections of a woman on the window of a train moving at hyper-speed in a Japanese city."

Not long after Kling debuted, people on social media began creating surreal AI videos using Luma AI's Luma Dream Machine. These videos were novel and weird but generally lacked coherency; we tested out Dream Machine and were not impressed by anything we saw.

Meanwhile, one of the original text-to-video pioneers, New York City-based Runway—founded in 2018—recently found itself the butt of memes that showed its Gen-2 tech falling out of favor compared to newer video synthesis models. That may have spurred the announcement of Gen-3 Alpha.

Gen-3 Alpha prompt: "An astronaut running through an alley in Rio de Janeiro."

Generating realistic humans has always been tricky for video synthesis models, so Runway specifically shows off Gen-3 Alpha's ability to create what its developers call "expressive" human characters with a range of actions, gestures, and emotions. However, the company's provided examples weren't particularly expressive—mostly people just slowly staring and blinking—but they do look realistic.

Provided human examples include generated videos of a woman on a train, an astronaut running through a street, a man with his face lit by the glow of a TV set, a woman driving a car, and a woman running, among others.

Gen-3 Alpha prompt: "A close-up shot of a young woman driving a car, looking thoughtful, blurred green forest visible through the rainy car window."

The generated demo videos also include more surreal video synthesis examples, including a giant creature walking in a rundown city, a man made of rocks walking in a forest, and the giant cotton candy monster seen below, which is probably the best video on the entire page.

Gen-3 Alpha prompt: "A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them."

Gen-3 will power various Runway AI editing tools (one of the company's most notable claims to fame), including Multi Motion Brush, Advanced Camera Controls, and Director Mode. It can create videos from text or image prompts.

Runway says that Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward the development of what it calls "General World Models," which are hypothetical AI systems that build internal representations of environments and use them to simulate future events within those environments.

A few limitations

While these demos look fun at first glance, it's worth mentioning a few drawbacks of an announcement like this. Since Gen-3 is not yet public and we do not have access yet, we have not had the chance to evaluate it. That means that even if you take Runway's stated claim ("All of the videos on this page were generated with Gen-3 Alpha with no modifications") at face value, the videos were very likely cherry-picked as having especially optimal results.

Also, all image and video synthesis models require large datasets of existing images or video, usually either culled from sources found online without permission or licensed from rights holders. Runway has not said where it obtained the training data to train Gen-3, but it says the model was trained both on videos and still images.

That said, going by face value, the demo videos appear impressive and state-of-the-art (an ever-moving target) for video synthesis. If the tech keeps getting better over the next few years, it's likely that video synthesis clips will eventually find their way into professional video projects somehow.

Gen-3 Alpha prompt: "A man made of rocks walking in the forest, full-body shot."

While media has never accurately captured reality, photorealistic video was, for a long time, largely anchored to real objects and situations (barring expensive special effects and CGI departments). If a fine enough measure of generational control is achieved, AI video tech stands poised to bring that big-budget capability to low-budget video productions, which may dramatically lower the cost of filmmaking in the future. But with some entertainment industry jobs potentially at stake—including visual effects teams, actors, and set designers—we expect to see struggle and backlash along the way.

As mentioned, Gen-3 Alpha is not yet available to the public, but the company offers an inquiry sign-up for commercial entities who might want to fine-tune the model for future commercial use. Runway says that Gen-3's release, whenever it comes, will be accompanied by content safeguards, such as an in-house visual moderation system and C2PA provenance standards.

A recap of AI video synthesis on Ars Technica

Since 2022, we've covered a number of AI video synthesis models. We've also missed a few notable projects, such as Phenaki (mentioned briefly in one piece), Runway's Gen-1, Pika (mentioned in a roundup syndicated from FT), Luma Dream Machine, and Kling (both mentioned above). To provide a brief rundown of where the technology has been so far, here's a list of related Ars Technica articles. This is as much for our benefit as it is for yours because it's sometimes difficult to keep all of these AI video models straight.

9/9/2022 - Runway teases AI-powered text-to-video editing using written prompts
9/29/2022 - Meta announces Make-A-Video, which generates video from text [Make-A-Video]
10/5/2022 - Google’s newest AI generator creates HD video from text prompts [Imagen Video]
3/30/2023 - AI-generated video of Will Smith eating spaghetti astounds with terrible beauty [ModelScope]
4/17/2023 - Adobe teases generative AI video tools [Firefly Video]
5/2/2023 - AI-generated beer commercial contains joyful monstrosities, goes viral [Gen-2]
11/27/2023 - New “Stable Video Diffusion” AI model can animate any still image [Stable Video Diffusion]
12/15/2023 - These AI-generated news anchors are freaking me out [Channel 1]
1/24/2024 - Google’s latest AI video generator can render cute animals in implausible situations [Lumiere]
2/16/2024 - OpenAI collapses media reality with Sora, a photorealistic AI video generator [Sora]
2/20/2024 - Will Smith parodies viral AI-generated video by actually eating spaghetti
2/23/2024 - Tyler Perry puts $800 million studio expansion on hold because of OpenAI’s Sora
5/15/2024 - Google unveils Veo, a high-definition AI video generator that may rival Sora [Veo]

Even a cursory look at the process from the earliest models above shows that AI video synthesis technology is steadily on the move, and the increased capability is likely only limited by available compute and enough high-quality training data. We'll keep you posted.

Listing image: Runway

Benj Edwards Senior AI Reporter

Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

98

View Comments