Creators of Sora-powered short explain AI-generated video’s strengths and limitations
OpenAIâs video generation tool Sora took the AI community by surprise in February with fluid, realistic video that seems miles ahead of competitors. But the carefully stage-managed debut left out a lot of details â details that have been filled in by a filmmaker given early access to create a short using Sora.
Shy Kids is a digital production team based in Toronto that was picked by OpenAI as one of a few to produce short films essentially for OpenAI promotional purposes, though they were given considerable creative freedom in creating âair head.â In an interview with visual effects news outlet fxguide, post-production artist Patrick Cederberg described âactually using Soraâ as part of his work.
Perhaps the most important takeaway for most is simply this: While OpenAIâs post highlighting the shorts lets the reader assume they more or less emerged fully formed from Sora, the reality is that these were professional productions, complete with robust storyboarding, editing, color correction, and post work like rotoscoping and VFX. Just as Apple says âshot on iPhoneâ but doesnât show the studio setup, professional lighting, and color work after the fact, the Sora post only talks about what it lets people do, not how they actually did it.
Cederbergâs interview is interesting and quite non-technical, so if youâre interested at all, head over to fxguide and read it. But here are some interesting nuggets about using Sora that tell us that, as impressive as it is, the model is perhaps less of a giant leap forward than we thought.
Control is still the thing that is the most desirable and also the most elusive at this point. ⊠The closest we could get was just being hyper-descriptive in our prompts. Explaining wardrobe for characters, as well as the type of balloon, was our way around consistency because shot to shot / generation to generation, there isnât the feature set in place yet for full control over consistency.
In other words, matters that are simple in traditional filmmaking, like choosing the color of a characterâs clothing, take elaborate workarounds and checks in a generative system, because each shot is created independent of the others. That could obviously change, but it is certainly much more laborious at the moment.
Sora outputs had to be watched for unwanted elements as well: Cederberg described how the model would routinely generate a face on the balloon that the main character has for a head, or a string hanging down the front. These had to be removed in post, another time-consuming process, if they couldnât get the prompt to exclude them.
Precise timing and movements of characters or the camera arenât really possible: âThereâs a little bit of temporal control about where these different actions happen in the actual generation, but itâs not precise ⊠itâs kind of a shot in the dark,â said Cederberg.
For example, timing a gesture like a wave is a very approximate, suggestion-driven process, unlike manual animations. And a shot like a pan upward on the characterâs body may or may not reflect what the filmmaker wants â so the team in this case rendered a shot composed in portrait orientation and did a crop pan in post. The generated clips were also often in slow motion for no particular reason.
In fact, using the everyday language of filmmaking, like âpanning rightâ or âtracking shotâ were inconsistent in general, Cederberg said, which the team found pretty surprising.
âThe researchers, before they approached artists to play with the tool, hadnât really been thinking like filmmakers,â he said.
As a result, the team did hundreds of generations, each 10 to 20 seconds, and ended up using only a handful. Cederberg estimated the ratio at 300:1 â but of course we would probably all be surprised at the ratio on an ordinary shoot.
The team actually did a little behind-the-scenes video explaining some of the issues they ran into, if youâre curious. Like a lot of AI-adjacent content, the comments are pretty critical of the whole endeavor â though not quite as vituperative as the AI-assisted ad we saw pilloried recently.
The last interesting wrinkle pertains to copyright: If you ask Sora to give you a âStar Warsâ clip, it will refuse. And if you try to get around it with ârobed man with a laser sword on a retro-futuristic spaceship,â it will also refuse, as by some mechanism it recognizes what youâre trying to do. It also refused to do an âAronofsky type shotâ or a âHitchcock zoom.â
On one hand, it makes perfect sense. But it does prompt the question: If Sora knows what these are, does that mean the model was trained on that content, the better to recognize that it is infringing? OpenAI, which keeps its training data cards close to the vest â to the point of absurdity, as with CTO Mira Muratiâs interview with Joanna Stern â will almost certainly never tell us.
As for Sora and its use in filmmaking, itâs clearly a powerful and useful tool in its place, but its place is not âcreating films out of whole cloth.â Yet. As another villain once famously said, âthat comes later.â