In my ongoing effort at MOCEAN to work with the latest creative AI tools, I hatched a new experiment: Could AI convert storyboards into a photo-real spot?
So I went back to a campaign for a long-time client that never got made. It featured a series of vignettes depicting this brand’s audience as unique and misunderstood, marching to the beat of their own drum, which made the brand the perfect home for them (even if it didn’t work out so well in the real world.)
Check out one of the animatics from this campaign above, then read on for the step-by-step process I used to bring this to life as a cinematic, photo-real spot…
Part I: Storyboards To Photos
In this exercise, I wanted to test how various AI tools, all of which we’ve been using at MOCEAN, would fare when trying to interpret the intent of storyboards into plausibly real images. Almost all AI image tools offer the ability to include a visual reference as a structure for the generated output. So I put 3 tools to the test below.
Two failed terribly, but one came out victorious…
Attempt 01: Firefly (fail)
Adobe’s Firefly is the AI engine at the heart of their Creative Suite of apps. Here, I used the web interface to upload the first storyboard as a composition reference, then prompted it with the image details:

While it does well mapping onto the structure of the image (and with non-human objects like trees), it’s not up to the task of making things look photographic at all. In fact, part of the problem is it matching the image so exactly, which only replicates the approximate nature of the sketch, without an understanding of what a scene like this should look like in real life.
Attempt 02: Magnific (fail)
We’ve had good luck using Magnific’s Style Transfer to apply color and texture to sketches before, so I was ready to be impressed. But again, while it understood the shapes and objects in the sketch, it really had no understanding of the image itself (especially the people), creating some surreal and weird results:

With more complex frames that include people, combined with my cinematic photorealism target, it’s clear AI needs a greater understanding of what it’s looking at to convert the overall intention of the storyboard.
Turns out, a new image generator just dropped last week that does exactly that…
Attempt 03: ChatGPT Image Generator (success!)
OpenAI just launched a new text-to-image generator as part of their ChatGPT 4o model. It’s a big deal for two reasons: 1. You can apply the conversational, natural-language and context understanding it has with text to creating images, which means… 2. It’s a much smarter image creator, resulting in a big leap in quality, accuracy, and complexity.
Below you’ll see the results inspired by the storyboards, but instead of the color-by-numbers approach of Firefly and Magnific, it captures the overall intention of the frame while creating something that makes sense as a real photograph. In essence, it looks past the approximate nature of the sketch to understand the whole.
What’s even more impressive? These outputs came after just one try…

Once this first shot was established, I included the same intro sentence before each prompt you see below, to ensure the same setting and aesthetic across shots:



The context of an ongoing chat meant that it understood what the moody, overcast day and cooler color palette should be from shot to shot. I also call out when it’s the same characters, which it did okay-not-great on (though with repeated generations I likely could have dialed that even further).
So, with quality photos in hand, it was time to make them move…
Part II: From Stills To Moving Shots
Just as I did with the image generations, I limited the image-to-video generations to one try, which for anyone who’s stayed up late hitting the re-generate button to get what you want knows is a tall order.
Here are the prompts I used across each of the 3 tools (paired with the images):
Shot 01: “Camera slowly pushes forward toward a crowd of people standing in the rain, looking sadly at the coffin as light rain comes down on their umbrellas. Cinematic, realistic.”
Shot 02: “We very slow push in toward the man wearing the orange floaties as he glances with a half-smile at everyone else. They stare at him, no one talking, all with frowns of anger. The man becomes a little nervous.”
Shot 03: “Close up shot that very slowly pushes in on an old woman slowly turning her head with a frown of anger as she looks left at someone off-screen.”
Shot 04: “The man wearing the orange floaties glances around as everyone else slowly walks away to the left, staring him down with frowns of hatred. He holds a nervous, subtle half-smile of embarrassment. His smile slowly changes to a frown.”
Kling AI (the clear winner)
In one pass for each shot, I got what I was looking for with performance, camera moves and visual coherence. Below are the raw outputs from Kling:
For shot two, the slow camera push-in happens after his head turn, so I trimmed it out in the final edit, and the grandmother turned to her left instead of camera left, so I reversed the shot in the edit so she was turning toward our floaties guy. And for the final shot, I cropped in to hide the different shape of the floaties on his arms.
Runway’s Gen-4 (runner-up)
Runway just dropped their next model called Gen-4, which boasts better resolution and shot coherence. But with my strict rule of one generation, it was hard to get the camera movement, performance, and simpler things like not having anyone talk. The opening shot is pretty good (though a little too fast) but note how all the people and umbrellas are like still frames, where Kling gave them some life while standing. And the last shot? Not sure what was going on there:
Sora (um… not great)
I expected Sora to do better than it did at image-to-video, but it seemed to all but ignore the prompt, resulting in what look like outtakes from the shoot (including multiple people with orange floaties for arms):
The Finished Spot
I completed the spot in Adobe Premiere, over-cutting the storyboards with the clips, retaining the original music, voiceover and rain sound effects. I also slightly tweaked brightness, contrast, and color balance from shot to shot to unify things a bit more. Lastly, I used Topaz’s Video AI to uprez the final cut to 4k.
Given that the whole process took me less than 3 hours, including the false starts with Firefly and Magnific, I was surprised how well it turned out:
Final Thoughts
Is there still an uncanny valley to some of the shots? Yes. But I’d argue the shot of the grandmother is virtually indistinguishable from a real shot. And remember, less than a year ago, none of this was even close to being doable, so we can’t underestimate the speed and changes afoot. I could have also dialed in likeness and photorealism further if I ran more generations and prompt tweaks beyond my one-try rule. There’s no doubt ChatGPT’s image generator represents next-level image creation.
But we can’t overlook the human contributions here to the concept, script, shot selection, character design, and performance direction. The tone of the humor feels uniquely human, balancing the slightly surreal with relatability, all communicated wordlessly in the nervousness of the guy and the angry looks of the other people (not to mention human-made music and the unmistakable human performance of Tom Kane in the voiceover, especially the way he says, “But you meant well.”)
Will all these elements eventually be something AI can also do? Maybe, but then the question will be, why would we want it to? Completely automating human expression in the name of efficiency feels insane when you say it out loud. Yet my biggest concern is how much of the craft can already be shortcut by these AI systems, compressing centuries of cinematic artistry from hundreds of thousands of people into a digital form that can be brought forth by merely typing a sentence.
This is the AI elephant in the room (that I wrote about in another post) that needs to be addressed before these tools become widely adopted for commercial purposes. My hope is this will be rectified legally in the courts, but we should also acknowledge collectively as creative humans that it’s simply the right thing to do.
Credits
I worked with some of my favorite and most talented collaborators on this (and many other campaigns over the years), who are key to making anything worth watching, regardless of the tools we use to make it. Love you all!
Matt Hubball | CD/Writer
Troy Hutchinson | CD/Writer
Frank Dellafemina | Storyboard artist
Tom Kane | Voiceover artist
Lauren Counter | Producer
Alex Wong | Editor
Sanaz Lavaedian | Music supervisor
…and all the humans at MOCEAN
I really love this and also all the details of the journey and human contribution.
I've been wanting to see this kind of simple proof-of-concept. A good example of where things are -and are going. Yikes. Thanks!