For years, I had the same problem. I couldn’t scale enough because my editors couldn’t edit enough videos, and the models and creators I used couldn’t get the videos just the right way I wanted. It bottlenecked me like crazy, and I tried everything to make it better.
At one time, I had over 20 editors working for me just pumping out videos, and I paid for hundreds of UGC style videos from models in the US and European countries. For the time, that was great, there was no better solution, but it’s always something that bothered me. And also, what happens when the niche dies down for a bit. You have a bunch of people on payroll who have nothing to do. They are just a waste of resources, but if you fire them, its likely they won’t want to work with you in the future, and they are all valuable editors.
Luckily, when Sora was released 2 years ago, I saw that things were about to change. At first, it was just an image model, then they released the video model, and I was blown away. While the AI avatars back then didn’t feel real, they were close enough that I could see the potential.
Now in 2026, the story is completely different. With the right prompts and models, you can have a realistic 1-1 human saying anything you want in the way you want it, with the accent you want, and all of that for just a few cents per generation. Incredible!
Before AI
Before this, most of my campaigns relied heavily on UGC-style ads. That meant hiring creators to film content like:
- Product demonstrations
- Testimonial-style videos
- “Talking to camera” ads
- Lifestyle clips
Sometimes this worked great.But it also came with problems. Not all creators make good content, and even those who do, sometimes just dont hit the vibe and tone you expect. And lets not even talk about how the formats vary and how editors need time to go though each video and make it usable.
You might spend $500+ on creatives that never convert. That’s just the nature of paid traffic. There are some companies like adsbabe.com that can help you get the results you want. Unlike doing it on your own, AdsBabe can to everything for you from A to Z. You are just left with the finished product, optimized and improved so that it fits your goals.
When Did AI Video Get Good?
Early AI videos were awful. Remember Will Smith eating spaghetti? But even at that point you could sense that it would change fast. The real usable models came with Gemini Veo 3, and now 3.1 and Sora 2. There is also a lot of good things from Higgsfield, Kling and Seedance coming (especially Seedance, that model looks crazy).
But for the last year, AI models have been very usable for video ads. For images, its been even longer. But the real question is how many creatives can you make and in what time.
If you do the classic thing of going though the web tools themselves, then you are limiting yourself and your output. These web based editors are mostly built for users that make a few videos a day and not for the real affiliates that pump out content.
If you want to make hundreds and thousands of creatives in hours, then you need to switch to API. There you have more control, better outputs, but it comes at a higher price. Instead of just paying for a subscription, you pay for each generation by second. This is where it can get expensive if you have bad prompts. You ended to dial down your prompts so you dowt waste money on bas thrashy versions.
The Spreadsheet That I use
Dialing in your prompts is not easy. It will take you a couple of generations to get what you want, but once you have it, you can use it to your advantage.
Instead of thinking: “I need a video for this campaign.”I started thinking in systems.I built a simple spreadsheet where I store creative ideas.
Each row contains things like:
- hook ideas
- different marketing angles
- scenarios
- character types
- environments
- visual styles
For example, one row might look something like:
Hook:
“I wish someone told me this before I bought this product…”
Scenario:
Person sitting in their car talking to camera
Angle:
Hidden feature / insider tip
Then I feed variations of those rows into AI video models. And suddenly I can generate dozens of creatives in minutes. Change the character, environment, or tone with just a few words.
With such a style you can generate incredible output that can match entire video production companies. And it all looks incredible. Sometimes even I cant tell if a video is AI generated or not, and the tech is only getting better and more affordable as time goes by.
Do AI Creatives Perform?
If you ask me, hell yeah.
I am now running almost exclusively AI videos for all of my campaigns, and ROI has been great. The thing with AI ads is that you can tune them to your audience exactly the way you want. You can make stronger hooks, change the visuals, change the angle, replace parts of the video like the background or person talking. Everything and anything you can think of.
One of the tests I love to run is I make all races and gender combinations for an ad, and see what my audience prefers. I can say that people of color and females perform way better. Especially if they have crazy hair combos or tattoos. Something about those people just clicks with users and they convert.
Of course for every niche its a bit different, but in general, it works, and that is something I would have never tried without AI.
This is especially important on TikTok, and Instagram/Facebook reels, where people love to swipe away, and if you dont get them right away, you never even get a second chance.
The future?
Nanobanana 2 just released and its incredible for images. By far the best models, but we are not done yet. Sora 3 is rumored to be released soon, and so is VEO 4 and Seedance.
All of these models promise a massive leap forward. The quality jump compared to older models is huge and the prompts you need to use to get realistic results is getting simpler every day. As an example, here is what I used for nanobanana 1 as a prompt:
{
“subject”: {
“description”: “A young woman taking a mirror selfie with very long voluminous dark waves and soft wispy bangs”,
“age”: “young adult”,
“expression”: “confident and slightly playful”,
“hair”: {
“color”: “dark”,
“style”: “very long, voluminous waves with soft wispy bangs”
},
“clothing”: {
“top”: {
“type”: “fitted cropped t-shirt”,
“color”: “cream white”,
“details”: “features a large cute anime-style cat face graphic with big blue eyes, whiskers, and a small pink mouth”
}
},
“face”: {
“preserve_original”: true,
“makeup”: “natural glam makeup with soft pink dewy blush and glossy red pouty lips”
}
},
“accessories”: {
“earrings”: {
“type”: “gold geometric hoop earrings”
},
“jewelry”: {
“waistchain”: “silver waistchain”
},
“device”: {
“type”: “smartphone”,
“details”: “patterned case”
}
},
“photography”: {
“camera_style”: “early-2000s digital camera aesthetic”,
“lighting”: “harsh super-flash with bright blown-out highlights but subject still visible”,
“angle”: “mirror selfie”,
“shot_type”: “tight selfie composition”,
“texture”: “subtle grain, retro highlights, V6 realism, crisp details, soft shadows”
},
“background”: {
“setting”: “nostalgic early-2000s bedroom”,
“wall_color”: “pastel tones”,
“elements”: [
“chunky wooden dresser”,
“CD player”,
“posters of 2000s pop icons”,
“hanging beaded door curtain”,
“cluttered vanity with lip glosses”
],
“atmosphere”: “authentic 2000s nostalgic vibe”,
“lighting”: “retro”
}
}
And for videos its even more complicated. But now most of this can be replaced with just a few “human” words and the models understand it.
Its incredible!
