fbpx

Our guest today is John Gargiulo, founder of Airpost.ai. In this episode, he shares insights on how AI is transforming creative advertising, from analyzing raw footage to crafting high-quality, scalable ads. John dives into the balance between human oversight and automation, and how to ensure ad quality that rivals traditional production methods. I’m excited to share a deep dive into the future of AI-driven creative solutions and its potential to reshape performance marketing.





About John: LinkedIn | Airpost.ai |

ABOUT ROCKETSHIP HQ: Website | LinkedIn  | Twitter | YouTube


FULL TRANSCRIPT BELOW

Shamanth Rao 

John. I’m excited to have you. I know we connected many, many years ago and stayed connected with you, certainly followed a lot of your work and some of the more interesting stuff that you’ve been doing with all things AI.

John Gargiulo 

You didn’t ask me to say that, but I’ve known you.

I can say that Shamanth was a ninja UA guy back in the days of the 2010s when I was at BlueStacks and I respected you a lot from afar before we met.

Shamanth Rao

Thank you. I know it’s from the very early days of mobile. It’s crazy.  You and your team have built an AI that helps create a production.

Tell me about the key technical decisions or components behind making an AI-driven creator.

John Gargiulo 

That’s exactly right. So, airpost.ai is my company and what we do is, and I think we’re the only one doing this as of right now, end-to-end creative, the ideation phase, you mentioned the production phase, the post-production phase.

What goes into it is a lot of things. We’ve spent three years building the foundation of this. We think of every video ad, and UGC as a dish, right? And it’s got a set of ingredients. You’ve got often it’s driven by a voiceover, You see product shots over that voiceover. Maybe you see the talent saying a few things.

Maybe you don’t, maybe you just see the talent using the product. Maybe you see them going on a run, doing pushups on their laptop, ordering something, right? You hear music. So these are all ingredients. We have built this huge pantry of over 300, 000 vertical video clips. We’re pretty sure it’s the biggest UGC vertical video library in the world.

And then what the technology does is. Our engine takes the right ingredients, typically driven by the script, by a voiceover, by a performance ad framework, three reasons why, et cetera. And then it builds in the footage. It builds in the music, it builds in the supers, et cetera.

Shamanth Rao

For somebody familiar with how language models work, to the extent you’re comfortable sharing, are you able to share how these elements interact with each other to come up with the final creative?

John Gargiulo 

Yes, so we used several different models and then we built a lot of tech ourselves. So it starts with an input, right? So think of it like a brief from the customer. What is your PDP, right? We’re not the first to do this where it gives us the URL. From Amazon, from your product page that unfurls all kinds of information that we auto-populate for folks.

What are the top 10 value propositions in order? Right. In special, right, all of that stuff. What are some do’s and don’ts? And then, we go to the models. And we say, look, we want to build a script. We have a particular prompt for this script. We have hundreds of these that we have built. It’s extremely painful to get them to be good.

I think prompting is still one of the more underrated parts of all this. You can build your tech and all this cool stuff, but shaping a prompt to divine what you need from which model at which moment is important. And then we’ll say, all right, to fill in the scripts, you know, the model we’ll come back with for this product.

With this information and everything I know from the Internet, here’s the voiceover, let’s say, or here’s a set of supers, right? And then our engine goes, great, I’m going to build off of that. We have tech that understands what’s in all of the clips that we have from the customer. They can dump their whole Google Drive.

No one has to tag anything by hand. It’s all just in the system. And then we line up those video clips to align with the script.

Shamanth Rao

Understood. So what the model is doing is. Reviewing the data from, say, Amazon reviews for the website to understand the product and certainly it’s possible for the script for the model to understand.

Okay, we still value propositions. This is what could resonate. You have prompts based on proven scripts that say that translate those value propositions into what could be good scripts right? And you also have models that could be vision models. I imagine that are looking at a lot of the footage that you already have received from your customer.

So the vision model is analyzed. Frame by frame, if you will, and saying, look, this is what’s present in the footage. This is what’s in the script. Let’s match the footage. Let’s into the script. And let’s You know, and again, I’m sure it’s not as easy as I make it sound in 10 seconds.

John Gargiulo: 

You’re repeating back to me exactly, that’s exactly what we do. And there are some moments in internal models, that are particular to us. So, again, unlike a wrapper that’s sort of using, you know, models only, we have a patent that’s dated pre-WinChat GPT came out that is exactly The sort of high-level building an engine a robot chef if you will that takes all these pantry ingredients Makes dishes to certain recipes throws them, you know, put them out on the floor to get results That was a really exciting moment for us.

That’s now set in stone and granted as well Just a few months ago from 2021. We’re putting a new patent into your last bit on analyzing video So yes, we have in the past used amazon and other third parties to say what is in this video dog table Field ball, know Gemini and other models have gotten a lot better at identifying actions and things like that, where there’s still a gap.

And we have built something that we were patenting is when that big or small company dumps that huge Google drive on us of 712 gigabytes of footage. Some of it is raw from influencers from three years ago, all this stuff to understand what’s in it. And cut and take a select of a three-and-a-half second moment that doesn’t have a copy over it that, you know, and where is that moment right in that minute and 38-second clip from three years ago that that guy did that no one’s barely looked at some piece of raw footage.

And identifying the key moments in that video is hard. And we’ve invested in that as well because I think friction is a big reason. I’m sure we’ll get to this is one of the big blockers of adoption and kind of like you may have had thousands of family photos in your hard drive in folders 10 years ago that you never looked at.

Now you have Google Photos that automatically understand it, and you can look up photos of your friend and You know, on this trip in this year, and it works by that friction going away. It’s made you probably use Google Photos more and see these things. That’s a lot of what we’re having to build to make this thing not, not just demo wear, but easy to use every day.

Shamanth Rao

I know you briefly talked about how prompting is one of the underappreciated parts of Building something in the eye, and I know you earlier said that there are like 100, 000 project product shots, and I figured out the exact numbers that are in that library. So all of that is. Heavy lifting. So there’s certainly quite a bit of human-powered, heavy lifting involved.

Can you talk to me about what that human-powered heavy lifting has been like?

John Gargiulo

Humans are in the loop. And this is another underrated thing. I was just talking to another founder who’s building something really exciting and then, you know, founder to founder, when you press like, okay, so is this happening by itself completely, or do you have people driving it?

I mean, even big companies, eight times out of 10. There’s a lot of humans in the loop. I was, I felt a bit naive. I was surprised to learn that, you know, and watching this company scale AI grow to billions and billions of dollars. And I thought, wow, they must, I heard they help like AI model companies and big companies do stuff.

I bet they have some crazy tech. It turns out it’s all service. It’s all people just answering questions about models to help train them. So there’s often more humans in the loop and people assume the way we approached AirPost was. Let’s have humans in the loop to get the quality high. If the quality of these ads is not 90 percent as good as if you spend weeks and money going back and forth with an influencer, did they get the products?

Well, you know, worry about agencies. If it’s not 90 percent as good, it’s worth nothing. So our thing was, let’s go in and get the quality, right? So it’s not just a cool demo that anybody who scratches the surface throws in the trash. To do that, you need people. Okay. To fill the gaps. And so Q4 2024 was about using people to help along our tech to get it good.

And we have been taking on customers. We have doubled our amount of customers just in the last few weeks. We have a waitlist, well over a thousand people long. And we’ve gotten the quality nearly there. Of course, I’m on the inside, so I see all the flaws. I was just telling another founder, that I have a list of over a hundred things that we need to get better about the ads, but it’s in good shape.

2025 for us is about going, we call. Shoot less, you know, going where you can just sign up, no new shoot, no new footage needed. Here’s my stuff. And the quality is the ultimate goal is to with no humans in the loop, get to a place where you can get, I can show you 10 ads, five from air posts, five from an agency or creative platform or whatever that took weeks and a lot of work and people, and you can’t tell which ones were which.

Shamanth Rao

Makes so much sense. And, you know, and just continue to talk about the humans thing. I know I was mentioning something much simpler by myself as a non-programmer. And I think I assumed also that it would be just very plug-and-play. I think a big part of it was just training the model and that was a lot of manual curation.

As I was mentioning, my co-pilot is basically to train our internal teams Much like you mentioned with your customers What my internal team gets needs to be 80, 90 percent of what they could get from chat. GPT, if not 120% right. Otherwise, there’s just no point. So that human element is so critical.

John Gargiulo 

And then designing another place, the human element comes in is designing, the ads themselves. So my somewhat cynical. The outlook on competitors in this space is they’re all techies. Like you live and breathe ads for a decade. Like myself most people that are diving into, Oh, we make ads with AI.

They’d never actually had to crush a black Friday. They don’t know. What the new performance frameworks are that are working that, you know, man on the street stuff’s not working anymore. This does work, you know, every millisecond counts. We say a lot of, you know, I spent five years building ready set digital agency that works with Robin Hood, DoorDash, and all others doing better than ever.

So we know ads. And then, the prompts,, the templates, the building, the various things are so important. Whereas I feel like what’s happening now is a lot of, what I call talking heads, staring at the camera, right? Like, Oh, we help you make a creepy-looking guy with the bad lip sync talking at you for 48 seconds.

And maybe you can do a bunch of work and what do you want them to say and put stuff in the background, which is cool. And we’ll incorporate that. thing too eventually, but we think the magic is really for actual advertisers and get performance. Like what are the different ads? Why will this one work?

Is that, how is the pacing? Most people don’t care a lot about that if they’re coming from pure tech.

Shamanth Rao

For sure. Yeah. And, you know, I know you talked about how. You know, there’s a Google drive at hundreds of GBs, and your systems can review and identify those. Now again, I’ve test-driven and seen other AI platforms that can understand.

What different kinds of footage are like, you know, let’s just say there’s something in the kitchen versus a bedroom Dog versus a cat. So what has to happen at a technical level to make this happen?

John Gargiulo 

I think a key is and I’ll I won’t be able to go all the way to the bottom here But like what we call internally auto describe, you know, I won’t describe how we do it, but essentially you’re right If it’s dog cat table bedroom, the models not going to know, Hey, look, I’m building this script.

This woman’s telling this semi-emotional story about how she’s had to deal with rosacea. It’s an ad for urology. It’s like, I, what do I do with 70 gigabytes of stuff that says. Bedroom, you know, cover bathroom. Maybe I put that in and it’s someone doing a funny dance in the bathroom. And I’m like, Oh, shoot, that’s not what I wanted for this part.

And it looks ridiculous. So a key component to make this work is being able to understand not only objects but actions happening in a video. So that’s where a lot of our effort has been.

Shamanth Rao 

So if there’s an action that is something that can be detected by AI.

John Gargiulo 

Yes, computer vision, as you said.

There was a quiet huge leap forward a month in computer vision in the summer of 24 that I was like, I recognized right away that this is going to change everything because of that, you know, if you don’t have that piece, it’s hard for AI to build an ad. So we’ve taken full advantage of that.

You know, when you go from dog, cat, bed, field, field. I’m talking, we can, and we can craft these things like a woman, you know, sits down in her bed. She looks despondent. She looks over at her dog. You know, the clip is 3. 8 seconds. And at the last second, it moves towards it. Like, That was not possible in January and December.

It’s there for us to build. 

Shamanth Rao 

Yes And if I recollect you referenced a YouTube video that Described this the last time we spoke in our prep call for those people that may be curious about this Is that something you recollect off of mind?

John Gargiulo

Sure. If you go to so a lot of companies doing it again, again people doing things internally like ourselves the one you’re referring to to get a feel for this is, and it’s actually kind of weirdly hard to find.

It was from a presentation by Google of Gemini using computer vision to describe 22 minutes of a Buster Keaton film. And you know, I like that ’cause it’s, it’s hard to cherry-pick, right? It’s a whole film. And instead of saying, you know, man, black and white woman note. You can watch it in real-time, describe a man walking in, the woman looks surprised, he slips a note into the other man’s pocket, a close-up reveals that it says this doctor’s office with this address the man runs out of the room, and the woman, you know, picks up the thing, like, it’s very advanced.

Shamanth Rao: for folks that may not have caught the name of the video, we’ll link to that in the show notes. And yeah, for those who are curious. Right. And do you feel that Yeah, is that a place where it can learn? It can form a link between performance and assets at this point

John Gargiulo 

A billion-dollar question.

I think it’s not today. I think people are trying to do it. People are saying, you know, Oh, Go tell us what the words are being said in this ad. It performed well. Make words for our ad and it will be great. I think it’s too early to say that that works today. There are just too many different variables.

You know, the ads we’re doing, it’s like, yeah, great. We have a voiceover. We have shots of the product. We have shots of the thing in action. That’s still only like 22 percent of ads. Ads can have kinds of things going on. I mentioned the man on the street interviews, right? That’s harder to do. Now you see people talking about a specific product.

Their lips have to say a certain thing. It has to look organic. Someone running up to someone on the street. Like, We’re not doing those yet. Maybe next year. So I think we’ve got a ways to go for this AI tailwinds, to help us along. We will get there, but my dream Shamanth is you drag and drop in an ad from a competitor.

That’s again, multifaceted, If anyone doubts what I’m saying, go watch just Play or Facebook ad library without discrimination, just the first hand, don’t cherry pick. And you’ll be like, Oh God, yeah, I could never do that. Or I guess we wouldn’t do that. Or you need that shot of this shot. But eventually, when you can create those things, by the way, we haven’t talked yet about Runway and AI video creation, which we are heavily incorporating.

Then you can drag in a competitor’s ad and it understands what is not only being said, but Not only the supers on the screen, but the juju of it, like, you know, the pacing of it, the all, you know, the way that it shows the product from a certain angle, doing a certain thing in fast motion, right? It understands all that.

And you say, great, please make that ad for me, you know, these 300, 000 shots that air post owns or with my shots or go. And it does a serviceable job. And I think that’ll be years before it’s like, wow, that’s the same ad. It’s just my shots, my voice, et cetera. But that’s where we’re headed.

Shamanth Rao: Yeah, and speaking of where we’re headed, what do you see coming down the line, you know, could be innovations and creative automation, AI, what, what do you see coming down the line that you

John Gargiulo

I think AI will eventually push all of these pieces of performance marketing. So, you know, media for example, right? I mean talking to friends at Facebook and other companies, they’re already Hard at work.

AI is already incorporated in computer vision is already incorporated llama, et cetera, in leveraging AI to decide what to spend on how the algorithm works. Right. So one thing I always tell people, because it doesn’t come up enough, you know, if you throw a hundred ads into ASC or like you, you run a bunch of stuff at once with cost caps, like let it, let the algorithm do the work.

I think we’re going to, I think what we’re talking about on this podcast is a shift from creative. Video creative scarcity to abundance. So what happens in five years whether it’s Airpost or someone else When you could just get all the video ads that True Classic has run in the last two years And it’s like that for the new t-shirt brand you’re doing by yourself What happens and my what I’m positing is with all the AI that the platforms are building in You can throw hundreds of ads at that thing and it’s literally quote unquote watching the ads A priori of running them, like it’s watching them and it’s understanding, okay, this has a guy in it.

A girlfriend comes up and does this. It describes it to itself, what product is in it versus the website, and what conversions look like. And if it spends 17 cents, don’t, you know, don’t fret. I think people fret today because they’re like, well, I spent weeks on that ad and a thousand dollars and my agency and I had three meetings about it.

I have to force spending into it. I think those days will go away. So number one, I think AI, Is and will continue to be built into the platforms, seeing these video ads and making those decisions, media buying in general, already, it’s gotten more algorithmic, it’s gotten more commoditized. I don’t think it’s commoditized, but especially for SMBs that you mentioned earlier, don’t have the level of sophistication and their Facebook rep and know what they’re doing.

They’re busy with their UPS person in the background, delivering a package and, you know, deciding what their next product’s going to look like. I think having AI run media. Build a media strategy, and get creative built for you. That’s, that’s, I think the future. And by the way, we’re trying to do the whole thing.

Like, I think that AI can and will replace the human ad agency model that I come from. I don’t know if it’ll take three years five years or 15. What do they say? The long-term people underestimate, and the short-term they overestimate. I think that’s what’s going on right now. We are overestimating what AI does right now.

No, I don’t think many people listening to this because I’m not and I’m on the bleeding edge of this stuff are using an I to add value incrementally to what they did two years ago right now, but I think in 10 years we’ll look back at all the manual work that we did in performance, marketing, media, creative and on and on and laugh because I think it agents and hopefully Airpost will be that agent will do all of it

Shamanth Rao

For sure, I have similar conversations with people that yeah. Were in pre-internet eras like people in my parents’ generation, right? To look at that era of disbelief after having used spreadsheets, like, I can’t believe I was using pen and paper to make balance sheets. It just feels like a miracle.

John Gargiulo

I do think we’re like, it’s, there’s no perfect analogy. I do believe there are some analogies though, too. It was the internet in 1995. No one’s making money from it. It’s got a lot of cool demos that don’t scale. You don’t use it every day. Your friend shows you how he paid someone with a credit card in Germany and they sent them something in the mail.

I’m like, awesome. But like, if you do it a hundred times, you’re going to get scammed 17 times. This thing doesn’t work, but short-term bearish. Right. And, it went through the whole Gartner hype cycle, right? Of like, Wow. It’s 1999. This is everything down to almost zero. I remember 2001, 2002. Oh yeah.

The internet was like a hype y thing. You know, we don’t normally use it too much anymore. Well, that’s when all those, you know, Facebook, Google all started. And now we look back and we can’t imagine life without the internet. The wifi goes down for four minutes. The family panics, you know.

Shamanth Rao 

Yeah, we certainly live in exciting times. John, this has been incredible and this is perhaps a good place for us to start to wrap up because I know we’re coming up on time. But this has been great and before we go, we would love for you to tell folks how they can find out more about you and everything you do.

John Gargiulo: Yeah, it’s so I’m just on LinkedIn. John Gargiulo you know, find me there. Love to talk to people in the space, you know, like the internet and the nineties, like it’s a really exciting time to build and learn from each other. I’m on Discord, all these things. I just always looking to meet new people.

And I love talking with you about this, cause I know you’re building as well. So that’s where I am. Airpost. ai is our company. It’s a really good time I can say to get involved because we are subsidizing a lot of. Things there and giving a ton of value. I’ve interviewed over 40 customers and one thing semi-annoyingly that keeps coming up is they’re like, I feel like I’m getting away with something like this is super cheap.

So yeah, I would love people to check it out and get feedback while we’re still in beta.

Shamanth Rao 

Certainly. We’ll link to all of that in the show notes and also the YouTube video Buster Keaton, Google Gemini, we’ll look that up and we’ll put that in the show notes for folks that are interested. But for now, thank you for being on the show.

Thank you.John Gargiulo

Yeah. Thanks, Shamanth. That was super fun.

WANT TO SCALE PROFITABLY IN A GENERATIVE AI WORLD ?

Get our free newsletter. The Mobile User Acquisition Show is a show by practitioners, for practitioners, featuring insights from the bleeding-edge of growth. Our guests are some of the smartest folks we know that are on the hardest problems in growth.