📖Learnings from 200+ app store conversion tests – with Sharath Kowligi (Head of Ad Monetization at GameHouse)🕹

Our guest today is Sharath Kowligi, the director of Ad Monetization at GameHouse and advisor to Rocketship HQ. Sharath works on improving monetization and yield for a portfolio of over 35 apps, and no, that is not a typo. Sharath has an incredible wealth of experience both on monetization as well as user acquisition and growth. In the past, he managed user acquisition and growth for Bash Gaming, where he worked alongside me. Bash Gaming was acquired for 170 million.

Sharath also worked for Pretty Simple and has advised many companies on different aspects of growth. Something that Sharath has done an incredible amount of work on, has been App Store testing. He has run over 200 app store conversion experiments and that’s what I’m excited to dive into with him today.

KEY HIGHLIGHTS

🔎 Why most apps don’t run app store tests.

🤔 How Google Play store has become so much more important than about 5 years ago.

🚀 For an app that’s never tested anything before, where Sharath recommends beginning.

🆕 How Sharath and his team uncovered a ‘new and interesting’ idea for an icon that was a big win.

💡 How Sharath recommends generating ideas for experiments.

🎨 Why dramatically different designs sometimes have no impact on metrics.

📁 How a cadence of testing helps teams gain respect for each other – and how it helps avoid a culture of red button-green button testing.

🤝 How to run effective team meetings

🤷‍♂️ How Sharath recommends testing if Android results are portable to iTunes

🔥 What metrics Sharath recommends using to evaluate results of AB tests.

KEY QUOTES

What really moves the needle in the app stores

There’s a lot of evidence that shows on the App Store, especially on the Google Store, that the thing to look for and the thing to change is really the big giant thumbnail for the video and the icon.

Testing reveals the unexpected

He had this classic portrait that he had drawn almost as an in-joke, it was like an 18th century painting of a girl: a nurse in a battlefield helping a soldier. And when we looked at that icon, we were like, this is terrible, no one’s going to like this, this is not the game at all, this is horrible; you’re turning it into some random romantic novel when this is a very serious game, and all those things. And then when we tested it, it knocked it out of the park, and it took, I think, three or four days for the results to come back from Google, and it was easily, at least, 40% better than this consensus candidate where everybody said it was great.

Not testing is not an option

We discovered it because one person said something, the rest of us said oh this is horrible; and then it was, wait a minute, it costs nothing to test. If this is, and even if we’re completely wrong, the cost of failure is super low. It’s 400 people, 500 people, maybe a 1000 people who will see it and make a different choice. The cost of it is super low, even if we are wrong. And if we’re right, we improve everything and of course then we go apologize to the guy who got it right, by buying a coffee.

Epiphanies take consistency

But I think what ends up happening is if you’re sticking with a product and you’re waiting for that one giant amazing idea there, you know, okay, this is going to make your career as creative director or this is going to make your career as a performance marketer, then you’re going to be waiting a long time, that’s my general feeling on that. I think experimentation needs cadence, like we did a couple of hundred tests on this but, of course all the great companies in the world run numbers, dozens of tests practically every day, and the way to really approach it is systematic, it’s to see what is the cadence rather than what is the big idea, because once you get in the rhythm of it, once any team gets in the rhythm of it, you get better at it.

You may be overthinking things

Whether it’s in this kind of testing or another kind of testing, two out of three experiments end up with a result of no difference, even if you think that we’re making two things that are completely different, quite often the audience just doesn’t care what we care about.

Find your own answers

Typically, in my experience, at least in the US, I don’t see a huge difference between something that is a killer creative on Google being a total dud on iOS.

Having said that, everyone’s got to test this thing out for themselves, because maybe a proxy landing page is better for somebody on iTunes and maybe just importing Google results to iTunes is better. But what I can say is once you run this experiment once or twice or even thrice, you’ll have your answer. If you have a great killer result on Google and you deploy it on Apple and you don’t see a bounce, you know that your Apple is different from your Google, if you see the same kind of killer bounce on iOS, you know, that you can disregard a couple of blog posts out there that say it’s completely different, and that for you it’s actually pretty similar.

FULL TRANSCRIPT BELOW

Shamanth: I’m very excited to welcome Sharath Kowligi, the Mobile User Acquisition Show. Sharath, welcome to the show.

Sharath: Hi Shamanth, good to be here.

Shamanth: Yeah, I am excited to have you because I’ve seen so much of your magic from close up, , and I’ve always admired both the mix of creativity and analysis that you always bring to the table, and so I’m excited to dig into a lot of your magic in a lot more detail. Also you’ve run like 200 plus App Store tests and this is certainly something that I want to dig into more about. So to take a step back, why is App Store conversion testing important, and why do you think most apps don’t do this?

Sharath: you and I, of course, worked together for years way back in what seems like a past life, and when you look at why we didn’t do it back then, it was mainly because we couldn’t.

Shamanth: Yeah.

Sharath: And when you look at a lot of folks who are currently running UA or growth, a lot of them have memories of that where tracking this stuff is just not possible in some cases and not straightforward in other cases. And frankly on iOS it’s still quite tricky to really test properly; whereas on Google now, it is possible, and I think the reason that adoption has been so difficult, especially in, let’s say, the gaming space where I primarily operate in, it’s because iOS has often had such an outsized impact on revenues, so it’s always been, oh okay, we have this, we have this thing but we don’t have it in iOS and iOS is our most important platform.

And Google keeps changing these things and, and has been improving these things in my opinion, it hasn’t been, backward and forward, it’s always been more data, more granularity, a better identification of specific conversions throughout the funnel. And I think mainly the reason for low adoption in my opinion is that because it was only really, and even now really available on one platform without using a third party, so it’s clean.

And of course, it’s critical now because of two different reasons, I think, one because, even if you have, one major platform out of two that has this information, it’s still hugely valuable just optimizing Google. The other part about it is that the Google Play Store has become so much more important for everybody than, let’s say, where it was five years ago for sure, because the rate of growth for most folks on the Google Play Store has been phenomenal when compared to Apple. And then this has always been the case in terms of downloads and then the number of users, but now also in revenue.

Shamanth: Yeah, that certainly makes sense, and I would also imagine way back six, seven, eight years ago when you and I were working on this, that it was somewhat relatively easy to get installs at the time, there wasn’t as much competitive pressure as well, which absolutely has changed since that time. And so, let’s assume there’s an app that’s like, right, we need to start testing, we listened to Sharath on this amazing interview, we need to start testing – where do you recommend they start, assuming they’ve never done any testing before with their App Store pages?

Sharath: Yeah, I think that’s fair. A while ago, almost seven years ago when we were working on this, the store was a whole lot less crowded when we started working on this. And now, if you were to really start from scratch, let’s say, I think it’s pretty clear that visuals are where it’s at, I don’t think there’s anyone who’s going to say, no, let’s test copy first, it breaks my heart to say it, because I got started in advertising as a copywriter at one point and it breaks my heart to say that copy doesn’t move as much as a copywriter would like to think it does, at least in the basic testing.

Those are the two big things that you want to change. Certainly, it’s really, easy I think to say, oh this is super new and, and this hasn’t been the way that it’s always been or anything, but visuals have always been the big critical thing in advertising, like there are entire companies, their careers are built on good packaging, right, and the App Store really is just good packaging for your app. And when you think about it that way, and you go back to the first principles of advertising, make it attention-seeking. Can anybody look at this thing and say what it is and get excited about it, like is this different, is this the same, and that same but different, that’s what people are really looking for, they want to be able to understand what it is but it should also be new and interesting. And that is really what one can achieve with a good amount of thinking around the icon, a good amount of testing around the video thumbnail or what used to be the feature graphic on Google.

Shamanth: Yeah. And when you say new and interesting, are there examples of tests you’ve run where you’ve employed this, that come to mind?

Sharath: Oh for sure, like my favorite example is this example of this doctor app that we worked on, and that was this game called Heart’s Medicine and the value of that test was just huge, because that was one of the first tests that we ran that had an impact until this time we were doing, you know, 2%, 3%, and 5%, nothing, like essentially, with confidence levels at 90%, the differences were very little.

But what I mean by the same but different or old and classic but still new, it was really in this one icon. So it was a doctor game, and we had an icon that everybody agreed on, we had the first-ever consensus where the creative team, the branding team, and the performance team looked at an icon and it was about a doctor’s story who she goes from a medical intern to a gifted surgeon and the icon was a doctor holding a stethoscope which makes tons of sense, and there’s a big background of a heart behind her and her name is Heart.

We thought that was super cool and then everybody agreed that that was a great thing to do, but then the art director on the actual game who was not part of the marketing team per se,

And you look back over the literature of AB testing, whether it’s, you know, stuff that was written a hundred years ago in Scientific Advertising or Ogilvy wrote 50 years ago, it is that AB testing does, when done right or when you get a little bit lucky, whichever one you choose, it makes a huge difference, and this thing really made a huge difference, it changed our advertising for that game, it changed how we thought about games and stories, it changed the top of the funnel and we could see that it was actually having an impact further down because the kind of expectations that were set at this point onwards, it was pretty solid, that’s, that’s my favorite test.

Shamanth: Yeah, yeah. And you wouldn’t expect it to be almost like a battlefield, a person in a battlefield, it’s almost like, you know, it told a completely different story.

Sharath: Yeah.

Shamanth: Right, yeah, that makes a lot of sense.

Sharath: Well, that’s the thing about it. It was actually a part of the story, but it was just one part of the story, like this is a 20-hour game almost and this particular scene, it’s I think the most emotional, the most gripping scene of the whole thing, but we just didn’t feel it that way when we were going into it.

Shamanth: Yeah.

Sharath: Hindsight, 20-20, it’s kind of obvious, oh of course, this was amazing. I mean, but that’s not how that story unfolded.

Shamanth: Yeah, yeah.

Sharath:

So that was what was at stake and, and I think that’s one of my favorite things about this, that the system that is fairly easy to deploy and redeploy and test and retest.

Shamanth: Yeah, yeah, you know, for every unique idea or test that you run that’s a winner like that, I imagine there are a lot of ideas that can be somewhat obvious, like that’s just a red button versus a green button.

Sharath: Right.

Shamanth: Or we just may not move the needle even if you think it’s going to be amazing, it probably wouldn’t move the needle, it wouldn’t beat the control, right – how do you recommend teams go about generating ideas for these tests, right, which are sort of somewhat similar, but have an interesting twist to them or any other way you want to think about these?

Sharath: I mean, I think the biggest thing is really the process because you never know which one is the big idea. Red button, green button, fair enough, for what is worth, I think that’s also worth testing but, I mean, the icon is the size of a postage stamp. So maybe it works, maybe it doesn’t, although actions usually do make a difference when you’re looking at much larger webpages, when you’re looking at a long copy, we see that it makes a difference.

Yeah, because I think even if you start at red button, green button, sometimes you got to see the fact that nobody actually cares to get out of that, to get that system and to stop thinking about things that way because maybe it works for your app, maybe it doesn’t, I don’t know.

Shamanth: Yeah.

Sharath: I’ve seen, you know, I’ve seen crazy things that I thought were as different as, you know, ice and fire, make no difference at all, because the users don’t necessarily care about the same things that we care about, and we’ve just been wrong about that. And you know, if I was to, for example, invest six months in making two things very, very different and then testing it and then finding out that the users actually don’t care and one is the 51.2% and the other one is 51.1%, that would be, that would be terrible.

For what it’s worth,

And then, and you know, this was the same thing when we were working together, the thing that we hate the most or the thing that we go, we’re not going to make a difference, somehow the universe just wants to prove you wrong, it just goes and, you know, insane result out of that.

Or that one lone voice that says, this is completely different, and that ends up working, and then I think the other thing with testing cadence being more important than the actual test themselves is that this is a great way to build a creative team because any good creative team has people who fundamentally believe in what they’re doing. Like these are folks who have done this for a while, they’ve been right a whole bunch of times, so how do they convince each other and how do they gain this respect for each other if they’re all completely new and haven’t worked together before, I think having a cadence of testing where everybody can see that, you know, a whole bunch of people are right a whole bunch of times and wrong a whole bunch of times, it just lowers the ego threshold of being right and being wrong, and puts the focus on doing things that are meaningful.

Shamanth: Yeah.

Sharath: Because, I mean, you do enough red button, green button tests that have no impact, you do 10 of those, and then no one’s going to come up with the next idea of, oh now, let’s try purple button. Right?

Shamanth: Yeah.

Sharath: Everyone starts then thinking in different directions and in different ways, and then you end up getting better ideas, and I think that that’s, that’s in many ways more, the most valuable thing.

Shamanth: Yeah. And when you say, the cadence matters the most, what does that cadence look like, what does that process look like for you guys Who’s present, who’s involved, is there a meeting, what is that like, yeah?

Sharath: Well, typically, the people who are most involved are the growth team, and, and I’ve now moved more towards the monetization side of things. So I don’t really look at it in that way anymore, but certainly the growth team is involved and the studio is involved, like whoever is making the app and whoever is selling the app. And my typical rule for any of these things is bring in everybody who needs to be there but don’t bring in anybody who doesn’t need to be there. So like if it’s an in-house system that you’re setting up and you need to be a person to set it up, then definitely involve them. But if it’s just something that you’re testing on the Google Play Store and is between the studios that make the art and the advertising or the growth team that runs the test, then run the test.

Some folks have a system where, you know, the studios run the tests themselves because they have everything within themselves, they’re fully, they have the bandwidth, let’s say, and other studios don’t have that bandwidth and it just is their advertising folks or their agencies or their growth teams that do it, then it’s just internally within their growth teams. But in terms of cadence, I think that for most B2C, you got to give that at least seven days and, you know, that way, if you’re running 50 tests in one segment for one feature for one app, that’s pretty decent.

It’s not perfect but it’s decent, and then you could also split between the US and UK or markets that you know for you behave the same way. You could probably cluster them out and run a few tests in parallel, but I would give it about seven to eight days. Sometimes you get a result sooner and that’s great, sometimes you don’t, but I would start running a test after a week if it has no result because, I mean, you could wait forever, and we have done a bit of analysis on how long is it and the overwhelming majority of our test, we have a result in eight days, and if we don’t get a result in seven to eight days then waiting 40 days doesn’t give us anything useful.

Shamanth: Yeah, that makes sense. So if you’re getting results roughly in just over a week, you’re structuring your meetings perhaps every other week or perhaps even every week to decide on what…

Sharath: Yeah, about every week. I mean, any studio that I’ve advised in the past and GameHouse works with a number of studios or work with a number of studios really, at least once a week we’d report in on a bunch of experiments whether it’s on the product side, on the paid advertising side, or growth in general, and even on the CRM side. I imagine it’s the same for all the good agencies out there as well, I know they do do something very similar, where once a week you just kind of touch base because that’s, that’s how most users look at the ads, I mean people live week to week, Sundays are different from Thursdays and that kind of stuff. And I think that that works here in the most optimal way as well.

Shamanth: Cool, yeah. And, you know, you did say earlier how a lot of the testing infrastructure on Google Play is very robust, so how do you recommend approaching iOS, if at all, yeah?

Sharath: Yeah, that was the tough one to give a scientific answer on because I think it depends on who you ask really like, I mean, there are companies who will tell you that their results on iOS are totally different from their results on Google, and that might be the case for them.

Typically, in my experience, at least in the US, I don’t see a huge difference between something that is a killer creative on Google being a total dud on iOS.

Having said that, everyone’s got to test this thing out for themselves, because maybe a proxy landing page is better for somebody on iTunes and maybe just importing Google results to iTunes is better. But what I can say is once you run this experiment once or twice or even thrice, you’ll have your answer. If you have a great killer result on Google and you deploy it on Apple and you don’t see a bounce, you know that your Apple is different from your Google, if you see the same kind of killer bounce on iOS, you know, that you can disregard a couple of blog posts out there that say it’s completely different, and that for you it’s actually pretty similar.

And I’ve both happen, right, and maybe it’s different for everybody, maybe there’s a broad trend that works for somebody in certain segments, I think the only way to do is test, if there’s a test and if it has no impact, then, yeah, unfortunately it has no impact. The cost for experimentation on iTunes is certainly higher in terms of dev costs, and then you’re submitting, and then you’re waiting, so there’s a lot of wastage in and around that. I don’t know. Someday Apple’s going to fix that.

Shamanth: Yeah or not.

Sharath: Or not, you never know.

Shamanth: Yeah, yeah.

Sharath: They have their reasons for doing what they do. But yeah, that’s inevitable, and then some people don’t have a philosophical issue with adding one more layer to the funnel for the sake of testing.

Shamanth: Yeah.

Sharath: I have issues with that so I don’t typically use those types of things but, but that’s just me. If it works better to test, then it works better to test.

Shamanth: Got you.

Sharath: I don’t test separately, I do deploy to iTunes because I found, for the most part, a giant win on Google for our segments of casual titles does translate into a good win on Apple as well.

Shamanth: Got you. And what metrics and/or KPIs do you use to evaluate the results on Google Play, when you’re running these AB tests?

Sharath: I like looking at returned users, especially for icons and headers and even copy and those types of things, how I like using those, I prefer the retained user than just the installed user. And, and sometimes those are different, certain icons can be clickbaity, certain creatives and end lines can be clickbaity, so you end up losing more people after a day. And then when it comes to other things like conversion testing or onboarding testing, there Google Play has a much more robust window of seven days and things like that, and I would rather stick with something like that, like a seven-day lookback rather than an install lookback.

Shamanth: Right, so for people who aren’t familiar this looks at users who are retained after seven days of install?

Sharath: Exactly, yeah.

Shamanth: Right, of whether they’re staying in the app.

Sharath: Exactly, and I find that some of the shorthand that we use is perhaps almost confusing on purpose like, something like D7 and you see that in a report or a post, and you’re like, what is that, and I think that that’s a fair point. So if someone downloads on a Monday because you change something, you want to stick around for another week after that, do they come back the next Monday.

Shamanth: Yeah.

Sharath: Even now like, when I meet certain folks and I told them, you know, day seven retention benchmark of 20% is a good number, quite often there’s a lot of double takes involved going, you can actually live with that that you have a 100 people installing on Monday, and only if you get – even if you get only 20 of those people coming back next Monday in the consumer segment, you actually have a good app.

Shamanth: Yeah, yeah.

Sharath: You actually have more than a good app, you have a pretty solid up at that point.

Shamanth: Indeed.

Sharath: But yeah, that’s just the benchmark and that’s how the metrics work.

Shamanth: Indeed. Yeah, Sharath. I think that makes a lot of sense, and, you know, I think this has been great, you know, picking your brain on so many other things you’ve learned, it’s interesting non-obvious things you’ve learnt running 200 plus tests. Thank you for being on the Mobile User Acquisition Show.

Sharath: Happy to be here.

Shamanth: Absolutely. Cool.

A REQUEST BEFORE YOU GO

I have a very important favor to ask, which as those of you who know me know I don’t do often. If you get any pleasure or inspiration from this episode, could you PLEASE leave a review on your favorite podcasting platform – be it iTunes, Overcast, Spotify or wherever you get your podcast fix. This podcast is very much a labor of love – and each episode takes many many hours to put together. When you write a review, it will not only be a great deal of encouragement to us, but it will also support getting the word out about the Mobile User Acquisition Show.

Constructive criticism and suggestions for improvement are welcome, whether on podcasting platforms – or by email to shamanth at rocketshiphq.com. We read all reviews & I want to make this podcast better.

Thank you – and I look forward to seeing you with the next episode!

KEY HIGHLIGHTS

KEY QUOTES

What really moves the needle in the app stores

Testing reveals the unexpected

Not testing is not an option

Epiphanies take consistency

You may be overthinking things

Find your own answers

WANT TO SCALE PROFITABLY IN A POST IDENTIFIER WORLD?