Nitay Alon and Yohai Sabag discuss the metrics and methods necessary for gauging the long-term effects of campaign effectiveness.
Video Transcript
– [Yohai] I’m Yohai, Optimove Chief Data Scientist, and here with me is Nitay Alon, a data scientist in Optimove Research Lab. Now, when it comes to analysis of campaigns, marketers and data analysts usually tend to use a very specific set of tools. They usually base their analysis on the average, they may look on the median, sometimes they add few common statistical tests like t-test or z-test to check the difference between the averages, but that’s about it.
And, in this session, I would like to share with you a more scientific approach on how to conduct campaign analysis, and the key motivation is to allow you to squeeze the data of your campaigns and get more insights of it. Actually to be able to ask and get answers to highly relevant questions regarding your campaigns.
Now, in order to do so, I will share with you our mindset, how we, as scientist, address this challenge, and I will also share with you some of our secret knowledge about how the data really looks like, and what are the relevant distributions in this domain, and at the end, we’re going to provide you with few handy tools that will allow you to leverage this understanding into a better analysis.
Okay. So, what is the key motivation? As you all know, a campaign is always born with an objective. It may be to increase conversion, to win back churn customers, to cross-sell, to upsell, whatever. And, if the marketer did his job properly, and he planned the campaign as a scientific experiment, so once the campaign was executed, and data starts coming, so there are many relevant questions that we can ask this data.
There also many ways, or many methods to approach this analysis, to conduct this analysis. But, we scientist, we never pick the tool for the analysis and then we ask the relevant question. We always do it the other way around. We first ask the question, we see what is the question in hand, and then we find the right tool, the best tool to tackle it, to address it.
And, what are the relevant questions that I would like you to ask regarding your campaigns? So, first, is the campaign effective? And, if it is effective, on whom is it effective? Should we keep running it? Can we estimate the future performance? Can we estimate how much value we’re going to get next week, next month, in the next execution?
Is it even possible to do that? And, what are the alternatives? Are there any alternatives for that campaign? Should we send another offer from what we’re sending? Now, when the data starts coming, instead of narrowing it down into one figure, like the average or the median, we always strive for the bigger picture, we always look on the entire distribution.
Now, by doing this, we are avoiding a very…let’s call it human instinct to simplify and aggregate things, which is, in general, it’s a good instinct. But my job here is to convince you that in this case, it’s highly beneficial to do that because there are very interesting, very many interesting insights that are hiding in this distribution.
Now, it would have made all the sense in the world to base the analysis solely on the average and the standard deviation had that was the case, and people usually think that this is, in most of the cases, this is the case, that the data is distributed normally.
And, if that is the case, so the average and the standard deviation really, they perfectly tell the entire story of the distribution when you know them in normal distribution, so you know how most of your data points, or most of your observation look like. But this is extremely uncommon, not in retail, not in gaming, not in Fintech, and definitely not when it comes to campaign analysis.
We almost never see this kind of distribution. What we do see is this. It’s called a long-tail or a fat-tail distribution. And, here, we have a very bizarre behavior because the maximum here is sometimes as much as important as the overall sum. And we may have, like, one or two data points that are responsible for as much as 95% of the overall sum.
And, we also don’t have a central mass. We don’t have all, or most of the observations gathered around the average. And we have a very low probability to see…let’s call it extreme values. And, in case you wonder, just to make things clear, so the x-axis here is any monetary value that is relevant to your business.
It may be like order value, or deposit amount, or wager amount, purchase, and whatever, and the y-axis is the frequency, the number of customers or number of respondents that respond in that value. Now, you can see that we need to see…we have to see a lot of observations until we cover all the part of the distribution, and it makes the average, kind of, bouncy…just bouncing until we see what the real…let’s call value of it.
And, not always we have the time, so it makes the average really problematic to rely on. We can’t rely on it, makes it less informative. In these kind of distributions, in long-tail distributions, the key to understand the data, or how the data behave is the quantiles. We must use the quantiles in order to understand our data.
Now, I want to provide you with…let’s call it a heuristic or a common method on how to divide your data using the quantiles so you will be able to better understand your distribution. So, again, the x-axis may be any monetary value that is relevant for your business, order value, deposit amount, whatever.
The y-axis is the frequency, the number of customers. Now, let’s divide it into three parts. The red part is the majority, the majority of the data. It may be, like, 80% of the data, they may be 80%of the data, and they tend to have a very low value on average. Then, to the right we have the central bulk.
Okay, you can refer to these customers as the working class, they have higher value in terms of average, and they may be, like, 20% of the data, or little less, little more. And, the last part are the respondents, the values that are located in the tail, the tail customers, the extreme values, we can call them, the VIPs.
And they are, like, 1% or 2% of the data, it depends, but they tend to have huge value on average, way higher than all the other parts of the distribution. Now, as you probably can guess, not all the data parts, not all the respondents, not all the part of the distribution were born equal. And, I want to take you through an example that will emphasize to you how problematic it is to rely your analysis solely on the average.
Now, let’s take for example a very common baseline, it’s a common case of a long-tail distribution. You can refer to the figures in the table as any monetary value. Let’s, for the sake of the example, use order value in terms of U.S. dollars.
So, we have an average of 10 to the majority, 80% of the data, we have an average value of 10, then 200 for the working class, 19% of the data, and 7,500 for the VIP, for the top 1%. Now, each one of the scenarios beneath the baseline referring to it.
Okay, I’ll show each one of them, and you can see that each one of them has the same average. The first scenario, we have a decrease of 10% in the average value of the VIPs, from 7,500 to 6,750. In the second scenario, we have a decrease of 20% in the average value of the working class, from 200 to 160, and you see that it gives us the exact same average.
But, the overwhelming fact is that even when we have a total crush in the average of the majority of our data, in the average value of 80% of our data, all the way down from 10 to 0.6, a decrease of 94%, we still get the exact same average as we had when we had a decrease of 10% in the average value of the VIPs.
Now, it’s obvious, it’s crystal-clear that each one of these scenarios is different, and it should trigger a different reaction, a different response from you guys, but you couldn’t tell it if you rely your analysis solely on the average. Now, with that understanding in mind, I will bring the mic to Nitay, and he will provide you with some handy tools that will allow to leverage this understanding into a better analysis.
– [Nitay] So, after Yohai set the kitchen for us, I’m going to show you how to cook some pasta. So, the idea is that I want to take you through a full cycle of scientific compare analysis, meaning we start with a question, then we look at the data and decide which part of the data are relevant to this question, selecting a proper tool from my scientific toolbox, and conducting the analysis, and at the end we make an informed decision after we’ve exhausted the analysis.
So, as a setting for my examples, I want to portray a very common scenario. So, you have a recurring campaign, and you sent it last week, and now it’s Monday morning, you come back to the office from the weekend, grab a cup of coffee, sit in front of the computer, and look at the results. Heads up, tomorrow is Tuesday…not really, in my scenario, and 800 new customers are going to be targeted in this campaign.
So, take two seconds and look at the campaign. So, first, one thing pops to the eye, very common scenario, very common thing that we see often, all the time, the test group is way higher than the control group, meaning that we have higher certainty, we have more information about the test group than we have on the control group.
Moreover, look at the average of the groups. If you look only on the average, you get the feeling that the test is performing better, but if you look at the median, you get a different picture where the test is not performing as well as the control group.
You might be spending good customers, allocating them to a campaign where they spend on median less than the control group. Now, you get two different answers for one single meta question, “Is the campaign effective overall?Are we doing the right thing? Should we keep this campaign running or not?”
So, let’s try and break the question of, “Should we keep the campaign running or not?” Into three business-related question. The first one is, “Am I making an overall…is the campaign effective overall?Are we making an effect on each and every one of the customers?Are we causing each and every customer in the test group to spend more?If we’re not making an effect on the overall population, well, we might make an effect on only one region of the population.”
And it’s very common that in CRM, or in marketing, you only effect one portion of the population. And, last, but not least, is, let’s try and predict the future. Let’s try and estimate how much money are we expected to see. Why is it important?
Because had we know beforehand how much money each campaign is about to generate, it would be very easy decision to make. Just keep the profitable campaign running and kill the ineffective ones. So, the first thing we want to tackle is, is the campaign effective overall? And it’s a very hard thing to do because we have very, very few data points in the control group.
And, remember from Yohai’s part that when you have very few data points in a long-tail distribution, the location of the true average is very hard to guess because the 22nd responder in the control group will shift the average, either to the right or to the left, with 100% probability. So, we are very uncertain about the true average of the control group, and we are less uncertain about the location of the test group.
So we’ve developed a heuristic, at Optimove, that we like to call the simulated subsampling, and the idea is as follows. Let’s try an insert uncertainty into the true location of the test group, meaning, if we take a small subsample of the test customers, are they outperforming the control group? So, how do you conduct this heuristic?
Well, you take all your customers, all your test customers, all 108 of them, you put them in a virtual hat, and you draw a small subsample of size 21. Now you have 21 customers from the test group, and you can compute their average. Repeat the process, say, 1,000, 2,000 times. Now we have a list of plausible test averages. Compare them to the control average.
When the subsample outperforms the control…let’s assume that you took a subsample and you get an average of 300, then it outperforms the control. Mark down “1.” If it’s on the other way around, let’s assume you took a subsample and you got an average of 200, mark “0,” sum of the “1s” [SP], and it will give you a good estimation on the probability.
We, scientist, we love answers in terms of probability. We don’t like certain answers such as “good or bad,” “working or not working.” We like probability. It will give you an estimation to the probability of the test outperforming the control. And, if you look at the results in front of you, you get a clear view that the test has less than 50% of chance to win the control.
So, when it comes to make a decision based on this analysis, the answer is no. You didn’t made a change on the overall population, but it’s okay because we know that you’re targeting only… we usually an effect on a small portion of the population.
So, let’s try and focus our attention on the working class. Let’s try and remove the noise and decrease the signal-to-noise ratio. Now, remember that the average is very sensitive to extreme observations. Remember Yohai’s table demonstration? Nod if you do.
Okay. So, a new observation, a new extreme observation will change the location of our average despite the fact that the majority of our population didn’t change their behavior, and we only want to make an analysis on the majority. So, let’s trim the edges. Let’s remove them. I’m saving the scissors from you because it’s quite terrifying. So, the idea is that we use a method called trimmed mean, and we remove the extremes.
We’re denoising the extremes, and we’re also denoising some of the low spenders, and focus our attention on the pink region. Okay. Now, we can conduct a simple t-test to compare the trimmed mean. And if you do this, then the t-test come out significant, meaning, despite the fact that you didn’t made a change on the overall population, you did cause the change in the behavior of the majority of your data.
Now, one thing important to remember, in long-tail distribution, when you run a trimmed mean, you don’t use standard deviation units, you use quantiles. I’m referring to what Yohai said before, and the idea is that I trimmed the lower and the upper decile. Last, but not least, let’s try and predict the future.
Now, the future is hard to predict. If I had a machine that predict the future with 100% certainty, I would be in the Caribbeans now. Now, but I don’t have the machine, so we are trying to approximate the future, we are trying to make a forecast. So, the idea is we have very few data points. And, if we want to explore the entire distribution of the test group, it will be very costly because, as Yohai said, it will take us a lot of customers and a lot of time to explore the entire distribution.
Right? Might be 10,000 or even 20,000 customers that we need to target to this campaign just in the name of exploration, and this is very, very costly. There are better alternative. There might be better alternatives for those customers, so we want to find a method to mimicate the action of allocating 10,000 new customers, and sampling 10,000 new observations from the same distribution that our test data came from.
Now, after years of experiment, we’ve decided to model the data using the log-normal distribution. Now, look it up at Wikipedia later on, but the idea is that it’s a very, very common distribution to model monetary variables…sorry. So, voila, deus ex machina. Now we have 10,000 new observations from the same distribution, and now we can start asking questions such as, “How much money am I expected to see?”
Let’s assume that you are going to target 800 new customers tomorrow morning. Take a sample of size 800 from this distribution and look at the sum. If you do it repeatedly and then average, in this case, you get somewhere in the region of $250,000. Is this a figure that you’re comfortable with?
Ask yourself this question, “Is this campaign generating enough money?” Let’s assume that the alternative generates only $200,000, then clearly this campaign is doing well, but if the alternative is generating $300,000, then you’re wasting time. Use the simulation to try and predict the future. So, in this session, we took you through our mindset of conducting campaign analysis.
Always begin with a clear business-related question. Don’t be afraid to ask any question you want. Try and break it down into smaller questions, still business-related. Look at the data. Select the region in the data that tells the most…tell the tale that you want to hear. Select the proper tool, simulation, trimmed mean, subsampling.
And, only after you’ve exhausted the scientific analysis, then make an informed decision. Now, all the methods that I’ve presented today are available in the Optimove Connect site for your use. Thank you, all, for listening.