Data Driven Attribution

February 16, 2011

Harley Norrgren studied Statistics at University College London and after a two year tenure is currently heading up Infectious Media’s Analytics Team. Being with Infectious throughout their entire working relationship with RTB he’s had the opportunity to watch the space develop since the start of RTB’s European adoption.

Regardless of what you think about attribution modelling, we can all agree that whilst practically useful, last click models tend to see attribution as a yes or no decision. Is a model that negates all exposure¹  history, apart from the ultimate exposure, necessarily painting the most accurate and useful picture of which advertising spend is actually generating conversions?

Thanks to the increasing digitalisation of our industry, exposure level data is readily available for building much more sophisticated and informative models, which can inform more efficient budget allocation and increase ROIs for clients. Using statistical methods we can isolate the uplift on probability of conversion that each individual exposure has and therefore assign attribution on an ROI rather than binary basis: exchanging the notion of a single exposure driving a conversion with an each exposure has a small effect approach. Furthermore these models can be implemented at the start of a campaign and updated throughout the campaign at predetermined intervals, providing simultaneous analysis and reactive attribution which would reward the efforts of effective advertising rather than competing media buyers simply gaming some system for what is essentially a random conversion attribution.

Approaches like these do take more time and require more skill to implement than last click models but there are three compelling arguments for their use:

  1. The potential return on investment and increased understanding of the conversion path for clients would outweigh the cost of implementation.
  2. Advertisers are already implementing more complex attribution models for planning purposes.
  3. The data is already available for use and the industry is coming to terms with dealing with data on a daily basis, change is coming so why should advertisers wait to be the last in line when they could be taking advantage of this data now?

To illustrate my point I’ve pulled some data from our platform on user behaviour under different retargeting conditions with a view to compare their behaviours against unexposed users.

We have chosen to examine the effect of aggressive retargeting versus less aggressive retargeting on a user’s propensity to make a return visit within 24 hours of the impression. Both types of retargeting will be compared against the natural return rate rather than against each other using a simple Chi2 test.

The aggressive retargeting (AR) group will be defined as users exposed to an advert within the same hour as the first site visit.

The less aggressive retargeting (LAR) group will be defined as users exposed to an advert between 24 and 48 hours after the first site visit.

For the AR group the results can be summarised as follows:

Didn’t Return Returned in 24hrs Return Rate
Didn’t See an Impression 268,530 26,691 0.099397
Saw an Impression Within 1 Hour 949 84 0.088514

For the LAR group the results can be summarised as follows:

Didn’t Return Returned between 24 and 48 hrs Return Rate
Didn’t See an Impression 268,530 8,947 0.033318
Saw an Impression between 24 and 48 hours 2,456 123 0.050081

We found that there was no significant uplift on the return rate (p-value = 0.3354) for the AR group, but there was a significant uplift on the return rate (p-value = 1.327E-5) for the LAR group despite the overall return rate being lower. A last click model would have given attribution to the AR group and suggested that it performed better than the LAR group, but this is plainly not the case. An exposure level attribution model could discriminate between significant and insignificant exposures, assigning attribution to where behaviours were driven and painting a much more realistic picture of advertising effectiveness.

Furthermore, tying in campaign setup with attribution model insights generated on the client’s side has become easier than ever before: granular campaign targeting is available through most platforms so making the transition could be easily achieved on the buy side. On the client’s side, implementing and updating a multivariate attribution model based upon maybe billions of rows of data is no simple task. Yet the industry is already starting to rely on big data to inform their advertising decisions and when clients want the improved results others may be achieving they’ll be playing catch up if they don’t start experimenting now. So my advice is to start small and when the time comes for de-facto bespoke attribution modelling you’ll be ready for it.

Of course this is quite a small insight into a vast array of possible analyses that could inform attribution model specification and I’d be keen to hear your opinions on this.

¹ For more information on exposures and interactions, please see Measurement: The Elephant in the Attribution Room where we discuss measurement in attribution models.

Measurement – The Elephant in the (Attribution) Room

January 27, 2011

Attribution is a massive issue right now and there are a number of innovative technology solutions that have been developed to give advertisers the ability to understand how different channels interact with each other. These solutions tend to focus on attributing value to impressions and clicks (interactions) further up the funnel and whilst this is a sensible step, it’s only half the story. It’s no good understanding that Display has a positive impact on Search without knowing how your activity can be altered to improve that impact. This has to start with measurement. We must delve deeper and assess whether these “interactions” were interactions at all.

Online was supposed to be easy. Actual, attributable intent and sales from advertising without having to spend the company’s pension fund on a piece of econometric analysis that you would neither understand nor trust. Last-click-wins gave us a benchmark to measure all our digital activity, allowing us to compare and contrast different channels and strategies in the same way.

But it was broken. Assuming that only the last click (or impression if no clicks were recorded) influenced a user to make a sale was not just wrong, it was misleading. Yet Display and Affiliate networks made huge sums building CPA businesses on this flawed methodology and when Search exploded on to the scene, no-one seemed to realise that Google had effectively stumbled upon the best exploitation of last-click-wins, using it to build the largest online advertising business in the world. Even now we have Criteo-style product retargeting and Affiliate voucher code sites that often snipe the last click like a seasoned eBay auction bidder, winning the best deals just before the timer runs out.

Times are changing, however. Marketers are now more savvy and are demanding a more accurate solution to the attribution problem. But before this can happen, we need to understand the complexities of online tracking a little more deeply.

When analysing online path to conversion data, you can typically find between 5 and 100 events that may or may not have influenced a customer before conversion. Since most adservers only record clicks and impressions, your conversion path will only include these metrics as events. Herein lies the first problem. Impressions are not ad views. In other words, just because your adserver has recorded an ad call, it doesn’t mean that the ad was actually served. Even if the ad was served, it maybe was not even seen, especially if served below the fold. In essence, although impressions appear more tangible, they are no more accountable than newspaper impacts.

The second problem is clicks. Clicks are seen as the only measure of engagement and intent online. But advertisers should be asking for more. Anyone who has done some click to page landing analysis will know you see anywhere from a 15% to 50% drop off. If your clickers aren’t even waiting until your landing page loads up, how engaged were they? Similarly, what about all the people that read the ad, maybe played with it a little, then returned to what they were doing before?

Clearly not all clicks/impressions are equal and on their own, do not provide us with enough information to form robust attribution models. Whilst you can create a model based on your current media activity, all it takes is some additional spend on some cheap, below the fold inventory or some incentivised click sites to skew attribution and devalue the sites that actually do generate true ROI. What is required is more information. Impressions need to be augmented with above / below the fold data, time and position on page and page context / quality. Clicks need to be supplanted by interactions and page landings. With these enhanced metrics we can begin to understand whether our adverts were indeed seen and how they engaged our target audience. Once we understand this, we can start to model it.

Luckily there are now many companies that can provide this data. The likes of AdXpose and Flashtalking can all provide interaction data and some of the impression tracking enhancements mentioned above. Page landings can already be recorded using existing pixel technology and many data companies such as Peer39 can provide page context. What stops us from running all these technologies across all campaigns now are the current incremental costs of such solutions as well as the technical difficulties in integrating this data with standard adserving data in one place.

Of course, having the capability to record this data is only half the story. Storing and analysing it at a time when most companies struggle to store and analyse their click and impression data is arguably a larger issue. Add to this the lack of statistical and analysis skills in most marketing departments, is it any wonder that marketers hide away from the problem and merely discuss the fact that last-click-wins needs to be improved but have no idea where to start?

Here is where RTB can help. RTB provides an environment that allows any company to exchange data with another, server to server, in order to better understand the impression being served. By providing the APIs to allow companies such as AdXpose, Peer39, etc. to integrate directly with DSPs and adexchanges, the integration problem goes away. Data can still be collected in two separate places but you have a common unique user id to match up the sets. Once integration is solved, costs can come down as more advertisers will take up the service introducing economies of scale.

This still leaves the data storage and analysis issue, but creating a fast and scalable storage and analysis infrastructure is not as difficult as it used to be. Companies such as Netezza and Greenplum can do it for you for a price. Alternatively, if you can afford the time to investigate and implement open-source platforms, solutions such as Hadoop and InfoBright can also work just as well.

2011 is going to be a year where these technologies combine to allow us to better understand all that our advertising is delivering. Soon there will be no excuse for marketers to stick with last-click-wins as we will be able to provide robust attribution models to support or oppose our hypotheses. When this happens, we will not only be able to better understand the value of events leading up to conversion, we will also open up the door to more branding activity being placed online.