Director of Demand Generation & Content at Thrive Internet Marketing Agency
Answered 3 months ago
Incrementality across Amazon, Walmart, and Instacart works best when measured at the SKU x retailer level using a Synthetic Control approach. Instead of leaning on last-click logic, this method builds a counterfactual for each advertised SKU within each retail network, using a weighted blend of similar SKUs that were not advertised during the same window. Those controls come from the same retailer, same category, similar price bands, seasonality, and historical velocity. Because the comparison lives inside each retail ecosystem, the read reflects true demand lift rather than media proximity to conversion, and it stays insulated from regional quirks that often distort geo tests. This structure avoids geo bias because it never relies on ZIP codes or regional holdouts where retailer coverage, delivery speed, or store density varies. Every SKU competes against its own synthetic twin drawn from national sales patterns inside the same platform. That makes Amazon results comparable to Walmart or Instacart without forcing artificial geographic splits that never behave the same across networks. The outcome is a clean answer to one question advertisers care about: what sales would have happened anyway for this exact product at this exact retailer. One example from our agency involved a packaged food brand running always-on sponsored placements across Amazon and Walmart. Last-click reports suggested strong performance on both, yet the synthetic control showed a different story. Amazon delivered a 14 percent incremental lift at the SKU level, while Walmart landed closer to 4 percent, with most volume reflecting baseline demand. Budget allocation changed the following quarter, favoring Amazon for conquesting and Walmart for coverage, and total incremental revenue rose without increasing spend.
The only way we've found to get real incrementality on retail media is to stop trusting platform-reported last click and force some kind of control into the system. For Amazon, Walmart, and Instacart, we lean on geo splits or time-based holdouts where a portion of demand is intentionally left unexposed, even if it makes people nervous. One test that gave us confidence was pausing retail ads in a handful of matched regions while keeping everything else constant, then watching what happened to total sales, not just attributed sales. What surprised people was that some "top-performing" campaigns barely moved the needle once you removed last-click bias. The big lesson was that incrementality shows up in lift against a baseline, not in a dashboard that's paid to take credit. If you're not willing to let a slice of demand go dark temporarily, you're guessing, not measuring.
In retail media, a big challenge is understanding whether ads actually create new sales - or whether they just take credit for sales that would have happened anyway. Many strategies still rely on "last click" reporting, which gives all the credit to the final ad someone clicks before buying. This often makes results look better than they truly are, especially for brand search ads. To avoid this, we focus on incrementality: would this sale still have happened if we had not shown the ad? We usually use 2 strategies for this: 1. Full Customer Journey Instead of only looking at the final click, we analyze the full customer journey. Using Amazon Marketing Cloud (AMC), we can connect: ad impressions (who saw which ads) ...and purchase data (who actually bought) This allows us to understand what happened before the purchase, not just the final interaction. We then compare two groups of shoppers: Group A: Shoppers who first saw an upper-funnel ad (such as Sponsored Display or Streaming TV), later searched for the brand and then bought the product Group B: Shoppers who only searched for the brand (and then bought) but did not see those earlier ads If Group A converts at a higher rate than Group B, it indicates that the earlier ad helped create demand. This approach gives us a much more realistic view of performance than standard ROAS metrics. 2. Using "New-to-Brand" as a Practical Signal Both Amazon and Walmart provide New-to-Brand (NTB) metrics, which show whether a customer is buying from a brand for the first time. While NTB is not a perfect measure of incrementality, it is a very useful indicator. For example: A campaign may show strong ROAS But if 90% of sales come from existing customers In that case, the campaign is likely low-incremental. Campaigns that drive a healthy share of new customers are generally much more incremental and valuable for long-term growth. Hope this helps! Cheers, Moritz
However, to understand the incrementality of our spend on retail media channels without the last-click attribution bias, we conduct geo-holdout or audience-holdout experiments, which means we intentionally avoid showing our ads to a certain area. One such example that provided us with a strong read was the Instacart platform. We identified zip codes with similar order volumes and demographics and withheld spend from the control group and ran our campaign on the other geographies. For two weeks, we measured both the sales inside the platform and the total incremental lift of product movement to those geographies using third-party measurement. Since the incremental increase we received from the exposed geographies versus stable and flat sales from the holdout groups provided us with a strong read on the true attribution from our ads, this is not the only way we measure attribution. When we combine this data with modeled attribution from the platforms, we obtain a balanced and fair perspective.
We measure incrementality by removing last-click from the equation entirely and relying on controlled tests—most often geo holdouts and audience splits—run in parallel across Amazon, Walmart, and Instacart. The goal is to compare outcomes where retail media is present versus absent, holding everything else constant. One test that gave us a confident read was a matched-market geo holdout on Amazon and Walmart. That is, we paused retail media in a statistically similar set of DMAs while keeping pricing, promotions, and national media unchanged. We measured lift on total sales (not attributed sales), brand search, and repeat purchase in test vs. control markets over four weeks. The delta told us the true incremental impact, and it was materially lower than last-click reports—but far more reliable for budget decisions. The key is treating retail media like any other upper-to-mid-funnel channel. Take time to prove lift with controls, then use platform attribution for optimization where most people just claim the credit from it.
When I started digging seriously into incrementality across retail media networks, I realized pretty quickly that last-click reporting was telling a comforting story, not a true one. Platforms like Amazon, Walmart, and Instacart all do a great job showing you what happened after an ad was clicked, but very little about what would have happened anyway. As a founder, that gap always bothered me because it leads to confident decisions built on shaky ground. One experiment that really changed how I think about this came from a retail brand we worked with at NerDAI that was scaling aggressively across multiple retail media networks. Instead of asking "which channel drove the sale," we asked a harder question: "what actually moved the needle?" We designed a geo-based holdout test where a small but statistically meaningful set of regions had retail media spend reduced or paused entirely, while everything else stayed the same, including pricing, promotions, and organic placements. What surprised us was how modest the true incremental lift was compared to what last-click reports showed. Some campaigns that looked incredible on-platform barely moved total sales when we zoomed out. Others, especially upper-funnel sponsored placements, showed strong incremental lift even though their click-through metrics looked average. That was the moment it clicked for me that incrementality is about comparison, not attribution. The key was discipline. We resisted the urge to optimize mid-test, let the experiment run long enough to smooth out noise, and focused on blended outcomes like total sales lift and new-to-brand penetration. That gave us confidence to reallocate budget based on real impact, not just dashboard performance. Today, when I talk about measuring incrementality, I always emphasize this: if your test design can't answer "what happens when we don't run the ads," you're not measuring incrementality, you're just redistributing credit. That mindset shift has been one of the most valuable lessons I've carried across retail, ecommerce, and marketplace-driven brands.
For us, the only way to get a confident read on incrementality across retail media networks was to move away from platform-reported attribution and design our own holdout tests. Relying on last-click in Amazon or Walmart almost always overstates impact, especially when you already have strong brand demand. One experiment that worked well was a geo-based holdout on Amazon. We paused sponsored product and display ads in a defined set of low-variance regions for a few weeks while keeping everything else constant, including pricing, inventory, and promotions. We then compared total sales lift, not ad-attributed sales, against matched control regions where ads continued running. The difference between baseline organic sales and exposed sales gave us a much cleaner incrementality signal. In some cases, we found that only about 60-70% of reported ad sales were truly incremental. The biggest lesson was to look at blended outcomes. We focused on total revenue, new-to-brand customers, and category share rather than ROAS alone. My advice is to start small with controlled pauses or geo splits, document assumptions clearly, and repeat tests regularly. Incrementality isn't a one-time answer, it's something you validate continuously as platforms, competition, and demand shift.
Measuring incrementality across Walmart, Amazon and Instacart without last click bias. It should be from attribution to experiment design. The Methodology: Clean Rooms: Go ahead use AMC to join impression data using retail sales, identifying New to Brand shoppers who haven't purchased in 12 months. Geo Testing: Divide regions into test and control groups. This captures "halo effects" and organic cannibalisation that click tracking misses. Case study: Design:Recently we ran a 4 week branded keyword holdout. Test: continued bidding on brand terms Control: Kept everything paused, all branded spend in 10 specific DMAs. Result: The organic listing included 70% of the lost paid traffic, meaning only 30% of branded ad sales were truly incremental. Outcome: We've reallocated that 70% waste to category level keywords, which offered a 15% higher total sales lift by acquiring new customers other than paying existing ones.
When I needed a clean read on incrementality across Amazon and Walmart, I stopped trusting last-click reports and ran a simple geo holdout test. We picked matched cities based on past sales, turned off retail media in 17% of them for four weeks, and kept spend steady in the rest. What surprised me was that Amazon reporting showed a strong lift everywhere, but the holdout markets only dropped 9% in total sales, not the 28% the dashboard implied. On Walmart, the gap was even clearer, with only an 11% difference between exposed and unexposed regions. That contrast gave me confidence about what was truly incremental. The test worked because it compared real buying behavior, not clicks, and it changed internal conversations from "which network gets credit" to "which spend actually moves demand.
Retail media campaigns sometimes looked stronger than they really were if judged only by last-click sales. To measure true incrementality, a simple A/B holdout test worked best. Half of the stores or zip codes received the campaign—Amazon Sponsored Ads or Walmart display—while the other half stayed dark. All other variables, like email and social, stayed the same. One test on Instacart display ads showed that regions exposed to the campaign saw a 19% lift in category sales versus the holdout. Only 7% of that lift could be explained by last-click attribution, showing prior models had overestimated results. The experiment confirmed that controlled holdouts, rather than just tracking clicks, gave a clear read on what advertising actually drove incremental sales, and helped guide where to spend the next marketing dollar.
Incrementality becomes visible once measurement no longer runs the race of attribution and begins isolating absence. A nice read on geo based suppression and not model tuning. Matched markets were selected based on historical sales velocity, basket size and seasonality. Sponsored placements were halted in the test markets for four weeks while elsewhere budgets remained flat. Organic rank, price, and availability were held constant so as to avoid confounding factors. The comparison was on net revenue change and not click paths. Scale by SEO saw a definite indicate. Amazon and Walmart displayed lift that was greater than the suppressed spend by nineteen percent, which confirmed incremental demand. Instacart showed near zero lift, with sales shifting channels, rather than expanding. That insight changed the allocation of budgets at once. Confidence came from the restrained. No blended dashboards were utilized nor was there any utilization of probabilistic crediting. This experiment was based on authentic absence and authentic dollars. The imperative discipline is patience. Short tests favour platforms that cycle fast and harm slower platforms. A minimum of three weeks is needed to get purchase behavior back to normal. Incrementality is not something that is hidden. It kills off only when everything is running everywhere all at once.
The only way to understand incremental impact is to think like an experimenter. We set up geo or audience-level test and control groups for each network and measure the lift instead of looking at last-click data. For example, on Amazon we used the Brand Lift beta to randomly suppress our sponsored product ads to a fraction of shoppers and compared sales and new-to-brand orders against the exposed cohort. On Instacart we ran a matched-market experiment, turning off our display placements in a handful of ZIP codes while leaving them on in similar ZIP codes; after normalising for baseline sales trends we could see the incremental contribution of Instacart ads. Across networks we also use media mix modelling to triangulate the effect of each channel. The common thread is setting aside a clean control group, aligning KPIs across networks and measuring uplift versus baseline. This avoids over-attributing to last click and gives our team a confident read on true incremental value.
When it comes to measuring true incrementality on Amazon or Walmart, first you need to get past the platform-reported metrics, which tend to be biased towards the concept of last-click attribution. These people are already in market and have high purchase intent; you need to separate the sales that happen because of your ads from sales that would have happened regardless. You need to prove causation, not correlation. One rock-solid way to do this is to do a geo-based holdout experiment. One time we created a test for a client where we first scored historic marketplace sales to find a set of statistically similar markets as well as customer profiles for the product. We took half of them and made them the "test" markets where we'd run our retail media campaigns as-is. The other half we made the "control" or "holdout" group, and turned off ad spend altogether for the same product. By holding those markets flat and measuring the total sales lift (not ad sales) and comparing that to the baseline sales over a 30 day period we were then able to confidently calculate the incremental revenue generated by the campaign. The key is having the data infrastructure that lets you measure total sales in a geo level tidily isolating your media spend impact from organic.
We avoid last-click bias by running geo-split holdout tests across retail media networks. One experiment that gave us high confidence was a matched-market test where we paused retail ads (Amazon + Walmart) in a small set of comparable DMAs while keeping pricing, promotions, and distribution identical. We measured baseline organic + direct sales lift versus exposed markets, not ROAS inside the ad platforms. The key was using new-to-brand rate, total category sales, and repeat velocity as primary KPIs, then validating results over 4-6 weeks to smooth promo noise. That read showed true incremental lift was ~25-30% lower than platform-reported ROAS, but far more durable.
We measure incrementality by isolating holdout or geo-split tests rather than relying on platform-reported attribution. One experiment that gave us confidence was pausing retail media in a matched set of regions while keeping pricing, distribution, and promotions constant, then comparing baseline sales lift versus exposed regions. The key lesson was that true incrementality shows up in net new sales and repeat rate, not last-click ROAS, so controlled testing beats attribution models every time.
To get a real sense of incrementality, I don't look at last-click data. I run holdout tests, where we pause retail media in a few spots while keeping spend steady everywhere else. When we did this for Amazon, we saw a clear sales dip in the test markets, which proved the lift was real. It's not a perfect method, but if you want to know if your ads are actually driving extra sales, simple holdouts give you the clearest answer.
I do SEO and PPC. It's always tricky to know if ads are actually bringing in new sales or if people would have bought anyway. So we ran a test. For a local retailer, we shut off ads in specific zip codes and just watched what happened to sales. It made explaining results to clients way easier because we could show the real lift. Even a basic test like this tells you a lot.
Geographic holdouts were able to yield the clearest signal without using platform reported attribution. During one retail push, associated with Equipoise products, matched DMAs were paired based on previous velocity, household income and baseline conversion rates. Paid media was placed only in test markets, while the control markets remained dark for four weeks. Pricing, promotions and distribution remained the same in both groups. The read was in terms of net lift in unit sales and new to brand rate - not ROAS. Test markets indicated a nine percent lift in units and six point increase in new buyers over control. Amazon and Walmart dashboards both overstated impact by close to two times when we look at last click. The holdout was a more quiet and believable story. Instacart needed a shorter window because it has faster purchase cycles, so on the same design, it ran for fourteen days with tighter matching. Results still retained directionality. The key discipline was the resisting platform level optimizing mid test. No bid changes. No creative swaps. Stability was a protection of the signal. Incrementality is apparent when the noise is removed and comparisons remain fair.
Here's what actually works. We shut off all ad spend in specific zip codes for a month, then compared sales to areas where we were still running ads. That gave us the real story, not the misleading one you get from just counting last-click conversions. If you need to prove your stuff is working, run a test in a few controlled regions. It's a simple setup but the results don't lie.
In our agency, we measure incrementality across retail media networks such as Amazon, Walmart, and Instacart through ad stock decay calibration per RMN. Each network trains shoppers to respond on a different timeline, so impressions do not vanish at the moment of click or purchase. We model a decay curve for each retailer at the SKU level, estimating how long media influence lingers and how fast it fades, then attribute sales only within that realistic window rather than crediting a single touch. This approach works because Amazon search ads often drive fast response, while Walmart and Instacart show longer consideration tied to replenishment habits. Calibrating decay per RMN respects those differences and strips out inflation that last click reporting loves to claim. It also creates a common measurement language across networks, since every sale gets weighted according to remaining media influence, not proximity to checkout. One example came from a household essentials client running sponsored media on Amazon and Instacart. Click reports painted both networks as top performers, yet ad stock calibration told a cleaner story. Amazon showed a short decay with an incremental lift near 11 percent, while Instacart revealed a slower fade and a true lift close to 6 percent, guiding smarter spend levels and steadier growth without chasing phantom credit.