Ad Fraud Detection in Microseconds
Most fraud detection happens after the money is spent. Ours happens before the bid is placed.
The ad fraud industry steals approximately $84 billion per year from advertisers. The detection industry that exists to fight it operates on a model that guarantees it will always be too late: detect fraud after the impression is served, then request a refund.
This is absurd. The advertiser's money is already spent. The fraudster already has it. The refund process takes weeks, assumes the SSP cooperates, and recovers a fraction of the loss. Post-bid fraud detection is an insurance claim filed after the house has burned down.
We detect fraud before we bid. Our Zig scoring engine evaluates 47 fraud signals in under 200 microseconds per request. If the request looks fraudulent, we do not bid. The money never leaves.
The Cost of Post-Bid Detection
The standard approach to ad fraud detection in the industry works like this: a DSP buys an impression, serves an ad, and fires a verification pixel. The verification vendor (DoubleVerify, IAS, MOAT) analyzes the impression after the fact and flags it as fraudulent or valid. The DSP then files an invalid traffic (IVT) claim with the SSP for a credit.
The problems with this approach:
The money is already committed. When we submit a bid and win, we owe the SSP the bid price. The verification result arrives 200-500ms after the impression. By then, the bid has cleared, the budget has been debited, and the fraudster's page has registered the impression. The IVT claim process can take 30-90 days for resolution. During that time, the advertiser's budget is depleted by fraudulent impressions, reducing their reach to real users.
Recovery rates are poor. In our analysis of IVT claims across three SSPs in Southeast Asia during 2025, the average recovery rate was 62%. That means 38% of confirmed fraudulent spend was never refunded. The reasons range from contractual limitations to disputes over fraud classification to SSPs that simply do not process claims efficiently.
It distorts optimization. Our bid optimization models learn from impression data. If 8% of impressions are fraudulent (the industry average for display in Southeast Asia), the model is learning from poisoned data. Fraudulent impressions have different click and conversion patterns than real impressions. The model adjusts its bidding behavior based on fraudulent signals, which compounds the problem.
Verification adds cost. Third-party verification services charge $0.01-0.05 per impression measured. At our scale of 400 million daily impressions, that is $4-20 million per year for the privilege of being told after the fact that we wasted money.
Pre-Bid Fraud Detection
Our approach is the opposite. Every bid request is scored for fraud before we decide whether to bid. If the fraud score exceeds our threshold, we skip the auction entirely. Zero spend on fraudulent inventory. Zero IVT claims to file. Zero verification vendor fees.
The constraint is speed. Our total bid pipeline runs in under 5ms. We cannot add 50ms of fraud analysis. The fraud scoring engine must be fast enough to be invisible in our latency budget.
Our Zig fraud scoring engine evaluates 47 signals in 180 microseconds (0.18ms). That is 3.6% of our total bid processing time. For context, a single Redis lookup across a network boundary takes 20-50x longer.
The 47 Signals
Our fraud signals fall into five categories. We are deliberately vague about the exact signals and their weights because specificity helps fraudsters. But the categories and representative examples are instructive.
Device and Environment Signals (12 signals)
We analyze the device information in the bid request for internal consistency. A bid request claiming to be from an iPhone 15 running iOS 17 with a screen resolution of 1920x1080 is lying. iPhones do not have that resolution. A request from a "Samsung Galaxy S24" with a User-Agent string containing "Linux x86_64" is a desktop machine pretending to be mobile.
These signals are evaluated as entropy scores. A real device has consistent properties. A spoofed device has properties drawn from different distributions that do not co-occur naturally. We maintain a device property database of 8,400 device models with their valid property ranges, compiled from our own impression data. The lookup is a hash table probe: 0.02ms.
Device fingerprint entropy is particularly powerful. We compute a Shannon entropy score over the device properties in the bid request. Real devices have low entropy because their properties are deterministic (a specific model has a specific screen size, OS version range, and browser version range). Spoofed devices have high entropy because the fraudster randomizes properties independently, producing combinations that are individually valid but collectively improbable. Our entropy threshold catches 34% of all fraud we detect.
Behavioral Signals (9 signals)
We track request patterns per IP address, device ID, and publisher using sliding-window counters in our Zig database. The signals include:
- Request velocity: More than 40 bid requests per minute from a single device is not human browsing behavior. It is a bot refreshing ad slots.
- Session pattern: Real users have variable inter-request intervals. Bots have uniform intervals. We measure the coefficient of variation of inter-request timing. A CV below 0.15 over 20+ requests is mechanical.
- Inventory cycling: A single device generating bid requests across 50+ different publisher domains in 5 minutes is not a user with many browser tabs. It is a bot rotating through a list of compromised sites.
These counters are maintained in AnokuroDB with sub-millisecond read latency. Each behavioral signal is a single point-read: 0.06ms per lookup, and we batch them into a single multi-get that completes in 0.09ms for all 9 signals.
IP Reputation Signals (8 signals)
We maintain a reputation database of 240 million IP addresses, updated continuously from our bid stream and augmented with data from public threat intelligence feeds. The database is stored as a Bloom filter hierarchy that fits in 380MB of RAM.
The hierarchy works like this: we maintain separate Bloom filters for different fraud categories (data center IPs, known botnet command-and-control ranges, residential proxy exits, VPN endpoints). Each filter has a false positive rate of 0.1%. We probe all filters in parallel. A single Bloom filter probe takes 0.003ms. Eight probes take 0.024ms.
The IP reputation system is where our database investment pays off. The 240 million IP entries with their fraud scores, last-seen timestamps, and category classifications live in AnokuroDB. When a new IP appears (not in the Bloom filter), we do a full database lookup in 0.7ms and update the Bloom filter. For known IPs (98.3% of traffic), the Bloom filter serves the answer without touching the database.
Traffic Anomaly Signals (11 signals)
These signals detect fraud at the publisher level rather than the device level. A legitimate publisher has traffic patterns that follow predictable distributions: higher traffic during business hours, lower on weekends, geographic concentration matching the publisher's audience.
We compute rolling statistics per publisher: request volume by hour, geographic distribution of requests, device type distribution, and viewability metrics. When a publisher's traffic pattern deviates from its baseline by more than 3 standard deviations on any dimension, the anomaly signals fire.
The most effective anomaly signal is geographic entropy shift. A publisher targeting Indonesian audiences should have 80%+ of traffic from Indonesian IPs. If that drops to 40% over a 6-hour window while total traffic increases, the publisher is likely buying bot traffic to inflate their inventory. We have caught 23 publisher fraud operations in Southeast Asia using this signal alone.
Creative Interaction Signals (7 signals)
These signals analyze patterns in how users interact with our ads after they are served. While this is technically post-impression data, we use it to update fraud scores for future bid requests from the same sources.
Click timing distribution, click coordinates (are all clicks in the exact same pixel?), post-click bounce rate, and conversion pattern analysis all feed back into the scoring engine. A publisher where 95% of clicks occur within 200ms of ad render and all clicks land on the same coordinate is running click injection. The feedback loop updates the publisher's fraud score within 2 seconds, suppressing future bids.
The Data Pipeline
The fraud scoring engine depends on data freshness. A fraudster who spun up a bot farm 10 minutes ago should be detected within minutes, not hours. Our data pipeline is designed for this.
Every bid request we process, whether we bid or not, updates the fraud signal database. Device properties, IP addresses, behavioral counters, and publisher traffic stats are written to AnokuroDB in the same request path. The write is asynchronous (fire-and-forget to io_uring) and takes 0.02ms of the request thread's time.
AnokuroDB's memtable is immediately readable. There is no write-ahead-log-to-query delay. A fraudulent IP address flagged at time T is available for scoring at time T+0. The behavioral counters use atomic increments in the lock-free skip list, so concurrent bid processing threads see updates immediately without locking.
The Bloom filters for IP reputation are rebuilt every 60 seconds from the latest database state. This is the longest delay in our pipeline: a brand-new fraudulent IP takes up to 60 seconds to enter the Bloom filter. During that window, it is caught by the full database lookup path (0.7ms instead of 0.024ms, but still within budget).
Results
We have been running pre-bid fraud detection in production for 11 months. The numbers:
Detection rate: 96.3%. We measure this by running our pre-bid system in parallel with a post-bid verification vendor on a 5% sample of traffic. Of impressions the verification vendor flags as fraudulent, our pre-bid system would have blocked 96.3% of them. The remaining 3.7% are sophisticated fraud patterns that our signals do not yet cover, primarily SSAI (server-side ad insertion) fraud in CTV inventory.
False positive rate: 0.02%. Of impressions our system blocks, we estimate 0.02% are legitimate traffic incorrectly classified as fraudulent. We measure this by analyzing blocked requests that match legitimate device profiles, come from residential IPs with clean histories, and originate from publishers with no anomaly signals. The 0.02% false positive rate means we lose approximately $180 per day in legitimate bidding opportunities. We will take that trade.
Budget saved: 11.4% of total spend. Before pre-bid fraud detection, 11.4% of our ad spend went to fraudulent inventory (based on post-bid verification data). That spend is now zero. For our current monthly spend volume, this represents approximately $2.1 million per month that stays in advertisers' budgets instead of funding fraud operations.
Verification vendor cost eliminated: $0. We no longer use third-party verification services for fraud detection. We maintain a small verification sample (5% of traffic) for calibration purposes, but the bulk verification spend is gone. Annual savings: approximately $3.2 million.
Optimization model accuracy improved. With fraudulent impressions removed from the training data, our bid optimization models converged 40% faster to optimal bid prices and showed 8% improvement in conversion prediction accuracy. Clean data makes better models. This is obvious in retrospect but difficult to achieve when 11% of your data is adversarial.
Why 200 Microseconds Matters
Every ad-tech company claims to do fraud detection. The difference is where in the pipeline it happens and how fast it runs.
If fraud detection is a post-bid service, it is a cost center that partially recovers losses. If it is a pre-bid check that adds 50ms to your pipeline, it is a latency tax that costs you auction wins. If it is a 180-microsecond inline evaluation, it is free. It costs less time than JSON parsing.
We built the fraud engine in Zig for the same reason we built everything in Zig: because the alternative was too slow. A Python fraud scoring service with a Redis backend takes 8-15ms per evaluation. That is the industry standard. It is also 44-83x slower than what we ship. At 200,000 requests per second, that difference is the difference between pre-bid and post-bid. Between prevention and insurance. Between keeping the money and filing a claim.
The fraudsters are getting more sophisticated. The detection systems need to be faster, not just smarter. We are both.