Programmatic Advertising Is a Systems Problem, Not a Marketing Problem

The companies winning in ad-tech are not the ones with better algorithms. They are the ones with faster systems.

By Anokuro Engineering·Jan 27, 2026·Ad Tech

There is a dirty secret in programmatic advertising that nobody talks about at conferences. The companies that win the most auctions are not the ones with the best machine learning models. They are not the ones with the richest data. They are the ones whose systems respond fastest.

In a 100ms auction, the fastest bidder has a structural advantage that no algorithm can overcome. We have the production data to prove it.

The 100ms Auction

Here is how an OpenRTB auction works. A user loads a webpage. The publisher's ad server sends a bid request to the supply-side platform. The SSP fans out that request to 5-20 demand-side platforms. Each DSP has 100ms to respond with a bid. The SSP collects all bids that arrive within the deadline, runs a first-price auction, and the winner's ad is rendered.

The critical detail: bids that arrive after 100ms are discarded. Not penalized. Discarded. A $50 CPM bid that arrives at 101ms loses to a $0.10 CPM bid that arrived at 80ms. The exchange does not care about your bid quality if your system is slow.

We analyzed 180 million auction outcomes from three major SSPs operating in Southeast Asia. The data is unambiguous: DSPs that consistently respond in the first quartile of response times win 31% more auctions than DSPs in the fourth quartile, controlling for bid price. The fast bidder wins ties. The fast bidder gets considered for impressions where the slow bidder times out entirely. The fast bidder has more time to compute a better bid because their overhead is lower.

Speed is not an optimization. Speed is the product.

How Latency Compounds

Most ad-tech engineers have never profiled their bid pipeline end to end. When they do, they discover that latency does not come from one slow component. It comes from everywhere.

Here is the anatomy of a typical DSP bid request, with timings from a well-known platform we benchmarked during a consulting engagement (they declined to be named):

| Stage | Time | |-------|------| | TLS termination and HTTP parsing | 1.2ms | | OpenRTB request deserialization (JSON) | 3.8ms | | User lookup (Redis cluster, cross-AZ) | 4.1ms | | Feature assembly for ML model | 2.3ms | | Model inference (TensorFlow Serving, gRPC) | 12.6ms | | Bid calculation and business rules | 1.4ms | | Creative selection (another service call) | 6.2ms | | Frequency cap check (another Redis call) | 3.1ms | | Fraud scoring (external API) | 8.4ms | | Response serialization (JSON) | 1.8ms | | Total | 44.9ms |

Add 20-40ms of network transit in Southeast Asia and you are at 65-85ms. That leaves 15-35ms of slack before timeout. One garbage collection pause, one network retry, one slow Redis shard, and the bid is dead.

This is not a bad system. This is a typical system. Most DSPs we have examined look like this or worse. The ML inference step alone consumes more time than our entire pipeline.

Why Most Stacks Are Slow

Three architectural decisions account for 80% of the latency in typical ad-tech stacks.

The JVM. Java and Scala dominate ad-tech infrastructure because the ecosystem is mature and engineers are abundant. But the JVM's garbage collector is fundamentally incompatible with latency-sensitive workloads. A G1GC pause of 15ms is considered good by JVM standards. In a 100ms auction, it is 15% of your budget gone to memory management. ZGC and Shenandoah reduce pause times but introduce throughput overhead and do not eliminate pauses entirely. We have seen production JVM bid engines with P99 latencies exceeding 60ms due to GC, while P50 sits at a reasonable 12ms. The P99 is what kills you because the SSP does not care about your median.

Python on the critical path. The standard ad-tech ML stack is: train models in Python, serve models in Python (or a Python-wrapped C++ serving layer like TensorFlow Serving). The model inference step in the table above is 12.6ms. That is not because the model is complex. It is a gradient-boosted tree with 200 features. The latency comes from gRPC serialization, Python's GIL, feature vector construction in NumPy, and the overhead of crossing the Python-C boundary hundreds of times per request. A hand-optimized decision tree in Zig evaluates the same model in 0.08ms. That is not a typo. 0.08ms versus 12.6ms. A 157x improvement by eliminating the abstraction layers.

Too many microservice hops. The table above contains four network calls: user lookup, model inference, creative selection, and fraud scoring. Each network call adds serialization overhead, TCP round-trip time, potential queuing delay, and a failure mode. The microservice architecture that makes deployments easy makes latency hard. Every network boundary is a latency tax that compounds under load.

The Anokuro Approach

Our entire bid pipeline runs in a single process on a single machine. There are zero network calls on the hot path.

Here is our pipeline on identical hardware:

| Stage | Time | |-------|------| | TCP accept + TLS (io_uring) | 0.3ms | | OpenRTB deserialization (binary codec) | 0.4ms | | User segment lookup (in-process hash table) | 0.06ms | | Feature assembly (pre-computed, memory-mapped) | 0.1ms | | Model evaluation (compiled decision tree) | 0.08ms | | Bid calculation + business rules | 0.12ms | | Creative selection (in-process index) | 0.09ms | | Frequency cap check (local Bloom filter) | 0.04ms | | Fraud scoring (inline signal evaluation) | 0.18ms | | Response serialization (binary codec) | 0.2ms | | Total | 1.65ms |

Under load at 200,000 requests per second, the P99 rises to 4.8ms. There is no garbage collector to spike it. There is no network call to add variance. The P99/P50 ratio is 2.9x, compared to 5x+ that we typically see in JVM-based systems.

The key architectural decisions:

Vertical integration. Every component that touches the hot path lives in the same address space. User segments are memory-mapped from AnokuroDB. The ML model is compiled to a Zig function at build time using our model compiler. Creative metadata is an in-process index rebuilt from the Gleam coordination layer every 2 seconds. Fraud signals are evaluated inline. There is nothing to call because everything is here.

Binary protocols everywhere. We do not parse JSON on the hot path. Our SSP integrations use a custom binary codec that maps OpenRTB fields to fixed offsets in a byte buffer. Deserialization is a bounds check and a pointer cast, not a parse tree walk. This is ugly and it is fast. We maintain compatibility by writing codec generators in TypeScript that produce Zig deserialization code from OpenRTB schema definitions.

No garbage collector, no allocator on the hot path. Each bid request is processed using a pre-allocated arena. The arena is 8KB, sized to the maximum request we will process. When the request completes, the arena resets with a single pointer decrement. Zero memory is allocated or freed during bid processing. This eliminates allocation as a source of latency variance.

Model compilation, not model serving. We do not serve ML models. We compile them. Our TypeScript toolchain takes a trained gradient-boosted tree (trained in Python, because the training ecosystem is genuinely good) and generates a Zig function that evaluates the tree with no branches, no pointer chasing, and no framework overhead. The generated function is ~400 lines of Zig with pre-computed lookup tables. It evaluates in 0.08ms because it is doing exactly the math and nothing else.

The Business Case

We track the correlation between our response latency and auction win rate on a per-exchange, per-region basis. The data from Q4 2025:

For every 1ms of response latency reduction on Indonesian inventory, our win rate increased by 0.6%. We reduced our median response time by 12ms when we moved to edge nodes. That corresponded to a 7.2% win rate improvement on Indonesian inventory alone. At our daily spend levels in Indonesia, that translates to $4,100 per day in additional winning bids at equivalent or better CPMs.

The relationship is not perfectly linear. Below about 3ms response time, the returns diminish because we are already in the fastest cohort and the marginal gain from being faster shrinks. But between 5ms and 20ms, every millisecond is worth real money.

This is why we say programmatic advertising is a systems problem. The algorithm decides what to bid. The system decides whether your bid is even heard. You can have the most sophisticated targeting model in the industry, and if your P99 latency causes you to miss 8% of auctions, you are leaving 8% of your potential revenue on the floor.

We chose Zig because it lets us build systems that are fast by default. We chose vertical integration because every network hop is a latency gamble. We chose to build our own database because every general-purpose database we tested was too slow.

The companies that treat ad-tech as a marketing problem hire data scientists and buy bigger GPU clusters. The companies that treat it as a systems problem hire systems engineers and measure microseconds. We know which approach wins more auctions.