Real-Time Bidding at the Edge: Why Centralized Ad Servers Are Dead
We moved our bid engine to edge nodes across Southeast Asia. Latency dropped 73%. Win rates climbed.
A programmatic auction gives you 100 milliseconds. That is the entire window between receiving a bid request and returning a response. If your response arrives at 101ms, it is discarded. You do not lose on price. You are not even in the room.
For the first year of Anokuro's operation, our bid engine ran in a single data center in Singapore. It worked. Singapore has excellent connectivity to the rest of Southeast Asia, and our Zig bid engine processes a complete auction in under 8ms. The network was the problem.
A bid request originating from a Jakarta publisher takes 28-34ms to reach Singapore. The response takes another 28-34ms back. That is 56-68ms of pure network transit, leaving us 32-44ms for everything else. We were winning auctions, but we were winning them with a hand tied behind our back.
So we moved the bid engine to the edge. Latency dropped 73%. Win rates climbed 18%.
The Problem with Centralized RTB
The real-time bidding ecosystem was designed in an era when ad-tech companies operated from one or two data centers in Northern Virginia or Amsterdam. The protocol assumes a world where everyone is close to the exchange. In Southeast Asia, that assumption is catastrophically wrong.
Here are the round-trip times we measured from major SSP exchanges to our Singapore data center:
- Jakarta: 28-34ms
- Bangkok: 18-22ms
- Ho Chi Minh City: 24-30ms
- Manila: 32-40ms
- Kuala Lumpur: 4-8ms
When the auction timeout is 100ms and network transit consumes 30-68ms of it, you are not competing on bid quality. You are competing on geography. A competitor with a server in Jakarta has 60ms more compute budget than we do for Indonesian inventory. That is not a minor advantage. That is a structural one.
The Edge Architecture
We now run lightweight Zig bid engines at 11 edge locations across Southeast Asia. Each edge node is a single bare-metal server with 64GB RAM and NVMe storage, running our complete bid pipeline: request parsing, user segment lookup, frequency cap check, bid calculation, creative selection, and response serialization.
The edge bid engine is a 4.2MB statically linked binary. It starts in 120ms. There is no JVM warmup, no container orchestration, no sidecar proxies. The binary is the server.
Each edge node connects to a Gleam coordination layer that handles the operations that require global consistency: budget management, pacing, and campaign configuration distribution. The coordination layer runs on three servers in Singapore. The edge nodes do not talk to each other directly.
The deployment model is deliberately simple. We use rsync to push new binaries to edge nodes. We do not use Kubernetes. We do not use service meshes. The edge nodes are stateless except for a local cache of user segments and campaign rules. If an edge node dies, traffic fails over to the next-closest node via DNS. Recovery time is under 30 seconds.
State at the Edge
The hard problem is not running compute at the edge. The hard problem is state.
A bid engine needs to answer questions that require shared state: How much budget remains for campaign X? How many times has user Y seen this creative today? What targeting rules apply to this inventory? In a centralized architecture, you query a single database. At the edge, that database is 30ms away, which defeats the entire purpose.
We solved this with three strategies, each tuned to the consistency requirement of the data it manages.
Budget counters are eventually consistent. Each edge node maintains a local budget counter per campaign. Every 500ms, edge nodes report their spend to the Gleam coordinator, which reconciles totals and pushes updated remaining budgets back. This means an edge node might overshoot a campaign budget by up to 500ms worth of spend before it learns the campaign is exhausted. In practice, for a campaign spending $50 per hour across 11 nodes, the maximum overshoot is about $0.007. We account for this in our billing reconciliation. Advertisers do not notice because they have never noticed at any ad-tech company ever. The budget precision they expect is plus or minus 1-2%.
Frequency caps are enforced locally with periodic sync. Each edge node maintains a local Bloom filter of recent impressions per user-campaign pair. The false positive rate is 0.8%, meaning we occasionally suppress an ad we could have shown. We prefer this to the alternative: showing an ad one too many times. The Bloom filters are synchronized across nodes every 2 seconds via the coordinator. For frequency caps of 3 impressions per day, the 2-second sync window introduces negligible drift. We validated this against our centralized system's logs: the edge implementation matches within 0.3% of the centralized frequency cap decisions.
Targeting rules are pre-distributed. Campaign targeting configurations (geo, device, context, audience segments) are pushed to all edge nodes whenever a campaign is created or modified. This happens through the Gleam coordinator's pub-sub system. Propagation latency is under 200ms to all 11 nodes. Since campaign changes happen at human timescales (minutes to hours), 200ms propagation is invisible.
The Gleam Coordinator
We chose Gleam for the coordination layer because budget management is a concurrency problem, not a computation problem. The coordinator handles 11 edge nodes reporting spend data, campaign managers updating budgets, and pacing algorithms adjusting bid shading, all concurrently.
Gleam runs on the BEAM VM, which gives us lightweight processes, fault isolation, and message passing without shared memory. A budget reconciliation crash for one campaign does not affect any other campaign. The let-it-crash philosophy is genuinely useful here, not just a talking point.
The coordinator exposes a binary protocol to edge nodes, not HTTP. Each message is a fixed-size struct: 8 bytes for campaign ID, 8 bytes for spend delta, 8 bytes for timestamp. At 11 nodes reporting every 500ms, the coordinator processes 22 messages per second per campaign. For our current scale of 4,200 active campaigns, that is 92,400 messages per second. The BEAM handles this without breaking a sweat.
The Results
We ran the edge architecture in shadow mode for three weeks, processing real bid requests at both the centralized and edge systems and comparing outcomes.
Latency reduction: 73%. Median bid response time dropped from 14.2ms (including network transit) to 3.8ms. The compute portion remained roughly the same at 5-7ms. The difference is entirely network transit eliminated.
Win rate improvement: 18%. This is the number that pays for everything. We are now consistently among the first responders to auctions on Indonesian, Thai, and Philippine inventory. Exchanges use a first-price auction model, and when multiple bids arrive at the same price, earlier responses often win. Our edge latency advantage translates directly to auction wins.
Infrastructure cost reduction: 12%. This was unexpected. The 11 edge servers cost more than our centralized setup, but we reduced our bandwidth costs significantly. Bid requests and responses no longer traverse international links. The local bandwidth in Jakarta and Bangkok is substantially cheaper than cross-border transit through Singapore.
Availability improvement. During a submarine cable incident in December 2025 that degraded Singapore-Jakarta connectivity for six hours, our edge node in Jakarta continued serving bids without interruption. Under the centralized architecture, we would have lost all Indonesian inventory for those six hours. At our traffic volume, that is roughly $14,000 in lost revenue.
Why Most Ad-Tech Cannot Do This
The typical ad-tech stack is a collection of microservices running in Kubernetes, usually on AWS, usually in us-east-1. The bid engine depends on a feature store, a model serving layer, a budget service, a creative selection service, and a fraud detection service. Each of these is a network hop. You cannot deploy this at the edge because the edge node would need to run the entire microservice graph, and at that point you have just replicated your data center 11 times.
Our bid engine is a single binary that does everything. Feature lookup is an in-process hash table read. Model inference is a hand-optimized decision tree compiled into the binary. Budget checking is a local counter. Fraud scoring is an inline Bloom filter probe. There are zero network hops on the hot path.
This is not an accident. We designed the system from day one to be deployable as a single process on a single machine. The edge deployment was always the plan. The centralized architecture was the temporary state while we built the coordination layer.
If your architecture requires a network hop to make a bid decision, you cannot move to the edge. And if you cannot move to the edge, you are paying a permanent latency tax that directly reduces your win rate. In Southeast Asia, where the network distances are long and the inventory is growing fast, that tax is fatal.
We are not going back to centralized. The numbers do not allow it.