The Advertising Company That Writes Database Engines

People ask why an advertising company builds its own database. The real question is why more of them do not.

By Anokuro Engineering·Mar 15, 2026·Infrastructure

Every few weeks, someone learns that we build our own database and reacts with some variation of "that is insane." A 12-person advertising company in Singapore maintaining a custom storage engine written in Zig. From the outside, it looks like an indulgent engineering vanity project. From the inside, it is the most consequential technical decision we have made.

The argument is simple: advertising is a data processing business. Every other activity, creative optimization, targeting, auction logic, attribution, fraud detection, is a computation over data. If you outsource the data layer, you outsource the thing that determines whether your computations are fast, correct, and cheap. You outsource your competitive advantage.

The Historical Precedent

This is not a novel insight. Every company that became dominant in advertising did so by building custom data infrastructure.

Google did not win search advertising by having better ad creatives. They won because they built BigTable and MapReduce, which let them process the entire web's click graph faster than anyone else. The data infrastructure enabled the advertising product, not the other way around.

Facebook did not build RocksDB because they wanted to contribute to open source. They built it because their social graph workload (billions of reads per second with real-time writes) was destroying their MySQL fleet, and the existing key-value stores did not have the write amplification characteristics they needed. RocksDB enabled the ads targeting that generates 97% of Meta's revenue.

Bloomberg built their own everything: terminal protocol, data distribution system, real-time messaging, time-series storage. People said it was insane then too. Bloomberg Terminal has been the dominant platform in financial data for 40 years.

The pattern is consistent: companies that own their data infrastructure outcompete companies that rent it. The ones that rent it can only move as fast as their vendors allow and can only differentiate on the layers above the database, which is where differentiation matters least.

What AnokuroDB Enables

We have written about AnokuroDB's architecture before. Here is what matters from a product perspective: the capabilities that would be impossible or prohibitively expensive with any off-the-shelf database.

Sub-millisecond ad event ingestion with real-time aggregation. When an impression event arrives, it is written to the WAL and simultaneously updates 4 materialized views (per-minute, per-hour, per-day, per-campaign rollups) in a single atomic operation. The write path takes 0.3ms at P99. No off-the-shelf database provides atomic writes with simultaneous materialized view updates at sub-millisecond latency because general-purpose databases do not know about our materialized view schema at the storage engine level.

We built the materialized view maintenance directly into the storage engine's write path. When the memtable receives an event, a comptime-generated callback updates the corresponding rollup entries. The rollup structures are co-located in the same memtable, so the updates are sequential memory writes with no I/O. This is not a trigger. It is not a background process. It is integrated into the write operation itself. No general-purpose database offers this because it would require exposing storage engine internals to application-specific logic, which violates every abstraction boundary that database designers hold sacred.

Custom composite indexing for multi-dimensional targeting. Ad targeting operates across multiple dimensions simultaneously: geographic region, device type, user segment, time of day, publisher category, creative format. A bid request asks: "give me the highest-bidding campaign that targets users in segment X, in Thailand, on Android, at 8pm, on a news publisher, for a 300x250 display slot."

In a traditional database, this is either a multi-column index (which works well for prefix queries but poorly for arbitrary dimension combinations) or multiple index lookups with an intersection step (which is slow). We built a custom index structure: a multi-dimensional interval tree where each dimension is a range (geographic hierarchy, device category set, time window). Campaign targeting criteria are inserted as multi-dimensional rectangles. A bid request is a point query in this multi-dimensional space. Lookup time: 0.08ms for 40,000 active campaigns.

We tried implementing this on top of PostgreSQL with GiST indexes. The equivalent query took 4.2ms. On Redis with sorted sets and intersection logic: 1.8ms. Our custom index is 22x faster than PostgreSQL and 50x faster than Redis for this specific operation, because we built the index for exactly this query pattern and nothing else.

Co-located compute for real-time fraud detection. Fraud detection must happen during the bid decision, not after. If a bid request is fraudulent (bot traffic, click farm, spoofed device), we need to know before we bid, not in a post-impression reconciliation process.

Our fraud detection model runs inside AnokuroDB as a co-located computation. When a bid request triggers a segment lookup, the storage engine simultaneously evaluates the request against fraud signals stored alongside the user segment data: request frequency from this IP in the last 60 seconds, device fingerprint consistency score, publisher-level fraud risk score. The fraud evaluation reads data that is already in the memtable or block cache from the segment lookup. There is no additional I/O. The fraud decision adds 0.04ms to the bid path.

Running fraud detection as a separate service that queries a separate database would add 1-3ms for the network hop and an additional database query. On a 10ms total latency budget, that is 10-30% of the budget spent on one feature. Co-locating the computation with the data eliminates this overhead entirely.

The Team That Makes This Possible

We are 12 engineers. Three work on AnokuroDB full-time. Two work on the ad-serving pipeline. Two on the dashboard and advertiser tools. Two on the SDK and creative delivery. One on the attribution and analytics pipeline. One on infrastructure and operations. One splits time between ML model training and data engineering.

This is not enough people to build a database, an ad platform, an analytics pipeline, a dashboard, and a mobile SDK. By conventional staffing models, we should need 40-60 engineers. We operate with 12 because of three decisions:

Technology selection is ruthless. We use three languages: Zig for performance-critical systems (database, bid server, creative transcoder), Gleam for concurrent data processing (attribution pipeline, event routing, real-time aggregation coordination), and TypeScript for everything with a user interface (dashboard, advertiser tools, SDK). Three languages. Not seven. Not "whatever the team prefers." Three, chosen for specific technical reasons, used consistently.

When an engineer switches from working on the database to working on the attribution pipeline, they switch from Zig to Gleam. That is it. They do not need to learn a new build system, a new deployment model, a new observability stack. Our CI pipeline, deployment tooling, monitoring, and alerting are the same for every service.

We do not microservice. Our production infrastructure runs 4 services: the bid server (Zig), the event pipeline (Gleam), the API server (TypeScript on Bun), and AnokuroDB (Zig). Four services, not forty. Each one is a single binary deployed to bare-metal servers. There is no Kubernetes. No service mesh. No API gateway. No container orchestration. We deploy by copying a binary to a server and restarting the process. Deploys take 8 seconds.

We build only what differentiates us. We use Cloudflare for CDN and DDoS protection. We use Stripe for payment processing. We use Linear for project management. We use GitHub for source control. We did not build any of these because building them would not make our ads faster, our targeting better, or our analytics more accurate. We built a database because it makes all three of those things measurably better. The build/buy decision is not about capability. It is about competitive advantage.

The Gap Is Closing

There is a persistent mental model in the industry that separates "tech companies that do advertising" (Google, Meta, Amazon) from "advertising companies that do tech" (everyone else). The first group builds infrastructure. The second group assembles SaaS products.

This distinction is dissolving. The tools to build custom infrastructure have improved dramatically. Zig gives a small team systems programming capability that previously required a C/C++ team of 30. Gleam gives us Erlang-grade concurrency with a type system that prevents entire categories of bugs. Modern NVMe hardware gives a single server the I/O throughput that required a storage cluster 5 years ago.

The infrastructure advantage that Google and Meta built with thousands of engineers and billions of dollars is now achievable with 12 engineers and off-the-shelf hardware. Not at their scale, obviously. But at the scale that matters for a regional ad platform serving hundreds of millions of impressions per month.

We are building the company that proves this thesis: that a small, technically excellent team can compete with platforms backed by orders of magnitude more resources by owning the infrastructure that determines performance. The database is the foundation. The ad platform is the product. The performance is the moat.

People ask why an advertising company builds its own database. The question contains its own answer. We build it because we are an advertising company, and advertising is a data processing business, and you do not win a data processing business by renting your data layer from someone who does not understand your workload.

The real question is why more advertising companies do not build their own. The answer, we suspect, is that most ad-tech companies do not think of themselves as technology companies. They think of themselves as media companies that use technology. We think of ourselves as a systems engineering company that happens to sell advertising. The database is proof.