📊 Design Ad Click Aggregation — System Design Interview Guide

Hard · Data Pipelines & Analytics

Design a real-time ad click aggregation system that counts ad clicks per minute/hour, detects fraud, and provides query capabilities for billing and analytics at scale.

Open the interactive Ad Click Aggregation design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

115K clicks/sec = high write throughput. Raw click events must be ingested and aggregated. Two paths: hot path (real-time, Kafka + Flink) for near-real-time aggregates; cold path (batch, S3 + Spark) for accurate billing. Reconcile hot and cold path results for billing.

Core entities

API design

High-level design

Click → Kafka → Hot Path (Flink windowed aggregation → Redis/OLAP) + Cold Path (S3 → Spark batch → Data Warehouse). Query API reads from OLAP DB. Fraud detection as separate Flink job.

Deep dives

🌊 Flink Windowed Aggregation

Tumbling window: fixed 1-minute non-overlapping buckets. Sliding window: "last 5 minutes" updated every 30 seconds. Session window: group clicks until 30s of inactivity. Flink counts per (adId, window). Late events: allow 5-min grace period — extend window until watermark passes. Checkpoint state to HDFS/S3 for fault tolerance. Exactly-once via Kafka offset commits + 2-phase commit.

🕵️ Fraud Detection

Real-time: Flink stateful function per (userId, adId). Track last click timestamp per pair. If same user, same ad, < 60s: mark as fraud. IP-based: > 20 clicks/min from same IP = bot. Behavioral: consistent click intervals = scripted bot. Pattern matching via CEP (Complex Event Processing) in Flink. Fraudulent clicks counted but flagged — excluded from billing.

🔄 Lambda Architecture

Hot path: Kafka → Flink → ClickHouse. Available in ~30 seconds. May have small inaccuracies (late events, processing failures). Cold path: S3 raw data → Spark batch job (nightly) → Data Warehouse. 100% accurate. Reconciliation: at billing time, use cold path numbers (authoritative). Hot path for real-time dashboard only. Kappa architecture: replace batch with longer-retention stream — simpler.

💰 Billing Accuracy

Billing requires exactly-once counting. Approach: process raw events in Spark with deduplication (GROUP BY clickId). Any duplicate clickId → count once. Reconcile: billing job reads from S3 (raw, immutable), not from stream aggregates. Idempotent billing run: re-running on same day produces same number. Fraud exclusion: join with fraud flags before counting.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…