📊 Design Ad Click Aggregation — System Design Interview Guide
Hard · Data Pipelines & Analytics
Design a real-time ad click aggregation system that counts ad clicks per minute/hour, detects fraud, and provides query capabilities for billing and analytics at scale.
Open the interactive Ad Click Aggregation design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Record every ad click (adId, userId, timestamp, source)
- Aggregate click counts per adId per minute and per hour
- Query: "how many clicks did ad X get in the last N minutes?"
- Detect fraudulent clicks (same user, same ad, < 1 min apart)
- Generate billing report: total clicks per advertiser per day
- Support filtering by device type, country, ad campaign
Non-functional requirements & scale
- 10B ad clicks per day (~115K clicks/sec at peak)
- Aggregation latency: counts available within 1 minute of click
- Query latency for last 1h aggregation < 1 second
- Exactly-once counting — no double-billing
- Data retention: raw clicks 7 days; aggregates 1 year
- System must handle late-arriving events (up to 5 min late)
Capacity estimation
115K clicks/sec = high write throughput. Raw click events must be ingested and aggregated. Two paths: hot path (real-time, Kafka + Flink) for near-real-time aggregates; cold path (batch, S3 + Spark) for accurate billing. Reconcile hot and cold path results for billing.
Core entities
- ClickEvent — clickId, adId, userId, campaignId, deviceType, country, timestamp, ipHash
- ClickAggregate — adId, windowStart, windowType (minute|hour|day), clickCount, uniqueUsers
- FraudFlag — clickId, reason, detectedAt
API design
POST /api/v1/clicks— Record click. Body: { adId, metadata }. Fire-and-forget to Kafka.GET /api/v1/ads/:adId/clicks?start=&end=&granularity=minute— Query aggregated click counts with time range.GET /api/v1/campaigns/:id/report— Billing report for a campaign.
High-level design
Click → Kafka → Hot Path (Flink windowed aggregation → Redis/OLAP) + Cold Path (S3 → Spark batch → Data Warehouse). Query API reads from OLAP DB. Fraud detection as separate Flink job.
Deep dives
🌊 Flink Windowed Aggregation
Tumbling window: fixed 1-minute non-overlapping buckets. Sliding window: "last 5 minutes" updated every 30 seconds. Session window: group clicks until 30s of inactivity. Flink counts per (adId, window). Late events: allow 5-min grace period — extend window until watermark passes. Checkpoint state to HDFS/S3 for fault tolerance. Exactly-once via Kafka offset commits + 2-phase commit.
🕵️ Fraud Detection
Real-time: Flink stateful function per (userId, adId). Track last click timestamp per pair. If same user, same ad, < 60s: mark as fraud. IP-based: > 20 clicks/min from same IP = bot. Behavioral: consistent click intervals = scripted bot. Pattern matching via CEP (Complex Event Processing) in Flink. Fraudulent clicks counted but flagged — excluded from billing.
🔄 Lambda Architecture
Hot path: Kafka → Flink → ClickHouse. Available in ~30 seconds. May have small inaccuracies (late events, processing failures). Cold path: S3 raw data → Spark batch job (nightly) → Data Warehouse. 100% accurate. Reconciliation: at billing time, use cold path numbers (authoritative). Hot path for real-time dashboard only. Kappa architecture: replace batch with longer-retention stream — simpler.
💰 Billing Accuracy
Billing requires exactly-once counting. Approach: process raw events in Spark with deduplication (GROUP BY clickId). Any duplicate clickId → count once. Reconcile: billing job reads from S3 (raw, immutable), not from stream aggregates. Idempotent billing run: re-running on same day produces same number. Fraud exclusion: join with fraud flags before counting.
Scaling considerations
- Kafka with 100 partitions for click topic — parallel ingestion at 115K/sec
- ClickHouse columnar storage for OLAP queries on 10B rows (< 1s aggregation)
- Flink auto-scales with Kafka partition count
- S3 Parquet files for cold storage — efficient Spark processing
- Click dedup: Redis Set with TTL 60s per (userId, adId) for hot path fraud check
What interviewers expect by level
- Junior: Describe click event ingestion, Kafka, basic aggregation. Understand why OLAP DB is better than MySQL for analytics.
- Mid: Flink windowed aggregation, late event handling, fraud detection, Lambda vs Kappa architecture.
- Senior: Exactly-once semantics, reconciliation between hot/cold paths, ClickHouse schema design, billing accuracy.
- Staff: Multi-region click collection, privacy (GDPR cookie consent), cost at 10B events/day, A/B test attribution.
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Google Drive / Dropbox
- Design Instagram
- Design Type-Ahead Search
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Pastebin
- Design Notification System
- Design Rate Limiter (Standalone)
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Live Streaming (Twitch)
- Design Distributed Key-Value Store
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…