💳 Design UPI Payment Gateway — System Design Interview Guide

Hard · Payments & Fintech

Design a UPI-based payment system like Google Pay or PhonePe that handles instant bank-to-bank transfers, handles 1B+ transactions per day, and ensures zero double spends.

Open the interactive UPI Payment Gateway design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

UPI works via NPCI (National Payments Corp. of India) as the switch. Each app (PSP) sends payment instruction to NPCI which debits payer bank and credits payee bank. Idempotency is critical — network retries must not cause double debit.

Core entities

API design

High-level design

Payment request → idempotency check (Redis) → write PENDING to DB → send to NPCI switch → async response → update DB to SUCCESS/FAILED → notify user via WebSocket/push.

Deep dives

🔄 Idempotency & Exactly-Once

Client generates UUID idempotencyKey. Payment Service: Redis SET key txnId NX EX 300. If SET fails, key exists — return existing txnId (duplicate detected). Network retry safe. NPCI also assigns unique transaction reference. DB: INSERT with UNIQUE constraint on idempotencyKey. Never retry against NPCI without same reference number.

📊 Transaction State Machine

States: INITIATED → PENDING → SENT_TO_NPCI → DEBIT_SUCCESS → CREDIT_SUCCESS (COMPLETED) or FAILED or REVERSED. Each state transition appended to TransactionLog (immutable audit). Use DB row version lock for concurrent state updates. Saga pattern for multi-step: debit payer bank → credit payee bank → confirm.

⚡ 50K TPS on Festival Days

Horizontal scale the stateless Payment Service. Redis cluster for idempotency checks (sub-millisecond). DB write bottleneck: use Postgres connection pooling (PgBouncer) + write-ahead log batching. NPCI rate limit: queue overflow in Kafka; process with backpressure. Circuit breaker on NPCI — fall back to "pending" with retry.

🔐 Security

UPI PIN never leaves device — encrypted with device key + server public key. MPIN validation in HSM (Hardware Security Module). TLS 1.3 for all transport. Bank account numbers tokenized — system stores token, not actual account. All transactions signed with customer certificate. Fraud detection ML model scores each transaction in <50ms.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…