⚖️ Design Online Judge (LeetCode) — System Design Interview Guide

Medium · Sandboxing & Execution

Design an online code judge like LeetCode or HackerRank where users submit code solutions, which are compiled and executed in a sandboxed environment against test cases, returning results within seconds.

Open the interactive Online Judge (LeetCode) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Users submit code in 20+ languages (Python, Java, C++, etc.)
Code is executed against hidden test cases
Return verdict: Accepted, Wrong Answer, TLE, MLE, Compile Error
Show execution time and memory usage per test case
Leaderboard and submission history per user per problem
Admin: add problems with test cases, time/memory limits

Non-functional requirements & scale

100K submissions per day; peak 1K/sec during contests
Execution result returned < 10 seconds
Secure isolation: user code cannot escape sandbox, access network, or harm host
Test case data must not be leaked to users
Fair execution: no user can consume excessive resources
Support up to 10 concurrent test cases per submission

Capacity estimation

Core challenge: run arbitrary user code safely. Need OS-level isolation (Docker/seccomp/cgroups). Queue submissions → worker picks up → run in sandbox → return result. Peak during contests: 1K submissions/sec, each takes 1-3s → need 1000-3000 workers.

Core entities

Problem — problemId, title, description, difficulty, testCases (hidden), timeLimit (ms), memoryLimit (MB)
Submission — submissionId, userId, problemId, language, code (encrypted), status, runtime, memory, createdAt
TestResult — submissionId, testCaseId, status, stdout, runtime, memory

API design

POST /api/v1/submissions — Submit code. Body: { problemId, language, code }. Returns { submissionId, status: QUEUED }.
GET /api/v1/submissions/:id — Poll status. Returns verdict once complete.
WS wss://app/submissions/:id — Real-time status updates as test cases run.
GET /api/v1/problems/:id/submissions — User's submission history for a problem.

High-level design

Submission → Queue (Kafka/SQS) → Execution Worker (isolated container) runs all test cases → writes results to DB → notifies user via WebSocket.

Deep dives

🔒 Sandboxing Code Execution

Multi-layer isolation: (1) Container (Docker): filesystem isolation, process isolation. (2) seccomp: whitelist only safe syscalls (read, write, execve) — block network, fork bombs, file writes outside /tmp. (3) cgroups: limit CPU (1 core), memory (256MB), process count (50). (4) Network namespace: no network access. (5) Time limit: kill container after timeLimit + 1s. (6) User: run as nobody (uid=65534), no sudo.

⚡ Worker Pool Scaling

Each worker handles one submission at a time (per container). Peak: 1K submissions/sec × 3s each = 3K concurrent workers. EC2 Spot instances: cheap for short-lived workloads. Auto-scaling: SQS queue depth triggers EC2 fleet scale-out. Worker starts container, runs all N test cases sequentially, sends results back. Container recycled after each submission (clean state).

🗃️ Test Case Security

Test cases stored in S3, encrypted with KMS. Workers fetch test cases at runtime via signed URL (valid 60s). Test case content never returned to user (only pass/fail + runtime). Code stored encrypted in DB (user can view their own). Diff-based checker for floating point answers. Special judge: custom checker code for problems with multiple valid outputs.

📊 Leaderboard

Contest leaderboard: rank by problems solved (primary), total penalty time (secondary). MySQL for small contests. For large contests (100K participants): pre-compute rankings in Redis sorted set. Rank = ZREVRANK key userId. Update on every accepted submission. Show leaderboard as of specific time for fairness analysis. Snapshots every 5 min during contest.

Scaling considerations

Worker fleet: Spot EC2 for cost, scale out in 60s based on queue depth
One container per submission — no state leakage between users
SQS with visibility timeout = timeLimit + 30s (auto-retry if worker dies)
Submission code stored in S3 (not DB) if large; DB stores metadata
Redis for real-time contest leaderboard updates

What interviewers expect by level

Junior: Describe submission flow, queueing, and execution concept. Know why sandboxing is essential.
Mid: Docker isolation layers (seccomp, cgroups), worker pool with SQS, result storage and polling.
Senior: Multi-layer security (seccomp whitelist, namespace isolation), worker auto-scaling, test case security, contest leaderboard.
Staff: Cost optimization (Spot fleet), plagiarism detection, language-specific VM vs container trade-offs, global contest delivery.

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…