🔎 Design FB Post Search — System Design Interview Guide

Medium · Search & Indexing

Design a search system for a social platform (like Facebook) that allows users to search posts, people, pages, and groups with results filtered by their social graph and privacy settings.

Open the interactive FB Post Search design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Full-text search across posts, people, pages, and groups
Results filtered by user's privacy settings and relationships
Ranked results: friends' content prioritized over strangers
Real-time indexing: new posts searchable within seconds
Faceted search: filter by type, date, author
Search suggestions as user types

Non-functional requirements & scale

3B users; 100B searchable documents
Search latency < 500ms P95
Privacy filters must be applied for every result
Index updates: new post searchable in < 30 seconds
99.9% search availability
Support for 100+ languages with stemming and stopwords

Capacity estimation

Privacy is the hardest part. A post from User A visible only to friends cannot appear in search results for User C (not a friend). This means search results are personalized — same query returns different results per user. Must intersect search results with the privacy-allowed set.

Core entities

Post — postId, authorId, content, mediaType, createdAt, privacy (public/friends/only-me)
SearchIndex — docId, type, text, authorId, privacyGroups[], createdAt (in Elasticsearch)
SearchResult — docId, type, snippet, score, author, createdAt

API design

GET /api/v1/search?q=birthday&type=post&from=-7d — Search with query, type filter, date filter. Returns personalized results.
GET /api/v1/search/suggest?q=john — Auto-complete suggestions for people/pages.

High-level design

Post created → Kafka → Indexer writes to Elasticsearch with privacy metadata. Search query → Query Service expands query + fetches user's friend list → Elasticsearch query with privacy filter → re-rank by social graph distance → return results.

Deep dives

🔐 Privacy-Aware Search

Each document in Elasticsearch has a privacyTerms field: ["public"], ["friends:userId"], or []. On search: user-specific privacy filter = ["public", OR "friends:myUserId"]. Elasticsearch query: must-match text AND filter-terms (privacyTerms). Problem: friend list changes → must update privacy terms for old posts? Approach: store friend groups in index; Social Graph Service provides real-time friend list for filter.

📊 Ranking with Social Signals

Base score: BM25 text relevance. Boost factors: (1) Friend authored post → 2× boost. (2) Post from page user follows → 1.5×. (3) Recent (< 7 days) → 1.2×. (4) High engagement (likes/comments) → 1.1×. Re-ranking: first fetch 100 candidates from Elasticsearch → apply social graph scoring → return top 10. Personalization: ML model per user (computationally expensive, done for top users).

⚡ Real-Time Indexing

Post created → Kafka event → Indexer worker fetches post content + privacy settings → Elasticsearch index API. Near-real-time search in Elasticsearch: default 1s refresh interval → new documents searchable within 1s. For trending topics (high-volume): prioritize indexing. For deleted posts: soft delete (mark deleted field true) → search filter excludes deleted.

🌍 Multi-Language

Elasticsearch: one index per language with language-specific analyzer (stemming, stopwords). Post language detected on indexing (FastText language detection). Query: detect query language, route to correct index. Fuzzy matching for typos: Levenshtein distance 1-2 for words > 5 chars. Unicode normalization: "café" and "cafe" match.

Scaling considerations

Elasticsearch sharded by docId hash; replicas for read scaling
Privacy filter evaluated at Elasticsearch level (not post-fetch) for efficiency
Social Graph Service caches friend list in Redis (TTL 5 min) for search
Indexer consumers auto-scale with Kafka lag
Search result cache in Redis for popular queries (TTL 30s)

What interviewers expect by level

Junior: Describe search index, Elasticsearch basics, post indexing pipeline.
Mid: Privacy-aware query construction, social graph for ranking, real-time indexing pipeline.
Senior: Privacy term design, multi-language support, re-ranking with ML signals, cache strategy.
Staff: Privacy enforcement at 3B user scale, cross-language semantic search (embeddings), cost optimization.

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…