🏗️ System Design Hard 2025–26

Design a Real-Time Chat Application

A mobile-first system design breakdown of a WhatsApp-style chat app — covering offline-first architecture, WebSocket lifecycle management, delivery receipts, and media handling on Android.

Understanding the Problem

Start your interview by defining the functional and non-functional requirements. For a chat application, functional requirements describe what the system does, while non-functional requirements describe the system qualities — things like "messages should be delivered in under 500ms" or "the app should work fully offline."

Prioritise the top 3 functional requirements. Everything else shows your product thinking, but clearly note it as "below the line" so the interviewer knows you won't be including them in your design. Check in to see if the interviewer wants to move anything.

Functional Requirements
Core Requirements
  1. Users should be able to send and receive text messages in real time (1:1 chat)
  2. Users should see delivery receipts — Sent (✓), Delivered (✓✓), Read (✓✓ blue)
  3. Messages should queue and deliver even when the user is temporarily offline
Below the line (out of scope):
  1. Group chat (>2 participants)
  2. End-to-end encryption
  3. Voice / video calling
  4. Message reactions and threads
Non-Functional Requirements
Core Requirements
  1. The system should deliver messages in under 500ms on a stable network connection
  2. The system should work fully offline — messages queue locally and sync on reconnect
  3. The system should be battery-efficient — no aggressive polling or persistent foreground services
  4. The app is read-heavy; local DB is the single source of truth for the UI at all times
Below the line (out of scope):
  1. Sub-50ms latency (a local DB + WebSocket gives us <200ms realistically)
  2. Message search and full-text indexing
  3. Compliance / data retention policies

Here's how it might look on your whiteboard: write "Functional Requirements" and "Non-Functional Requirements" with a horizontal line separating core from out-of-scope. Tell the interviewer explicitly what you're deprioritising and why.


The Set Up
Planning the Approach

On Android, the mobile client owns significantly more responsibility than in a web app. The device must manage its own connection state, persist data locally for offline access, handle background sync, and deliver push notifications when the process is killed. Before drawing architecture boxes, establish this contract clearly.

The core insight that separates strong candidates: the local Room database is the single source of truth. The UI never reads from the network directly. It only observes the DB. The repository layer syncs the DB with the server silently in the background.

Defining the Core Entities

Before designing the architecture, align on the data model. The two central entities are Message and Conversation. The trickiest design decision here is the dual-ID system on Message:

// Two-ID pattern — the most important design decision in this system

@Entity(tableName = "messages")
data class MessageEntity(
    @PrimaryKey val localId: String,       // UUID assigned by device at send time
    val serverId: String?,                  // null until server ACKs; used for ordering
    val conversationId: String,
    val senderId: String,
    val text: String?,
    val mediaUri: String?,                 // local path initially, then CDN URL
    val status: MessageStatus,            // PENDING | SENT | DELIVERED | READ | FAILED
    val createdAt: Long,                   // client timestamp (ms)
    val serverTs: Long?                    // used for canonical ordering after sync
)

@Entity(tableName = "conversations")
data class ConversationEntity(
    @PrimaryKey val id: String,
    val participantIds: String,           // JSON array
    val lastMessageId: String?,
    val unreadCount: Int,
    val updatedAt: Long
)

The localId is generated by the device the instant the user taps Send. This allows the message bubble to render immediately without waiting for any network response. The serverId arrives later via the WebSocket ACK frame and is written back to the same DB row.

🔷 Pattern: Dual-ID / Optimistic UI

The dual-ID system enables optimistic UI — the message appears instantly, before any server confirmation. This is how WhatsApp, iMessage, and Telegram all work. The localId acts as an idempotency key: even if the network request is retried, the server deduplicates by localId and never creates a duplicate message.


High-Level Design
1) The Android Client Architecture

When the app launches, it connects to the chat server via a persistent WebSocket and opens a reactive stream from Room that the UI observes. All writes — both from the local user and from the server — go through Room first, so the UI always renders from a consistent local state.

UI LAYER ChatScreen (Compose) ConversationListScreen ChatViewModel observes StateFlow DOMAIN LAYER SendMessageUseCase GetConversationUseCase SyncUseCase DATA LAYER ChatRepository — coordinates Local DB + WebSocket + REST Room Database Single source of truth WebSocketManager Real-time messages OutboxWorker WorkManager retry
Android Client Architecture — UI always reads from Room 🔍

Let's walk through exactly what each layer does:

  1. UI Layer: Jetpack Compose screens observe StateFlow from ViewModels. They never call network methods directly — they only call ViewModel functions.
  2. Domain Layer: Use cases encapsulate business logic. SendMessageUseCase assigns a localId, writes to Room, and triggers the outbox worker.
  3. ChatRepository: The central coordinator. It writes incoming WebSocket frames to Room and reads from Room for all UI queries. Room → UI is a reactive Flow.
  4. Room Database: The only source the UI reads from. This makes the UI identical whether online or offline — it always renders from local data.
  5. WebSocketManager: Maintains the persistent connection. Routes incoming frames to the Repository. Handles reconnection with exponential backoff.
  6. OutboxWorker: A WorkManager task constrained to NETWORK_CONNECTED. Drains the local outbox table and sends pending messages. Auto-retries on failure.

2) How a Message is Sent (Offline-First Flow)

When a user taps Send, we want the message to appear in the UI immediately — regardless of network state. Here's the exact sequence:

1 User taps Send localId (UUID) generated. Status = PENDING 2 Write to Room immediately Room emits → UI renders bubble. Feels instant. Works offline. 3 OutboxWorker dispatches Sends via WebSocket. Constraint: NETWORK_CONNECTED. Auto-retries. 4 Server ACK → status update WebSocket delivers ACK. Room updated: serverId set, status = SENT (✓) 5 Recipient events → DELIVERED → READ Server pushes delivery events back. Room updates status. Ticks animate.
Message Send Flow — local DB first, network second 🔍
// OutboxWorker — WorkManager drains pending messages
class OutboxWorker(ctx: Context, params: WorkerParameters) :
    CoroutineWorker(ctx, params) {

    override suspend fun doWork(): Result {
        val pending = messageDao.getPendingMessages()
        pending.forEach { msg ->
            try {
                webSocketManager.send(msg.toWsFrame())
                // Don't mark SENT here — wait for server ACK via WebSocket
            } catch (e: Exception) {
                if (runAttemptCount >= 3) {
                    messageDao.updateStatus(msg.localId, MessageStatus.FAILED)
                    return Result.failure()
                }
                return Result.retry()  // WorkManager uses exponential backoff
            }
        }
        return Result.success()
    }
}

3) How the WebSocket Connection is Managed

The critical question isn't just "use WebSockets" — it's how to manage the connection lifecycle on Android, where the OS aggressively kills background processes. Here's the technology trade-off:

OptionLatencyBatteryComplexityDecision
HTTP Polling (5s interval)~5s🔴 HighLowRejected
HTTP Long Polling<1sMediumMediumRejected
Server-Sent Events (SSE)<200msLowLowAcceptable
WebSocket (OkHttp)<100msLowMedium✅ Chosen
gRPC Streaming<100msLowHighGood alternative

The lifecycle rule: connect when the app comes to foreground, disconnect cleanly on background. When the app is backgrounded or killed, FCM push notifications wake it up for new messages. This avoids the battery drain of a persistent foreground service.

🔷 Pattern: Exponential Backoff Reconnect

On connection failure, do not retry immediately — this thundering herd problem can DDoS your own server when millions of clients lose connectivity at once (e.g. server deploy). Instead, use exponential backoff with jitter: delay = min(2ⁿ × 1000ms, 30000ms) + random(0, 1000ms).


4) How Image and Video Sharing Works

The key principle: never block the message send on media upload. The image renders immediately from the local file. Upload happens in the background. This is the same approach used by WhatsApp, Telegram, and Signal.

  1. User picks image: Copy to app-internal storage. Compress to ≤300KB with target 1080px width. Generate a localUri.
  2. Render from local URI: Message bubble shows the image from disk immediately. Status = PENDING.
  3. MediaUploadWorker runs: Chunked multipart upload to S3 via a pre-signed URL. Stores the last uploaded byte so resumable uploads survive network interruptions.
  4. Upload complete: DB row updated with the CDN URL. Coil swaps the image source transparently — no flicker, no reload.

⚠️ Common pitfall: Don't use the original file from MediaStore for upload — the URI may become invalid if the user deletes the photo mid-upload. Always copy to your app's own internal storage first, then upload from there.


Low-Level Design

The HLD shows what components exist. The LLD shows exactly how they talk to each other — method calls, data transformations, and state changes at every step. Below are three precise flows you should be able to draw and explain in an interview.

Whiteboard Overview

Start here. Draw this on the whiteboard in the first 5 minutes to anchor the entire conversation. Every other flow is a zoom-in of one arrow on this diagram.

SENDER (Android) ChatScreen (Compose) sendMessage() ChatViewModel execute() SendMessageUseCase insert() Room DB ★ Source of Truth enqueue() OutboxWorker (WM) SERVER WebSocket Gateway Message Service Messages DB (Postgres) FCM Push RECEIVER (Android) WebSocketManager ChatRepository Room DB ★ Source of Truth Flow emits ChatScreen (Compose) FCMService (wakes app) WS send(frame) push frame ACK (serverId) app killed path Normal flow Reactive (Flow) Network call ACK / Receipt
Full System Whiteboard — Sender → Server → Receiver 🔍
Flow 1 — Sending a Message (Detailed)

This is the most important flow to master. Every method call, every state change, in exact order.

UI
ViewModel
UseCase
Repository
Room DB
WorkManager
WebSocket
Server
1
sendMessage(text)
User taps Send. ViewModel receives the call.
2
execute(SendMessageParams)
ViewModel delegates to SendMessageUseCase. Assigns localId = UUID.randomUUID(), status = PENDING.
3
saveMessage(MessageEntity)
UseCase calls Repository with a fully formed entity. No network involved yet.
4
messageDao.insert(entity)
Room writes the row. The UI's Flow<List<Message>> fires immediately — bubble appears on screen with a clock icon (PENDING).
5
WorkManager.enqueue(OutboxWorkRequest)
Constraint: NETWORK_CONNECTED. If offline, WorkManager queues it and waits. If online, it runs immediately.
6
webSocket.send(WsFrame.json)
OutboxWorker fetches all PENDING rows, serialises each to a JSON frame, and calls webSocket.send(). Does not mark as SENT yet.
7
WsFrame{ localId, text, conversationId }
Frame hits the server's WebSocket Gateway.
8
ACK{ localId, serverId, serverTs }
Server persists the message, generates a serverId, and sends back an ACK frame on the same WebSocket connection.
9
incomingFrames.emit(ack)
WebSocketManager emits the frame onto its SharedFlow. Repository collects it.
10
messageDao.updateAck(localId, serverId, SENT)
Room row updated: serverId set, status → SENT. UI's Flow fires again — clock icon becomes ✓ (single tick).
Flow 2 — Receiving a Message

The receive path is simpler because all the heavy lifting (outbox, retry) is on the sender side. The receiver's job is: decode the frame → write to Room → let the UI react.

Server
WebSocket
Repository
Room DB
ViewModel
UI
1
onMessage(WsFrame.MSG)
Server pushes an incoming message frame. WebSocketListener.onMessage() fires on OkHttp's internal thread.
2
incomingFrames.emit(frame)
Decoded and emitted on Dispatchers.IO. Repository is already collecting this SharedFlow in a coroutineScope.
3
messageDao.insertOrIgnore(entity)
insertOrIgnore is idempotent — if the same message arrives twice (reconnect scenario), no duplicate is created.
4
Flow<List<Message>> emits
Room detects the new row and emits on the reactive Flow. No polling. No manual notify.
5
uiState updates → recomposition
ViewModel maps the list to UI models. Compose recomposes only the new message bubble. No full list redraw.
6
DELIVERED_ACK{ messageId, recipientId }
Repository sends a DELIVERED receipt back to the server after writing to DB. Sender's app receives this and updates their row to DELIVERED (✓✓).
Flow 3 — Delivery Receipt State Machine

Each message row follows a strict one-way state machine. The receipt flows are the most commonly asked follow-up in interviews — draw this clearly.

PENDING 🕐 queued server ACK SENT ✓ single tick recv writes to DB DELIVERED ✓✓ double tick chat screen opens READ ✓✓ blue ticks FAILED ⚠️ max retries PENDING → FAILED if WorkManager exhausts 3 retry attempts SENDER WRITES SERVER ACK RECV WRITES DB SCREEN OPEN 3x RETRY FAIL
MessageStatus State Machine — strictly one-directional, never goes backwards 🔍
// Repository collecting WebSocket frames and updating status

class ChatRepository @Inject constructor(
    private val dao: MessageDao,
    private val wsManager: WebSocketManager,
    private val scope: CoroutineScope
) {
    init {
        scope.launch(Dispatchers.IO) {
            wsManager.incomingFrames.collect { frame ->
                when (frame.type) {
                    WsFrameType.MESSAGE  -> handleIncoming(frame)
                    WsFrameType.ACK       -> dao.updateAck(frame.localId, frame.serverId, MessageStatus.SENT)
                    WsFrameType.DELIVERED -> dao.updateStatus(frame.serverId, MessageStatus.DELIVERED)
                    WsFrameType.READ      -> dao.updateStatus(frame.serverId, MessageStatus.READ)
                }
            }
        }
    }

    private suspend fun handleIncoming(frame: WsFrame) {
        dao.insertOrIgnore(frame.toEntity())
        // Send DELIVERED receipt back immediately after DB write
        wsManager.send(WsFrame.deliveredAck(frame.serverId))
    }

    // UI observes this — reactive, no manual refresh needed
    fun messages(conversationId: String): Flow<List<MessageEntity>> =
        dao.getMessages(conversationId)  // Room returns Flow automatically
}
🔷 Pattern: Idempotent Writes with insertOrIgnore

On reconnect, the server may re-deliver messages the client already received. Using INSERT OR IGNORE (Room's OnConflictStrategy.IGNORE) means duplicate frames are silently dropped. The UI never shows a duplicate bubble — no extra deduplication logic needed anywhere else.


Potential Deep Dives
1) How do we paginate old messages?

Use Paging 3 with RemoteMediator. Recent messages load from Room instantly. When the user scrolls up past the local cache boundary, RemoteMediator fetches older pages from the REST API, writes them to Room, and Paging 3 re-emits. The UI always reads from Room — the pagination source switch is invisible to the user.

2) How do delivery receipts work at scale?

Don't send a receipt event per message — this floods the WebSocket. Instead, batch them: when the user opens a chat screen, send a single READ_ACK frame with the newest messageId they've seen. The server marks all messages up to that ID as read. This reduces receipt traffic by ~90%.

3) How does push notification work when the app is killed?

Firebase Cloud Messaging (FCM) delivers a data-only push (not a notification push) to the device. Android wakes the app's FirebaseMessagingService, which connects the WebSocket, fetches missed messages via REST, writes them to Room, and shows a local notification. This is battery-safe because FCM uses the system-level push channel — no background process required.

4) How would you add end-to-end encryption?

Use the Signal Protocol (also used by WhatsApp). Each device generates a key pair on first launch. The public key is uploaded to the server. On first message, the sender fetches the recipient's public key and performs an X3DH key exchange to derive a shared secret. From that point, every message is encrypted on-device using the Double Ratchet algorithm. The server never sees plaintext.

🔷 Pattern: Dealing with Key Distribution

E2E encryption introduces a key distribution problem: what if a user installs the app on a new device? They need to re-establish keys with every contact. WhatsApp handles this by requiring the user to re-verify contacts' safety numbers. This is a UX trade-off you should surface in the interview.

5) How would you share logic with an iOS client?

Use Kotlin Multiplatform (KMM). The Repository layer, use cases, data models, and Room queries (via SQLDelight on iOS) can be shared. Only the UI layer (Compose on Android, SwiftUI on iOS) and platform-specific code (WorkManager, FCM) stay separate. This is a strong signal at Staff+ level interviews.


What is Expected at Each Level

Your answer to this question will be evaluated differently depending on the role you're interviewing for. Here's what each level needs to demonstrate:

Mid-level
  • Clean MVVM architecture with Repository pattern
  • Room as local DB, Retrofit for REST
  • Basic offline support with WorkManager
  • FCM for push notifications
  • Can articulate delivery receipt states
Senior
  • Dual-ID / outbox pattern explained clearly
  • WebSocket lifecycle tied to app foreground state
  • Chunked resumable media upload
  • Paging 3 + RemoteMediator for history
  • Batched read receipts to reduce traffic
Staff+
  • E2E encryption trade-offs (Signal Protocol)
  • KMM for cross-platform code sharing
  • Performance monitoring + regression alerting
  • Group chat fan-out strategies
  • Multi-device session management

20 Must-Know Interview Questions

These are the most frequently asked follow-up questions in real Chat App system design interviews at Google, Swiggy, Flipkart, and CRED. Each one is a potential 10-minute rabbit hole — know them cold.

Q1 Easy Why is the Room database the single source of truth? Why not read from the network directly in the ViewModel?

Reading from the network directly in the ViewModel breaks offline support, introduces loading states in the UI, and makes the UI depend on network availability. Room acts as a local cache that the UI always reads from — the same query works whether you're online or offline. The Repository syncs Room with the server silently in the background. This is the repository pattern and the foundation of offline-first architecture. The UI is always fast because local DB reads are microseconds, not hundreds of milliseconds.

Q2 Easy What is the dual-ID system (localId + serverId)? Why not just use the server-generated ID?

If you wait for the server to generate an ID before inserting into Room, the user sees a loading spinner after tapping Send — which feels sluggish. The localId (a UUID generated on-device) lets you insert into Room immediately, render the bubble, and send to the server in the background. When the server ACK arrives, you write back the serverId to the same row. The localId also acts as an idempotency key — if the network request is retried, the server ignores duplicate frames with the same localId.

Q3 Medium Why WorkManager for the outbox instead of just calling the API directly from the ViewModel?

If you call the API directly from the ViewModel and the user closes the app mid-send, the coroutine is cancelled and the message is lost. WorkManager survives process death — it stores the work request in its own SQLite DB and re-executes it when the app restarts or connectivity returns. It also handles NETWORK_CONNECTED constraints automatically, so you don't need to manage connectivity callbacks yourself. For anything that must complete eventually regardless of app lifecycle, WorkManager is the right tool.

Q4 Medium How do you handle WebSocket reconnection without draining the battery?

Use exponential backoff with jitter: delay = min(2ⁿ × 1000ms, 30000ms) + random(0–1000ms). The jitter prevents thousands of clients from reconnecting simultaneously after a server restart (thundering herd). Tie the connection to the app's foreground state using ProcessLifecycleOwner — connect on ON_START, disconnect on ON_STOP. When backgrounded, rely on FCM push to wake the app instead of keeping a persistent connection. Also register a ConnectivityManager.NetworkCallback to reconnect immediately when network becomes available, rather than waiting for the next backoff interval.

Q5 Medium What happens to messages when the user is offline for 3 days and then comes back online?

On reconnect: (1) The WebSocket connects and the server pushes any missed messages that arrived while the client was offline. (2) For a 3-day gap, the server may have too many messages to push via WebSocket — instead, the client calls a REST GET /messages?since={lastKnownServerTs} endpoint to fetch the backlog and writes them all to Room. (3) Any outbox messages (PENDING rows in Room) are immediately picked up by WorkManager and sent. The key is storing lastKnownServerTs persistently in DataStore so the sync-from point survives app restarts.

Q6 Medium How do delivery receipts work at scale? What if you send a receipt per message?

Sending a receipt per message is fine for low-traffic chats but doesn't scale. If a conversation has 100 unread messages, opening it would trigger 100 READ receipt frames simultaneously. Instead, batch them: send a single READ_ACK{ conversationId, upToMessageId } frame when the chat screen becomes visible. The server marks all messages up to that ID as read. This reduces receipt traffic by ~90%. For DELIVERED receipts, send one per incoming message immediately after writing to Room — this is unavoidable since delivery is per-message.

Q7 Medium How do you prevent duplicate messages from appearing in the UI?

Use INSERT OR IGNORE (Room's OnConflictStrategy.IGNORE) when inserting incoming messages. The localId is the primary key — if the same frame arrives twice (reconnect scenario, server retry), the second insert is silently dropped. On the sender side, the outbox pattern ensures the message is in Room before any network call, so the bubble is never duplicated regardless of how many times WorkManager retries. The key insight: make every write idempotent at the DB level rather than trying to deduplicate at the UI level.

Q8 Hard How would you design group chat? What changes in the architecture?

Group chat introduces fan-out: one message must be delivered to N recipients. Two strategies: (1) Fan-out on write (server-side) — when the server receives a message, it immediately pushes to all online group members' WebSocket connections and queues FCM for offline ones. Simple for the client, scales to ~100 members. (2) Fan-out on read (client-side) — the server stores one copy and clients pull. Simpler server, but more complex client sync logic. For the client, group messages add a groupId field and read receipts become per-member (you need to track who has read, not just whether the message was read).

Q9 Hard How do you handle media (image/video) sending without blocking the message flow?

Never block the message send on media upload. The flow: (1) Copy image to app-internal storage, compress to ≤300KB. (2) Insert a message row with mediaUri = localPath, status = PENDING — bubble renders immediately from local file. (3) MediaUploadWorker uploads to S3 via a pre-signed URL in the background using chunked multipart upload. Store the last uploaded byte offset in DataStore so uploads resume after interruption. (4) On success, update the row with the CDN URL. Coil swaps the image source transparently. The receiver downloads from CDN on first open and caches to disk.

Q10 Hard How does push notification work when the app is completely killed?

When the app is killed, the WebSocket is gone. The server detects the disconnect and switches to FCM. Send a data-only push (not a notification push) — this wakes the app's FirebaseMessagingService even when the app is killed. In onMessageReceived(): (1) Connect the WebSocket briefly. (2) Fetch missed messages via REST GET /messages?since=lastTs. (3) Write to Room. (4) Show a local notification using NotificationManager. Use data push not notification push so you control the notification content (unread count, sender name) rather than FCM controlling it. Handle notification grouping for multiple conversations using NotificationCompat.InboxStyle.

Q11 Medium How do you paginate message history? The user scrolls up and there are 10,000 old messages.

Use Paging 3 with RemoteMediator. The PagingSource reads from Room (fast, local). When the user scrolls past the oldest locally cached message, RemoteMediator.load() fires and fetches the next page from REST GET /messages?before={oldestLocalServerId}&limit=50. Write the fetched page to Room. Paging 3 automatically emits the updated list — the UI scrolls smoothly with no manual handling. Use cursor-based pagination (by serverId or serverTs), never offset-based — offset pagination breaks when new messages are inserted during scrolling.

Q12 Medium How do you handle message ordering? What if client clock is wrong?

Never use the client timestamp (createdAt) as the canonical ordering key. Client clocks can be wrong by minutes or days. Use serverTs — the timestamp assigned by the server when it persists the message — for ordering. For display, show the client timestamp (so "just now" is accurate), but sort by serverTs. In Room: ORDER BY serverTs ASC, localId ASC — the localId as a tiebreaker handles the brief window where serverTs is null (PENDING messages). PENDING messages always appear at the bottom since their serverTs is null.

Q13 Hard How would you add end-to-end encryption? What changes in your architecture?

Use the Signal Protocol (X3DH + Double Ratchet). Key changes: (1) On first launch, generate an identity key pair and a set of one-time pre-keys. Upload public keys to the server's key distribution service. (2) Before the first message to a user, fetch their public keys and run X3DH to derive a shared session key. (3) Encrypt every message on-device before sending. The server only ever sees ciphertext. (4) Room stores encrypted blobs — you decrypt on read, in the ViewModel before mapping to UI models. Major trade-off: key management becomes complex. New device onboarding, key rotation, and message backup all require careful design. The server cannot moderate content.

Q14 Medium How do you implement typing indicators (User is typing...) efficiently?

Typing indicators are ephemeral — never persist them to Room. The flow: (1) When the user types, send a TYPING{ conversationId } WebSocket frame. (2) Debounce the send — only fire after 500ms of inactivity stops (don't send on every keystroke). (3) The server forwards the event to the recipient's WebSocket. (4) On receipt, show the indicator and start a 5-second timer. If another TYPING frame arrives, reset the timer. If it expires, hide the indicator. (5) Send a TYPING_STOP frame when the user clears the input or sends the message. Store the typing state in a simple MutableStateFlow<Boolean> in the ViewModel — no DB involved.

Q15 Hard How do you support multiple devices for the same account (phone + tablet)?

Each device maintains its own WebSocket connection identified by a deviceId. The server maintains a mapping of userId → [deviceId1, deviceId2, ...]. When a message is sent, the server fans out to all active WebSocket connections for that user. For offline devices, FCM is registered per-device so each gets its own push token. The client-side Room DB is per-device — messages sync independently on each device using lastSyncedTs. The trickiest part: read receipts. If a user reads a message on their tablet, the phone should also mark it as read. The server broadcasts a READ_SYNC event to all other devices of the same user.

Q16 Medium How do you keep the conversation list updated in real time (unread count, last message preview)?

The conversation list is a Room query with a reactive Flow. Use a Room @Query with a JOIN between conversations and messages tables:
SELECT c.*, m.text as lastMessageText FROM conversations c LEFT JOIN messages m ON m.localId = c.lastMessageId ORDER BY c.updatedAt DESC. Every time a new message is inserted, Room emits on this Flow automatically — the conversation list reorders and shows the new preview without any manual refresh. Unread count is a column on the conversations table, incremented on incoming message insert and reset to 0 when the chat screen opens.

Q17 Hard How would you test this architecture? What are the key test layers?

Three layers: (1) Unit tests — Test SendMessageUseCase with a fake Repository. Test ChatViewModel using runTest + Turbine to assert StateFlow emissions. Test ChatRepository with a fake DAO and fake WebSocketManager. (2) Integration tests — Test Room DAO with an in-memory Room database (Room.inMemoryDatabaseBuilder) using runTest. Test the outbox flow with a TestListenableWorkerBuilder for WorkManager. (3) UI tests — Use MockWebServer (OkHttp) to simulate WebSocket frames and assert Compose UI reactions. Use Hilt's @UninstallModules + @TestInstallIn to replace real dependencies with fakes.

Q18 Medium How do you handle message deletion? Both "delete for me" and "delete for everyone".

Delete for me: Soft-delete in the local Room DB — add a deletedForMe: Boolean flag. The Room query filters these out. Never actually delete the row (it may be needed for receipt sync). Delete for everyone: Send a DELETE{ serverId } WebSocket frame. The server marks the message as deleted in its DB and fans out a DELETE event to all recipients' devices. On receipt, the client sets deletedForEveryone = true in Room. The UI shows "This message was deleted" in place of the content. The content itself can be nulled out in Room. Hard limit: WhatsApp allows delete-for-everyone only within 60 hours — enforce this on the server.

Q19 Hard How do you design the system to handle 1 million concurrent users on the Android client side?

The Android client itself doesn't change at scale — it still manages one WebSocket and one Room DB. The client-side concerns at scale are: (1) Reconnect storms — exponential backoff with jitter prevents all 1M clients hitting the server simultaneously after an outage. (2) Battery efficiency — the foreground/background WebSocket lifecycle pattern keeps battery impact minimal. (3) DB growth — implement a message retention policy: delete messages older than 30 days from Room (keep on server). Use a periodic WorkManager task for DB housekeeping. (4) Memory — Paging 3 ensures only the visible window of messages is in memory, not all 10,000.

Q20 Hard If you had to share business logic between Android and iOS, how would you approach it?

Use Kotlin Multiplatform (KMM). The shareable layer includes: Repository, Use Cases, data models, and the outbox pattern logic. Room is replaced by SQLDelight (which generates type-safe Kotlin code from SQL for both Android and iOS). kotlinx.coroutines works on both platforms. The WebSocket client can use Ktor's WebSocket client (multiplatform). What stays platform-specific: UI (Compose on Android, SwiftUI on iOS), WorkManager (use BackgroundTasks on iOS), and FCM (use APNs on iOS). This approach typically reduces business logic duplication by 60–70%, but adds build complexity and requires the team to be comfortable with KMM's maturity limitations.