Design a Video Streaming App (YouTube / Netflix)
1. Understanding the Problem
Design the Android client for a video streaming app. The core challenge is delivering smooth, high-quality video playback across wildly varying network conditions β from a 5G city connection to a spotty 2G rural signal β while keeping memory usage bounded, supporting background playback, offline downloads, and handling the full player lifecycle (seek, quality switch, fullscreen, PiP).
The defining technique for video streaming is Adaptive Bitrate (ABR) streaming: video is encoded at multiple quality levels and split into small chunks (~2β10 s). The player monitors download speed and buffer health, then selects the highest quality rendition it can sustain without stalling. This makes playback resilient to fluctuating bandwidth.
Learn This Pattern ββ Functional Requirements
- Browse a home feed of video thumbnails with titles
- Tap a video β full player with controls (play/pause, seek, quality)
- Adaptive quality based on network bandwidth
- Background audio playback when app is minimised
- Picture-in-Picture (PiP) mode
- Offline download for later viewing
- Resume playback at last-watched position
- Subtitles / closed captions
βοΈ Non-Functional Requirements
- Startup: First frame in <1.5 s on 4G
- Stall rate: <0.5% of playback seconds should buffer
- Memory: Bounded buffer (don't load entire video)
- Battery: Release decoder when backgrounded without audio
- Download: Resumable, Wi-Fi-only by default
- DRM: Widevine for premium content
- Live streaming or on-demand VOD, or both?
- Is DRM (Widevine) required, or is content public?
- Offline downloads required? Wi-Fi only or cellular too?
- Short-form (YouTube Shorts / Reels) or long-form, or both?
2. The Set Up
Adaptive Bitrate Quality Tiers
β Default starting quality. ExoPlayer's ABR algorithm switches up/down automatically.
Streaming Formats β Why DASH/HLS?
| Format | Protocol | ABR? | DRM? | Latency | Best for |
|---|---|---|---|---|---|
| Progressive MP4 | HTTP | β | β | High | Small clips only |
| HLS (.m3u8) | HTTP | β | FairPlay | Medium | iOS-first, VOD + live |
| π DASH (.mpd) | HTTP | β | Widevine | LowβMedium | Android, VOD + live |
| RTMP / WebRTC | TCP/UDP | β | β | Ultra-low | Live only, no Android SDK |
Core Components
3. High-Level Design
How DASH Streaming Works
A DASH stream consists of two parts: a manifest file (.mpd) that describes all available quality levels, segments, and durations; and the segment files (.m4s) β short chunks of 2β10 seconds each at a specific quality. ExoPlayer downloads the manifest first, then fetches segments ahead of playback position, maintaining a rolling buffer of typically 15β30 seconds:
<!-- Simplified DASH manifest -->
<MPD>
<Period>
<AdaptationSet mimeType="video/mp4">
<Representation bandwidth="300000" width="426" height="240"/>
<Representation bandwidth="1000000" width="854" height="480"/>
<Representation bandwidth="2500000" width="1280" height="720"/> <!-- Default -->
<Representation bandwidth="5000000" width="1920" height="1080"/>
</AdaptationSet>
</Period>
</MPD>
4. Low-Level Design
Whiteboard: Full Playback Pipeline
Flow 1: Adaptive Bitrate Switching
| PlayerView | ExoPlayer ABR Engine | CDN Segment Server | PlayerViewModel / UI |
|---|---|---|---|
| 1Measures download speed of last N segments (bandwidth estimator) | |||
| 2Buffer health drops below 15 s (rebuffering risk detected) | |||
| 3ABR algorithm selects lower rendition (e.g., 1080p β 720p) | |||
| 4Next segment requested at new rendition URL | |||
| 5Seamless quality transition β player stitches segments; no stall visible | |||
| 6Bandwidth improves β buffer health rises β ABR upgrades back to 1080p | |||
7Quality badge in UI updates via player.videoFormat.height listener |
Flow 2: Background Playback (MediaSessionService)
| PlayerActivity | MediaSessionService | ExoPlayer | System / Notification |
|---|---|---|---|
1User presses Home β Activity goes to onStop() |
|||
2MediaSessionService is already running (bound in onStart) |
|||
| 3Audio-only: decoder releases video track, keeps audio decoder active (saves battery) | |||
| 4System shows media notification with thumbnail, title, play/pause/skip | |||
5User taps pause in notification β MediaSession.setCallback receives command |
|||
6player.pause() called β buffering stops |
|||
| 7User returns to app β Activity re-binds β PlayerView re-attaches to same ExoPlayer instance; video resumes exactly where audio left off |
Flow 3: Offline Download
| UI | DownloadViewModel | ExoPlayer DownloadMgr | CDN | Room DB |
|---|---|---|---|---|
| 1User taps "Download" β quality picker (480p / 720p) | ||||
2Build DownloadRequest(uri=manifestUrl, selectedTracks=[720p]) |
||||
| 3DownloadService (foreground) fetches manifest, queues all segments | ||||
| 4Segments downloaded in order; service runs on Wi-Fi constraint via WorkManager | ||||
5Progress bar updates via DownloadManager.getCurrentDownloads() |
||||
6On completion: segments in filesDir/downloads/, DRM license cached |
||||
7INSERT/UPDATE downloads table: videoId, path, quality, expiresAt |
||||
| 8Offline play: ExoPlayer loads from local path; no network needed |
Key Code: ExoPlayer Setup with DRM + Resume
fun buildPlayer(context: Context, videoInfo: VideoInfo): ExoPlayer {
val player = ExoPlayer.Builder(context)
.setTrackSelector(buildTrackSelector(context)) // ABR params
.build()
val drmConfig = MediaItem.DrmConfiguration.Builder(C.WIDEVINE_UUID)
.setLicenseUri(videoInfo.licenseUrl)
.setLicenseRequestHeaders(mapOf("Authorization" to "Bearer ${videoInfo.token}"))
.build()
val mediaItem = MediaItem.Builder()
.setUri(videoInfo.manifestUrl)
.setMimeType(MimeTypes.APPLICATION_MPD) // DASH
.setDrmConfiguration(drmConfig)
.build()
player.setMediaItem(mediaItem)
player.seekTo(videoInfo.resumePositionMs) // restore watch progress
player.prepare()
player.playWhenReady = true
return player
}
private fun buildTrackSelector(context: Context): DefaultTrackSelector {
val params = DefaultTrackSelector.Parameters.Builder(context)
.setMaxVideoSizeSd() // cap at 480p on mobile data by default
.setPreferredTextLanguage("en")
.build()
return DefaultTrackSelector(context, params)
}
Key Code: Player Lifecycle Management
// Proper lifecycle: one ExoPlayer, handed between Activity and Service
class PlayerViewModel : ViewModel() {
val player: ExoPlayer by lazy { buildPlayer(...) }
// Save position to Room before the ViewModel is cleared
override fun onCleared() {
val pos = player.currentPosition
viewModelScope.launch {
watchHistoryDao.upsert(WatchHistory(videoId, resumePositionMs = pos))
}
player.release()
}
}
// Activity binds the same player instance to PlayerView
class PlayerActivity : AppCompatActivity() {
private val viewModel: PlayerViewModel by viewModels()
override fun onStart() {
super.onStart()
playerView.player = viewModel.player // attach surface
}
override fun onStop() {
playerView.player = null // detach surface; audio continues via Service
super.onStop()
}
// PiP: keep surface attached in reduced window
override fun onPictureInPictureModeChanged(inPip: Boolean, newConfig: Configuration) {
playerView.useController = !inPip // hide controls in PiP
}
}
5. Potential Deep Dives
Picture-in-Picture (PiP)
Enter PiP via enterPictureInPictureMode(PictureInPictureParams). Set a custom sourceRectHint matching the PlayerView bounds for a smooth zoom animation. Supply RemoteAction buttons (play/pause/next) so controls work inside the PiP window. In onPictureInPictureModeChanged, hide the player controls (playerView.useController = false) since they're too small. On exit from PiP, restore the full UI. The player keeps the same ExoPlayer instance β no seek required.
Keep ExoPlayer in the ViewModel (not the Activity). This survives rotation without rebuilding the player or losing buffer. The Activity just attaches/detaches the PlayerView surface in onStart/onStop. Buffer, playback state, and position all persist through config changes for free.
Pre-buffering Thumbnails for Seek Bar
YouTube shows video thumbnail previews when dragging the seek bar. This is implemented as a separate sprite sheet image: the server generates a grid of thumbnails at 10 s intervals, stored as a single JPEG. The manifest includes the sprite sheet URL and tile dimensions. When seeking, the client calculates which tile corresponds to the seek position: tileIndex = seekPositionSec / 10; tileX = tileIndex % columns; tileY = tileIndex / columns. Coil loads the sprite sheet and crops to the relevant tile using a custom Transformation.
Widevine DRM Lifecycle
DRM works in two steps: (1) License acquisition: ExoPlayer sends the DRM init data from the manifest to your license server. The server validates the user's entitlement and returns an encrypted content key. (2) Decryption: The key is stored in the device's secure hardware (Trusted Execution Environment). Segments are decrypted on-the-fly by the hardware MediaDrm API β the key never exists in clear memory. For offline downloads, the license must be requested in download mode (setDownloadLicenseRequest) with an expiry tied to the rental/purchase duration.
Memory-Bounded Buffer
ExoPlayer's DefaultLoadControl manages buffer size. Configure it carefully: too small β rebuffering; too large β OOM on low-end devices. Key parameters:
val loadControl = DefaultLoadControl.Builder()
.setBufferDurationsMs(
minBufferMs = 15_000, // start playback after 15 s buffered
maxBufferMs = 50_000, // don't buffer more than 50 s ahead
bufferForPlaybackMs = 2_500, // rebuffer threshold
bufferForPlaybackAfterRebufferMs = 5_000
)
.setTargetBufferBytes(50 * 1024 * 1024) // 50 MB max buffer
.setPrioritizeTimeOverSizeThresholds(true)
.build()
Short-Form Video (Reels / Shorts) Feed
For a TikTok/Shorts-style vertical feed: use a ViewPager2 with one ExoPlayer instance per visible item (or reuse a single player with the same pattern as the ride-sharing video autoplay from the earlier article). Pre-load the next video by building its MediaItem and calling prepare() while the current video plays β ExoPlayer buffers it in background. The first frame is available instantly when the user swipes. Use setPlaybackSpeed(1.0f) and setRepeatMode(REPEAT_MODE_ONE) for looping shorts.
6. What is Expected at Each Level
- Knows ExoPlayer exists and can set it up
- Understands progressive vs streaming
- Handles player lifecycle (release in onDestroy)
- Shows loading/buffering state to user
- Saves and restores playback position
- Loads thumbnails with Coil
- DASH vs HLS trade-offs; why ABR matters
- ExoPlayer in ViewModel (survives rotation)
- Background playback via MediaSessionService
- Offline download with DownloadManager
- PiP mode implementation
- DefaultLoadControl buffer tuning
- DRM concepts: Widevine, license server flow
- Custom ABR algorithm (override
BandwidthMeter) - Pre-warming player for next video in feed
- Seek thumbnail sprite sheet strategy
- Multi-DRM (Widevine + ClearKey fallback)
- Server-side ABR vs client-side ABR trade-offs
- CDN failover: secondary URL on 4xx/5xx
- Stall rate telemetry: rebuffer count, time-to-first-frame metrics
7. Interview Questions
ABR streaming encodes the same video at multiple quality levels (240p to 4K) and splits each into short chunks (~4 s). The player monitors its download speed and buffer depth in real-time. When bandwidth drops, it switches to a lower-quality rendition for the next chunk β preventing the video from stalling. When bandwidth improves, it upgrades. Without ABR, a 4K stream would pause and buffer every few seconds on a congested network, while a 240p stream would waste the capacity of a fast connection. ABR gives the best possible quality at any given bandwidth condition, invisibly and automatically.
HLS was developed by Apple and uses Apple's FairPlay DRM β not natively supported on Android. DASH (Dynamic Adaptive Streaming over HTTP) is an open standard with full Widevine DRM support, which is Android's native DRM system. ExoPlayer supports DASH natively with excellent ABR algorithms. DASH also offers finer-grained segment control, supports more codec options (VP9, AV1, H.265), and has better multi-audio-track and subtitle support. The only reason to use HLS on Android is if the content server only provides HLS (common with AWS MediaConvert defaults) β ExoPlayer supports both.
ExoPlayer is expensive to create and buffers data in memory. If it lives in the Activity, a screen rotation destroys and recreates it β causing a brief stall, re-fetching the manifest, and losing the buffer. In the ViewModel, ExoPlayer survives configuration changes. The Activity only attaches/detaches PlayerView (the SurfaceView that renders frames) in onStart/onStop. The player keeps buffering and playing through the rotation. The position, buffer, and ABR state are all preserved seamlessly. The ViewModel's onCleared() is the right place to call player.release() β it's called only when the user genuinely leaves the screen.
Use MediaSessionService (Media3 / Jetpack Media) which keeps ExoPlayer alive in the background as a foreground service. Bind the Activity to the service. When the Activity goes to onStop, detach the PlayerView surface β ExoPlayer continues playing audio only (video decoder released to save power). The system shows a media notification with transport controls. The service handles audio focus changes (phone call β pause). When the user returns, the Activity re-binds and attaches the PlayerView β video resumes instantly because ExoPlayer never stopped. On Android 13+, request POST_NOTIFICATIONS permission for the media notification.
ExoPlayer's default AdaptiveTrackSelection uses two inputs: (1) Bandwidth estimate: measured as bytes downloaded per second across recent segments, with a conservative fraction applied (e.g., 75% of measured bandwidth) to avoid over-estimating; (2) Buffer health: how many seconds are buffered ahead. The algorithm selects the highest rendition whose bitrate is safely below the available bandwidth. It also enforces hysteresis β requiring bandwidth to sustain a higher quality for a minimum number of chunks before upgrading (prevents rapid oscillation). You can customise via DefaultBandwidthMeter or implement your own BandwidthMeter for platform-specific signal strength data.
ExoPlayer provides a DownloadManager and DownloadService API. Build a DownloadRequest with the manifest URI and the desired track selection (e.g., 720p video + English audio). Run DownloadService as a foreground service with a Wi-Fi constraint. The service downloads and stores all required segments in the app's internal storage. For DRM content, request an offline license via OfflineLicenseHelper and store the encrypted key. To play offline, use DownloadHelper.createMediaSource(downloadRequest, dataSourceFactory) β ExoPlayer loads from local storage transparently. Store download metadata (videoId, path, expiresAt, size) in Room for the Downloads screen.
Widevine is a DRM system built into Android's MediaDrm API. It has three security levels: L1 (hardware-enforced, required for HD/4K), L2, L3 (software, SD only). The flow: (1) ExoPlayer extracts the PSSH (Protection System Specific Header) from the encrypted DASH manifest; (2) It creates a key request and sends it to your license server with the user's auth token; (3) The license server validates entitlement and returns an encrypted content key; (4) MediaDrm decrypts the key in the device's TEE (Trusted Execution Environment); (5) The hardware video decoder uses the key to decrypt segments frame-by-frame β the key never appears in clear memory, making screen recording capture impossible at L1.
Call enterPictureInPictureMode(PictureInPictureParams.Builder().setAspectRatio(Rational(16, 9)).setSourceRectHint(playerViewRect).build()). The sourceRectHint gives the system the current bounds of the player for a smooth shrink animation. Add RemoteAction buttons (play/pause) via setActions(). Override onPictureInPictureModeChanged to hide/show UI controls. Declare android:supportsPictureInPicture="true" and android:configChanges="screenSize|smallestScreenSize|screenLayout|orientation" in the manifest β this prevents Activity recreation on PiP enter/exit. Auto-enter PiP on home button via setPictureInPictureParams with setAutoEnterEnabled(true) (Android 12+).
Save position in Room's watch_history table: videoId, resumePositionMs, totalDurationMs, lastWatchedAt. Write the position periodically (every 5β10 s) and on ViewModel.onCleared(). Avoid writing on every player position update β that's 60 writes/second. On play: query Room for the stored position, call player.seekTo(resumePositionMs) before prepare(). Show a "Resume from 12:34" chip if the position is more than 60 s in. The server can also store watch progress for cross-device sync β send the position in a background call every 30 s and on pause/stop.
ExoPlayer with MediaSessionService handles audio focus automatically when you call player.setAudioAttributes(AudioAttributes.DEFAULT, handleAudioFocus = true). When a phone call arrives: Android sends an audio focus loss event; ExoPlayer pauses automatically and lowers to silent. When the call ends: ExoPlayer resumes if it was playing (not paused by the user). For transient interruptions (navigation prompt): ExoPlayer ducks (lowers volume temporarily) and restores after the prompt. You don't need to write audio focus logic manually β setting handleAudioFocus = true delegates it to ExoPlayer's internal AudioFocusManager.
Surface detach (playerView.player = null) disconnects the rendering surface from ExoPlayer. ExoPlayer keeps playing and buffering, but video frames aren't rendered anywhere β only audio output continues. This is what you do in onStop when the app goes to background. Player release (player.release()) completely destroys the player: releases the audio/video decoder, closes the MediaDrm session, stops all background threads, and frees memory. This is done in ViewModel.onCleared() β only when the user truly leaves the screen. Calling release in onStop would destroy the buffer on every screen rotation, causing a 1β2 s stall on return.
DASH manifests include subtitle AdaptationSet entries (WebVTT or TTML format). ExoPlayer selects the preferred subtitle language via DefaultTrackSelector.Parameters.setPreferredTextLanguage("en"). To let the user switch: call player.currentTracks.groups and filter for TrackType.TEXT groups. Show available languages in a bottom sheet. On selection: trackSelector.buildUponParameters().setPreferredTextLanguage("fr").build(). Subtitles render in the SubtitleView inside PlayerView β style them via playerView.subtitleView?.setStyle(CaptionStyleCompat(...)) to match the app's font and colour scheme.
Time-to-first-frame (TTFF) is measured from player.prepare() to the first Player.STATE_READY callback. Key optimisations: (1) Start at 480p: configure ABR to begin at a lower quality, so the first buffer fills faster; upgrade quality once stable; (2) Manifest CDN: host the .mpd manifest on CDN edge nodes near users β a 50 ms manifest fetch vs 200 ms origin; (3) Pre-warm: start fetching the manifest (not segments) as soon as the user taps a thumbnail, before the player screen loads; (4) Reduce initial buffer: set bufferForPlaybackMs = 1500 β start playback after 1.5 s buffered rather than 15 s. Track TTFF as a P50/P95 metric in production analytics.
ExoPlayer supports playback speed natively: player.setPlaybackParameters(PlaybackParameters(speed = 1.5f)). This applies to both audio (pitch-corrected by default using SonicAudioProcessor so voices don't sound like chipmunks) and video. Show a speed picker in the player menu. Persist the user's preferred speed in SharedPreferences and restore it on each video. Note that at 2x, ABR needs more bandwidth to keep up with consumption β you may want to pre-buffer a larger window or force a slightly lower quality rendition at high speeds.
Acquire a WakeLock during video playback to prevent the CPU from sleeping and the network from disconnecting mid-stream. ExoPlayer can do this automatically: player.setWakeMode(C.WAKE_MODE_NETWORK) acquires both a partial WakeLock (CPU awake) and a Wi-Fi WakeLock (Wi-Fi stays connected) while playing. When the player pauses or releases, the lock is released automatically. Without this, Android may drop the Wi-Fi connection during playback on some devices, causing a network error. Declare WAKE_LOCK and CHANGE_WIFI_MULTICAST_STATE permissions in the manifest.
Use ViewPager2 with one Fragment per video. Maintain a pool of 2β3 ExoPlayer instances (not one per page β expensive). When a page becomes visible: assign a player to it, set media, call prepare() and play(). When a page is offscreen by 1 position: pause and queue for release. Pre-warm the next video by calling player.setMediaItem(nextItem); player.prepare() while the current video plays β ExoPlayer buffers it in background. Limit buffer to 20 s (maxBufferMs=20_000) since short videos are often <60 s total. This gives near-instant playback on swipe without loading all videos into memory.
Implement AnalyticsListener and register via player.addAnalyticsListener(myListener). Track: (1) onLoadStarted/onLoadCompleted β measure segment download time; (2) onPlaybackStateChanged β when state changes to STATE_BUFFERING, record timestamp; when back to STATE_READY, emit a rebuffer_event with duration; (3) onVideoSizeChanged β log quality switches; (4) onPlayerError β log error type and segment URL. Batch these events and send to your analytics backend every 30 s during playback and on stop. Key metrics to dashboard: stall rate (stall_seconds / total_play_seconds), TTFF P95, quality distribution, abandonment rate (quit before first frame).
Two layers of expiry: (1) DRM license expiry: the Widevine offline license has a built-in playback_duration_seconds and license_duration_seconds β set by the license server at download time. After expiry, ExoPlayer can't decrypt the segments even if they exist on disk; (2) Client-side enforcement: store expiresAt in Room's downloads table. On the Downloads screen, check expiry before showing the play button. Run a nightly WorkManager job that deletes expired download files and calls DownloadManager.removeDownload(downloadId). Show a countdown timer for rentals approaching expiry. On rental purchase, the server returns the expiry timestamp along with the manifest URL.
ExoPlayer's DefaultLoadErrorHandlingPolicy retries failed segment downloads with exponential backoff (1 s, 2 s, 4 s β¦ up to a max). You can configure the retry count and whether to retry on 5xx vs 4xx errors. For CDN failover: provide a DataSource.Factory that tries the primary CDN URL first and falls back to a secondary origin URL on 4xx/5xx. This is done via a custom DataSource that catches HttpDataSourceException and rewrites the URL to the fallback host. ExoPlayer does not re-fetch the manifest on segment failure β implement LoadErrorHandlingPolicy.getFallbackSelectionFor to switch the CDN base URL in the manifest for subsequent requests.
ExoPlayer provides a TestExoPlayerBuilder and fake FakeMediaSource for unit testing without actual media. For integration tests: (1) Use MockWebServer to serve a real DASH manifest and fake segment responses; (2) Build an ExoPlayer with a custom FakeClock (to control time in buffer calculations); (3) Use player.addListener with a CountDownLatch waiting for STATE_READY; (4) Assert player.currentPosition, playbackState, and videoFormat?.height. For ABR testing: throttle MockWebServer's response rate and assert the player switched to a lower rendition. For DRM tests: mock the license server endpoint and verify the correct init data is sent.