Coroutines & Flow
How Kotlin conquered Android concurrency โ from suspending functions and structured lifecycles to cold streams, hot state, and back-pressure.
The problem with threads
Before coroutines, Android concurrency was a battlefield. You had AsyncTask โ a class so error-prone it was deprecated in API 30 with no fanfare, just a quiet acknowledgment that it had caused enough damage. You had raw Threads, which you could create freely but couldn't cancel, coordinate, or automatically clean up. You had callbacks nested inside callbacks nested inside callbacks โ what the community affectionately called "callback hell." And every time you wanted to do something after a network call, you'd write code like this:
This code is hard to read, nearly impossible to handle errors in cleanly, and leaks memory if the user navigates away before all three calls complete. The callback for fetchProducts holds a reference to the enclosing Activity โ which might have been destroyed by the time it fires.
Threads themselves are expensive. Creating a thread allocates roughly 512KBโ1MB of stack memory. Android devices, especially mid-range ones, can't sustain hundreds of concurrent threads. And once you create a thread, you own it โ if you forget to stop it, it keeps running long after the work is done, burning CPU and battery.
Kotlin coroutines solve all of this. But to understand how, you first need to understand what a coroutine actually is.
What is a coroutine?
A coroutine is a unit of work that can be suspended and resumed without blocking a thread. When it hits a waiting operation โ a network call, a disk read, a timer โ it suspends, releases the thread it was running on, and lets that thread pick up other work. When the operation completes, the coroutine resumes โ picking up exactly where it left off, on the same or a different thread.
The important distinction is between blocking and suspending. When a thread blocks, it stops doing anything โ it just sits there consuming OS resources while waiting. When a coroutine suspends, the thread is freed entirely. Ten thousand coroutines can be simultaneously waiting on network responses while sharing just a handful of threads, because none of them are blocking while they wait.
This is what the suspend keyword means in Kotlin. Marking a function suspend tells the compiler: "this function may pause execution at some point." The compiler then transforms it into a state machine that can be paused and resumed. To the developer writing it, it reads like normal, sequential, synchronous code โ because it is. There are no callbacks, no .then() chains, no thread management. The concurrency is handled for you.
Compare this to the callback version at the top. Same logic, same three calls โ but here you can read it top to bottom, handle errors in one place, and return a value naturally. A suspend function behaves exactly like a regular function from the caller's perspective. The only difference is that it can pause โ and when it does, it does so without wasting a thread.
How coroutines work under the hood
Writing suspend fun feels magical โ the code looks synchronous but somehow doesn't block a thread. Most articles stop at the analogy. This section goes deeper: what the Kotlin compiler actually generates, what a Continuation is, and the exact moment a thread gets freed.
The Continuation โ the secret extra parameter
Every suspend function you write gets silently transformed by the Kotlin compiler. The transformation is called Continuation-Passing Style (CPS). The rule is simple: every suspend fun gets an extra parameter added at the end โ a Continuation.
What is a Continuation? It's an interface with one method:
Think of a Continuation as a packaged-up "what to do next." When a coroutine suspends, it takes everything it needs to resume later โ its local variables, which line to return to โ and bundles it into a heap-allocated object. The call stack is replaced by a chain of Continuation objects on the heap. That's how a coroutine can be suspended without occupying a thread โ there's no stack frame holding memory. The coroutine's state costs a few hundred bytes on the heap, not 512KBโ2MB of thread stack.
Imagine you're reading a book and get interrupted. You don't freeze in place holding the book open. You fold a bookmark into the page, put the book on the shelf, and go do something else. When you come back, you pick up the book, open to the bookmark, and continue reading exactly where you left off.
The Continuation is the bookmark. "Which page" (current state label), "what I was holding" (local variables saved as fields), and "what to do when I'm done" (the caller's continuation) are all encoded in it. The thread is the reader โ free to do other things while the bookmark waits on the shelf.
The state machine the compiler generates
CPS alone isn't the whole story. A function with multiple suspension points needs to remember which suspension point it's at when it resumes. The compiler transforms the function body into a state machine โ a when block with a numbered label that tracks progress. Each suspension point gets its own state number. Let's trace through a real example:
The critical lines are the two if (res == COROUTINE_SUSPENDED) return exits. That return unwinds the entire call stack โ every frame between this state machine and the thread's event loop returns. The thread is now free. The state machine object, with label = 1 and user still null, sits quietly on the heap consuming a few hundred bytes.
When fetchUser eventually completes on a background thread, it calls this.resumeWith(Result.success(user)). The Dispatcher intercepts this, wraps it in a Runnable, and posts it to the appropriate thread. That thread calls resumeWith() again โ this time label == 1, so the state machine picks up exactly where it left off.
suspendCancellableCoroutine โ the lowest level primitive
suspendCancellableCoroutine is the function that actually performs a suspension. Everything else โ delay(), Room queries, Retrofit calls โ ultimately calls this. It gives you the raw Continuation and lets you store it somewhere. When the async work completes, you call resume() on it.
This is the pattern for every async integration. Firebase, Bluetooth, Camera, Play Services โ any callback-based API can be made suspendable with this same skeleton:
How dispatchers schedule the resumed work
When continuation.resume(value) is called โ say, by the timer thread after a delay() โ the coroutine doesn't jump back to executing immediately. The coroutine's CoroutineContext contains a Dispatcher, and the dispatcher decides when and where to run the resumed work. For Dispatchers.Main on Android, it posts a Runnable to the main thread's Looper. For Dispatchers.IO, it submits the work to a shared thread pool.
Why 100,000 coroutines is fine but 100,000 threads crashes everything
Now the numbers make visceral sense. A suspended coroutine = one state machine object on the heap = ~200โ400 bytes. A thread = 512KBโ2MB of OS-allocated stack memory + kernel thread object. You can have a million suspended coroutines on a device that can sustain maybe 500 threads before the OS starts struggling.
A running coroutine occupies a thread. A suspended coroutine occupies only a heap object. The Dispatcher is just a queue of "continuations waiting to run." When your IO thread pool is full, new coroutines don't create new threads โ their continuations queue up and run as threads become available. This is why coroutines scale โ the number of actual OS threads stays small and bounded, while the number of logical concurrent tasks can be enormous.
Starting a coroutine: the three builders
You can't call a suspend function from regular code โ you need to be inside a coroutine to call one. This is by design. Kotlin's coroutine system enforces that someone is responsible for the coroutine's lifecycle. That someone is always a CoroutineScope, and the bridge between regular code and the coroutine world is a coroutine builder.
launch โ fire and forget
launch starts a new coroutine and returns a Job. You use it when you want to kick off some work and you don't need a result back. The calling code continues immediately โ launch doesn't wait for the coroutine to finish.
async โ when you need a result
async starts a coroutine and returns a Deferred<T> โ think of it as a promise of a future value. You call await() to get the result, which suspends the calling coroutine until the result is ready. The real power of async is running things in parallel.
runBlocking โ the testing bridge
runBlocking starts a coroutine and blocks the current thread until it completes. It's the escape hatch that lets regular code (like a unit test's main function or a test body) enter the coroutine world. In production code, you should almost never use it โ if you're blocking a thread, you've defeated the purpose of coroutines.
Calling runBlocking on the main thread in production code will freeze your UI. It's for tests and main functions only. In a ViewModel or repository, always use launch or async within an appropriate scope.
Dispatchers โ where your coroutine runs
When you launch a coroutine, you need to tell Kotlin which thread or thread pool it should run on. This is what Dispatchers are for. Think of a Dispatcher as a job board โ it decides which worker (thread) picks up a piece of work.
| Dispatcher | Thread Pool | Use For | Never Use For |
|---|---|---|---|
| Main | Single main thread | UI updates, ViewModel logic, anything touching Views | Network calls, disk I/O, heavy computation |
| IO | Up to 64 threads (or CPU count, whichever is higher) | Network requests, file reads/writes, Room DB queries | CPU-heavy work (use Default instead) |
| Default | CPU core count threads | JSON parsing, image decoding, list sorting, heavy computation | Blocking I/O (wastes CPU threads waiting) |
| Unconfined | Inherits from caller, then resumes wherever | Testing, very specific library use-cases | Production application code |
A crucial point: Room, Retrofit (with suspend functions), and Kotlin's own IO functions are already cooperative โ they suspend the coroutine without occupying a thread. When Room executes a query, it uses Dispatchers.IO internally and suspends your coroutine while the disk read is in progress. You don't need to manually switch dispatchers for suspend-aware libraries.
Structured Concurrency โ the rule that prevents leaks
This is the concept that separates developers who use coroutines from those who understand them. Structured concurrency is a simple rule with profound consequences:
Every coroutine must exist within a scope. When the scope is cancelled, all coroutines within it are cancelled automatically.
Imagine a project manager who starts sub-tasks. If the project manager quits (scope is cancelled), every sub-task they started is automatically cancelled โ nobody keeps working on a cancelled project. There are no orphaned tasks running in the background burning resources with no one checking on them.
Without structured concurrency, you'd have to manually track every coroutine you ever started and remember to cancel them all. You'd always forget one. That forgotten coroutine would silently leak memory โ holding a reference to your Activity or Fragment long after it was destroyed.
The three scopes you'll use every day
viewModelScope is tied to a ViewModel's lifecycle. When the ViewModel is cleared (which happens when the screen it belongs to is permanently gone), all coroutines in viewModelScope are automatically cancelled. This is the scope you'll use for 90% of coroutines in a well-architected app.
lifecycleScope is tied to a Fragment or Activity's lifecycle. It's useful for UI-layer work โ observing Flows, starting one-shot operations that should stop if the user navigates away. Be careful: it's also cancelled on rotation, so don't use it for work you want to survive configuration changes (use viewModelScope for that instead).
applicationScope / custom CoroutineScope is for work that should outlive any single screen โ uploading a photo, syncing in the background, an analytics flush. You create this manually and tie it to your Application class or a singleton service.
Job vs SupervisorJob โ failure propagation
Every coroutine is associated with a Job. Jobs form a tree โ a parent job and its children. How a failure in one child affects the tree depends on whether you're using a regular Job or a SupervisorJob.
With a regular Job, think of a team of mountaineers roped together. If one falls, they all fall. One coroutine throws an uncaught exception โ the whole scope is cancelled โ all sibling coroutines are cancelled too.
With a SupervisorJob, each mountaineer has their own safety harness. If one falls, the others continue climbing. One coroutine fails โ only that coroutine is affected โ siblings keep running.
This is why viewModelScope uses a SupervisorJob under the hood. If you have a ViewModel that loads a user profile and also tracks analytics, you don't want the analytics coroutine to cancel the profile-loading coroutine if it throws an exception.
Cancellation โ cooperative, not forceful
Coroutine cancellation is cooperative. Cancelling a coroutine doesn't immediately stop it โ it sends a signal that the coroutine should stop. The coroutine's code is responsible for checking this signal and stopping itself. This is a crucial distinction.
When a coroutine is cancelled, any subsequent call to a suspend function will throw a CancellationException. This exception is special โ it's silently ignored by the coroutine machinery (it's not an error, it's a controlled shutdown). If you catch Exception in your coroutine, make sure you don't swallow CancellationException:
What is Flow?
A suspend function returns one value. But what if you need multiple values over time โ a stream of database updates, a sequence of network pages, live UI events? That's what Flow is for.
A regular suspend function is like ordering a pizza โ you place the order, you wait, you get one pizza. A Flow is like a pizza subscription โ you subscribe once, and pizzas keep arriving over time whenever they're ready.
There's a crucial detail though: a Flow is cold โ the subscription doesn't exist until a collector asks for it. The pizzeria only starts making pizzas when someone subscribes. If nobody subscribes, no pizzas are made. This matters for understanding when your code actually runs.
Cold means the code inside a flow { } builder doesn't run until you call collect(). Every new call to collect() starts a fresh execution from the beginning. This is fundamentally different from the hot streams we'll see with StateFlow and SharedFlow.
Flow operators โ building pipelines
The real power of Flow comes from its operators โ functions that transform, filter, combine, and reshape streams. They're lazy: an operator doesn't process a value until a collector downstream asks for it. This creates a pipeline where data flows only when consumed.
Pay close attention to flatMapLatest โ it's one of the most important operators for real-world use. When a new value arrives, it cancels any ongoing collection of the previous inner flow and starts collecting the new one. For a search box, this means: if the user types "cor", starts a search, then immediately types "coro" โ the "cor" search is cancelled and "coro" takes over. You never see stale results from a superseded query.
Combining multiple flows
combine and zip merge two flows into one, but they behave differently:
In most UI use-cases, combine is what you want โ you want the UI to re-render whenever any of its data sources update. zip is for one-to-one processing, like pairing events with timestamps.
StateFlow & SharedFlow โ hot streams
Remember: regular Flow is cold โ it only runs when collected, and each collector gets an independent stream. But for UI state, you want something different: a value that exists and updates regardless of whether anyone is currently collecting it, and that new collectors can immediately get the current value from. That's what hot streams are for.
StateFlow โ the UI state container
A StateFlow is like a scoreboard at a sports game. The score exists and updates whether anyone is looking at it or not. When you sit down and look at the scoreboard for the first time, you see the current score โ you don't have to wait for the next goal. Everyone looking at the scoreboard always sees the same current value.
StateFlow is the modern replacement for LiveData. It always has a value, collectors always get the latest value immediately on subscription, and it only emits when the value actually changes (it skips duplicate values with structural equality).
Notice repeatOnLifecycle(STARTED). This is critical โ it stops collecting when the app goes to the background (Fragment is stopped) and restarts when it comes back to the foreground. Without it, you'd be processing UI updates and doing unnecessary work while your app is in the background.
SharedFlow โ for events
SharedFlow is a more flexible hot stream. Unlike StateFlow, it doesn't require an initial value and can replay multiple past values to new subscribers. This makes it perfect for one-shot events: showing a snackbar, navigating to a screen, displaying a dialog.
StateFlow: use when you're modeling state โ something that has a current value that persists over time (isLoading, user data, list contents). New subscribers should always get the current value.
SharedFlow: use when you're modeling events โ one-shot things that happen and shouldn't be replayed to late subscribers (navigation, snackbars, toasts). A "show snackbar" event collected twice would show two snackbars.
Channel โ the coroutine mailbox
A Channel is a communication primitive for sending values between coroutines. Think of it as a pipe: one coroutine sends values in one end, another coroutine receives them from the other end. Unlike SharedFlow, a channel item is consumed by exactly one receiver โ it's not broadcast to multiple collectors.
A Channel is a mailbox between two people. You can drop letters in it (send), and the recipient picks them up one by one (receive). Each letter is picked up by exactly one person โ it's not photocopied for everyone. And if the mailbox is full (buffer is full), the sender either waits, drops the letter, or drops the oldest letter, depending on the overflow policy.
Channels are particularly useful when you have a producer-consumer relationship: one part of your code generates work (events, tasks, data), and another part processes it. The channel buffers work between them and naturally handles back-pressure.
Production patterns you'll use every day
Observing Room as a Flow
Room's DAO supports returning Flow<T> from queries. This creates a live query โ whenever the data in the database changes, Room automatically re-runs the query and emits the new result. Combined with StateFlow in the ViewModel, this creates a fully reactive data pipeline with zero polling.
stateIn with WhileSubscribed(5000) is a pattern worth memorising. The 5-second timeout means: if all subscribers unsubscribe (e.g., because the user rotated the screen), keep the upstream flow alive for 5 seconds. If a new subscriber arrives within that window (the new Activity after rotation), they immediately get the last cached value without re-running the database query. This avoids the flash of an empty state on rotation.
Handling errors in a Flow pipeline
Errors in a Flow cancel the entire flow by default. The catch operator intercepts exceptions and lets you emit a fallback value โ keeping the stream alive rather than terminating it.
Testing coroutines
The biggest challenge when testing coroutines is time. Real code uses delay(), but in tests you don't want to wait real seconds. Kotlin provides TestCoroutineDispatcher (now called UnconfinedTestDispatcher and StandardTestDispatcher in the current API) and the runTest builder that lets you control virtual time.
For testing Flows, the Turbine library by Cash App is the gold standard. It turns the async nature of collecting a flow into a synchronous, readable test API:
How this connects to system design interviews
Every Android system design question you'll face at a senior level involves coroutines and Flow. Understanding them deeply lets you explain your choices with precision rather than hand-waving.
When designing an Analytics SDK, you explain: "The track() function calls channel.trySend() โ non-blocking because the Channel is bounded with DROP_OLDEST overflow. A background coroutine in a SupervisorJob-backed scope drains the channel to Room. The scope is tied to the application lifecycle so it outlives any single Activity."
When designing a Payment System, you explain: "The CheckoutOrchestrator uses a Mutex with tryLock() to prevent double-submission โ if the mutex is already held, the second startCheckout() call returns immediately. The checkout state is a StateFlow<CheckoutState> that the UI layer collects with repeatOnLifecycle(STARTED)."
The pattern is always the same: choose the right primitive (Channel, StateFlow, SharedFlow, Mutex), wire it into the right scope, and explain why each choice is correct. That's what senior-level answers look like.