Design a Networking SDK (like Retrofit)
A system design walkthrough of building a type-safe HTTP client library from scratch — covering annotation-driven API design, the request pipeline, interceptor chain, converters, auth token refresh, caching, and testing architecture.
Before drawing boxes, align with the interviewer on which perspective is being asked. This question comes in two flavours:
- SDK author view — "Design something like Retrofit that other teams at the company can drop in." You are the library builder. Focus: API ergonomics, extension points, and correctness guarantees. This is what this article covers.
- App networking layer view — "Design the networking architecture for our app." You are a consumer. Focus: interceptors, auth, error handling inside one codebase. Much simpler scope.
- Declare API calls as annotated Kotlin interface methods — no manual HTTP code
- Support
GET,POST,PUT,DELETEwith path params, query params, and a request body - Pluggable serialisation — JSON today, Protobuf tomorrow, no SDK change needed
- Middleware/interceptor chain — auth, logging, retry without touching call sites
- Async-first — suspend functions,
Result<T>, andFlow<T>return types - Token refresh — transparent re-auth on 401 without callers knowing
- HTTP caching — respect
Cache-Control, avoid redundant network calls - Test-friendly — swap real network for a mock with zero production code changes
- WebSocket / SSE (separate transport concern)
- GraphQL query language support
- File download progress / resumable uploads beyond multipart
- Zero reflection at steady-state — parse annotations once, cache the result
- R8/ProGuard safe — ideally compile-time codegen rather than runtime proxies for public SDKs
- Thread-safe — multiple coroutines sharing one client must not race
- Minimal dependencies — OkHttp is a default engine, but it should be swappable
- Testable in isolation — every layer independently unit-testable
Nine building blocks compose the entire SDK. Understanding what each one owns is the foundation of the design.
| Entity | Responsibility | Owned by |
|---|---|---|
| NetworkClient | Builder-configured root object. Creates service proxies. Holds all registered interceptors, converters, and the engine. | SDK caller (singleton) |
| ServiceProxy | JVM dynamic proxy that intercepts method calls on your annotated interface and turns them into HTTP requests. | SDK — created via client.create<T>() |
| ServiceMethod | Parsed, cached representation of one interface method — HTTP verb, URL template, which arg maps to path/query/body. | SDK — cached on first call |
| HttpRequest | Immutable value object: method, full URL, headers, serialised body. Passed through the interceptor chain. | SDK internal |
| HttpResponse | Immutable value object: status code, headers, response body (bytes or stream). | SDK internal |
| Interceptor | Single-method interface. Can read/mutate the request before forwarding, and read/mutate the response on the way back. | SDK + caller custom |
| Converter | Serialises a Kotlin object → RequestBody (outbound) or ResponseBody → Kotlin object (inbound). | SDK + caller (e.g. JSON factory) |
| CallAdapter | Bridges a raw HttpResponse to whatever return type the method declares: plain T, Result<T>, or Flow<T>. | SDK built-in |
| HttpEngine | Single-method interface: execute(HttpRequest) → HttpResponse. OkHttp in production; MockEngine in tests. | SDK interface, OkHttp default |
The SDK has four distinct layers. The caller only ever touches the top layer; everything else is an implementation detail they never see.
Key design principle: Each layer depends only on the layer below it via a stable interface. The caller never imports OkHttp. The pipeline never imports OkHttp. Only the engine layer does. This is what makes the engine swappable and the entire stack testable.
The best SDK feels like magic to the caller. They write a plain Kotlin interface with annotations and get a fully working HTTP client back. The guiding principle is zero boilerplate — no request builders, no callback hell, no serialisation glue in application code.
An annotated interface declares what to call, not how to call it. The SDK generates all the wiring at runtime (or compile time with KSP).
| Annotation | Where used | Purpose |
|---|---|---|
@GET, @POST, @PUT, @DELETE | Function | HTTP method + path template e.g. "/users/{id}" |
@Path("id") | Parameter | Replaces {id} placeholder in the URL — URL-encoded automatically |
@Query("page") | Parameter | Appended as ?page=value — null values are skipped |
@Body | Parameter | Serialised to request body via the active Converter |
@Header("X-Trace") | Parameter | Added as a per-call request header |
@Headers("Accept: ...") | Function | Static headers added to every call of this method |
@Multipart | Function | Switches body encoding to multipart/form-data |
@Streaming | Function | Response body streamed rather than buffered — for large downloads |
| Return type | Behaviour | Best for |
|---|---|---|
suspend fun … : T | Throws HttpException on non-2xx. Deserialises body to T. | Simple happy-path calls in a try/catch or runCatching |
suspend fun … : Result<T> | Never throws. Success or failure wrapped in Result. | ViewModel / UI layer that handles errors with fold |
suspend fun … : Unit | Discards body. Throws on non-2xx. Typical for DELETE / fire-and-forget POST. | Mutations where the response body is irrelevant |
fun … : Flow<T> | Wraps repeated polling calls in a Flow that emits on each interval. | Live-updating data (status polling, score tickers) |
When the caller invokes userApi.getUser("u1"), the SDK runs through a deterministic sequence of steps before the network is ever touched — and another sequence on the way back. This is the core of the LLD.
Why cache ServiceMethod? Parsing annotations with reflection is expensive (~1ms). The ConcurrentHashMap<Method, ServiceMethod> cache means that cost is paid exactly once per method across the entire app lifetime, no matter how many calls are made.
The interceptor chain is the SDK's most powerful extension point. It follows the Chain of Responsibility pattern: each interceptor receives the request, can modify it, calls chain.proceed() to pass it forward, and then sees the response on the way back. This symmetric in/out design is what makes auth, logging, retry, and caching all implementable without touching each other.
- Auth first — it needs to see the raw request before any other modification, and it needs to retry after token refresh with a fresh token
- Cache second — it can short-circuit and return without ever reaching the engine; it must see the authenticated request (some caches are user-scoped)
- Retry before Logging — so logging records each individual attempt, not just the final outcome
- Logging last before engine — it sees the final, fully-decorated request as it will actually be sent
Interview trap: Interviewers often ask "what if two requests both get a 401 at the same time?" The naive answer — both refresh the token — causes two token refresh calls, and one of them will fail because the refresh token gets invalidated. The correct answer is a Mutex inside TokenManager.refreshToken() — see Section 8.
Converters are the answer to "how does the SDK avoid being coupled to any particular JSON library?" They use the Abstract Factory pattern — the SDK defines the contract, callers plug in implementations, and the registry picks the right one at runtime.
Each Converter.Factory handles two directions: serialising a Kotlin object to a RequestBody (outbound), and deserialising a ResponseBody back to a Kotlin object (inbound). A factory returns null when it cannot handle a given type — the registry moves to the next one.
Token management is where most networking SDKs get it wrong. There are three problems to solve: where to store the token safely, how to attach it transparently, and how to handle concurrent refresh without hitting the auth endpoint multiple times.
| Storage option | Risk | Verdict |
|---|---|---|
| In-memory only | Token lost on process death — user re-logs on every app restart | ❌ Bad UX |
| Plain SharedPreferences | XML file readable by root or via backup exploit | ❌ Insecure |
| EncryptedSharedPreferences | AES-256-GCM; key in Android Keystore (hardware-backed API 23+) | ✅ Recommended |
| Keystore direct (no value storage) | Keystore stores keys, not arbitrary strings — must pair with EncryptedSharedPreferences | ⚠ Pair with above |
Never use the main NetworkClient for token refresh. The refresh request would go through the AuthInterceptor, which would see a 401 and try to refresh again — infinite loop. Use a separate raw HttpEngine instance with no interceptors for the auth endpoint.
The cache interceptor implements standard HTTP cache semantics — the same logic browsers use. A cache hit means zero network usage; a conditional request (304) means network metadata only, not body bytes.
The HttpEngine interface is what makes the entire SDK testable without a real server. In tests you swap in MockEngine — a simple queue of canned responses. No mocking frameworks, no HTTP stubs on a local port, no network flakiness.
What makes this design powerful for testing:
- Interceptors are tested with the real chain — not mocked. A test can verify that
AuthInterceptoractually attaches a token by checkingmockEngine.recordedRequests[0].headers["Authorization"] - Converters are exercised — the mock returns real JSON strings that go through the real
KotlinxJsonConverterFactory - Error paths are trivially testable — just enqueue a 401 or 500 response and assert your error handling works
- Latency simulation —
MockEnginesupports adelayMsper response for testing loading states and timeouts
This is a classic "how does it actually work?" follow-up at Staff level. There are two fundamentally different approaches to turning the annotation-decorated interface into working code.
| Dimension | Dynamic Proxy | KSP Codegen |
|---|---|---|
| When errors are caught | Runtime — first method call | Compile time — build fails immediately |
| R8 / ProGuard | Needs keep rules for all annotated interfaces | ✅ Generated code is plain Kotlin, fully shrinkable |
| Reflection cost | Once per method (cached), negligible after | Zero — no reflection at all |
Supports inline / reified | ❌ — proxy can't use them | ✅ — generated code can be inlined |
| Build complexity | None | Requires KSP Gradle plugin + processor module |
| KMP (Kotlin Multiplatform) | ❌ JVM-only, iOS/WASM don't have dynamic proxies | ✅ Works on all targets |
| Choose when | Internal tools, prototype, JVM-only app | Public SDK, KMP, aggressive minification |
- Explain the caller-facing API: annotations, interface,
create<T>() - Describe dynamic proxy mechanics at a high level
- Name 2–3 interceptors and what they do
- Know the difference between
suspend TandResult<T> - Understand why
HttpEngineis an interface (testability)
- Design the full interceptor chain with correct ordering and reasons
- Explain the concurrent 401 race and the Mutex + version counter fix
- Design the
Converter.Factoryregistry and explain factory ordering - Explain
suspendCancellableCoroutineand OkHttp call cancellation - Design the caching layer: ETag, 304, TTL, cache-only vs network-only modes
- KSP processor design: symbol resolution, incremental processing, code emission
- Full thread-safety analysis across every SDK class
- SDK versioning: binary compatibility, shim layers, deprecation strategy
- KMP strategy: why dynamic proxy fails on non-JVM targets
- Observability: metrics interceptor, URL pattern normalization for low-cardinality
- Certificate pinning: where it lives, how to rotate pins without a release
client.create<UserApi>() actually work at runtime? ▾It calls Proxy.newProxyInstance() with your interface's class. The JVM generates a synthetic class. Every method call is routed to an InvocationHandler that looks up the cached ServiceMethod, runs RequestBuilder to build the HttpRequest, then dispatches through the interceptor chain via the CallAdapter. Reflection is only paid on the first call; subsequent calls hit the ConcurrentHashMap cache.
Use a Mutex inside TokenManager.refreshToken(). The first coroutine acquires the lock and calls the auth endpoint. The other two suspend at the mutex. To avoid a second refresh when they wake up, use a tokenVersion counter. Before acquiring the lock, each coroutine reads the current version. After acquiring, if the version has changed, someone else already refreshed — return the existing fresh token without hitting the network again.
The ViewModel launches the call inside viewModelScope, which is cancelled in onCleared(). Because every SDK method is suspend, the CancellationException propagates naturally. Inside OkHttpEngine, the OkHttp Call is started with suspendCancellableCoroutine { cont → cont.invokeOnCancellation { call.cancel() } } — so the underlying socket is actually closed, not just abandoned.
Auth must go first because it may need to retry the entire request after a token refresh — it wraps the whole downstream chain. Logging goes last (just before the engine) because it should see the final, fully-decorated request exactly as it will be sent over the wire, including the Authorization header added by auth. If logging were first, it would see a request without credentials and give misleading logs.
Implement Converter.Factory and return non-null converters for types that have a Protobuf descriptor. Register it via Builder.addConverter(ProtobufFactory()). The registry walks factories in registration order; your factory claims proto types, the JSON factory claims everything else. Zero SDK changes — this is the Open/Closed principle in action.
Use KSP when: (1) shipping a public SDK where callers may use aggressive R8 rules — generated code survives shrinking without keep rules; (2) targeting Kotlin Multiplatform — iOS and WASM don't have JVM dynamic proxies; (3) you want annotation errors caught at build time rather than crashing on first call in production. Use dynamic proxy when building an internal tool quickly — no KSP processor module to maintain.
The cache interceptor detects a stale entry (max-age exceeded but we have an ETag). It sends a conditional request with If-None-Match: <etag>. If the server responds 304, the cache interceptor intercepts before the response reaches the call adapter, fetches the previously stored body from disk, bumps the TTL, and returns the cached HttpResponse. The call adapter never sees a 304 — it sees a 200 with the fresh cached body.
Keep: annotation-based interface (zero boilerplate), the Converter.Factory registry (open for extension), and the CallAdapter pattern (any return type without SDK changes).
Change: Remove the Call<T> wrapper — it exists for historical RxJava reasons and adds noise in modern coroutine code. Add a first-class HttpEngine interface so OkHttp is swappable (KMP support). Add built-in Result<T> and unified error types rather than leaving error handling to each caller. Ship MockEngine as a first-class artifact, not a third-party extra.
Add a MetricsInterceptor that records System.nanoTime() before and after chain.proceed(), then calls a pluggable MetricsReporter interface. For the URL label, use the original pathTemplate from ServiceMethod stored in HttpRequest.tag — e.g. /users/{id} instead of /users/abc123. This keeps metric cardinality low (one series per endpoint, not one per user ID).
At the OkHttpEngine layer — it's a TLS concern, not an application-layer one. A CertificatePinner is passed to the OkHttpClient.Builder. Always pin two certificates: the active leaf and a backup (the next leaf or the signing CA). If you pin only one, rotating becomes a zero-downtime problem. Deploy the new backup pin in SDK v1, then once all clients have updated, rotate the primary cert — all v1+ clients already trust it.