πŸ’³ System Design Hard 2025–26

Design a Payment System

A comprehensive Android system design covering the client-side payment SDK (checkout orchestration, card tokenisation, PCI DSS compliance), UPI intent flows, wallet integration, and the server-side payment gateway β€” including idempotency, transaction state machines, double-charge prevention, and reconciliation.

Understanding the Problem

Payment systems are where software failures have immediate financial consequences. An interviewer posing this question is testing whether you understand the three-sided complexity: the Android client (security, UX, state recovery), the payment gateway (idempotency, atomic state transitions), and the PSP integration (UPI, card networks, wallets). The critical insight: the client must never be trusted as the source of truth for payment status β€” always verify with the server, always assume the network can fail mid-transaction.

βœ… Functional Requirements
  • Support three payment methods: saved/new card (tokenised, PCI-safe), UPI (intent-based via installed UPI apps), and wallet (in-app balance)
  • Initiate a checkout flow from any screen β€” product detail, cart, subscription β€” with a consistent UI
  • Track transaction status end-to-end: from initiation through PSP processing to final confirmation
  • Retry transient failures automatically; surface permanent failures (card declined, invalid VPA) to the user immediately
  • Support refunds: full and partial, initiated server-side, reflected in transaction history
  • Allow users to save payment methods (card tokens, linked UPI VPAs) for one-tap future payments
  • Support split payments: e.g., β‚Ή200 from wallet + β‚Ή800 from UPI
πŸ“ Non-Functional Requirements
  • PCI DSS compliance: raw PAN and CVV must never touch application code, logs, or local storage β€” only PSP-issued tokens are stored
  • Idempotency: a network retry must never cause a double charge β€” guaranteed by stable idempotency keys and server-side deduplication
  • Checkout latency: payment sheet must open in <500ms; total checkout flow including PSP round-trip <3s on 4G
  • Zero-trust client: payment status is determined only by the server β€” a malicious client cannot report a fake "success"
  • Crash recovery: if the app is killed during a payment, the transaction must be recoverable on next launch without requiring user re-entry
  • GDPR / RBI compliance: payment data must be localised (India), consent required for saving card tokens, 7-year audit log retention
Out of scope
  • EMI / BNPL schemes, international cards, multi-currency
  • Merchant dashboard, settlement reporting, dispute resolution

The Set Up
Architecture Principles

Before drawing any boxes, establish the non-negotiable rules that every design decision must respect:

  • Never store raw card data. PAN, CVV, and expiry must never appear in your database, logs, or crash reports. Tokenise at the point of capture using the PSP's hosted fields or direct tokenisation API. Store only the PSP-issued token.
  • Idempotency keys are immutable. Generate one UUID per payment intent and keep it for the lifetime of that transaction β€” including all retries. The server uses it to return the same result rather than processing twice.
  • Client is not the source of truth. The UPI intent result, the card payment callback, and any local status β€” all of these are hints to be verified with a server status poll. The database is updated only after server confirmation.
  • State machine, not ad-hoc flags. Transaction state transitions must be an explicit enum, not a collection of booleans. Only valid transitions are allowed; invalid ones are logged as anomalies.
  • Certificate pinning for all payment endpoints. Standard TLS is insufficient for financial APIs β€” pin the leaf certificate and rotate with a shadow pin strategy.
Data Models

These are the entities stored locally on the device. Sensitive data (tokens) are encrypted at rest using Android Keystore before writing to Room.

Transaction @Entity
StringtxnId@PrimaryKey, UUID
StringorderIdserver order ref
LongamountPaisenever floating point
TransactionStatusstatusstate machine
PaymentMethodmethodCARD / UPI / WALLET
StringidempotencyKeystable across retries
IntretryCountfor exponential backoff
LongcreatedAt
LongupdatedAt
String?failureReasonfor declined/error
SavedPaymentMethod @Entity
StringmethodId@PrimaryKey
PaymentMethodtypeCARD / UPI
StringdisplayLabele.g. "Visa β€’β€’4242"
StringencryptedTokenAES-GCM, Keystore key
String?vpaUPI Virtual Payment Address
BooleanisDefault
LongsavedAt
UpiTransaction (embedded)
Stringvpapayer VPA
StringtxnRefIdfrom UPI intent response
String?rrnbank reference number
StringupiStatusSUCCESS / FAILURE / PENDING
StringupiAppPackagewhich UPI app was used
WalletInfo (server DTO)
LongbalancePaisecached, not authoritative
BooleanisKycCompleteKYC gating
LongdailyLimitPaiseRBI β‚Ή10k/day limit
LongcachedAtstaleness check

🚨 Never store amounts as Float or Double. Floating-point arithmetic causes rounding errors that translate directly to financial discrepancies. Always store monetary amounts in the smallest currency unit (paise in India) as a Long. Only format to a decimal string at the display layer.

The TransactionStatus state machine defines which transitions are valid. Any attempt to move to a non-adjacent state is an error and must be logged for audit:

INITIATED PROCESSING PENDING_VERIFY SUCCESS FAILED REFUNDED PSP call await callback / poll verified βœ“ declined / error refund TIMED_OUT 30s timeout
TransactionStatus state machine β€” only shown transitions are valid; all others throw an IllegalStateException

High-Level Design
System Components

The system has three distinct layers. Understanding which layer owns which responsibility is the first thing an interviewer evaluates:

πŸ“±
Android Payment SDK
Checkout UI, tokenisation, UPI intent handling, local transaction persistence, status polling
🏦
Payment Gateway
Order creation, idempotency store, PSP routing, transaction ledger, reconciliation, fraud scoring
πŸ”Œ
PSP / Bank Networks
Razorpay / Stripe / NPCI (UPI), card network (Visa/Mastercard), bank core banking systems
Android SDK PaymentSdk (root) CheckoutOrchestrator Tokenization UpiHandler TxnRepository PaymentTransport WalletManager Room DB (encrypted) Payment Gateway Order Service Idempotency Store (Redis) PSP Router Fraud Engine Transaction Ledger (Postgres) Notification Service Reconciliation (batch) PSP / Networks Razorpay / Stripe NPCI (UPI / IMPS) Visa / Mastercard Core Banking Systems HTTPS + pin PSP SDK webhook
Three-layer architecture β€” Android SDK ↔ Payment Gateway ↔ PSP/Networksdashed = async webhook / callback
Payment Method Flows

Each payment method has a fundamentally different flow. Understanding all three β€” and their failure modes β€” is what interviewers test at the senior level.

MethodInitiationAuthResultVerification
Card Collect card details via PSP hosted fields β†’ tokenise 3DS2 OTP in WebView or redirect Synchronous PSP callback Server confirms with PSP webhook
UPI Build upi://pay?... URL β†’ launch intent chooser User authenticates inside UPI app (PIN) onActivityResult intent extras Server polls NPCI for txnRefId status
Wallet Load wallet balance β†’ confirm if sufficient Optional: app PIN for high-value txns Synchronous server response Ledger debit is atomic; no PSP needed

⚠️ Never trust the UPI intent result directly. The onActivityResult extras (Status=SUCCESS) are provided by the UPI app and can be tampered with. Always poll the NPCI status API with the txnRefId returned in the intent β€” the server confirms the actual bank debit before marking the transaction successful.

Server-Side Gateway Design

The payment gateway has five critical responsibilities that prevent double-charging, data corruption, and fraud:

  • Idempotency store (Redis): Before forwarding any payment to a PSP, check if the idempotencyKey already exists. If yes, return the cached response immediately. If no, process and store result atomically. TTL = 24 hours. This is the single mechanism that prevents double charges on network retries.
  • Transaction ledger (Postgres): Double-entry bookkeeping β€” every payment creates two rows: a debit on the customer account and a credit on the merchant account. The two rows share a transactionId and must be written in a single database transaction. If either fails, neither commits (atomicity).
  • PSP Router: Routes to Razorpay for UPI/cards, direct NPCI for high-volume UPI, or Stripe for international. Maintains PSP health scores and falls back to secondary PSP if primary returns 5xx for >5 consecutive requests.
  • Webhook handler: PSPs send status updates asynchronously (payment confirmed, refund processed). The webhook must be idempotent β€” duplicate deliveries (PSPs retry webhooks) must not double-update the ledger. Use a webhook event ID as a deduplication key.
  • Reconciliation (daily batch): Compare your ledger with the PSP's settlement report. Flag discrepancies β€” a payment that succeeded at the PSP but failed on your side, or vice versa. These are the "stuck transactions" that require manual resolution.

Low-Level Design

The SDK is composed of seven classes with strict one-directional dependencies. Each class owns exactly one responsibility and is independently testable. The diagram below shows every class, its key methods, and how they wire together at runtime.

PaymentSdk init(context, config) checkout(orderId, amount) recoverPendingTxns() CheckoutOrchestrator startCheckout(method) onUpiResult(intent) CheckoutState machine WalletManager fetchBalance(): WalletInfo pay(amount): TxnResult @Volatile cachedBalance TokenizationService tokenize(cardData) getToken(methodId) PSP hosted fields UpiIntentHandler buildUpiUrl(params) parseResult(intent) resolveInstalledApps() PaymentTransport createOrder() charge(idempotencyKey) pollStatus(txnId) TransactionRepository save(transaction) updateStatus(id, status) Room @Dao PaymentDatabase OkHttpClient
Class dependency graph β€” arrows show constructor dependencies injected at SDK initdashed = infrastructure dependencies
0) SDK Initialization β€” the composition root

All dependencies are wired in a single init() factory method. No dependency is created anywhere else β€” this is constructor injection without a DI framework, keeping the SDK self-contained and testable with fakes.

// PaymentSdk.kt β€” composition root class PaymentSdk private constructor( val orchestrator: CheckoutOrchestrator ) { companion object { fun init(context: Context, config: PaymentConfig): PaymentSdk { val db = PaymentDatabase.build(context) val okHttp = OkHttpClient.Builder() .certificatePinner(CertificatePinner.Builder() .add(config.apiHost, config.certPin) // cert pin .add(config.apiHost, config.shadowPin) // rotation shadow .build()) .build() val transport = PaymentTransport(okHttp, config.apiBase, config.apiKey) val txnRepo = TransactionRepository(db.txnDao()) val tokenizer = TokenizationService(config.pspPublicKey) val upiHandler = UpiIntentHandler(context) val wallet = WalletManager(transport) val orchestrator = CheckoutOrchestrator( txnRepo, tokenizer, upiHandler, transport, wallet ) orchestrator.recoverPendingTransactions() // reset IN_FLIGHT on startup return PaymentSdk(orchestrator) } } }
1) CheckoutOrchestrator β€” the state machine

The orchestrator owns the checkout state machine and coordinates all other classes. It exposes a single public entry point β€” startCheckout() β€” and a Flow<CheckoutState> that the UI layer observes. A Mutex ensures only one checkout can be active at a time, preventing double-submission.

UI / VM Orchestrator Tokenization Transport TxnRepository startCheckout(method, orderId) tryLock() β€” reject if already active save(txn: INITIATED) emit(COLLECTING) tokenize(cardData) PaymentToken createOrder(orderId, amount) gatewayOrderId charge(token, idempotencyKey) updateStatus(PROCESSING) PSP response (or poll) updateStatus(SUCCESS / FAILED) emit(COMPLETE / ERROR)
Card payment checkout sequence β€” Mutex ensures single active checkout; status written to Room at every state transition
class CheckoutOrchestrator( private val txnRepo: TransactionRepository, private val tokenizer: TokenizationService, private val upiHandler: UpiIntentHandler, private val transport: PaymentTransport, private val wallet: WalletManager ) { private val scope = CoroutineScope(SupervisorJob() + Dispatchers.IO) private val checkoutMutex = Mutex() // prevents double-submission private val _state = MutableStateFlow<CheckoutState>(CheckoutState.Idle) val state: StateFlow<CheckoutState> = _state.asStateFlow() fun startCheckout(method: PaymentMethod, orderId: String, amountPaise: Long) { if (!checkoutMutex.tryLock()) { // already processing β€” ignore duplicate tap return } scope.launch { try { val txn = Transaction( txnId = UUID.randomUUID().toString(), orderId = orderId, amountPaise = amountPaise, status = TransactionStatus.INITIATED, method = method, idempotencyKey = UUID.randomUUID().toString(), // stable for retries createdAt = System.currentTimeMillis() ) txnRepo.save(txn) _state.value = CheckoutState.Processing val result = when (method) { is PaymentMethod.Card -> processCard(txn, method) is PaymentMethod.Upi -> processUpi(txn, method) is PaymentMethod.Wallet -> processWallet(txn) } txnRepo.updateStatus(txn.txnId, result.status) _state.value = if (result.status == TransactionStatus.SUCCESS) CheckoutState.Success(txn.txnId) else CheckoutState.Failed(result.failureReason) } finally { checkoutMutex.unlock() } } } fun recoverPendingTransactions() = scope.launch { // On startup: reset any IN_FLIGHT txns from a prior process kill txnRepo.resetInFlightToInitiated() } }
2) TokenizationService β€” PCI DSS at the boundary

PCI DSS requires that raw card data (PAN, CVV) never enter your application stack. TokenizationService is the single class that touches card details, and it immediately hands them to the PSP's tokenisation API. The class has no state and no persistence β€” it returns a PSP-issued token that is safe to store, log, and transmit.

🚨 The PSP hosted fields approach is safest. Rather than collecting card details in your own EditText views, use the PSP's Android SDK to render input fields in an isolated WebView or native component that communicates directly with the PSP server. Your application code never sees the PAN β€” only the resulting token. This reduces your PCI scope from SAQ D (complex) to SAQ A (minimal).

class TokenizationService(private val pspPublicKey: String) { /** * Tokenizes card data using the PSP's tokenisation endpoint. * Raw PAN/CVV exist in memory ONLY for the duration of this call. * The returned PaymentToken is safe to persist in Room. */ suspend fun tokenize(cardData: CardInput): Result<PaymentToken> { return try { val encrypted = encryptWithPspPublicKey(cardData, pspPublicKey) val response = pspApi.tokenize(encrypted) // HTTPS to PSP only cardData.clear() // zero memory immediately Result.success(PaymentToken( token = response.token, last4 = response.last4, // display-safe network = response.network, expiryMonth = response.expiryMonth, expiryYear = response.expiryYear )) } catch (e: Exception) { cardData.clear() // always clear on failure too Result.failure(e) } } private fun encryptWithPspPublicKey(card: CardInput, key: String): String { // RSA-OAEP encrypt PAN+CVV using PSP's JWK public key // PSP decrypts on their server β€” your servers never see plaintext val jwk = ECKey.parse(key) return JWEObject(JWEHeader(JWEAlgorithm.RSA_OAEP_256, EncryptionMethod.A256GCM), Payload(card.toJson())).also { it.encrypt(RSAEncrypter(jwk)) }.serialize() } }
3) UpiIntentHandler β€” the intent-based payment flow

UPI on Android works via Intent β€” your app fires a upi://pay deep link, the OS shows an app chooser (PhonePe, GPay, Paytm…), the user authenticates in their UPI app, and the result is returned via onActivityResult. The flow has three critical design requirements: build the URL correctly, handle the case where no UPI app is installed, and never trust the intent result.

Android SDK UPI App Our Server NPCI Payer Bank POST /orders β†’ gatewayOrderId startActivityForResult(upi://pay?...) collect UPI PIN β†’ route via NPCI debit payer account txn success + RRN txnRefId + status onActivityResult(txnRefId, Status=SUCCESS) ← DO NOT TRUST POST /verify?txnRefId= (server polls NPCI) { status: SUCCESS, rrn: "..." } ← authoritative update Room: SUCCESS + rrn
UPI payment flow β€” client result is always verified server-side before status is written to Room
class UpiIntentHandler(private val context: Context) { fun buildPayUri(params: UpiPaymentParams): Uri = Uri.Builder() .scheme("upi").authority("pay") .appendQueryParameter("pa", params.payeeVpa) // payee VPA (merchant) .appendQueryParameter("pn", params.payeeName) .appendQueryParameter("am", params.amountRupees) // "199.00" β€” UPI expects rupees .appendQueryParameter("cu", "INR") .appendQueryParameter("tr", params.txnRefId) // our transaction reference .appendQueryParameter("tn", params.note) .build() fun getInstalledUpiApps(): List<ResolveInfo> { val intent = Intent(Intent.ACTION_VIEW, buildPayUri(UpiPaymentParams.dummy())) return context.packageManager.queryIntentActivities(intent, PackageManager.MATCH_DEFAULT_ONLY) } /** * Parse onActivityResult extras β€” returns a HINT only. * MUST be verified server-side before trusting the status. */ fun parseResult(data: Intent?): UpiIntentResult { if (data == null) return UpiIntentResult.NoResponse // app killed/back pressed val status = data.getStringExtra("Status") ?: "" val txnRefId = data.getStringExtra("txnRef") ?: "" return when (status.uppercase()) { "SUCCESS" -> UpiIntentResult.PossibleSuccess(txnRefId) // needs server verify "FAILURE" -> UpiIntentResult.ClientFailure(txnRefId) else -> UpiIntentResult.Unknown(txnRefId) // always poll server } } }
4) PaymentTransport β€” idempotency and safe retry

PaymentTransport is the boundary between your app and the payment gateway. Every mutating request (create order, charge, refund) carries an idempotencyKey header. The retry logic distinguishes safe-to-retry errors (network timeout, 5xx) from never-retry errors (4xx: card declined, invalid VPA, insufficient funds). Retrying a declined card doesn't help and frustrates the user.

class PaymentTransport( private val okHttp: OkHttpClient, private val baseUrl: String, private val apiKey: String ) { private val MAX_RETRIES = 3 suspend fun charge( orderId: String, token: PaymentToken, amountPaise: Long, idempotencyKey: String // stable UUID β€” same on every retry ): ChargeResult { var attempt = 0 while (true) { val response = executeCharge(orderId, token, amountPaise, idempotencyKey) when { response.isSuccessful -> return parseSuccess(response) response.code in 400..499 -> { // 4xx: permanent failure β€” card declined, invalid token, fraud block // DO NOT RETRY β€” return immediately with failure reason return ChargeResult.Declined(parseDeclineCode(response)) } attempt >= MAX_RETRIES -> return ChargeResult.NetworkError else -> { // 5xx or IOException β€” safe to retry with same idempotencyKey attempt++ val backoff = (1000L * (attempt * attempt)) + Random.nextLong(500) // jitter delay(backoff) } } } } suspend fun pollStatus(txnId: String): TransactionStatus { // Used for UPI (async result) and card 3DS flows // Polls with exponential backoff: 1s, 2s, 4s, 8s, 16s (max 30s total) val deadline = SystemClock.elapsedRealtime() + 30_000L var interval = 1000L while (SystemClock.elapsedRealtime() < deadline) { val result = fetchStatus(txnId) if (result != TransactionStatus.PENDING_VERIFY) return result delay(interval) interval = (interval * 2).coerceAtMost(8000L) } return TransactionStatus.TIMED_OUT } }
5) TransactionRepository β€” encrypted local persistence

The repository wraps the Room DAO and adds two responsibilities: encryption of sensitive fields (tokens are encrypted with an Android Keystore-backed key before writing) and startup recovery (resetting any IN_FLIGHT transactions left by a process kill). Payment data must be readable across app restarts β€” a transaction started in a previous session must be recoverable without user re-entry.

class TransactionRepository(private val dao: TransactionDao) { private val keystore = AndroidKeyStoreEncryptor("payment_keystore_alias") suspend fun save(txn: Transaction) { val encrypted = txn.copy( // encrypt token before Room insert β€” key lives in Keystore, never exported encryptedToken = keystore.encrypt(txn.rawToken ?: ""), rawToken = null // never persist raw token ) dao.insert(encrypted) } suspend fun updateStatus(txnId: String, status: TransactionStatus, reason: String? = null) { // Validate transition before writing β€” throw on illegal state move val current = dao.getById(txnId)?.status ?: return require(current.canTransitionTo(status)) { "Illegal transition: $current β†’ $status for txn $txnId" } dao.updateStatus(txnId, status, reason, System.currentTimeMillis()) } /** * Called on every SDK init. Resets transactions stuck in PROCESSING/IN_FLIGHT * from a previous process kill. They will be re-verified on next network call. */ suspend fun resetInFlightToInitiated() { dao.resetStatusWhere( from = listOf(TransactionStatus.PROCESSING, TransactionStatus.PENDING_VERIFY), to = TransactionStatus.INITIATED, before = System.currentTimeMillis() - 5 * 60 * 1000L // older than 5 min ) } }
6) WalletManager β€” balance and atomic deduction

Wallet balance is stored server-side β€” the client caches it for display only and must never use the cached balance to decide whether to allow a payment. The cache is a UX hint ("you have β‚Ή450 in your wallet") β€” the server enforces the actual balance atomically at deduction time using optimistic locking. This prevents the classic double-spend race condition where two simultaneous payments both see sufficient balance before either deducts.

class WalletManager(private val transport: PaymentTransport) { @Volatile private var _cachedBalance: WalletInfo? = null private val CACHE_TTL_MS = 5 * 60 * 1000L // 5 minutes suspend fun getBalance(forceRefresh: Boolean = false): WalletInfo { val cached = _cachedBalance if (!forceRefresh && cached != null && (System.currentTimeMillis() - cached.cachedAt) < CACHE_TTL_MS) { return cached } return transport.fetchWalletInfo().also { _cachedBalance = it } } suspend fun pay( orderId: String, amountPaise: Long, idempotencyKey: String ): WalletPayResult { // Server checks real balance and deducts atomically using optimistic lock: // UPDATE wallet SET balance = balance - amount, version = version + 1 // WHERE id = ? AND balance >= amount AND version = ? val result = transport.walletCharge(orderId, amountPaise, idempotencyKey) if (result.success) _cachedBalance = null // invalidate cache after deduction return result } }

Edge Cases

These are the eight scenarios that cause real production incidents in payment systems. Each one has either resulted in customer double-charges, silent payment failures, or financial reconciliation discrepancies at scale.

πŸ’Έ Double Charge on Network Retry

The SDK sends a charge request. The server processes it, debits the customer's card, and sends a 200 OK. The response is lost in transit. The SDK times out, treats it as a 5xx, and retries. The server processes the second request β€” now the customer is charged twice. This is the most catastrophic payment failure mode.

Fix: Every charge request includes an Idempotency-Key header containing a UUID that is generated once when the transaction is created (at INITIATED state) and is immutable for the lifetime of that transaction. The server stores (idempotencyKey β†’ chargeResult) in Redis before returning the response. On receipt of a duplicate key, it returns the cached result immediately without processing. The idempotency store has a 24-hour TTL β€” sufficient for all realistic retry windows. The client's PaymentTransport always uses the same key on retry, making charge operations idempotent by construction.

πŸ“± UPI App Killed β€” No onActivityResult

The user selects PhonePe, enters their UPI PIN, and the OS kills your app (low memory) while the UPI flow is in progress. When the user returns to your app, there is no onActivityResult callback β€” the transaction is stuck in PROCESSING state with no resolution. The bank may have already debited the customer.

Fix: On every app resume (onResume of the payment Activity and Application.onCreate()), query the repository for any transactions in PROCESSING or PENDING_VERIFY state older than 30 seconds. For each, immediately call transport.pollStatus(txnId) to fetch the actual state from the server. Display a "Checking your payment status…" UI during this recovery. This ensures the user always gets a definitive result even after process death. The server-side status is the ground truth β€” the client just needs to catch up.

πŸ‘† Double-Tap on Pay Button

A user on a slow network taps "Pay β‚Ή999" and nothing happens visually for 300ms (network round-trip to create order). They tap again. Two startCheckout() calls are in flight simultaneously. Both create separate transactions. One succeeds β€” but which one the user sees depends on a race condition. The other may also succeed, silently double-charging.

Fix: CheckoutOrchestrator uses a Mutex with tryLock() on every startCheckout() call. If the lock is already held, the second call returns immediately without starting a new transaction β€” it emits a CheckoutState.AlreadyInProgress to the UI, which ignores the second tap. The UI should also disable the Pay button immediately on first tap and restore it only on completion or failure. Two-layer guard: UI disabling (UX) and Mutex (correctness).

πŸ”‹ Process Death Mid-Payment (3DS WebView)

For card payments requiring 3DS authentication, your app opens a WebView for the bank's OTP page. The OS kills your app during this WebView session (common on low-RAM devices with many background apps). On restart, the transaction is stuck in PROCESSING β€” the 3DS callback URL never fired, but the bank may have already authorised the payment pending OTP entry.

Fix: Save the txnId and gatewayOrderId to Room before launching the 3DS WebView (already done by state machine). On Application.onCreate(), recoverPendingTransactions() resets any PROCESSING transaction older than 5 minutes to INITIATED and immediately calls pollStatus(). If the bank already authorised the 3DS, the gateway will return SUCCESS β€” the transaction completes without user re-entry. If not, show the user a "Resume payment" button that re-launches the WebView with the same gatewayOrderId.

🚫 No UPI App Installed

The user selects UPI payment. getInstalledUpiApps() returns an empty list β€” no UPI app is installed on the device (older Android, international device, fresh factory reset). If you fire the intent blindly, the OS throws ActivityNotFoundException, crashing the app or showing a confusing system error dialog.

Fix: Always call getInstalledUpiApps() before showing the UPI option. If the list is empty, hide the UPI payment method in the checkout sheet and show a toast: "UPI not available on this device β€” try card or wallet". On Android 11+, you must also declare the upi scheme in your AndroidManifest.xml under <queries> to query intent resolvers. Additionally, after startActivityForResult, always wrap in a try-catch ActivityNotFoundException as a safety net β€” the installed app list can change between the check and the launch.

πŸ’° Wallet Insufficient Funds β€” Race Condition

A user has β‚Ή500 in their wallet. They open two tabs (or the app on two devices) and initiate β‚Ή400 payments simultaneously. The client-side balance check on both sees β‚Ή500 (sufficient). Both send charge requests to the server. Without server-side protection, both deductions succeed, leaving the balance at -β‚Ή300.

Fix: The server executes wallet deduction as a single atomic SQL statement with optimistic locking: UPDATE wallet SET balance = balance - 400, version = version + 1 WHERE user_id = ? AND balance >= 400 AND version = ?. Only one concurrent request will match the version predicate and succeed; the other receives a 409 Conflict response. The client must never use its cached balance to gate payment attempts β€” only the server's atomic check is authoritative. On 409, the client refreshes the balance and shows "Insufficient wallet balance".

πŸ”‘ Certificate Pin Rotation β€” App Doesn't Update

You certificate-pin your payment API endpoint against your current TLS leaf certificate. The certificate expires (typically 1–2 years). You rotate to a new certificate and update the app. But 30% of your users haven't updated β€” their pinned certificate no longer matches, and all payment requests fail with an SSLPeerUnverifiedException. Every payment in those app versions is broken until users update.

Fix: Always pin two certificates simultaneously: the current leaf cert and a shadow pin (your next certificate, already generated but not yet deployed). When you rotate, the new cert is already trusted by all app versions. OkHttp's CertificatePinner accepts multiple pins per hostname: .add("api.example.com", "sha256/CURRENT==").add("api.example.com", "sha256/NEXT=="). Implement a server-side pin announcement endpoint that apps check periodically β€” when the new cert is 30 days from deployment, update the live pins. Never ship a new certificate to production without its shadow already pinned in the current app version.

πŸ“Š Reconciliation Mismatch β€” Ghost Transactions

End-of-day reconciliation reveals 47 transactions where your database shows FAILED but the PSP settlement report shows them as successful debits β€” the customer was charged but your system never marked the order as paid. This happens when the PSP webhook was lost (your webhook endpoint was down for 2 minutes), and your in-app polling timed out before the bank confirmed.

Fix: Three-layer reconciliation: (1) Webhook retry: PSPs retry failed webhooks with exponential backoff for up to 72 hours β€” make your webhook endpoint idempotent and highly available (separate from your main API). (2) Proactive polling: Any transaction in PENDING_VERIFY for >2 minutes triggers a server-side cron that polls the PSP directly. (3) Daily batch reconciliation: Compare your ledger against the PSP's T+1 settlement file. Any mismatch triggers an alert to the payments ops team with the transaction IDs. The three layers together ensure zero ghost transactions in steady state.


What Interviewers Expect at Each Level
Mid-Level
  • Knows the three payment methods: card, UPI, wallet
  • Understands that card data must not be stored (PCI compliance awareness)
  • Describes basic retry logic with a fixed delay
  • Knows UPI intent flow at a high level
  • Uses Room for transaction persistence
  • Aware that payments can fail and user should see a status
Senior
  • Designs the complete TransactionStatus state machine with valid transitions
  • Explains idempotency keys and why they prevent double charges
  • Never trusts the UPI intent result β€” always server-verify with txnRefId
  • Handles process death recovery on startup (resetInFlightToInitiated)
  • Prevents double-tap with Mutex.tryLock()
  • Uses Android Keystore for token encryption at rest
  • Describes certificate pinning with shadow pin rotation
  • Distinguishes retriable (5xx) from non-retriable (4xx) failures
Staff / Principal
  • Reduces PCI scope to SAQ A using PSP hosted fields (no PAN in app)
  • Designs server-side double-entry ledger with atomic debits
  • Addresses wallet race condition with optimistic locking (version column)
  • Designs three-layer reconciliation (webhook retry + proactive polling + daily batch)
  • Covers certificate pin rotation strategy end-to-end
  • Discusses PSP redundancy: primary/secondary routing with health scoring
  • Addresses split payments (wallet + UPI) as two linked transactions with saga pattern
  • Proposes fraud scoring integration: ML model runs server-side before PSP call
  • Considers RBI mandate compliance: stored card tokenisation (CoF) regulations

Interview Q&A

Tap any question to reveal the answer. These cover the questions most commonly asked at Razorpay, PhonePe, Paytm, and Flipkart for senior Android roles.

Q1 Easy Why should you never store raw card numbers (PAN) in your database?
β–Ύ

PCI DSS (Payment Card Industry Data Security Standard) prohibits storing raw PAN, CVV, or PIN data. Violation results in fines up to $100,000/month, loss of card processing privileges, and mandatory forensic audits after a breach. Practically: storing raw card data makes your entire database a high-value target β€” a single SQL injection exposes every customer's card. The solution is tokenisation: the PSP converts the raw PAN into an opaque token (e.g., tok_1Abc23XYZ) that is useless to an attacker because it can only be charged through your PSP account. The token is safe to store, log, and transmit.

Q2 Easy What is an idempotency key and why is it critical for payments?
β–Ύ

An idempotency key is a UUID generated once per payment intent, included as a header (Idempotency-Key: uuid) on every mutating API call. The server stores a mapping of idempotencyKey β†’ result in a fast store (Redis). If the same key arrives again β€” from a client retry after a network timeout β€” the server returns the stored result immediately without reprocessing the payment. Without idempotency keys, a network timeout during a successful charge causes the client to retry and the customer is charged twice. With them, the retry returns "already charged" with the original transaction ID, and the customer is never double-billed.

Q3 Medium Why can't you trust the UPI intent result from onActivityResult?
β–Ύ

The intent extras returned by the UPI app (Status=SUCCESS, txnRef=...) are populated by third-party applications running on the user's device. A modified or malicious UPI app could return Status=SUCCESS for a payment that never happened β€” or was intentionally failed. On a rooted device, an attacker could intercept and modify the intent. The correct pattern is: parse the intent only to extract the txnRefId (a reference ID provided by NPCI), then immediately call your server with that ID. Your server queries NPCI's payment status API to get the actual bank debit confirmation. Only after the server verifies the debit do you update the transaction to SUCCESS. The client's intent result is a hint, not a fact.

Q4 Medium How do you handle a payment that was in progress when the app was killed?
β–Ύ

Every transaction state is persisted in Room as it changes. On Application.onCreate(), PaymentSdk.init() calls recoverPendingTransactions(), which queries Room for any transactions in PROCESSING or PENDING_VERIFY state older than 5 minutes. For each, it calls PaymentTransport.pollStatus(txnId) to fetch the current status from the payment gateway. The result updates Room. When the user next sees the payment screen, they see the correct final state β€” success, failure, or an actionable "resume payment" prompt β€” rather than a stuck spinner. The 5-minute threshold avoids incorrectly recovering a transaction that is still actively processing in a background thread.

Q5 Medium How do you prevent a user from being charged twice if they tap "Pay" twice?
β–Ύ

Two-layer guard: (1) UI layer β€” disable the Pay button immediately on first tap and restore it only on completion or failure. This prevents most duplicate taps. (2) SDK layer β€” CheckoutOrchestrator holds a Mutex. Every startCheckout() call attempts mutex.tryLock(). If the lock is already held (checkout in progress), the call returns CheckoutState.AlreadyInProgress and nothing is executed. Even if the UI layer fails (configuration change, programmatic duplicate call), the Mutex ensures at most one checkout executes at a time. On the server side, the idempotency key provides a third layer: even if two charge requests reach the server, the second is deduplicated.

Q6 Medium What is certificate pinning and why is it especially important for payment apps?
β–Ύ

Certificate pinning means your app only trusts a specific TLS certificate (or its public key hash), rather than any certificate signed by a trusted CA. A standard HTTPS implementation trusts hundreds of CAs β€” if any of them is compromised or issues a rogue certificate for your domain, an attacker can perform a MITM attack and intercept your payment requests. With certificate pinning, the app rejects any certificate not matching its pinned hash, even if it's CA-signed. For payment apps this is critical because a MITM attack against a payment endpoint exposes transaction amounts, card tokens, and payment method details. OkHttp implements pinning via CertificatePinner. The pin is the SHA-256 hash of the certificate's public key β€” changing the certificate but keeping the same key pair allows you to rotate the certificate without updating the pin.

Q7 Medium Why should monetary amounts always be stored as integers (Long) rather than Float or Double?
β–Ύ

Floating-point numbers (Float, Double) cannot represent most decimal fractions exactly in binary. For example, 0.1 + 0.2 in IEEE 754 floating point equals 0.30000000000000004. Across thousands of transactions, these rounding errors accumulate into real financial discrepancies β€” a β‚Ή0.01 error per transaction across 10 million transactions is β‚Ή100,000. The solution is to store all monetary values in the smallest currency unit as a Long: β‚Ή199.50 is stored as 19950 paise. All arithmetic is exact integer arithmetic. The decimal representation is only computed at the display layer, formatted once as a string. Never pass monetary values through Float/Double β€” use BigDecimal for display formatting if needed, but persist and compute on the integer representation.

Q8 Hard How do you handle a split payment where β‚Ή200 comes from wallet and β‚Ή800 from UPI?
β–Ύ

A split payment is a distributed transaction β€” two separate deductions that must either both succeed or both roll back (atomicity). The classic solution is the Saga pattern: execute each step sequentially with compensating transactions for rollback. Step 1: deduct β‚Ή200 from wallet (immediate, server-side). Step 2: initiate β‚Ή800 UPI payment. If Step 2 fails, execute compensation: refund β‚Ή200 to wallet. The saga coordinator lives server-side, not on the client. The client submits a single "split payment" request with the amounts per method. The server orchestrates the two deductions. The idempotency key covers the entire saga β€” a retry of the split payment request replays only the steps that haven't yet succeeded. Never attempt to coordinate a split payment entirely client-side β€” network failures between steps leave you in an inconsistent state with no way to guarantee compensation.

Q9 Hard How does the server-side ledger prevent double-spend using double-entry bookkeeping?
β–Ύ

Double-entry bookkeeping means every transaction creates exactly two ledger rows: a debit and a credit. For a β‚Ή999 purchase: DEBIT β‚Ή999 from customer_account_balance, CREDIT β‚Ή999 to merchant_escrow_balance. Both rows share a transaction_id and are written in a single database transaction (atomic). If either write fails, neither commits. The sum of all credits must equal the sum of all debits at all times β€” this is an invariant enforced by the database. Double-spend is prevented because the debit reduces the customer balance before the credit reaches the merchant; any concurrent transaction reading the customer balance sees the reduced amount. For wallet payments, the debit uses optimistic locking (WHERE balance >= amount AND version = ?) to prevent two concurrent debits from both seeing sufficient balance.

Q10 Hard What is the reconciliation problem and how do you solve it?
β–Ύ

Reconciliation is the process of verifying that your internal transaction records match the PSP's settlement records. Mismatches occur because: (1) your webhook receiver was down when the PSP sent the callback, (2) the PSP processed a payment but your server crashed before recording the result, (3) a refund was processed at the PSP but not in your DB. The solution is three-layer: (1) Real-time webhook with retry: make the webhook endpoint highly available and idempotent; PSPs retry for 72 hours. (2) Proactive polling: any transaction in PENDING_VERIFY for >2 minutes is polled server-side. (3) Daily batch reconciliation: download the PSP's T+1 settlement file (CSV/SFTP), compare every PSP transaction ID against your ledger, and flag discrepancies as "ghost charges" (PSP charged, you didn't record) or "phantom credits" (you recorded success, PSP didn't charge) β€” both require manual resolution by a payments ops team.

Q11 Medium How do you distinguish retriable from non-retriable payment failures?
β–Ύ

Retriable failures are transient β€” the payment might succeed on a second attempt: network timeout (IOException), 500 Internal Server Error, 503 Service Unavailable, 504 Gateway Timeout. These warrant exponential backoff with jitter (e.g. 1s, 2s, 4s + random 0–500ms) up to 3–5 retries. Non-retriable failures are permanent β€” retrying will produce the same result: 402 Payment Required (insufficient funds), 422 Unprocessable Entity (invalid card number, expired card), 400 Bad Request (malformed VPA), a specific PSP decline code (card blocked, suspected fraud). Retrying these wastes time, frustrates users, and in the case of fraud-blocked cards, may trigger additional fraud flags. The SDK must parse the HTTP status code and the PSP's decline code from the response body to categorise correctly, then surface a human-readable reason to the user immediately for non-retriable failures.

Q12 Medium What Android API must you use to store card tokens securely, and why?
β–Ύ

Card tokens (and any payment credentials) stored on-device must be encrypted using the Android Keystore System. The Keystore stores cryptographic key material in secure hardware (TEE β€” Trusted Execution Environment, or StrongBox on Pixel devices) that is isolated from the application processor. An attacker with root access or an ADB backup of the device cannot export the key material β€” they can only use it through the Keystore API, which can enforce authentication requirements (biometric/PIN required before each use). The recommended cipher is AES-256-GCM (authenticated encryption β€” provides both confidentiality and integrity). SharedPreferences, files, or Room without encryption are not acceptable for payment credentials β€” they are readable by anyone with ADB access or a rooted device. Use the EncryptedSharedPreferences or EncryptedFile Jetpack Security library as a convenience wrapper over Keystore.

Q13 Hard How does PSP failover work, and what risks must you manage?
β–Ύ

PSP failover means routing a payment to a secondary PSP (e.g., Stripe) when the primary (e.g., Razorpay) returns 5xx or times out. The gateway maintains a health score per PSP: after N consecutive failures within a 60-second window, mark the PSP as degraded and shift traffic to the secondary. The risks: (1) Partial charges: if the primary processed the payment but returned a timeout, retrying on the secondary double-charges. Solution: check the idempotency store before failover β€” if the primary has a result stored, return it without calling the secondary. (2) Different PSP capabilities: Razorpay supports UPI; Stripe does not. Failover logic must be payment-method-aware. (3) Token incompatibility: a card token issued by Razorpay cannot be charged by Stripe β€” tokenisation is PSP-specific. For saved cards, maintain dual tokens if multi-PSP failover is needed. PSP failover is complex and should only be attempted for high-criticality flows where PSP downtime is unacceptable.

Q14 Medium How do you handle the 30-second UPI timeout when the user hasn't completed their PIN?
β–Ύ

NPCI mandates a maximum 30-second window for a UPI transaction. If the user hasn't entered their PIN within 30 seconds, the transaction is auto-expired at the NPCI layer. The client-side handling: set a 30-second timer when the UPI intent is launched. If onActivityResult hasn't returned by 30 seconds and there's no valid txnRefId, update the local transaction to TIMED_OUT. Show the user: "Your UPI payment expired. Please try again." The user can initiate a new transaction with a fresh idempotencyKey. On the server, poll the timed-out transaction once more β€” occasionally the UPI app returns late and the transaction did succeed within the window. If the server confirms success, update to SUCCESS immediately even though the client had already marked it timed out. The Room state update propagates to the UI via Flow.

Q15 Hard What is the RBI's CoF tokenisation mandate and how does it affect your stored card implementation?
β–Ύ

The RBI's Card-on-File (CoF) Tokenisation mandate (effective October 2022) prohibits merchants from storing raw card data. All saved cards must be tokenised through the card networks (Visa/Mastercard) or their certified TSPs (Token Service Providers). Key implementation changes: (1) You can no longer store even the last-4 digits and expiry date for display β€” these must come from the network-issued token's metadata. (2) Tokenisation must happen via the card network's API (network tokenisation), not just a PSP token. A PSP token is a network-specific alias β€” a Razorpay token cannot be charged through another PSP. A network token (Visa Token Service) can be charged through any Visa-certified acquirer. (3) For existing saved cards, you must migrate from PSP tokens to network tokens or delete them. (4) The RBI mandate applies to any entity storing card data for Indian cardholders, including international PSPs serving Indian merchants.

Q16 Medium How do you implement a payment timeout with automatic cancellation?
β–Ύ

Use Kotlin coroutines' withTimeout() wrapping the entire checkout coroutine. When the timeout fires, a TimeoutCancellationException is thrown, caught in the finally block, and the transaction is marked TIMED_OUT in Room. But β€” and this is the critical part β€” cancellation must also notify the server. A locally-timed-out transaction may still be processing at the PSP. Silently abandoning it leaves a ghost transaction. On timeout, immediately call POST /cancel?txnId=&idempotencyKey=. If the PSP has already confirmed success, the cancel will fail β€” the server returns the existing SUCCESS status and the client corrects its local state. If the cancel succeeds, the transaction is voided at the PSP. The gateway's cancel endpoint must also be idempotent β€” a retry on cancel must not double-void.

Q17 Hard How do you design the fraud detection integration to minimise latency impact?
β–Ύ

Fraud scoring must happen server-side β€” never client-side where it can be bypassed. Placement in the flow: the fraud engine runs after order creation but before PSP call. This means a high-risk transaction is blocked before any money moves. The fraud model inputs: device fingerprint (from the Android SDK), IP address, transaction amount, payment method, merchant category, velocity (transactions per hour for this user), card BIN data. Latency target: the fraud check must complete in <200ms to keep total checkout under 3s. To hit this: (1) Pre-compute velocity features in Redis (updated on every transaction, read in O(1)). (2) Run the ML model on a low-latency inference server (e.g., ONNX Runtime) co-located with the gateway. (3) Use async scoring for post-hoc review: for borderline risk scores, allow the transaction and flag it for manual review within 2 hours β€” zero latency impact. Block only clear high-risk (score > 0.9). This tiered approach maximises fraud catch rate while maintaining acceptable false-positive rate.

Q18 Medium How do you handle a refund that needs to be reflected immediately in the UI?
β–Ύ

Refunds are initiated server-side (by merchant or customer support) and are asynchronous β€” the actual bank reversal takes 5–7 business days for card refunds (2 hours for UPI). The UI update chain: (1) The server initiates the refund with the PSP and immediately updates the transaction status to REFUND_INITIATED. (2) A push notification (FCM) is sent to the user's device with the refund ID. (3) The SDK receives the FCM, calls transport.pollStatus(txnId), and updates Room. (4) The UI observes the Room Flow and shows "Refund Initiated β€” β‚Ή999 will be credited in 5–7 days." When the actual bank reversal completes, the PSP sends a webhook (refund.processed), the server updates to REFUNDED, sends another FCM, and the UI updates to "Refund Complete". The local transaction history in Room reflects each status change as it happens β€” the user sees a live timeline.

Q19 Hard How do you test a payment SDK without making real bank transactions?
β–Ύ

Testing strategy has four layers: (1) Unit tests with fakes: FakePaymentTransport implements the same interface as PaymentTransport and returns configurable responses (success, decline, timeout). All tests in the orchestrator and repository use fakes β€” no network, no PSP sandbox calls, runs in milliseconds. (2) Integration tests with PSP sandbox: Each major PSP (Razorpay, Stripe) provides a sandbox environment with test card numbers that simulate specific outcomes: 4111111111111111 always succeeds, specific test cards simulate decline codes, 3DS triggers. These tests run in CI on every PR, take 30–60 seconds. (3) UPI test harness: Build a test UPI app (debug-only) that responds to the same intent with configurable status β€” useful for testing the no-response edge case and the "don't trust the intent" flow. (4) Chaos testing: Intercept OkHttp calls in debug builds with an interceptor that randomly drops responses, returns 500s, or delays by 5 seconds. Run the full checkout flow against the chaos interceptor to verify state recovery works correctly. Never run real transactions in automated tests.

Q20 Hard How does your checkout SDK handle a network change mid-payment (WiFi β†’ 4G)?
β–Ύ

Mid-payment network changes have two dangerous moments: (1) During the PSP charge request: if the TCP connection drops, the OkHttp call throws an IOException. PaymentTransport retries with the same idempotency key on the new network interface β€” the idempotency store at the server prevents double-charging. The retry succeeds transparently. (2) During the UPI intent (user is in PhonePe): if the network drops while the UPI app is processing, the UPI app may show an error and onActivityResult returns with Status=FAILURE. Your parser returns UpiIntentResult.ClientFailure β€” but you must still poll the server because the bank debit may have succeeded before the network dropped. Register a ConnectivityManager.NetworkCallback: when network is restored, immediately re-run the status check for any transaction in PENDING_VERIFY. The correct heuristic: on any network change, immediately poll all unresolved transactions. Never rely on a client-side timeout to discover that a transaction resolved during a network outage.