Why Your App Crashes Under Load: A Guide to Robust API Handling

Professional editorial photograph of distributed network architecture with concurrent data streams

Published on March 11, 2024

The root of mobile app instability isn’t the number of API requests, but how the backend communicates with the physical and logical constraints of the device itself.

Blocking the main UI thread, even for milliseconds, leads to freezes and “Application Not Responding” (ANR) errors.
Unmanaged concurrent data access creates unpredictable crashes, while inefficient API protocols drain user battery life.

Recommendation: Shift from merely processing requests to designing a resilient system that accounts for thread sanctity, data race prevention, and network-aware communication from the server-side.

You’ve built a powerful backend, but users are reporting that the mobile app freezes, stutters, or crashes, especially when their connection is spotty or they’re performing multiple actions. The initial instinct is often to blame the frontend or simply try to “batch more requests.” While well-intentioned, this approach often overlooks the fundamental source of the problem: a disconnect between server-side logic and the harsh realities of the mobile environment.

Common advice revolves around using background threads and caching data. These are not wrong, but they are the “what,” not the “why” or the “how.” A truly robust system requires a deeper understanding of the constraints at play. It’s about respecting the sanctity of the UI thread, anticipating the chaos of concurrent data access, and even considering the energy impact of your API design on the phone’s radio. It’s a architectural mindset, not just a collection of quick fixes.

But what if the true key to stability lies in treating the mobile device not as a dumb client, but as a resource-constrained partner in a complex dance of data exchange? This perspective forces us to design APIs and data flows that are inherently resilient, efficient, and forgiving. It means building for the reality of a user entering a tunnel, not just for the ideal conditions of a perfect Wi-Fi connection.

This article will dissect the core issues that lead to instability when handling multiple requests. We will explore the mechanics of UI freezes, the dangers of data races, the tangible impact of API choices on battery life, and strategies for handling network failures gracefully. Finally, we’ll address the ever-present challenge of improving performance by refactoring legacy code without introducing new bugs.

To navigate this technical deep dive, the following summary outlines the key areas we will cover, from diagnosing freezes to refactoring with confidence.

Summary: How to Handle Multiple API Requests Without Crashing Your App?

Why Does Your App Freeze When Loading Data?
How to Use Local Caching to Reduce Server Costs by 40%?
The Data Race Bug That Randomly Crashes Your App
REST or GraphQL: Which Saves More Battery for the User?
How to Handle Request Failures When the User Goes Into a Tunnel?
Why Is Optimising for Snapdragon Harder Than Apple Silicon?
How to Write Tests for Code You Didn’t Write?
How to Refactor Legacy Mobile Code Without Breaking Features?

Why Does Your App Freeze When Loading Data?

The most common cause of a frozen app is a blocked main thread, also known as the UI thread. This single thread is responsible for drawing the user interface and responding to user input. For a smooth 60 frames-per-second experience, it has only 16 milliseconds to complete all its work for each frame. When a long-running task, like a synchronous network request or heavy data parsing, is executed on this thread, it can’t draw the next frame. The result is a stutter, or worse, a complete freeze.

A synchronous API call forces the UI thread to wait for the server’s response. If the network is slow or the server is delayed, the app becomes unresponsive. On Android, if the main thread is blocked for too long, the system intervenes. An “Application Not Responding” (ANR) dialog is triggered if input dispatching is blocked for more than 5 seconds, as the official documentation states. This is the ultimate user experience failure, often leading to uninstalls.

The solution is to ensure that no I/O operations—whether network or disk—ever run on the main thread. This is the principle of UI thread sanctity. All data loading must be delegated to background threads. This is where asynchronous programming comes in. An asynchronous call initiates a request on a background thread and provides a callback or a promise that will be executed on the main thread once the data is ready. This frees the UI thread to continue rendering the interface and responding to user taps, showing a loading spinner or a placeholder state while the data is fetched in the background.

This separation is non-negotiable for a stable application. The backend’s role is to provide data quickly, but the client architecture must be designed to consume that data without compromising the user’s interaction with the interface. Every developer must internalize the rule: the UI thread is for UI work only.

How to Use Local Caching to Reduce Server Costs by 40%?

Local caching is more than a performance-enhancement tool; it’s a critical component of a cost-effective and resilient backend strategy. Every time an app can serve data from a local cache instead of making a network request, you save on server load, database queries, and bandwidth costs. This effect is compounded at scale. A well-designed caching strategy can dramatically reduce the operational overhead of your infrastructure.

The core idea is to store frequently accessed, non-critical, or slow-changing data directly on the user’s device. This can range from a simple in-memory cache for the duration of a session to a persistent on-disk cache using SQLite or a similar database for data that should survive app restarts. For example, a user’s profile information, product catalogs, or configuration settings are excellent candidates for caching. The key is to define a clear cache invalidation strategy, such as a Time-to-Live (TTL), to ensure data doesn’t become stale.

The impact can be substantial. For instance, a detailed case study of an automotive IoT platform found that implementing intelligent caching resulted in a 94% cache hit rate during peak hours, which ultimately reduced their infrastructure costs by 42%. Beyond cost, caching improves the user experience by making the app feel faster and enabling offline functionality. It also contributes to sustainability, as reduced data transfer can lead to a significant reduction in network power consumption on the device.

Implementing a multi-layer caching system—combining in-memory, disk, and server-side caches like a CDN—creates a robust data delivery pipeline. This approach ensures that the app is fast, responsive, and resilient to network issues, all while keeping a close eye on the bottom line.

The Data Race Bug That Randomly Crashes Your App

A data race is one of the most insidious bugs in a concurrent system. It occurs when two or more threads access the same shared memory location simultaneously, at least one of the accesses is a write, and there is no synchronization mechanism to protect the data. The result is unpredictable behavior. The app might work perfectly a hundred times, then crash inexplicably on the hundred-and-first attempt. These bugs are notoriously difficult to reproduce and debug.

Data races are an important class of concurrency errors where two threads erroneously access a shared memory location without appropriate synchronization.

– DataCollider Research Team, Effective Data-Race Detection for the Kernel, USENIX

Imagine a scenario: one background thread is fetching and writing a user’s profile to a shared object while another thread is reading that same object to update the UI. If the write operation is only partially complete when the read occurs, the UI thread might read a corrupted or inconsistent state, leading to a crash. Surprisingly, a significant portion of these races might not cause immediate harm; in fact, research published in ACM SIGPLAN shows that 76-90% of reported data races in some systems can be benign. However, the remaining 10-24% can be catastrophic, leading to data corruption and random crashes.

The solution lies in enforcing disciplined access to shared mutable state. This is achieved through concurrency primitives. The most common tools include:

Mutexes (Locks): A mutual exclusion lock ensures that only one thread can execute a critical section of code at a time.
Atomic Operations: For simple data types like counters or flags, atomic variables provide guaranteed “all-or-nothing” read-modify-write operations that cannot be interrupted.
Serial Queues/Dispatchers: By dispatching all access to a specific piece of data onto a single serial queue, you ensure that operations are executed one after another, eliminating the possibility of a race.

As a backend developer, while you may not be writing the UI-side concurrent code, designing APIs that return immutable data objects or provide clear transactional boundaries can significantly help frontend developers avoid these pitfalls. The key is to never assume thread safety and to always protect shared mutable resources.

REST or GraphQL: Which Saves More Battery for the User?

The choice between REST and GraphQL is often debated in terms of developer experience or API flexibility. However, for mobile clients, this decision has a direct and measurable impact on a user’s battery life. The primary driver of this impact is the device’s network radio. The radio is one of the most power-hungry components, and the less it’s active, the better. Battery drain isn’t just about how much data is transferred, but also how many times the radio has to wake up from an idle state to a full-power state.

This is where the architectural differences between REST and GraphQL become critical. A typical RESTful approach often leads to over-fetching (getting more data than the screen needs) or under-fetching (requiring multiple requests to gather all the data for one screen). For a complex screen, an app might need to make 4-7 separate API calls to different endpoints. Each call wakes up the radio, consumes power, and adds latency. GraphQL, by contrast, allows the client to request exactly the data it needs in a single request, eliminating both over-fetching and the need for multiple round trips.

A mobile API comparison study found that GraphQL can result in 64% smaller payloads on average, but the real savings come from the reduction in network requests. This table illustrates the difference:

REST vs GraphQL Battery Impact for Mobile Applications
Metric	REST API	GraphQL	Impact
Network Requests per Screen	4-7 round trips	1 request	Fewer radio wake-ups
Payload Size	Baseline (100%)	64% smaller	Reduced data transfer
Battery Consumption	Baseline	Lower due to fewer requests	Extended battery life
Data Transfer	2-3x more than needed	Exact fields requested	Bandwidth savings

By consolidating requests, GraphQL significantly reduces the time the network radio spends in its high-power state. This concept, known as “tail energy,” refers to the energy consumed by the radio as it waits for a short period before powering down. Fewer, larger requests are far more energy-efficient than many small, chatty requests.

While GraphQL adds complexity to the backend, for mobile-first applications where battery life is a key user concern, the benefits are compelling. It’s a strategic decision that trades some server-side complexity for a demonstrably better and more energy-efficient user experience.

How to Handle Request Failures When the User Goes Into a Tunnel?

Mobile networks are inherently unreliable. A user can lose connectivity at any moment—entering a tunnel, walking into an elevator, or simply moving through an area with poor coverage. A robust application must be designed to handle these transient failures gracefully, ensuring no data is lost and the user experience remains as seamless as possible. Simply showing an error message for every failed request is not a viable solution, especially for user-initiated actions (mutations).

The key is to build a system that can automatically retry failed requests and persist critical operations. For read operations (GET requests), a common strategy is to implement a retry mechanism with exponential backoff and jitter. Instead of retrying immediately, the client waits for a progressively longer period between attempts (e.g., 1s, 2s, 4s, 8s). Adding jitter (a small, random amount of time) to each delay prevents a “thundering herd” problem, where thousands of devices all retry at the exact same moment when connectivity is restored.

For write operations (POST, PUT, DELETE), the stakes are higher. If a user submits a form or makes a purchase, that action must not be lost. Here, the solution is an offline persistent queue. When a mutation is triggered, it’s first saved to a local database on the device before the network request is even attempted. A background service then works through the queue, attempting to sync each operation with the backend. If a request fails, it remains in the queue to be retried later. This ensures user actions are captured and eventually processed, even if the app is killed before connectivity returns.

From the backend perspective, this requires designing idempotent APIs. An idempotent operation is one that can be performed multiple times with the same result as performing it once. For example, if the app retries a “create comment” request due to a network blip where it didn’t receive a success response (even if the server processed it), the backend shouldn’t create two comments. Using a unique client-generated ID for each mutation allows the server to recognize and discard duplicate requests.

Your Action Plan: Implementing Robust Retry Mechanisms

Implement exponential backoff with jitter to prevent thundering herd problem when connectivity is restored to many users simultaneously.
Use background threading mechanisms (like Kotlin Coroutines or Swift Concurrency) for all retry operations to avoid blocking the main thread.
Create a persistent operation queue using a local database (e.g., Room, CoreData) to store failed mutations (POST, PUT, DELETE).
Ensure user actions are never lost, even if the app is killed before regaining a connection, by persisting the operation first.
Design clear UI patterns for ‘sync pending’ and ‘offline mode’ indicators to communicate the app’s status to the user gracefully.

Why Is Optimising for Snapdragon Harder Than Apple Silicon?

The challenge of optimizing for Snapdragon-powered Android devices compared to Apple’s Silicon-based iPhones stems from a fundamental difference in the ecosystem: vertical integration vs. fragmentation. Apple controls the entire stack, from the A-series or M-series chip design (the “Silicon”) to the hardware it runs on (the iPhone/iPad) and the operating system (iOS/iPadOS). This tight integration allows for holistic optimization. When Apple releases a new performance framework, they know exactly the hardware capabilities it will run on.

Qualcomm’s Snapdragon, on the other hand, is a component within a vast and fragmented ecosystem. A Snapdragon SoC (System on a Chip) is sold to dozens of different manufacturers (OEMs) like Samsung, Google, Xiaomi, and OnePlus. Each OEM then integrates that chip into their own unique hardware with different screen sizes, memory configurations, and thermal management systems. Crucially, they also add their own layer of software, including custom Android skins, drivers, and pre-installed applications.

This SoC fragmentation creates a nightmare for optimization. A piece of code that runs perfectly on a Google Pixel with a stock Android experience might trigger thermal-throttling or a driver-specific bug on a different device with the same Snapdragon chip. The performance characteristics, such as the behavior of the big.LITTLE core architecture (a mix of high-performance and high-efficiency CPU cores), can vary based on the OEM’s kernel scheduler. You aren’t just optimizing for Snapdragon; you’re optimizing for Snapdragon-on-a-Samsung, Snapdragon-on-a-Xiaomi, and so on.

As a backend developer, this impacts you because performance bottlenecks are less predictable. An API response that is parsed efficiently on an iPhone might strain the CPU or cause garbage collection pauses on a specific mid-range Android device, not because the code is bad, but because the underlying software and hardware environment is a less-controlled variable. It requires a more defensive approach to performance, assuming a wider range of device capabilities.

How to Write Tests for Code You Didn’t Write?

Inheriting a legacy codebase without a test suite is a common and daunting scenario for any developer. The fear of breaking an unknown feature paralyzes any attempt at refactoring or adding new functionality. The solution is to build a safety net of tests before making any changes, but how do you test code whose internal logic is a mystery? The answer is to focus on its external behavior, not its implementation.

The most powerful technique for this is characterization testing, also known as Golden Master testing. The process is straightforward:

Identify a piece of code you want to test (a function, a class, an API endpoint).
Feed it a wide range of valid, invalid, and edge-case inputs.
Capture the output for each input without judging it. This captured output, however strange it may seem, is your “Golden Master.” It represents the current, observable behavior of the system.
Write tests that assert that for a given input, the code produces the exact output stored in your Golden Master.

At this point, your tests will all pass. They don’t prove the code is *correct*, but they do prove it’s *unchanged*. Now you have a safety net. If you make a change that alters the behavior, a test will fail. This gives you the confidence to start refactoring. You can now clean up the code, and as long as the tests continue to pass, you know you haven’t broken existing functionality.

When dealing with a large, tangled system, it’s often best to start with high-level integration or end-to-end tests. These tests treat large parts of the application as a black box and verify the overall workflows (e.g., “given a user logs in and navigates to the profile screen, their name should be displayed”). These broad tests provide the most value initially, as they cover more ground and protect critical user paths. You are not trying to understand the code; you are trying to preserve its behavior.

Key takeaways

Never block the UI thread with network or I/O operations; always use asynchronous patterns to maintain app responsiveness.
A multi-layer caching strategy is essential for reducing server costs, improving perceived performance, and enabling offline functionality.
Preventing data races with concurrency primitives (locks, atomics) is critical to stopping random, unpredictable crashes.

How to Refactor Legacy Mobile Code Without Breaking Features?

Refactoring legacy code is like performing surgery: the goal is to improve the health of the system without harming the patient. The most important principle is to work incrementally and with a safety net. Before changing a single line of code, you must have the characterization tests described in the previous section in place. These tests are your feedback loop, confirming that your changes haven’t altered the system’s external behavior.

One of the most effective and safest strategies for large-scale refactoring is the Strangler Fig Pattern. Instead of a “big bang” rewrite, you gradually create a new, clean system that grows around the edges of the old one. For a mobile app, this could mean creating a new networking layer. New feature code would call the new layer exclusively. Then, you’d identify one small piece of the old app that makes a network call, and you’d reroute that call through your new layer. You repeat this process, piece by piece, “strangling” the old code until it can be safely removed.

To do this, you need to find or create “seams” in the code. A seam is a place where you can alter behavior without editing in place. A dependency injection framework is a powerful tool for creating seams, allowing you to swap an old implementation with a new one through configuration. Even without a framework, you can often introduce a seam by extracting a piece of logic into a new function or class that can then be replaced.

Throughout this process, feature flags are your best friend. Wrap your new, refactored code paths in a feature flag. This allows you to deploy the refactored code to production in a disabled state. You can then enable it for a small percentage of users, monitor for errors, and quickly disable it if problems arise. This de-risks the deployment and separates the act of deploying code from the act of releasing it to users.

Now that you have a comprehensive understanding of the pitfalls and solutions, the next logical step is to apply this knowledge. Begin by identifying the most critical bottleneck in your own application—is it UI freezes, excessive network requests, or a fragile legacy module?—and apply the corresponding strategy to build a more stable and performant experience for your users.

Written by Sarah Jenkins, Sarah Jenkins is a Lead Mobile Architect with 12 years of experience building scalable applications for London's FinTech sector. She holds a Master's in Computer Science from Imperial College London and is a certified AWS Solutions Architect. Her expertise lies in optimizing Swift and Kotlin codebases for performance and battery efficiency.