Designing always-on apps that don't crash when the Internet connection fails

Developers generally assume always-on connectivity and design their software with that premise. But such applications crash when they can't reach the Internet or have poky connections. If you want happy users, design applications that can function remotely.

"Someone is waiting just for you / Spinnin’ wheel, spinnin’ true."

Those lyrics to a 1969 song by Blood, Sweat & Tears could also describe 2017 enterprise apps that time-out or fail because of dropped or poor connectivity. Wheels spin. Data is lost. Applications crash. Users are frustrated. Devices are thrown. Screens are smashed.

It doesn’t have to be that way.

Always-on applications can continue to function even when the user loses an Internet or Wi-Fi connection. With proper design and testing, you won’t have to handle as many smartphone accidental-damage insurance claims.

Let’s start with the fundamentals. Many business applications are friendly front ends to remote services. The software may run on phones, tablets, or laptops, and the services may be in the cloud or in the on-premises data center.

When connectivity is strong, with sufficient bandwidth and low latency, the front-end software works fine, and the user experience is excellent. Data sent to the back end is received and confirmed, and data served to the user front end is transmitted without delay. Joy!

When connectivity is non-existent or fails intermittently, when bandwidth is limited, and when there’s too much latency—which you can read as “Did the Internet connection go down again?!”—users immediately feel frustration. That’s bad news for the user experience, and also extremely bad in terms of saving and processing transactions. A user who taps a drop-down menu or presses “Enter” and sees nothing happen might progress to multiple mouse clicks, a force-reset of the application, or a reboot of the device, any of which could result in data loss. Submitted forms and uploads could be lost in a time-out. Sessions could halt. In some cases, the app could freeze (with or without a spinning indicator) or crash outright. Disaster!

Plentiful processes

The last thing anyone wants (particularly the end user) is an application to crash or time-out because the user interface is waiting for data that simply isn’t coming or because the data is being sucked one bit at a time through an overloaded Wi-Fi straw.

When designing online, offline, and intermediate-line (hey, did I coin a term?) applications, developers (and those whom they serve) should think in terms of separate processes to handle different aspects of the workflow. That’s important no matter the type of application, but especially so when there’s a separation between client and server (which may be a tablet and a SaaS application).

Broadly speaking, you need:

  • A process that maintains the user experience
  • A process to manage communications with the remote back end; that may be real-time data transmission or behind-the-scenes syncing
  • A process that manages the cache itself
  • A process that monitors the state of the communications link, going beyond connected/disconnected to discern the quality and reliability of the connection; a supervisory process makes decisions based on those quality and reliability metrics, based on the type of application and the current workload

Consider every possible bandwidth situation and how the application should respond in each circumstance. Realize that the raw bandwidth reported by the last-time wired or wireless link is by no means the end-to-end throughput to the back-end processor. Because throughput changes constantly, it’s not sufficient to make a bandwidth determination when the application is launched or at the beginning of a transaction. That’s why a separate connection-quality process is so essential.

The state of the telco industry in a hyperconnected world

The high-level application architecture dictates how to design and code the various processes and how to communicate with the user. Karim Yaghmour, CEO of training and development firm Opersys, who previously ran a file syncing/sharing startup, says that keeping everything in sync is one of the biggest challenges. “We built everything around a list of events. There was the central authoritative server that had the ‘reference’ view of the system. Any offline client had to sync with that reference view when they got back online, i.e., get the list of events that happened since the last log-in.”

The mobile client had to keep track of its changes and send those changes to the server; only when the server accepted (and acknowledged) that the changes were those transactions considered completed by the client, says Yaghmour. “Until that time, the client side would inform the user that whatever they saw was not in sync with the back end. The client could still continue to work, but all changes were local.” That offline work is stored in a cache.

Copious caching

Caching is essential for any type of application that requires back-end communication. Imagine a corporate application for video, voice, and text chat. The best option for a cache for a real-time video conference is likely a buffer and buffering logic that’s robust enough to smooth out jitter, such as by dropping video frames while maintaining high-quality audio, when required.

However, the application design should also include caching for, say, the tool’s contact database. That might be a local datastore that’s persistent between sessions and synced when there’s sufficient connectivity. Having the user negotiate a local datastore of contacts (or at least of frequent contacts) would offer a far superior user experience than making the user slowly and painfully negotiating through an entirely cloud-hosted enterprise contact database over a 3G connection.

“Make sure minimal amount of data is coming back (low bandwidth) and keep a copy of everything necessary (offline),” says Arthur Hicken, evangelist at software test-tools company Parasoft. “It’s a constant war. What you do for low bandwidth is pretty much the opposite of what you do for a lost connection, unless you have the luxury of preloading all the data with the app install and then just syncing the delta as available and critical changes on demand.”

How much cache do you need? Lots. Make sure there’s enough to allow for the employee to work offline (or intermediate-line) as much as makes sense. If it’s a form-filling application, for example, the user needs to save (and thus cache) lots of data input. Do you need enough cache to allow the employee to fill in forms during a six-hour cross-country flight? Maybe, maybe not. That’s a requirements question. However, you don’t want the application to freeze (and lose work) if the connection drops or slows down while a nurse goes from Screen 6 to Screen 7 when taking a patient’s medical history.

Also be sure to intelligently cache input that is generated while the user is waiting. Where do those extra mouse clicks or keystrokes go? Do repeated pressings of the “Esc” key lead to the application quitting or backing out from valid data-entry screens? Think hard about consequences.

A paramount rule: Never lose data entered by the end user, whether it’s text or graphics. If it’s entered, even if it’s going to be transmitted immediately to the back end, cache the data first, and then transmit it to the back end from the cache.

Danny Goodman, who develops mobile apps for Android and iOS, explains, “I tend to design from the non-connected world upward, trying to implement as much as possible without connectivity and then add connectivity-heavy features from there.”

Goodman’s iOS apps include a section for “News and Tips,” with infrequently updated info in HTML, such as new-version release notes, links to third-party supplemental data, and power-user tips.

“Each time the app downloads the latest document version, it saves it locally on the device, along with the document date from the download header,” he says. “Because I control the files on my server (as opposed to being composed by a database), my apps first perform a header check to see if the file date has changed since the last time it was downloaded, thus minimizing data transfer if there are no changes.”

If there is no connectivity, the app displays the most recently downloaded text. Says Goodman: “When downloaded data is the star of the app, I still cache each download (either as received JSON data or plugged into iOS Core Data) so that at next launch, if there is no connectivity, at least the most recently downloaded data appears in the app.”

Tenacious testing

Design is good, but testing is where the user experience hits the road. Test early, test often, and don’t expect that the developers’ own experiences cover all possible end-user scenarios. Otherwise, you’ll hear screams and fix lots of broken phone screens. Also, gather lots of metrics from end-user sessions.

According to Kent Beck, a programmer at Facebook and creator of the Extreme Programming methodology: “This is a big issue for us at Facebook, especially in emerging markets. My only general advice is to measure the actual latency, throughput, and variance of your actual users. That’s been an eye-opener for us, and the answers change over time. Also, have the version of the app that the developers use every day simulate that networking environment. Forcing yourself to use the service as others use it puts skin in the game.”

Goodman agrees: “It’s vital to test your app thoroughly with all connectivity turned off so you can see how it behaves. Then put yourself in the shoes of a user, and handle that scenario gracefully and intelligently.”

Disaster avoidance

Nobody wants to see the spinning wheel, especially when they’re trying to get a job done. That goes double when the user is interacting with a customer or entered data that might be irrecoverable.

There truly is no legitimate excuse for applications that lose data, freeze, or crash when the state of the communications link changes. Consider the use cases and scenarios. Model and role-play. Design assuming bad connections. Cache everything. And test, test, test. That’s the best way to disaster-proof applications, whether they connect to the cloud or to the on-prem data center.

Always-on apps: Lessons for leaders

  • Plan for the best way to gracefully handle both offline and low/intermittent connections.
  • Don’t assume that, because the local device has a strong connection, you have a good end-to-end connection.
  • Cache both incoming and outgoing data to improve offline/intermittent functionality.
  • Test caching and synchronization under a wide variety of conditions.