[This post is by Brad Fitzpatrick, an Android Software Engineer who worries unreasonably about responsiveness. —Tim Bray]
Back Story
One great thing about Google is “20% time”: spending 20% of your time working on projects outside your main focus area. When I joined Google, I bounced all over the place, often joking that I had seven 20% projects. One project I kept coming back to was Android. I loved its open nature, giving me access to do whatever I wanted, including opening my garage door when I approached my house on my motorcycle. I really wanted it to succeed but I worried about one thing: It wasn’t always super smooth. Animations would sometimes stutter and UI elements weren’t always immediately responsive to input. It was pretty obvious that things were sometimes happening on the wrong thread.
As a heavy SMS user, one of my 20% projects during the Cupcake (Android 1.5) release was speeding up the Messaging app and making it feel smoother. I got the app to a happy state and then continued bouncing between other 20% projects. When the Donut (Android 1.6) release came out, I noticed that a few of my Messaging optimizations had been accidentally broken. I was sad for a bit but then I realized what Android really needed was always-on, built-in, pervasive performance monitoring.
I joined the Android team full-time just over a year ago and spent a lot of time investigating Froyo performance issues, in particular debugging ANRs (those annoying dialogs you get when an application stalls its main thread’s Looper). Debugging ANRs with the tools at hand was painful and boring. There wasn’t enough instrumentation to find the causes, especially when multiple processes were involved (doing Binder or ContentResolver operations to Services or ContentProviders in other processes). There had to be a better way to track down latency hiccups and ANRs...
Enter StrictMode
“I see you were doing 120 ms in a 16 ms zone...”
StrictMode is a new API in Gingerbread which primarily lets you set a policy on a thread declaring what you’re not allowed to do on that thread, and what the penalty is if you violate the policy. Implementation-wise, this policy is simply a thread-local integer bitmask.
By default everything is allowed and it won’t get in your way unless you want it to. The flags you can enable in the thread policy include:
In addition, StrictMode has about a dozen hooks around most of the places that hit the disk (in java.io.*
, android.database.sqlite.*
, etc) and network (java.net.*
) which check the current thread’s policy, reacting as you’ve asked.
StrictMode’s powerful part is that the per-thread policies are propagated whenever Binder IPC calls are made to other Services or Providers, and stack traces are stitched together across any number of processes.
Nobody wants to be slow
You might know all the places where your app does disk I/O, but do you know all the places where the system services and providers do? I don’t. I’m learning, but it’s a lot of code. We’re continually working to clarify performance implications in the SDK docs, but I usually rely on StrictMode to help catch calls that inadvertently hit the disk.
Background on disks on phones
Wait, what’s wrong with hitting the disk? Android devices are all running flash memory, right? That’s like a super-fast SSD with no moving parts? I shouldn’t have to care? Unfortunately, you do.
You can’t depend on the flash components or filesystems used in most Android devices to be consistently fast. The YAFFS filesystem used on many Android devices, for instance, has a global lock around all its operations. Only one disk operation can be in-flight across the entire device. Even a simple “stat” operation can take quite a while if you are unlucky. Other devices with more traditional block device-based filesystems still occasionally suffer when the block rotation layer decides to garbage collect and do some slow internal flash erase operations. (For some good geeky background reading, see lwn.net/Articles/353411)
The take-away is that the “disk” (or filesystem) on mobile devices is usually fast, but the 90th percentile latencies are often quite poor. Also, most filesystems slow down quite a bit as they get more full. (See slides from Google I/O Zippy Android apps talk, linked off code.google.com/p/zippy-android)
The “main” Thread
Android callbacks and lifecycle events all typically happen on the main thread (aka “UI thread”). This makes life easier most of the time, but it’s also something you need to be careful of because all animations, scrolls, and flings process their animations by callbacks on the main thread.
If you want to run an animation at 60 fps and an input event comes in (also on the main thread), you have 16 ms to run your code reacting to that input event. If you take longer than 16 ms, perhaps by writing to disk, you’ve now stuttered your animation. Disk reads are often better, but they can also take longer than 16 ms, especially on YAFFS if you’re waiting for the filesystem lock that’s held by a process in the middle of a write.
The network is especially slow and inconsistent, so you should never do network requests on your main thread. In fact, in the upcoming Honeycomb release we’ve made network requests on the main thread a fatal error, unless your app is targeting an API version before Honeycomb. So if you want to get ready for the Honeycomb SDK, make sure you’re never doing network requests on your UI thread. (see “Tips on being smooth” below.)