To set the stage, Canabalt is a game that tries to run at sixty updates, or frames, per second (60Hz). That's the fastest rate available on most displays, so it's a common target for driving, shooting, sports, or reflex games like Canabalt. A 60Hz update rate gives us 16.666ms per frame (60 / 1 second = 0.01666s = 16.666ms) to do everything related to running the game: gather input, simulate physics, prepare the scene for rendering, a bunch of other things, and play sounds.
Not long after adding sound effect playback via the SoundPool class, we started to notice the game running kind of rough. The game would visibly stutter on occasion and miss input events, meaning the main character wouldn't jump when you asked him to. Not fun, and certainly not shippable.
The culprit was the call to SoundPool.play(). Seen here in an image from a bare-bones test application that I wrote while diagnosing this problem, the game thread was spending anywhere from 1-9ms(!) just waiting to start playing back a sound. Not pictured are some instances of sounds taking more than 12ms to start (rare, but it happens). That's not only quite volatile, but also way, way over-budget!
To narrow down what was going on, I tried changing every variable that I could think of, one at a time:
* SoundPool is build on top of AudioTrack, which is the lowest-level sound API available to apps written Java. So, the first thing I tried was using AudioTrack directly. Operating in static mode, any combination of AudioTrack calls that would reliably play sound effects properly netted basically the same delay. Streaming mode clocked in at several hundred milliseconds per fill of the streaming buffer, so that too was out.
* The sound files themselves are decoded to PCM by SoundPool when loaded (AudioTrack expects you to supply it with PCM yourself), so as expected there wasn't any performance difference between using OGG and MP3 files. Other things that turned out to not have any discernible impact: sound file sampling rate, sample bit precision, and sound file length.
* Finally, every device we have here (every major manufacturer, most non-major manufacturers, ARM CPUs, Intel CPUs, every Android version from 2.2 - 4.0) turned in similar performance, with one notable exception. The gaming-centric Sony XPERIA Play never blocked the game for longer than 2ms.
Thiago Rosa has found evidence of the sound system shutting down to save power when nothing is playing. Starting a sound would take a long time because everything would need to start back up. Unfortunately, the suggested solution of constantly playing a muted looping sound didn't alter the numbers I was seeing.
Experimenting with AudioTrack is what lead to our shipping implementation. It seems that starting a static AudioTrack is nearly instant. Stopping a static AudioTrack so that you can tell it to refresh and play its contents again (even if the playback was long completed) is what eats up all of the time. My best guess is that all of the time is being spent idling waiting for something between the deeper parts of Android and the hardware to grant access to the AudioTrack.
We ended up moving audio playback onto a different thread from the game logic. Instead of stalling the game while waiting to play a sound, the game simply (and instantly) adds to a buffer the id of a sound effect it wants to play. Each frame, the buffer is handed over to the sound thread where all of the calls to SoundPool.play() happen.
This basically means that the game is free to run whenever the sound thread is stuck waiting.
In the best-case scenario, single-core devices are able to do work during time that was previously spent waiting for the audio system. Multi-core devices may even have the game and sound threads on separate cores, which means the game would be completely isolated from all of sound effect speed bumps.
The worst-case would be if the audio system is actually working hard and not idling while waiting to start a sound. In that case, there is no time to be gained back on a single-core CPU, but we still do have the benefit of not blocking the game thread for extended periods when several sounds start simultaneously.
The downsides to this approach are increased code complexity and an additional game frame (16ms) of latency between the request for a sound to start and when the sound actually starts.
It does bother me a bit that this was ultimately based on an educated guess, but I can say that threading our sound effect playback did restore the missing responsiveness to the game. I would love to have a conversation with someone who really knows the Android audio system to help get to the bottom of it.
I can't say whether our experience would have been any better if we were able to use the NDK. A brief web search indicates that the OpenSL ES implementation there suffers many of the same limitations as AudioTrack/SoundPool in Java, such as 100ms of latency. It would be an interesting point of research to see if starting and stopping sounds there had the same performance penalty.
A future refinement may be to add a priority value to requests in the sound playback queue. This would be so that important sounds (dialog, impact noises, etc.) aren't delayed by a backlog of unimportant ones (ambient sounds) if SoundPool is really running behind. It's not really an issue for Canabalt, but I can see how it would be handy to have.