Many streaming producers are increasing the number of mobile and over-the-top (OTT) platforms that they support while implementing adaptive streaming to enhance the viewing experience on each. There are two ways to accomplish this: produce a unique set of streams for each target, or derive one smaller group of files that will effectively serve all platforms. In this How-To article, I’ll explore the latter approach.
Contents
Identify Your Targets
The first step is to identify the platforms that you’re targeting. For the vast majority of producers, typically this means both desktop and mobile, with OTT platforms increasingly common. While adding platforms may seem daunting at first, keep these two comforting facts in mind. First, all relevant target platforms play H.264 video, and second, they all use one of three adaptive streaming technologies; Flash Dynamic Streaming (either RTMP or HTTP), Smooth Streaming, or HTTP Live Streaming (HLS). You can see this in Table 1.
The single file max column shows the maximum supported specs for each platform. I didn’t include computers because while it varies by platform, there is no hard limitation, and most of the installed base of computers can play at least 720p video. The playback capability on all mobile platforms is device-dependent, and you should see the recommendations for single file streaming at the referenced URL. More on this in a moment. In terms of adaptive streaming, Apple’s HLS is the clear winner in the OTT space, with only Microsoft’s Xbox 360 not joining the party (no surprise there).
What does this mean for streaming producers? The devil is always in the details to a degree, but if you’re distributing single files to each device, you just need to find and use the optimal encoding parameters for each platform. If you’re distributing adaptively, you need to support Flash or Smooth Streaming on the desktop, with HLS garnering most of the relevant mobile and OTT targets. Throw in a media server or CDN service that can transmux from one adaptive format to another, and the task starts to look much, much simpler.
Before examining each individual target platform, let’s discuss some high-level realities.
Mobile Sets the Tone
First is that encoding for mobile is the most complicated because these devices have the least playback horsepower and the least reliable connection speeds. In contrast, most computers and OTT devices can play any H.264 configuration you throw at it and connect on high-bitrate network or broadband connections, either wired directly or via Wi-Fi.
Though there is some variability in connection speed, when configuring your videos for computers and OTT devices, your primary con-
cerns are delivering acceptable quality at the low end, and bandwidth cost at the high end. When configuring your streams for mobile, it’s about configuring streams that play on the least powerful units within your target and are deliverable at low connection speeds. This means that you’ll produce some streams for mobile delivery that you may not deliver to either computer or OTT targets. It also means that you should consider your mobile platforms first, so let’s jump in.
To add some perspective to our analysis, as of Dec. 20, 2012, NetMarketShare reported that Apple had a 61.1% share of the mobile/tablet operating system market, with Android second at 28.02%, Java ME third at 6.65%, BlackBerry fourth at 1.42%, Symbian fifth at 1.24% and Windows Phone sixth at 0.9%. I’ll discuss Apple, Android devices, and Microsoft and ignore the rest.
APPLE MOBILE DEVICES
Video producers obviously prioritize iOS support because it has the largest market share, but also because Apple has done an outstanding job making the iOS platform easy to distribute to. For example, Apple specifically defined the H.264 playback capabilities of every device it released, including resolution, data rate, and profile, a seemingly obvious action that somehow escaped the attention of most Android hardware developers. Second, it tailored and defined HLS to make it simple to reach its devices. A great example can be seen in Table 2, which shows the recommended configuration parameters for 16:9 streams produced for HLS delivery. The data is taken from Apple Technical Note TN2224, titled “Best Practices for Creating and Deploying HTTP Live Streaming Media for the iPhone and iPad.”
As you can see, you get specific configuration recommendations and notification of which streams are compatible with which platforms. For video producers seeking to distribute to the iOS platform, TN2224 is the obvious place to start.
ANDROID MOBILE DEVICES
Distributing to Android devices is much more complicated for several reasons. First, there are multiple hardware manufacturers with multiple devices with vague or no references to H.264 playback capabilities. For example, my daughter has an HTC Rhyme; the spec sheet on the HTC website has no mention of H.264 whatsoever.
The number and diversity of devices, along with the lack of H.264 playback information, makes creating a chart such as Table 2 an impossible task. Instead, on the Android Supported Media Formats page, Google details the software-playback capabilities of the Android operating system itself and provides the anemic recommendations shown in Table 3.
One would guess that the hardware playback capabilities of most current Android tablets and smartphones far exceed these recommendations. For example, on my Toshiba Thrive tablet, I’ve played 720p video encoded using the High profile with no problem (though the Toshiba website similarly failed to provide H.264 playback information). When encoding for Android devices, however, you have to guess. With Apple, you know.
Another factor complicating Android support is that HLS support came late, starting with Android version 3.1. You can check the penetration of Android versions; when I checked in late December 2012, version 3.1 and newer versions only accounted for about 35% of the total Android market, making HLS an incomplete solution. It’s also an imperfect solution, with crashing, seeking, and aspect ratio issues on some platforms, as you can read about in “Jeroen Wijering Talks HLS, DASH, and the JW Player 6.” As Wijering points out, the most comprehensive solution is likely to build your own app.
MICROSOFT WINDOWS PHONE
Though Microsoft’s existing share is currently negligible in the mobile/tablet space, it has high hopes for Windows 8 and RT and its Windows Phone platform. Like Apple, Microsoft offers a limited number of phones and documents their capabilities nicely. Note that while Windows RT will support Flash (and AIR by mid-2013), Windows phones do not currently support Flash. Support for Windows Phone is not on Adobe’s Flash technology roadmap.
As you can see in Table 1, the only adaptive technology supported by the Windows Phone platform is Smooth Streaming. As noted in the Supported Media Codecs for Windows Phone document referenced previously, not all Windows phones support dynamic resolution changes. For these phones, all resolutions in the adaptive group must share the same resolution.
The best source for recommended encoding parameters for Smooth Streaming are the encoding presets contained in Microsoft Expression Encoder 4. Though space considerations prevented us from reproducing that spreadsheet here, for anyone interested, I recorded the configuration parameters recommend for 1080p source video in a Google Documents spreadsheet that you can access. As you’ll see, the preset uses multiple resolutions that wouldn’t work for some versions of Windows phone.
Over-the-Top Devices
Again, OTT devices are easier than mobile because they all live on at least relatively high-speed connections and can all decode virtually any H.264 stream you throw their way. You have links to the playback and adaptive streaming specs, so here I’ll just point out any highlights therefrom.
Though Roku supports multiple adaptive specs, its guide makes it clear that HLS is the preferred technique. The guide also identifies the Wowza Media Server as a “very popular, budget minded choice in the HLS field,” with a useful guide to getting up and running with the Roku Streaming Player.
Apple TV is discussed previously in the iOS section. Note that according to the Boxee support boards, HLS only works within an application, not in the browser. Interestingly, as you can see in Table 4, GoogleTV adapted its stream recommendations from Apple TN2224, though it ignored the lowest quality grouping and recommended the High profile for all streams.
Finally, for Smooth Streaming to the Xbox, see the earlier discussion about Microsoft Windows Phone. Note that I checked Expression Encoder, and there were no Xbox presets.
Synthesis
With this as background, let’s start making some decisions, beginning with the number of streams.
HOW MANY STREAMS?
As shown in Table 2, Apple recommends 10 streams for 1080p-source content, including the audio-only stream. However, before adapting that recommendation, let’s examine how much it would cost to distribute Apple’s highest-quality stream. Specifically, at 8,564Kbps for 1080p video, an hour of video would consume around 4GB. According to Dan Rayburn’s latest blog on the subject, CDN pricing for customers buying from $100,000 to more than $1 million/year in bandwidth ranged from a low of 1 cent per GB to a high of 12 cents.
At these prices, it would cost between 4 cents and 48 cents to stream an hour of video at Apple’s highest recommended rate. However, I’ve seen legacy bandwidth pricing for more modest-sized commitments as high as $1.10/GB, which would boost the per-hour transfer cost of this 1080p configuration to $4.40.
When configuring your highest-quality stream, choose the highest data rate that you can afford, given your monetization strategy and cost structure. Since your top-quality stream has to look very good, you’ll have to adjust video resolution accordingly. For example, if you can only afford 3Mbps at the top end, encode at 720p, not 1080p.
At the other end of the spectrum, identify the lowest video data rate that you’d like to support. For Apple, that’s 200Kbps, though I’ve had clients who produced video as low as 110Kbps. Then identify the resolution/frame rate combination that delivers optimal quality at that video data rate. Apple’s 416×234 at 10-12 frames per second is a reasonable starting point.
Now you’ve got your high- and low-end streams. Next you need to choose the number of streams that accomplishes two goals. The first is to provide at least one stream for every window size the video will be played in within a browser. For example, YouTube plays 16:9 videos at two window sizes, 640×360 and 854×480, plus full screen. If you upload a 720p or larger video, YouTube will create videos at both of these resolutions, because both encoding and video playback is most efficient when the video is displayed at its native resolution. So if you display video on your website in a 640×360 window, you want at least one stream at that resolution.
You also want a sufficient number of streams to serve as reasonable stepping stones between your highest- and lowest-quality streams. For YouTube, this meant four 16:9 streams between their mobile stream configured at 176×141 and their 1080p stream (or, streams at 426×240, 640×360, 854×480 and 720p). Though I don’t know the specs of ESPN’s mobile or OTT streams, for computer viewing, there were three streams between the low of 480×272 and high of 720p: 576×324, 640×360, and 768×432.
More streams are not necessarily better; more streams means that the streams are closer together, minimizing the quality difference while increasing the frequency of stream switching, which can disrupt viewing. The ideal scenario is when the viewer quickly identifies the optimal stream and continues to watch that through the end of the video.
Other Configuration Options
Once you know how many streams you’ll produce, you need to configure them. At the low end of the spectrum, I prefer to drop the frame rate rather than resolution; you can read all about why in “Configuring Low Data Rate Adaptive Streams” (http://go2sm.com/configadaptive). In terms of choosing the data rate for each stream, the differences should start out fairly small — such as the 200Kbps between Apple’s first three streams with video — and continue to increase at higher bitrates, such as the 2Mbps separating the top four streams.
Probably the biggest configuration issue relates to the H.264 profile applied to each file. For example, if you follow Apple’s recommendations, you’ll use the Baseline profile for the first four streams with video and the Main profile for the next four, in all cases to maintain device compatibility. If you’re producing for the Android platform, the safest approach would be to use the Baseline profile for all streams. However, all OTT platforms and computers can play streams encoded using the High profile Should you produce separate groups of files for each, increasing encoding and storage costs?
I evaluated this issue in my article, “H.264 in a Mobile World: Adios to the Main and High Profiles?” (http://go2sm.com/h264profiles), which essentially documented the research I performed on behalf of a consulting client. Specifically, I produced three test cases comparing the quality of video encoded at the Baseline, Main, and High profiles using otherwise identical parameters. In only one of those test cases, where the video was encoded at 640×360@240Kbps, was the difference visible. At more reasonable settings, such as 720p@800Kbps and 640×480@468Kbps, the files were virtually indistinguishable.
The client looked at the difference in quality between all the files and reasonably concluded that the 640×360@240Kbps file would seldom be viewed for long by a computer user connecting via broadband. He decided to produce only one group of files, using the Baseline profile where necessary to maintain compatibility with targeted mobile devices. I suggest you perform the same analysis with representative footage and draw your own conclusions.
Pay particular attention to the quality difference between the Main and High profile streams. If you choose the High profile for OTT and computers, rather than Main, you’ll need to create equivalent files using the Main profile for iOS/ Android compatibility.
I would assume that most Android devices share similar hardware playback capabilities as Apple devices of the same form factor and approximate release date, so I wouldn’t create a set of Baseline-only streams for Android. Rather, the schema shown in Table 2 is probably safe for Android. It’s probably the optimal schema for efficient, one adaptive group encoding for all computer, mobile, and OTT targets.
Wrapping Things Up
Now that you’ve made all the hard decisions, it’s time to touch on the mechanical aspects of encoding for adaptive streaming. First, the key frame interval for all files needs to be identical for stream switching to occur seamlessly. Most producers use an interval of either 2 or 3 seconds and disable the insertion of keyframes at scene changes.
Second, encode using either constant bitrate encoding (CBR) or constrained variable bit rate encoding (VBR), with a maximum data rate of between 1.25 and 1.5 times the target. These techniques will minimize stream switching that occurs because of changes in the video data rate rather than changing bandwidth or CPU conditions.
Finally, regarding audio, recognize that it’s safest to use the same audio parameters for all files in an adaptive group, which minimizes the risk of popping or similar artifacts during stream switches. This is why Apple recommends 44.1 kHz audio at 64Kbps for all streams in Table 2.
On the other hand, if you’re producing premium content where audio quality is a significant component of the overall experience, you may find this approach too restrictive. To minimize potential issues, use the same frequency for all streams and switch the number of channels, data rate, or both. For example, consider using 44.1 kHz mono audio at 32Kbps for your lowest stream, 44.1 kHz mono at 64Kbps for mid-quality streams, and 44.1 kHz stereo at 128Kbps for your highest quality streams. Then test before going live to ensure that audio artifacts don’t occur when switching streams.