H.264 is the most widely used codec today, whether for streaming via Flash or Silverlight or for the Apple iPod, iPhone, and iPad product lines. If you’ve worked with H.264 before, the format is old hat for you. But if you’re cutting over from VP6 or Windows Media or expanding distribution to H.264-compatible devices, you’re faced with a learning curve.
Well, we’re here to help. In this article, I’ll detail what you need to know to produce H.264 files for streaming or device playback. Our target reader is the novice working with encoding tools such as Adobe Media Encoder, Apple Compressor, Sorenson Squeeze, and Telestream Episode Pro. If you’re looking for help with more advanced tools, they simply provide too many options to address in an introductory article.
What’s Your Profile?
With H.264, it’s critical to adopt Stephen Covey’s advice: “Begin with the end in mind.” That’s particularly true with device playback, because if you produce a file incorrectly, it won’t load or play on your target device.
From a compatibility standpoint, the most important encoding parameter compatibility is profile. This defines a set of encoding techniques that can be used to create the encoded file. The H.264 standard defines 17 different profiles, with the three most relevant to streaming shown in Figure 1, a table adapted from one presented by Wikipedia (http://en.wikipedia.org/wiki/H.264). As you can see, the encoding techniques are presented on the left, and each successive profile deploys additional techniques to produce the encoded file. This produces a higher-quality file but also one that’s harder to decode.
Why 17 profiles? Because they serve as convenient compatibility points for hardware developers and video producers. For example, Apple designed the iPod to play H.264 video produced using the Baseline profile, balancing factors such as CPU and memory cost as well as LCD resolution. If you’re producing video for the iPod, so long as you encode using the Baseline profile (and the appropriate level, discussed later), the file will load and play on the iPod.
Accordingly, rule No. 1 for H.264 encoding is to know the highest profile supported by the playback device you’re targeting. Fortunately, most manufacturers, particularly Apple, do a nice job laying out these specs in their product materials, so check their website first.
The other H.264-related parameter that you’ll typically see on these spec pages is the “level.” For example, according to Apple’s website, the iPad will play “H.264 video up to 720p, 30 frames per second, Main Profile level 3.1.” By way of explanation, levels further subdivide the profile regarding parameters such as resolution and data rate. If you visit the H.264 page on Wikipedia, you’ll learn that level 3.1 for the Main Profile has a maximum bitrate of 14Mbps and maximum resolution of 1280x720p. If you exceed these specs on a file destined for the iPod, you run the risk that iTunes will kick the file back out during the syncing process.
So, when producing for devices, once you know the profile and level, you have a pretty good idea how to encode your video. What about when producing for computers?
Producing for Flash and Silverlight
Both the Flash and Silverlight players can play H.264 files encoded using all three profiles, so unless you’re attempting to produce a file that plays on both computers and other devices, use the High Profile. Typically, when you’re producing for computers, levels are irrelevant, since the relevant player doesn’t check the level before attempting to play the file. Instead, the most relevant considerations for computers are the resolution and data rate of the actual video file, which will determine whether that computer can play the file smoothly.
To explain, if you try to play a 1080p file produced at 12Mbps on a Pentium 4-based computer, chances are the file won’t play smoothly. This, of course, has nothing to do with profile or level; there are simply too many pixels for the older computer to push. So, where producing for devices is all about meeting the designated profile and level, producing for computers is all about the configuration of the compressed video file, which I discuss in more detail later.
For this reason, most encoding tools don’t let you select a level, and some, like Telestream Episode Pro, let you click a check box to automatically adjust the level to match the selected encoding parameters. Occasionally, with tools like Adobe Media Encoder, which does allow you to restrict encoding to a designated level, you may get an error message if you attempt to encode using parameters that exceed that level.
For example, if you set the level at 3.1 for the High Profile and attempt to encode at 1080p, Adobe Media Encoder will let you know that you’ve exceeded the parameters for level 3.1. If you really want to produce that file, you simply boost the level setting to level 4, and Adobe Media Encoder will produce the file.
Encoders in Action
As you would suspect, not all encoding tools address these options the same way. For example, when encoding with the Apple H.264 codec, Apple Compressor only supports the Baseline and Main Profiles, and it does so with a somewhat anonymous check box labeled “Frame Reordering” (Figure 2). Check the box and you get the Main Profile; if you don’t, you get Baseline. Compressor does not let you designate a level or encode using the High profile.
As mentioned, the Adobe Media Encoder lets you choose both the profile and level, as you can see in Figure 3. Note that neither Compressor nor Adobe Media Encoder provides access to the encoding parameters discussed in the next section.
Another H.264-specific encoding option enabled in some encoding tools is entropy coding, which designates how the compressed data is packed in the final video file. As you can see in the screenshot of Sorenson Squeeze shown in Figure 4, there are two choices, CAVLC and CABAC, with the latter available only when producing using the Main or High Profiles.
Like many advanced encoding options, the advanced option-CABAC-produces a higher-quality file that’s harder to decode on the playback platform. To oversimplify, the data are packed tighter, which is more efficient qualitywise. The downside is that it requires more CPU horsepower to unwrap and display on the viewing station. The obvious questions are, “How much better is the quality?” and, “How much harder is the file to decode?”
In my tests comparing similarly configured files (720p at 800Kbps video data rate) encoded with CAVLC and CABAC, the quality difference was noticeable in some hard-to-compress scenes, and I’ve seen some experts claim that CABAC delivers similar quality at 12%-15% lower data rates. On the decode side, playing back the CABAC file took three-fifths of a percent (that’s .006%) more CPU horsepower on my HP 8710w Mobile Workstation running a 2.2GHz Core 2 Duo CPU and a 4% difference on an older pre-Intel dual 2.7GHz PPC G5 Mac.
Since the quality advantage is meaningful and the playback difference negligible, I always use CABAC when producing with profiles (and encoding tools) that support it. As you can see in Figure 5, an analysis of an H.264 video file downloaded from YouTube, YouTube does as well and also uses the High Profile-nice validations for both recommendations. The utility that provided this analysis is MediaInfo, which is free and runs on Windows, Mac, and Linux platforms. It’s the one tool that I install on every computer that I own, and you can download it at http://mediainfo.sourceforge.net/en.
Like most advanced video compression technologies, H.264 uses interframe compression to eliminate redundancy between frames, which is why talking-head sequences encode much more easily than World Cup matches. H.264 implements interframe compression using the same three frame types deployed by MPEG-2: I-frames, B-frames, and P-frames.
Briefly, I-frames (also called keyframes) are encoded without reference to any other frame using JPEG, the still-image compression technique. P-frames can look backward to previous I-frames or P-frames for redundancies, while B-frames can look forward and backward for redundancies, making B-frames the most efficient frame type. Like CABAC coding, however, this efficiency comes at a cost-files encoded with B-frames have higher CPU playback requirements.
This triggers the same analysis as with CABAC-“How much better is the quality, and how much higher are the CPU requirements?” Regarding the first question, files encoded with B-frames enjoy slightly higher quality than those encoded without B-frames, but only on high-motion files produced at the lowest possible bitrate. In terms of required CPU horsepower on the playback side, files with B-frames can consume up to 10% more CPU horsepower, but the difference is usually 5% or less.
For this reason, I recommend using B-frames when the profile supports it. As shown in Figure 6, from Telestream Episode Pro, typical B-frame-related parameters include the number of B-frames and number of reference frames. The number of B-frames is the number of B-frames in sequence between I- and P-frames. So at a setting of 3, the frame sequence would be IBBBPBBBPBBB … and so on until the next I-frame.
The number of reference frames is the number of frames searched for interframe redundancies. Here you balance encoding time against the potential for quality improvement, since searching for these redundancies takes time. In most videos, redundancies occur most frequently in the frames immediately surrounding the frame being encoded, so reference frame values higher than three to five typically provide little additional quality. Figure 6 shows the settings I use and recommend for most encodings, a B-frame setting of 3 with three reference frames.
Episode Pro and Squeeze also let you balance encoding time against quality: Episode Pro with an encoding speed versus quality slider with values that range from 10 to 100, and Squeeze with a Fast, Medium, and Best list box. The Episode Pro help file has a good explanation of what’s going on here, so let’s quote:
Encoding speed vs. quality. The H.264 encoder has a wide range of encoding methods to use, which may result in a very time consuming encoding process. The Encoding speed vs. quality setting determines the complexity of the encoding by switching on or off different tools. Encoding speed vs. quality can be set between 10 and 100; 10 represents the fastest speed, with most of the advanced features turned off, 100 represents the most advanced coding mode, yielding the best quality, but also taking a considerably longer time. In general, values over 50 yield very small improvements in visible image quality.
In terms of variations in encoding time and quality, your mileage will vary based upon content and target encoding parameters. With Episode Pro, I’ve seen very little difference in either encoding time or quality at the extremes of the encoding speed or quality slider. Still, I typically set the slider at 100 since my encoding volume is low. If you’re a high-volume shop, run your own comparatives and see if the quality and encoding time varies significantly. With Squeeze, I always use the Best setting, which takes about 25% longer than Fast, but it produces noticeably better quality.
The final H.264 encoding parameter that I’ll address is slices, which is available in both Squeeze and Episode Pro. This parameter divides each frame into slices and assigns a different CPU to each slice, speeding encoding on multiple processor systems. The only downside is that slices can’t refer to each other during encoding, so if a bouncing ball moves from one slice to another during the course of the video, the encoder won’t pick this up as a redundancy. Again, since I’m
a low volume shop, I always set slices at the minimum value, but your results may vary.
These are the H.264 basics; now let’s touch on encoding for devices and computers.
Producing for Devices
If you’re producing for devices, job No. 1 is to check the manufacturer specs and make sure your encoding parameters don’t exceed them. If you use a template supplied by your encoding tool, you should compare the values against
the manufacturer’s specs. I’m sure that most templates will conform, but it is your file.
After encoding, test your file on the target device or, if applicable, a range of devices. About 2 years ago, I downloaded 50 files from iTunes and tried to load them on my iPod nano. Six wouldn’t load, with mistakes ranging from wrong profile to exceeding the recommended data rate and resolution to one producer using the Sorenson Video 3 codec, which won’t play on the iPod. Mistakes happen, but all these would have been caught had the producer simply tried to load the file onto the target device.
When producing for devices, remember that you have to adjust your encoding parameters to match your delivery scheme as well as the playback specs of your target. For example, while the iPad can play a 720p file encoded at up to 14Mbps, delivering that file via cellular or even Wi-Fi might be a touch impractical. That’s why most producers streaming to the iPad configure their videos in the 640×360 range, with data rates well under 1Mbps.
When producing for mobile devices, using an adaptive strategy is critical to satisfying customers connecting via different technologies and speeds. Here, Apple has both a technology advantage, with HTTP Live Streaming designed specifically for its family of devices and a documentation advantage, with a tech note available athttp://developer.apple.com/iphone/library/technotes/tn2010/tn2224.html# detailing suggested configurations for iPads and iPhones. For an introduction to HTTP Live Streaming, try this article or Google “Jan Ozer” and “HTTP Live Streaming.”
Producing for Computers
If you’re producing for computers, beyond the configuration options discussed previously, choosing the resolution and data rate of your files is the most important consideration. If you’re producing a single file, SD configurations of H.264 (640×480 or 640×360) play well on computers as old as my HP Workstation xw4100, which I got in 2003 and which has a 3GHz Pentium 4 CPU with Hyper-Threading Technology.
Jump to 720p, however, and the required CPU playback horsepower increases significantly, running the risk of frames dropping during playback. As a very general rule of thumb, if you’re producing at 720p, the minimum target platform that can play these files at or below 50% CPU utilization are Core 2 Duo-based computers.
Again, the best way to serve a range of target viewers with varying CPU and connection speeds is by offering multiple files, preferably via an adaptive streaming technology such as Adaptive Streaming or Dynamic Streaming, since Apple’s HTTP Live Streaming is pretty much limited to Apple devices. If this isn’t in the cards, consider the movie trailer strategy, where you offer multiple files and let the viewer decide which to watch. That way, if the video stops during playback, the viewer blames his computer or internet connection, not your encoding strategy.