- Streaming production
- Streaming fundamentals
- Encoding your video
- Choosing production tools
- Distributing your video
- Video tutorials
- Peer review
How to Produce High-Quality H.264 Video Files
- Categorized in: H.264 production
H.264 is the only compression technology that plays on all computers, mobile devices, and OTT players. This makes producing high-quality H.264 files compatible with your target playback devices an essential skill. Helping you acquire and/ or polish this skill is the focus of this article.
We’ll start with the compatibility element, since if the file won’t play on your target devices, it really doesn’t matter how good the quality is. Then we’ll tackle the resolution, frame rate, and data rate of your encoded file, since if you get these wrong, fussing with the H.264 encoding parameters of your file also won’t matter. Then we’ll cover choosing the right encoding tool and H.264 codec and how to quickly tune x264 encoding parameters for an optimal quality file.
The most fundamental H.264-related encoding parameters are profiles and levels. Briefly, the profile controls which encoding algorithms and techniques are used when producing the encoded file. The Baseline profile produces a file that can be played back on devices with minimal CPU and memory, while the High profile uses the most advanced techniques and requires a more powerful playback platform. Most encoding tools provide a simple control for choosing the profile, such as that shown in Figure 1 (from Telestream Episode).
Figure 1. Choosing the profile in Telestream Episode
Profiles were established in the H.264 standard to allow device manufacturers to support H.264 playback with an inexpensive, power-efficient configuration, such as that used in the original video-capable iPods, which could only play H.264 video encoded using the Baseline profile. At the other end of the spectrum, computers manufactured in the last 5 or 6 years and all OTT devices can play video encoded using the High profile.
To enable even more precise targeting of playback capabilities, levels set maximums for parameters such as resolution and data rate within each profile. This is shown in Table 1, which shows the profiles and levels supported by all video-capable Apple devices. In the first column, you can see that the original iPod, through version 5g, could only play video encoded using the Baseline profile to Level 1.3, which meant 320x240 resolution at 30 fps at a maximum data rate of 768 kbps. In contrast, as with computers and OTT devices, the newest Apple devices can play pretty much any file you would care to throw at it.
Table 1. Playback capabilities of Apple devices
Looking at the second column from the left, if you want your video to play on version 4 or lower iPhones, you need to produce these streams using the Baseline profile at a maximum configuration of 640x480x30fps at 2.5Mbps. In fact, in Technical Note TN2224, Apple’s seminal reference on producing HTTP Live Streaming (HLS) adaptive files for iDevice delivery, Apple recommends encoding all 640x360 streams and smaller using the Baseline profile, with higher resolution files encoded using the Main and High profiles. Note that as part of the HLS specification, devices check the stream configuration before retrieving a file, so they won’t attempt to retrieve a file that they can’t play.
Unfortunately, the breadth of Android manufacturers makes consolidating playback capabilities into a table like Table 1 nearly impossible. Instead, Google guarantees that each Android device will play a 480x360x30fps file encoded at 500Kbps without the H.264 playback hardware acceleration that most devices supply. In other words, while most recent Android devices are capable of playing back files encoded using the Main and High profiles, Google can’t guarantee this. In fact, Google’s Supported Media Formats document states that 1280x720x 30fps video encoded at 2Mbps using the Baseline profile will not play on all Android devices.
For this reason, when producing a single file for Android delivery, most sources recommend encoding at the maximum supported configuration (480x360x30 fps @ 500Kbps, Baseline Profile). Since Android versions 3.0 and above support HLS playback, you can use the Apple schema presented in TN2224 to efficiently deliver to those Android devices as well. That is, so long as you encode at least one stream at 480x360x30fps at 500Kbps using the Baseline profile, you’ll have one stream for Android devices to play, and using the HLS protocols, the Android device will be able to retrieve any higher-quality file that it can play.
The bottom line? If producing a single file for mobile playback, you should use the Baseline profile to ensure universal playback. If producing an adaptive group of files for mobile playback, you should encode the lower-resolution files using the Baseline profile, and the recommendations Apple provides in TN2224 are a great place to start.
This raises a larger question: If you do produce files using the Baseline profile for mobile playback, should you also create files using High profile exclusively for computer and OTT playback to provide the highest possible quality? In my experience, the qualitative difference between files encoded using the High and Baseline profile is often less than you might think. For this reason, you should compare the quality of files encoded using the High and Baseline profiles to make sure the additional encoding cycles are worth the effort.
Configuring Your Files
Once you’ve ensured compatibility, it’s time to shift your focus to quality. The most critical element here is the configuration of your video file(s); specifically the resolution, frame rate, and data rate. Get these wrong and your file will look awful, even if all other encoding options are perfect.
By way of background, as with all streaming-video codecs, H.264 is a lossy codec, which means the more you compress the video file, the more quality you lose. How do you ensure that you don’t over-compress your video? By monitoring a metric called bits per pixel.
Briefly, bits per pixel is the amount of data applied to each pixel in the file. The formula is the per-second data rate, divided by the number of pixels per second, which you compute by multiplying the width times the resolution times the frame rate (data rate/width x height x frame rate). For example, suppose you encoded a 640x360x30 fps file at 800Kbps. The bits per pixel would be 0.116, calculated by dividing 800,000/(640x360x360), or 800,000/6,912,000. Alternately, you could just load the file into MediaInfo, and let the free, cross-platform tool compute the bits per pixel for you (Figure 2), as well as provide a bunch of other meaningful encoding details.
Figure 2. Choosing the profile in Telestream Episode
At 640x360x30 fps, most producers use a data rate of around 700Kbps to 1Mbps, which delivers a bits-per-pixel value of between 0.1 and 0.15. At the upper end of the scale, ESPN encodes its high-motion sports content at 1.4Mbps, or 0.203 bits per pixel. If your video is blocky, pixelated, and full of artifacts, and you’re encoding at a bits-per-pixel value of 0.1 or below, raise your data rate to bring quality into line. If your bits-per-pixel value exceeds 0.2, there’s a good chance you could produce visually similar results at a much lower data rate, saving bandwidth costs and enabling delivery to mobile devices. Try encoding at a lower data rate and see.
Because codecs are more efficient at higher resolutions, the bits-per-pixel value necessary to maintain equivalent quality drops as frame sizes increase. For example, in their 720p stream, ESPN encodes at 2.8Mbps, or 0.102 bits per pixel, about half the bits-per-pixel value used for 640x360 files. The mathematical representation of this increased efficiency is quantified in the Power of .75 rule, which involves fractional exponents and is beyond my ability to verbally explain. You can read more on this.
For the purposes of this article, just understand that as resolutions increase, the bits-per-pixel value required to deliver equivalent quality drops. As a guide, consider the data in Table 2, from my book Producing Streaming Video for Multiple Screen Delivery, which shows data rates at the resolutions and bits-per-pixel values shown, all at 29.97 fps. The red squares suggest the appropriate data rate for each resolution, which as you can see, drops in bits-per-pixel value as resolutions increase. Again, if your data rates are much lower than those shown and the quality doesn’t cut it, boost the data rate until quality is sufficient. If your data rates are much higher, experiment with lower data rates to see if you can produce a lower-data rate stream that looks the same but is more efficient and cost-effective.
Table 2. Recommended data rates at various video configurations
Choose the Right Encoding Tool
Once you get the profile and configuration right, it’s time to start honing in on H.264-specific parameters. The sheer number of streaming professionals using Final Cut Pro X (FCPX) guarantees that quite a few producers are encoding with FCPX’s companion product, Apple’s Compressor. Though Compressor itself is capable, the stock Apple H.264 codec included in the product is very subpar.
By way of background, H.264 is a standard, so unlike proprietary codecs such as On2’s VP6 or Microsoft’s Windows Media Video codecs, multiple parties can create their own codecs. As you’ll see, MainConcept, a German subsidiary of Rovi Corp. (formerly Macrovision), has produced an H.264 codec used in many premium encoding tools, while the open-source x264 codec is also widely supported. Apple created one of the first H.264 codecs, which was competitive in its day. However, since then, other developers have optimized their H.264 codecs, while Apple, presumably focusing on more profitable segments, let its codec languish, and its quality is now uncompetitive.
Fortunately, Compressor can access QuickTime plug-ins such as the x264Encoder plug-in that you can download at MyCometG3. Note that the developer of the site, Takashi Mochizuki, discontinued development on the plug-in at the end of 2011, so the code is frozen as of that date. That said, while there will continue to be quality and performance advances, the x264 codec at that time was highly optimized. As you can see in Figure 3, it delivers significantly better quality than the Apple codec.
Figure 3. Compressor’s native Apple codec is subpar at aggressive encoding parameters
For the record, unless otherwise stated, all comparison frames in this article were produced using my standard 720p test file, with the 29.97 fps video compressed to 800Kbps using the highest quality-related settings supported by each encoding tools, and consistent key frame, B-frame, reference frame, and similar parameters. These parameters are aggressive; by comparison, sites such as YouTube and ESPN produce their 720p, 29.97 fps video at 2.5Mbps or higher.
To be fair, if you’re producing your 720p files at 2.5Mbps or higher, you’d likely see very little difference between the Apple and x264 codecs. However, if you’re attempting to achieve the highest possible quality at the lowest possible data rate, the x264Encoder will perform much better. In particular, mobile encodes, which require a highly optimized stream to deliver the largest resolution at the lowest possible data rate, can significantly benefit from the x264 plugin. For those unfamiliar with the plug-in, note that OnlineVideo.net published an extensive tutorial on how to install and use the X264Encoder in an article titled, “Final Cut Pro X Tutorial: How to Get Better Encoding Results.”
Use the Best H.264 Codec
Many encoding tools, such as Sorenson Squeeze in Figure 4, offer multiple H.264 codecs, including the MainConcept codec, x264 and multiple hardware-accelerated codecs such as the MainConcept CUDA codec or Intel Quick Sync Video (QSV) H.264 codecs. Which are best here?
Figure 4. Which is the best option here?
First, you should never use the MPEG-4 codec unless you’re attempting to create files specifically for older mobile phones and similar devices. For general-purpose use, H.264 always delivers higher quality at the same or lower data rates.
As between the MainConcept and x264, Moscow State University issues an annual H.264 comparison that is both extensive and highly regarded. Over the last few years, x264 has proved the highest-quality option, with MainConcept in second, typically about 20% behind x264. To explain, the researchers reported that for their test clips, using the x264 codec, you could produce a file at a 20% lower data rate than MainConcept with equivalent quality.
Interestingly, Moscow State University relies solely on mathematical quality comparisons, specifically the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) metrics. Uniquely, the x264 codec offers tuning options (more later) that produce streams optimized for these tests, which apparently Moscow State University used when producing its x264 files. This potential advantage led the researchers to include this statement in their overall conclusion: “The leader in this comparison is x264 -- its quality difference (according to the SSIM metric) could be explained by the special encoding option (‘tune-SSIM’).”
That red herring aside, my own tests, which are entirely based upon subjective frame and playback comparisons, indicate that x264 produces higher quality streams than MainConcept, though the advantage seems smaller than the Moscow State University report might imply. Specifically, in most of the seven sequences in my standard test clip, the quality of the two streams was very similar. Where there was a difference, however, x264 was always better, as shown in Figure 5, where the x264 frame clearly retains more detail in the circled area.
Figure 5. x264 delivered slightly higher quality than MainConcept.
However, the difference isn’t nearly as dramatic as the x264-Apple differential shown in Figure 1. For this reason, while I use x264 when available in an encoding tool, I wouldn’t switch encoding tools because only MainConcept, and not x264, was available.
In comparison, as shown in Figure 6, I never use hardware-accelerated H.264 codecs such as the Intel and MainConcept CUDA codecs, since they are clearly inferior to software-only encodes. As a confirmation of my subjective findings, Moscow State University rated the Intel codec 82% behind x264, with the MainConcept CUDA 170% behind.
Figure 6. Hardware accelerated codecs encode much faster but offer noticeably less quality.
Briefly, these hardware-accelerated codecs leverage the dedicated encoding cores on Intel CPUs (QSV) or the general processing power of NVIDIA GPUs (CUDA), and the design emphasis was on speed, not quality. If you need fast draft encodes, these codecs might provide some value. Otherwise, you should eschew them in favor of totally software-based H.264 codecs such as the x264 or MainConcept codecs.
What about other products that use proprietary H.264 codecs? For the most part, you’ll find these on some higher-end products such as the hardware and cloud encoders offered by Elemental Technologies. In my experience, vendors that use proprietary H.264 encoders have invested significant resources to come close to or match x264’s quality. For this reason, codec quality is seldom a significant differentiator when comparing these high-end tools.
Optimize Your x264 Encodes
One popular feature of the x264 codec is a plethora of configuration options, though with options such as “Quantizer curve compression factor” and “Use mixed refs per 8x8 partitions,” it’s likely that very few compressionists in the field can tie these options to any desired encoding outcome. Fortunately, most encoding tools that use the x264 codec let you select both a preset and the tuning mechanisms, with the typical options shown in Figure 7, a screen from the x264Encoder discussed previously. Briefly, presets, which were created by the x264 development team, adjust the individual x264 encoding parameters in ways that trade off quality versus encoding time. When it comes to choosing a preset, the obvious questions ask how much quality and how much encoding time.
Figure 7. x264 Preset and Tuning from the x264Encoder
Let’s address encoding time first. Table 3 shows the encoding times with Sorenson Squeeze when producing my 92-second test file to the data rates shown using the presets shown. I performed these tests on an HP Z820 with two 2.7 Ghz Intel Xeon E5-2697 v2 CPUs, each with 12 cores (24 with HTT enabled). So this is a very fast computer, and the x264 codec is very highly optimized for multiple core computers, using nearly 100% of available CPUs on all encodes. If you’re using a slower computer, times will obviously increase, as will the differences between the intervals.
Table 3. Encoding times using the x264 profiles at these target data rates
From a strict performance perspective, there is little reason to choose a preset slower than Medium, which is generally the default setting, since the performance boosts at the faster settings aren’t that dramatic. How does the quality compare?
Well, in my comparisons, the only preset that showed obvious artifacts was the Ultra Fast preset, which showed blockiness at both data rates (Figure 8). At the other end of the spectrum, the differences between even the Placebo and Medium settings was very minor, and often the frames looked visually identical. In fact, there’s a surprising lack of differential between all encodes ranging from Super Fast to Placebo.
Figure 8. Files encoded using the various x264 presets
If you run a quality-focused shop and aren’t limited by throughput on your encoding stations, you could use the Very slow preset to potentially increase quality, while only doubling encoding time. In contrast, the Placebo setting would have a much more dramatic impact on encoding time, and it’s unlikely to produce better quality than the Very Slow option.
At the other end of the spectrum, if you’re throughput-limited, you could try encoding using the Super Fast option and test to see if the files appear degraded compared to those encoded using the Medium option. However, don’t expect the time savings to be significant. If you are throughput-limited, it may be time to consider shopping for a new encoding station.
What about x264 tuning? For general-purpose footage, some pundits recommend using the Film option, though in my comparison tests using my standard test files, I’ve seen no difference. Otherwise, if you’re encoding specialized footage, try the specific tuning option, such as the Animation setting for animated footage, Touhou for this Japanese computer game, Still Image for slideshows, and Grain for footage with lots of graininess.
What if your encoding tool doesn’t use the x264 codec? Unfortunately, the codecs and configurations used by these tools vary so much that any kind of specific advice is impossible. Fortunately, most encoding tools include presets designed to deliver the optimal blend of performance and quality. When a vendor offers presets, I would choose the preset closest to my target configuration, make sure the Profile is appropriate, and adjust the resolution and data rate to my target settings.