Recently I received a shortlist of compression-related questions from a consulting buddy; he’s always been gracious in answering my questions and I wanted to return the favor. Since he’s asking, I thought others might be curious as well; hence this blog post that largely incorporates my responses to his questions, all relating to encoding H.264 files.
Here are the questions:
– Profile: Do we need to stick with Main Profile, or can we use High?
– Level: Is there a Level or decode buffer size we should restrict ourselves to?
– Is libx264 an acceptable encoder?
– Preset: medium, slow, slower?
Here are the answers:
Do we need to stick with Main Profile, or can we use High?
When encoding H.264, you always want to choose the preset for maximum compatibility. By way of background, H.264 has multiple profiles and levels that allow hardware developers to target levels of support that meet their CPU, size, and battery life targets. There are three profiles, Baseline, Main, and High, that define the encoding tools that can be used in a profile (like B-frames, which aren’t available in the Baseline Profile). There are many levels, from 1.0 to 6.2, that define resolution and frame rate limits. When producing the original iPod Touch, Apple limited H.264 support to the Baseline Profile up to level 3.0, which means up to 720×email@example.com, presumably because it would cost too much to support HD video at 30 fps.
Back in the day, many producers included rungs with multiple H.264 profiles in their encoding ladders to ensure playback on older devices. In the encoding ladder below, pulled from Apple’s seminal TN2224 (now taken down and supplanted by the HLS Authoring Specifications), you see the recommended profiles and levels and the devices that they were meant to support.
Table 1. Encoding ladder from the Apple’s gone but not forgotten TN2224.
About four years ago, Apple abandoned this scheme and pronounced support for the Baseline and Main profiles obsolete, or at least vestigial. Specifically, sometime around January 2017, Apple amended TN2224 and launched the HLS Authoring Specifications. As I reported in my article Apple Makes Sweeping Changes to HLS Encoding Recommendations:
- TN2224 now says “You should also expect that all devices will be able to play content encoded using High Profile Level 4.1.”
- The Apple Devices spec then said “ 1.2. Profile and Level MUST be less than or equal to High Profile, Level 4.2. 1.3. You SHOULD use High Profile in preference to Main or Baseline Profile.”
Now the HLS Authoring Spec reads:
- 1.3a. For maximum compatibility, some H.264 variants SHOULD be less than or equal to High Profile, Level 4.1.
- 1.3b. * Profile and Level for H.264 MUST be less than or equal to High Profile, Level 5.2.
- 1.4. For H.264, you SHOULD use High Profile in preference to Main or Baseline Profile.
Regarding the asterisk, the Spec states “Rules with a leading asterisk (*) are modified by one or more of the Amended Requirements.” So, 1.3b appears to be new.
So, clearly, you shouldn’t exceed High Profile Level 5.2, which is 4K @ 60 fps. Shouldn’t be a problem. But it also looks like Apple is backing off its High Profile perch and recommending some rungs with lower profiles for “maximum compatibility.” What’s the best strategy here?
While most streaming publishers are loath to leave any viewers without a stream to view, understand that Baseline-only devices at this point are extremely rare. In addition, while the profile doesn’t make a huge difference at the top end of the ladder, it does on the lower end. The chart below shows the VMAF score for four relatively hard-to-encode files encoded to 360p @ 800 kbps, which is should be near the bottom end of most encoding ladders. As you can see, there is a significant quality jump from the Baseline to the Main Profile; much less so from the Main to the High Profile. If you’re really concerned about leaving viewers with older legacy devices without a stream to view, leave the bottom stream at Baseline but move up to Main in the next rung.
Level: Is there a Level or decode buffer size we should restrict ourselves to?
These are really two questions regarding level and decode buffer size. Let’s start with Level. Wikipedia defines levels as “a specified set of constraints that indicate a degree of required decoder performance for a profile. For example, a level of support within a profile specifies the maximum picture resolution, frame rate, and bit rate that a decoder may use. A decoder that conforms to a given level must be able to decode all bitstreams encoded for that level and all lower levels.”
Just to put a stake in the ground, the iPhone 6 shipped in September 2014, and supported “H.264 video up to 1080p, 60 frames per second, High Profile level 4.2 with AAC-LC audio up to 160 Kbps. According to the specs I snipped from Wikipedia below, this means a maximum data rate of 50 Mbps, and a maximum resolution/frame rate combination of 2048×1080 at 60 fps. This is a pretty old device, and so long as you’re distributing HD video (as opposed to 2K/4K) it would be tough to exceed these specifications.
In addition, as Apple specified above, so long as one or more streams are “less than or equal to High Profile, Level 4.1,” your viewers can find a stream to play.
Video Buffer Verifier
Regarding Decode Buffer SIze, there are technical limits that relate to the different levels but I’ve never seen these become a problem. Everything I know about the Video Buffer Verifier is contained in an article entitled, Book Excerpt: VBV Buffer Explained. As I explained in that article, “the VBV buffer is a model used by the encoder to ensure that the bitstream produced doesn’t exceed the limitations of the decoder. For example, if you set the VBV buffer to 5000 kbps, the encoder produces a stream that won’t underrun or overrun a 5,000 kbps buffer. The model was created and deployed back in the days when most decoding was performed in very limited hardware and was necessary to ensure smooth playback on these devices.”
You set the VBV in your application interface or FFmpeg command string. Typically the buffer size is 1-2X bitrate. If I’m producing 200% constrained VBR, I typically set the buffer at 2X the data rate, like so:
-b:v 2000K -maxrate 4000K -bufsize 4000K
As you can read in the Book Excerpt article, increasing the buffer size improves quality but also data rate variability. So, if I’m producing CBR video, I would set the buffer at 1x the data rate, like so:
-b:v 2000K -maxrate 2000K -bufsize 2000K
In terms of exceeding any maximum and crashing playback, for the Book Excerpt article, I captured and analyzed multiple files from various streaming producers and podcasts and checked their configuration in MediaInfo. For streaming files where stream variability might have been an issue in 2016, the buffer averaged about 2.5X the target bitrate. For downloadable podcasts, where stream variability is irrelevant, the buffer size averaged 5.4X, but most of that came from an Apple podcast that had a bitrate of 5 Mpbs and a buffer size of 31.250 Mbps. From these findings, it seems that 1-2X for the VBV is pretty safe.
Is libx264 an acceptable encoder?
Because H.264 is a standard, there are multiple codecs for encoding H.264, and x264 is the very prominent H.264 codec in FFmpeg and many other encoding tools and services. From my perspective, libx264 is certainly the best “free alternative.” If you’re a high-volume operation looking for the absolute best quality or the fastest possible encoding speed, it might be worth talking to Beamr, MainConcept, and IDT (now Renasis), or hardware vendors like Xilinx and NetInt for live transcoding.
Which x264 Preset: Medium, Slow, Slower?
The preset controls x264’s encoding speed/quality tradeoff. The following chart, from my course Streaming Media 101: Technical Onboarding for Streaming Media Professionals, tracks encoding time, mean VMAF quality, and low-frame quality by preset over files produced using 200% constrained VBR. Low-frame VMAF is the lowest quality frame in the video, and it’s a predictor of transient quality issues.
To explain, the Medium preset, which is the default, delivers 99.5% of overall VMAF quality and 98.7% of low frame quality in 13.38% of the time it takes to encode using the Placebo preset. I wouldn’t go lower than the Faster preset because of low frame scores, but Faster is actually pretty close to Medium and comes close to doubling throughput. Chasing quality beyond Medium delivers minimum gains with significant boosts in encoding time/cost. With the test files that I used, it never made sense to go beyond Veryslow because the Placebo preset tripled the encoding time and actually decreased low frame quality.
There is no single “best” preset; choose the one that delivers the desired throughput and quality. If you want to learn how to create a chart like this for your encoder and test files, check out my course Computing and Using Video Quality Metrics; a Course for Encoding Professionals.
For the record, when encoding H.264 files with other codecs, the compression tool may or may not offer presets. if it does, the presets won’t have the same names as the x264 presets which are used exclusively for the x264 H.264 codec, and the x265 HEVC codec.