Fundamentals of H.264 Encoding Webinar-see Replay

Jan Ozer September 3, 2013 Blogs Leave a comment 986 Views

Transcript

This webinar was held on September 4, 2013. You can click here to register and view the archived event.

Learn the basics of H.264 encoding while standing on one leg (well, in 30 minutes, you can assume any position that you’d like). Click here to register; description below.

Who this free seminar is for: H.264 is the most widely used codec today, primarily because H.264-encoded video can be played on all computers and mobile devices. However, several key configuration options like profiles and levels control both compatibility and quality. This webinar is for producers encoding with the H.264 codec who want to learn how to optimize the playback compatibility and quality of their H.264-encoded files.

What it covers:

What profiles are and how they impact quality and compatibility for computers and mobile devices.
Which profiles to use to ensure compatibility with computers and mobile devices.
What entropy encoding is and how the two techniques, CABAC and CAVLC, impact quality and compatibility.
What B-frames and reference frames are and how they impact quality.
How to use presets to optimize encodes with the MainConcept and x264 codecs.

What you will learn: You will learn which profiles to use for your H.264 encodes, how to choose an entropy encoding technique, and which B-frame and reference frame intervals produce the best quality.

Click here to register for the webinar and view the replay.

Transcript (this is a work in process)

Here’s the transcript. I apologize for any rough edges; I cleaned up the transcript a bit, but didn’t rigorously edit or proofread.

Here’s the program, which covers the concepts you absolutely need to know to encode H.264 files. Every encoding tool is different, but virtually all provide a limited set of H.264 encoding parameters that will be consistent from tool to tool. And those are the ones covered here.

Profile is the most important because it determines compatibility as opposed to quality. Quality is nice, of course, but if the file won’t play on your target device, quality is irrelevant. So my perspective is compatibility first, and then you worry about fine tuning quality. So we’ll cover H.264 profiles first because they determine the compatibility of the stream.

Then we’ll look at entropy encoding. If you’ve heard of CABAC or CAVLC, those are the parameters we’ll be talking about. Then, we’ll look at B-frames and reference frames; what they are, what the optimal settings are and how they impact quality.

Since many producers work with the X.264 and MainConcept codecs, I’ll provide configuration recommendations for both.

________________________________________________________________________________________

Okay. So what is a profile? As shown in Wikipedia, a profile defines a set of encoding tools or algorithms that can be used to generate a bit stream. The three most common profiles that we see in H.264 encoding tools are the Baseline, main, and High profiles. And as you can see in the slide, the Baseline uses the fewest of the advanced tools on the left. The Main profile uses a few more. And then, the High profile uses all but two of those shown in the chart.

________________________________________________________________________________________

The Baseline profile uses the fewest advanced algorithms and techniques. So in theory, would give you the lowest quality stream and be the easiest to decode stream; right? Not a lot of advanced algorithms in the stream itself, so it’s going to be easier to decode by lower power devices.

High profile uses the most advanced algorithms. So you would expect both better quality at a given data rate, and also, longer encoding time and most relevant, increased difficulty to decode. You’d need a higher-powered device to play back a High profile stream than a Baseline stream. Looking at the Wikipedia chart shown above, there are a lot of advanced techniques used in the High profile that aren’t used in the Baseline profile, so you’d expect a huge difference in quality. Hold that thought.

________________________________________________________________________________________

So why do profiles exist? Because H.264 was designed to be very broad spec that could be used by a lot of different products and a lot of different producers for a lot of different uses. So, as an example, when Apple came out with the first video capable iPod, they wanted to include H.264 video. But they wanted a long battery life. They wanted a low cost. And they wanted a small size. So they couldn’t design-in a very high-power processor. So, basically, they said, “We’ll support the Baseline profile, which means we can have longer battery life, lower cost bill of materials. And anybody producing for that platform knows that they need to produce video using the Baseline profile.

So profiles really are a meeting point between the hardware developers and the video producers. If you want to have video play on the original video capable iPod, you need to use the Baseline profile.

________________________________________________________________________________________

So rule number one is don’t exceed the profile of the device that you’re targeting. So if you’re targeting that iPod I keep referring to, encode using the Baseline profile. Now, computers and OTT devices– and when I say, “OTT,” I mean Over The Top, Roku boxes, Apple TV, Boxee, Play Station Portable, all the devices you connect to your TVs. All of them, plus all computers, can play the High profile.

So if you’re just producing for computer playback, you’d use the High profile and forget about Baseline and main. The problem arises when you start to produce for multiple screens.

________________________________________________________________________________________

This chart includes all devices ever produced by Apple that play H.264 video. And on the extreme left in green is the original video capable iPod that I keep referring to. And that only plays videos using the Baseline profile. And the next category is iPhones and iPod touches, up to but not including Version 4, that also will only play the Baseline profile. And then, the next batch of phones and the first iPad play Main profile. And then, going on from there, the iPhone 5 and the iPhone 4S and the new iPad play High profile.

So if you want to produce files that play on all these devices, you’re going to have to either produce a lot of files using the different profiles, or produce a lowest common denominator file that uses the Baseline profile. And we’ll look at how Apple suggests doing that, in a moment.

________________________________________________________________________________________

Now, here’s Android. Now, I can put together that Apple chart because Apple has a very limited number of devices. I mean, it seems like a lot, but there’s really only 15 or 20 devices that are easy to categorize within that nice spreadsheet. Unfortunately, Android is a different story.

Android has many, many hardware vendors. So not only do you have more products, the hardware vendors themselves don’t specify what profile their devices play back. My daughter has an Android phone; the HTC Rhyme. I tried to look up its specs, but HTC doesn’t specify H.264 playback capabilities. I have a Toshiba Thrive tablet. I tried to look up its specs, and Toshiba doesn’t either.

Most vendors don’t. So even if you wanted to create a spreadsheet like the one we did for the i-devices, you couldn’t practically do it. So what Google says is, “Hey, if you want your videos to play in software only, you’ve got to use the Baseline profile for all encodes.”

As a practical matter, my Toshiba Thrive, which shipped around the same time as the iPad 2, plays files encoded using the Main profile. I did those tests. And I’m pretty sure that most Android smart phones are as powerful as the iPhone shipped around the same time. So if it’s a brand new Android phone, I would guess that it’s going to play at least the Main profile, if not the High profile. But if you want to be 100 percent sure, you have to encode using the Baseline profile.

________________________________________________________________________________________

If you’re producing a single file, like a 640 x 360 file to play back everywhere, you would use the Baseline profile. That’ll play on all phones. That’ll play on all computers, all OTT devices. If you’re producing separate files for separate devices, you would customize each file for the target group.

________________________________________________________________________________________

What if you’re producing for adaptive streaming? Briefly, adaptive streaming is where you take a single input file, and you produce multiple output files that are distributed based upon the bandwidth of the player or the CPU capabilities of the player.

Let’s look at Apple’s recommendation as provided in Tech Note TN2224. It’s probably a little bit more detailed than most producers use. But you get the theory. They’re basically saying, “Okay. We’re going to produce one set of files for cell phones, or for lower power devices or for devices that are connected on the low speed connections. And we’re going to use the Baseline profile because they’re compatible, as you see on the right, with those devices.”

As the devices get more powerful, as bandwidths increase, Apple recommends using the Main profile, which all iPads and the iPhone 4S will all play. Newer iPhones will even play the High profile.

So as the files get larger and are no longer compatible with the older devices, Apple says, “We’ll use the Main profile and use the High profile for the 1080P file.” Now, most producers don’t produce all of these files; right? They might produce five separate files from a 720p video, but the theory is the same.

The big question is, do you create a separate set of files using the High profile for computer and OTT playback and produce another set of files using Apple’s recommendations for mobile? Obviously, this only makes sense if the quality difference between the files encoded using the High profile and Baseline and Main profiles are significant.

________________________________________________________________________________________

To test that, I created one set of files that were close to Apple’s recommendation. Then I created another set of files using the High profile. And then, we can go and we can compare the quality of those files. You can see them for yourself, here. By way of background, all files were encoded using the Elemental cloud encoder, which is a reputable, high-quality encoder. Let’s see what we found.

________________________________________________________________________________________

This is the lowest quality iteration recommended by Apple. This 416 x 234 at, I think, 200 kilobits per second. High profile is on top. Baseline profile is on the bottom. And, again, you can download the slides so you can see this in full detail. And you can go to my website and compare the actual encoded files.

Even when I look at the PDF, or even when I look at the original files, I don’t see a huge amount of difference between the High profile and the Baseline profile.

________________________________________________________________________________________

And 480 x 270, same difference. By way of background, this is a test file that I produced about three or four years ago. It’s 93 seconds long and I’ve used it to test everything from different codecs to different encoding tools. So I’m very familiar with which scenes are hard to compress, which frames really illustrate differences between technologies or codecs. And these are the frames that we’re looking at now.

In this particular frame, the dancer is panning across the stage. So there’s a lot of motion. There’s a lot of detail in the wall behind her. So, this is a very difficult to compress frame. And I’m not seeing any significant difference between the Baseline and the High profile.

________________________________________________________________________________________

The next comparison point was 640 x 360 at 600 kilobits per second, pretty much the same frame. Again, I’m not seeing a lot of difference between the two comparisons.

________________________________________________________________________________________

The last one that I looked at is 960 x 540 at 1400 kilobits per second. And, again, I’m not seeing significant difference between the High profile on the right and the Baseline profile on the left.

So let me just summarize. From my perspective, if you’re producing for adaptive streaming, you need one set of files using the Baseline and Main profile for mobile viewing. I don’t see a lot of need to produce a separate set of files using the High profile for computer and OTT playback.

You should definitely you run your own tests. But from what I’m seeing and what you can see on my website, I don’t see a lot of benefit to separate files for the separate targets.

If you’re producing a single file, say a 640 x 360 @ 600 kbps file for mobile and desktop viewing, I don’t see a lot of benefit to producing one using the High profile and one using the Baseline profile. I think you’re going to see very little difference with most files. While it’s hard to understand why, given the technical differences between the profiles, but I’ve done these tests multiple times with multiple encoding tools, and the results are very consistent.

________________________________________________________________________________________

I first looked at this issue about a year ago on a consulting project where a client asked this very question. He was converting SD DVDs to adaptive groups, and was encoding four iterations using the Baseline profile for mobile. He wondered if we should also produce a separate set using the High profile for computer playback. I ran the tests shown above which you can view on URLs presented on the following slide.

________________________________________________________________________________________

I found that profile only mattered in the most extreme encoding parameters, and recommended that the client distribute one set of Baseline encoded files. And that’s what the client ultimately did. So here’s the article on my website; check out the files and draw your own conclusions.

________________________________________________________________________________________

Now, what do we know about what actual producers are doing? Well, from my experience, most large customers are producing separate sets of files. I wrote an article about Turner Broadcasting’s NBA League Pass, maybe 18 months ago. And they shared their encoding parameters. And as you can see, they’ve got one for web, one for mobile and one for OTT. So I’m guessing Turner did their own analysis. And, either, they decided that they wanted the configurations different, or they felt that they wanted to leverage the High profile. But they decided to do multiple sets of files, one each for each target.

________________________________________________________________________________________

21 And in my experience, I did some work for a large three-letter network in the United States. I did some work for an independent movie site in the US. And they did also produced different sets of files for each relevant target. One for mobile, one for computer playback, and one for OTT.

In contrast, smaller sites, particularly those working with SD files, are mostly doing Baseline only. So that’s kind of where my experience falls out.

________________________________________________________________________________________

Overall, I don’t see that there’s any right or wrong approach. I think it’s important to recognize that there may be less difference between the High and the Baseline profiles than you think. Don’t assume that there’s a significant enough difference that you need to produce two sets of files. Do the work. Perform the tests that I did. And then, draw your own conclusions.

________________________________________________________________________________________

Man 1: Okay. Real quickly, Entropy encoding. This is a screen from Telestream Episode. As you can see, there are two options, CABAC and CAVLC. CABAC is the high-quality, harder-to-decode option. CAVLC is the lower-quality, easier-to-decode option.

________________________________________________________________________________________

In my experience, the quality differential is not that much. Some authorities say it’s as much as 15 percent difference between CABAC and CAVLC, but I’ve never seen that.

On the other hand, the additional decode overhead is also very small. For example, several years ago, I ran comparative tests on some older computers, CABAC was about 4% harder to decode, and that was on a really old G3-based Mac, not even Intel. On faster computers, the difference is negligible.

All that said, my general rule is do what YouTube does. As you can see in the MediaInfo analysis of a YouTube file presented above, whenever YouTube encodes using the High profile, they use CABAC encoding. So if CABAC is available, meaning the High, or the Main profile, I recommend CABAC.

________________________________________________________________________________________

Now onto configuring B-frames. B-frames stands for bi-directionally interpolated frames. They are, in theory, the most efficient frames, because they can find redundancies in frames both before and after it in the stream. So theoretically, you want the most possible B-frames to get the highest possible quality.

The problem with B-frames is they’re harder to decode because you’ve got all the referenced frames in memory and they may be retrieved out of order. That’s why B-frames aren’t available in the Baseline profile; just the Main and the High profile.

________________________________________________________________________________________

Typical B-frame related parameters are the number of B-frames and the number of reference frames. The number of B-frames is the maximum number of B-frames in a row. So an interval of three is three B-frames between every I-frame and P-frame or every P-frame and P-frame. Reference frames are the number of frames that each B-frame can reference when they’re being encoded.

________________________________________________________________________________________

To try and determine the optimal B-frame value, I ran some test encodes using 0, 3, 5 and 10 B-frames. I encoded at both low and high data rates. You can download the PDF handouts via a link above to view these comparisons in more detail.

________________________________________________________________________________________

At high data rates, I saw no real difference. At more aggressive rates, as shown above, there were minor differences. For example, on the piano above, zero B-frames is clearly the most blurry, while three B-frames seems to be the clearest.

________________________________________________________________________________________

In the higher motion frame above, zero B-frames is really ugly and three B-frames is the most clear.

________________________________________________________________________________________

So when I encode, I use three B-frames. That seems to be the optimal number. Again,you would think that you would see a huge difference between 0 and 5 or 3 and 10. But in my experience, you just don’t. I’m comfortable with the 3, but I don’t think it’s going to make a night and day kind of difference.

________________________________________________________________________________________

What about reference frames? Reference frames, as I said, are frames from which the encoded frame can get redundant information. What’s the trade off? The higher the number of reference frames, the more searching for redundancies, which translates to longer encoding times. Because most redundancies will be in frames proximate to the encoded frame, you would assume diminishing returns; once you go beyond a certain number, the quality improvement gets less and less. Higher number of reference frames also means a greater decode load, but that’s likely negligible.

________________________________________________________________________________________

What’s the optimal value? To test this, I encoded my standard test file using Sorenson Squeeze on an HP Z800 workstation. I encoding using 1, 5 and 16 reference frames. And you can see that the 16 reference frames did increase the encoding time pretty significantly.

________________________________________________________________________________________

What about quality? Here, again, you don’t see a huge amount of difference. But the sharpest image does appear to be the five reference frames.

________________________________________________________________________________________

Here are my recommendations.

________________________________________________________________________________________

35 Okay. A lot of encoding tools use the X.264 codec. It is the highest quality H.264 codec available, according to numerous studies, including my own. And most encoding tools that use it present the encoding parameters pretty much the same way. First you choose a preset. Then you choose a tuning. These are screen shots from Telestream Episode. But as I said, they’re pretty uniform in all encoding tools.

________________________________________________________________________________________

36 So, question one is what preset do you use? You see the test file and test system and the results. The difference in encoding time between ultrafast, the fastest preset, and placebo, the slowest preset, is staggering. It’s 2487 times longer if you use the placebo approach. So you definitely don’t want to say, “Hey, I’ll encode everything using the placebo.” Because it really will extend your encoding times dramatically.

In contrast, medium seems to be a nice compromise. It’s 84 percent longer than ultrafast. The next highest preset is slower, which is almost three times slower than medium.

________________________________________________________________________________________

37 What about quality? In the figure above, medium is on the left, slower on the right. And if you really, really study the two frames, you’ll see that the slower file does have a little bit of additional detail, but it’s not striking.

Basically, I use slower most of the time because I’m a low-volume encoding shop. But if you’re encoding a lot of files and a three hundred percent increase in encoding time would require you to buy a new encoder, medium is a good option.

________________________________________________________________________________________

38 What about tuning? I don’t recommend tuning for general purpose video, though it’s worth a shot for animation. But many respected compressionists, including Ben Waggoner, now with Amazon, formerly with Microsoft, disagree. Ben recommends using the film-tuning preset for general purpose use. I ran some comparative tests on my standard test files, and saw no difference, so I’m not in the same camp.

________________________________________________________________________________________

39 So, here are my recommendations. You should note that results will vary by encoding program, encoding platform and required through put. I was on a very fast computer, or what was a very fast computer a year ago, when I ran this test. And I was using Episode.

That said, most programs that use X.264 deliver similar quality and performance results. But, if you’re using very long files or very complex files, you may see some difference in the results. So you definitely should run some comparison programs yourself.

As I said, the preset default for Episode, and pretty much every program with X.264, is medium. In a low volume environment, like my environment, I usually use slower. Otherwise, I would use medium. And then, for tuning, if your content matches a tuning preset, like animation or still image, I would give that a try. If you’d like to experiment with film for general purpose video, that’s worth a try, too. But I wouldn’t expect to see a lot of difference there.

________________________________________________________________________________________

What about Main Concept? Most encoding tools that let you adjust quality use a quality/performance slider. There’s low-quality fast encoding on the left, and high-quality slower encoding on the right. That’s the optimum number?

________________________________________________________________________________________

In contrast with X.264, you don’t see a huge increase in encoding time, with minimal difference in time between 1 and 13, and only 68% difference at the top. What about quality?

________________________________________________________________________________________

Above, the frame on the left was encoded at a setting of 1; the middle file at ten and the file on the right at 16. You see a pretty significant difference in the pita between 1 and 10. There’s some graininess and blockiness in 1, very little of that in 10. And you don’t see a lot of difference between 10 and 16.

________________________________________________________________________________________

So when I’m working with the MainConcept encoder, in my low volume environment, I use 16. I’ll just encode to the highest possible quality because it doesn’t increase the encoding time that dramatically.

If you’re in a high volume environment and you need through put, use the highest number that delivers the required through put. I wouldn’t go much below 13 because I think you may start to see quality degrade beyond that.

________________________________________________________________________________________

Streaming Learning Center Where Streaming Professionals Learn to Excel

Fundamentals of H.264 Encoding Webinar-see Replay

Related Articles

Transcript

Transcript (this is a work in process)

About Jan Ozer

Check Also

HEVC Passes AV1 on CanIUse

Tutorial: Producing Live-Streamed Events with the Roland V-02HD MK II

Take the Bitmovin Video Developer Survey

Leave a Reply Cancel reply