Lesson of the Week: Codecs are Not Generic

I discuss the bandwidth savings delivered by VP9 and HEVC over H264 in my course Streaming Media 101: Technical Onboarding for Streaming Media Professionals. I wanted to illustrate this with my own tests, so I used FFmpeg to encode H.264, HEVC, and VP9 output using about 25 short test files. This felt like a good time to discuss the difference between a codec and a technology standard. 

When we discuss codecs we tend to speak in generalities like “HEVC is 40% more efficient than H.264,” or that “AV1 delivers the same quality as HEVC at 70% the data rate.” The problem with this approach is that it’s too imprecise. A great example if this is shown in Figure 1, a figure from the Moscow State University Codec Comparison 2019 Part III: 4K Content, Objective Evaluation ($950, free version available here).

In the chart, MSU compares all codecs using x264 as the benchmark at 100% of performance. The first HEVC codec, Intel’s SVT-HEVC, delivers the same quality as x264 at 93% the bitrate, while the best performing codec, MainConcept HEVC, delivers the same quality at 64% the bitrate. Both are HEVC, but depending upon how you do the math, MainConcept is about 31% more efficient than SVT-HEVC, and about 12% more efficient than x265, the HEVC codec in FFmpeg that I use for most of my tests.

It’s also worth noting that this chart is based upon the VMAF metric; a different table in the report documents the same comparison using the SSIM metric with different results; MainConcept dropped to third and VP9 was actually 10% less efficient than H.264. This comparison was based on eleven clips of general content, which did not include any games, animations, or organized sports.

Of course, HEVC isn’t the only codec with multiple implementations. In their article entitled Performance comparison of video coding standards: an adaptive streaming perspective, Netflix reported results using both the reference VP9 encoder from Google and a third-party encoder that they actually use for production. The difference in performance between the two VP9 codecs in one data set was 25% based upon PSNR. In performing these studies, Netflix used three sets of files, one from MPEG, one from the Alliance for Open Media, and one from their own archives.

There will also be multiple AV1 implementations. One of the early leaders is the WZAurora AV1 encoder shown in the table below from Part II of the MSU Codec Comparison 2019 which was based upon subjective comparisons. Here, the WZAurora AV1 encoder delivered the same quality as x264 at 42% the bitrate, proving about 22% more efficient than the closest HEVC encoder. This evaluation included five clips, which didn’t include any gaming, animated sequences, or organized sports.

Looking forward, you should expect multiple versions of LCEVC, EVC, and VVC codecs.

Evaluating the Evaluations

Most researchers, like MSU and Netflix, are very precise with their nomenclature and identify the specific codecs that they tested. When you read a codec comparison, the first thing you should identify is the specific codecs involved.

MSU is one of the few shops that test codecs other than those incorporated into FFmpeg; unfortunately, as we’ve seen, these aren’t best of breed codecs. If these are the codecs you’ll be using, the results of the study are relevant, but if you’re an OTT producer considering adding HEVC, AV1, or VP9 to your production encodes, you should consider evaluating third-party alternatives that provide better performance.

Other factors to consider when attempting to determine the relevance of a study include:

  • Whether the researchers compared the test clips subjectively or via metrics, and if the latter which metrics?
  • What type of clips were tested and are they relevant to your programming?
  • Which encoding parameters the researchers use. As an example, both MSU and Netflix used the placebo preset for the x264 and x265 codecs, which I’m guessing few producers use in their production encoders. Would the results be different with the default medium preset? Tough to say, but if you use a different preset it raises some concerns.

The odds of finding a study that tests videos relevant to your service using the specific codecs and encoding parameters that you use is obviously slim. That said, the more the study diverges from these specifics the less useful it is in determining the savings you should expect from implementing a new codec. With three codecs shipping from MPEG in 2020, and AV1 hitting its stride with hardware support, it’s going to be a big year for encoding studies.

If you’re interested in learning more about how to run codec comparisons and about metrics like SSIM, PSNR, and VMAF, please check out my course Computing and Using Video Quality Metrics.

About Jan Ozer

I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks, and evaluate new encoders and codecs.

Check Also

Choosing an x265 Preset: an ROI Analysis

This post presents a return on investment view of choosing an x265 preset that delivers …

Leave a Reply

Your email address will not be published. Required fields are marked *