Over the last few years, the Moscow State University Graphics and Media Lab (MSU) has produced the most highly-respected H.264 codec comparisons available. In October, MSU released its first HEVC comparison, which promises to achieve the same significance with the new codec. During the testing, which involved 20 HD video clips encoded to an exhaustive array of configurations, MSU compared eight HEVC codecs, including x265 and two codecs each from Intel and Ittiam, while also assessing how HEVC compared to Google’s VP9 and the winner of most previous H.264 trials, x264. As usual, MSU described its findings in free and pro versions ($850) of a report.
The free version contains 29 figures and SSIM comparisons based upon the Y color plane, while the Pro version includes more than 5000 figures, and SSIM and PSNR comparisons measured on the Y, U, V and overall color planes. If you buy the Pro report, you can also download the source video clips for your own testing.
The report is significant in several key ways. Obviously, any company currently evaluating HEVC codecs will find the comparisons invaluable. In addition, the report also provides an interesting look at how HEVC compares to alternative codecs VP9 and x264. In this respect, the report might be a touch controversial.
To explain, MSU tested three scenarios. Fast transcoding required a minimum of 30 frames per second, Universal encoding required a minimum of 10 frames per second, and Ripping involved no speed requirements. For most VOD producers, the Ripping comparisons are the most relevant, and these are the ones shown in the chart below.
As you can see, MulticoreWare’s x265 codec was the winner, but second place went to VP9, about 6% behind the leader. X264 placed fifth, roughly 21% behind the leader. In terms of the VP 9 vs HEVC comparisons, MSU results roughly parallel those reported in Streaming Media, where VP9 trailed x265 by about 9% using the VQM metric, but less than 1% using the SSIMPlus. However, several peer-reviewed studies involving 4K clips and subjective comparisons showed HEVC to be up to 50% more efficient than VP9 (see PDF here).
In terms of the H.264 comparisons, even at 1080p, most encoding vendors claim that HEVC delivers a 30-50% advantage, while YouTube encodes 1080p VP9 videos at 43% lower data rates than H.264, and 720p clips at a 35% lower data rate. This makes the 21% advantage found by MSU seem conservative in comparison.
As you can read in the interview below, Dr. Dmitriy Vatolin (down), who heads up MSU’s Graphics and Media Lab, intends to add both 4K clips and subjective testing to future projects. In the meantime, for any companies planning to choose an HEVC codec in the near term, both reports present an invaluable state-of-the-state of HEVC codecs today.
Here’s our interview with Dr. Vatolin.
Streaming Media: About how long does this study take, and about how many man hours?
Vatolin: We plan each study about one year in advance, and actively work on the project for about six months. We have no exact statistics in man hours, but for this study, we spent about 2.5 times more than was originally planned.
That’s because currently, many of the HEVC codecs are far from ideal, and some were really buggy. We regularly report mistakes and problems to the developers, who provide an update that often also has problems. This cycle can iterate several times, particularly for codecs that don’t perform particularly well (winners rarely seem to complain). We take this approach deliberately, because it ensures that we represent each codec in its most favorable light, and the integrity of our tests.
This is very similar to where H.264 codecs were about ten years ago (at the time when we were preparing the first and second annual H.264 codec comparisons), where our studies experienced lots of problems and delays until the products matured. We expect to see similar progress with HEVC codecs over the next few years.
Streaming Media: What were the most important findings in the new HEVC study in your opinion?
Vatolin: We’re still in the early days of HEVC codec implementations, so it’s still too early to draw any final conclusions about these technologies. Nevertheless, it’s important to test now, if only to understand where we are in the overall market development.
Streaming Media: Who is the big winner, and who is the big loser?
Vatolin: Developing a new codec is very difficult, so we feel that all participants are winners. They just have different market shares at the moment.
Streaming Media: I noticed that x264 is very close to even the best HEVC quality, which is delivering nowhere near the promised 50% savings over H.264. How do you account for this?
Vatolin: If you find and compare implementation of MPEG-2 in 1992 and 2012 you will see significant progress in RD (Rate Distortion) curves, meaning dramatic improvements in quality. We saw this with H.264 as well. Today we’re looking at the initial implementations of HEVC that have yet to experience this improvement.
Also, our tests peaked at 1080p, and HEVC should really start to shine in 4K. We hope to show 4K comparisons by the end of the year, with 10-bit and other higher-end comparisons also on the testing roadmap.
Streaming Media: What are your thoughts on how VP9 compares to HEVC? Many in the HEVC community have asserted that it’s very far behind in terms of quality, but your report showed otherwise.
Vatolin: Sometimes such things happen. Definitely impressive work by Google’s development team, but we’ll hold our applause until we see the results for 4K.
Streaming Media: I noticed that you rely on the SSIM (Structural Similarity) quality metric a lot, rather than VQM (Video Quality Metric), which we find more useful. Can you explain your thoughts?
Vatolin: With the volume of tests that we run, we have to prioritize speed, and SSIM is much faster. Also we include PSNR results in the Pro version. Next year, however, we hope to introduce true subjective comparisons and to publish those results.
Streaming Media: Speaking of the Pro report, who is the target customer, and what’s the typical reason that they buy it?
Vatolin: The target customers are professionals in this area who are building solutions with integrated codecs, from software designers to aircraft manufacturers. We started offering the Pro report because we were receiving lots of questions about details not included into the free report. Because our results include more tests than we can show in 150-200 report pages, this year, for the first time, we included all 5,000+ figures in an applet included into report package.
Streaming Media: How do you see MSU’s role in the greater compression community?
Vatolin: We see our role as providing objective data to help companies improve their solutions in multiple ways. For example, in one H.264 comparison produced several years ago, a codec from ATI proved exceptionally fast because it used a unique architecture. We received multiple messages asking if the codec used the GPU, which it didn’t. One year later, during next annual comparison, we received several codecs using the same architecture that matched ATI’s speed.
In this way, we see that open competitions really motivate companies to improve their solutions. In companies that develop or implement codecs, managers need independent and accurate information about the current market situation. We commonly provide more detailed information with more codecs (and fresher versions of codecs), than a company could test internally. We know of several prominent companies that stopped their own H.264 development to license other solutions, or even buy other companies based upon our results.
Streaming Media: Why does MSU perform the codec comparison?
Vatolin: Our team has very long history of producing objective comparisons. Many years ago, we started by comparing archivers, then image coders like different implementations of JPEG and JPEG2000. During the last 13 years, we have compared video codecs, and in the last five years we started with stereoscopic 3D and multiview artifacts measurement, which is an extremely interesting project that will greatly improve quality of 3D content. Recently we performed large comparison of video matting methods. These are great projects for our students and researchers to work on, and maybe we just make this world a little bit better?