Publishers and encoding companies alike are beginning to embrace VP9, Google’s open source codec. Here’s how it stacks up on quality and data rates.
The value proposition for VP9 is clear, as stated in Figure 1: “Adaptive HD streaming with 1/2 the data of H.264!” Half the data rate cuts your bandwidth and storage costs and allows you to reach more viewers with better quality video on slower connections. It also cuts your customer’s monthly data costs, a major issue now that many ISPs are instituting monthly bandwidth caps.
Many producers are starting to explore the benefits associated with distributing VP9-encoded video to desktop/notebook browsers and some mobile platforms instead of H.264. While you won’t be able to wean yourself off of H.264 entirely for many years yet, VP9 delivery is definitely a concept whose time has come. In this article, I’ll compare multiple aspects of each, from encoding to delivery to player creation. Beyond this, I’ll briefly touch on the IP situation and conclude with a brief mention of AV1, the codec that will soon replace VP9.
Contents
VP9, WebM, and DASH
Let’s start with a brief overview of VP9, which is an open source codec from Google, courtesy of its purchase of On2 Technologies in 2010. The first codec open-sourced by Google was VP8, which was paired with the Vorbis audio codec in the WebM file structure, which was based upon the Matroska media container.
VP9, the next iteration of the codec, was first introduced in mid-2013 and first deployed by YouTube a few months later in September. Also in 2013, the WebM format was expanded to incorporate the Opus audio codec, which is most often paired with VP9. You can deliver a single file containing VP9 and Opus in a WebM file or produce multiple VP9/Opus streams in DASH packaging for adaptive streaming.
Figure 1. JW Player supports VP9 in its eponymous player and online video platform service.
VP9 is the last iteration of VPx, as Google formed the Alliance for Open Media in September 2015 to consolidate open source codec development with Mozilla, Cisco, Microsoft, Intel, and others. The first alliance codec, called AV1, should be released sometime between December 2016 and March 2017, which I discuss later in the article.
With this, let’s start our look at VP9 by examining file quality.
Same Quality at Half the Data Rate? Close!
To test quality, I encoded three video clips at three different resolutions at five different data rates. The first clip is a short segment from Blender Foundation’s Tears of Steel (TOS) movie, representing mostly traditional movie content. The second is a short segment from Blender’s Sintel movie, representing animated content, and the third, which I call the New clip, is composed of multiple clips to simulate real-world video. Working in Adobe Premiere Pro CC, I produced very high data rate H.264 mezzanine clips in three resolutions with the same horizontal pixel count, but a vertical count that varied by clip. For example, the smallest New clip was 1280×720, while the Sintel/TOS clips were 1280×576. These mezzanine clips were the starting points for all encodes.
I used FFmpeg for all test encodes. For the VP9 encodes, I used the VOD recommended settings from the WebM Wiki, changing both the speed and frame parallel to 0 at Google’s recommendation. For x264, I used two-pass encoding with the veryslow preset, with maxrate and buffer set to 150 percent of the target data rate, essentially 150 percent constrained VBR with a bit of wiggle room in the buffer. The Keyframe interval was set to 3 seconds for all tests.
I encoded each clip at five different data rates, which varied by clip. As an example, I encoded the 1920×856 Sintel/TOS clips at 1000Kbps, 1500Kbps, 2000Kbps, 2500Kbps, and 3000Kbps. In all tests, this left two comparison points for each clip in which VP9 was encoded at 50 percent of the data rate of H.264. For example, with the 1920×856 TOS clip, I could compare VP9 at 1000 with H.264 at 2000, and VP9 at 1500 with H.264 at 3000 (Figure 2). Since there were two comparison points for each of nine test clips, that meant there were 18 total comparison points to test the “same quality at 50 percent of the data rate” premise.
Figure 2. VP9’s quality consistently and substantially exceeded H.264’s over all tested data rates.
To test quality, I used the Peak Signal-to-Noise Ratio (PSNR) metric, calculated using the Moscow State University Video Quality Measurement Tool. In four of the 18 cases, the PSNR score of the VP9 video was higher than the H.264 video at twice the data rate. If I added a 5 percent tolerance by multiplying the H.264 score by 0.95, VP9 won in 14 of 18 test cases. At 10 percent tolerance, all VP9 files exceeded the quality of their H.264 counterparts. As you can see in Figure 2, the quality differential was fairly consistent along the entire data continuum.
Codec benchmarking is an inexact science, and all comparisons usually result in more questions than answers. To supplement my trials, I spoke with JW Player’s lead compression engineer, Pooja Madan, who designed and implemented JW’s VP9 encoding facility. She advised that the company was achieving about 50 percent savings overall on its encoding ladder, hence the claim shown in Figure 1. Most other companies I spoke with while writing this article reported similar results.
But You’ll Need Lots of Time
The Achilles’ heel of VPx-based codecs has always been encoding time, and VP9 is no exception. I produced all performance numbers discussed later on an HP Z840 workstation with dual 3.1 GHz E5-2687W processors with 10 cores each, and Hyper-Threading Technology (HTT), for a total of 40 cores. All source files were stored on and encoded files delivered to Turbo SSD G2 drives. This is an extremely fast and capable system that you can read about in a series of benchmarking reviews in Streaming Media Producer.
On the Z840, encoding the 96-second 720p New file into H.264 format at 2Mbps took 98 seconds. Encoding with the Google-recommended parameters delivered the same file in VP9 format in 19:10, about 12 times longer. When I interviewed JW Player for this article, Madan reported that the company had invested significant time to achieve the optimal balance between encoding time and quality.
I asked if she would share her FFmpeg scripts to use in my testing, and she (and the company) agreed. The results were impressive. Specifically, the JW Player command line script produced the 720p New file @ 2Mbps in 4:59, about 22 percent of the time of the Google script. I checked the PSNR value for the JW Player-encoded file against the file created using Google recommended parameters, and the JW score was about 5 percent higher.
During encoding, I tracked CPU utilization and noticed that when rendering a single file, VP9 barely moved the needle. I decided to check performance with multiple encoders running. To accomplish this, I created multiple folders on the two SSD drives and ran each of the command line scripts eight and then 12 times simultaneously. As you can see in Table 1, multiple encodes reduced the H.264 to VP9 encoding differential with the JW Player script down to less than 4 seconds per file when encoding 12 files simultaneously.
Table 1. Encoding times for H.264 and two VP9 test scripts.
These results raise two questions. The first asks why Google would produce a codec that performed so poorly on multiple-core workstations. The answer is because this performance model fits perfectly into its current encoding schema. That is, Google doesn’t encode each input file from start to finish in a single encoding instance; it encodes all files in parallel, splitting each source into chunks and then sending them off to different encoding instances.
In the context of Google’s encoding system, VP9 isn’t slow at all. Heck, at 40.5 seconds per file, it’s only about 10 percent percent slower than H.264. However, this performance schema puts the burden on the developer to create an efficient encoding program or platform. Of course, VP9 isn’t the first codec that fails to efficiently leverage multiplecore systems, and encoding programs such as Telestream Episode have long used a technique called split and stitch to improve performance on multiple-core systems. Much like YouTube’s parallel encoding schema, split and stitch divides a single input file into multiple parts encoded separately in parallel, and then stitched back together for final output. Essentially, the 12-simultaneous encode test shown in Table 1 simulates this parallel encoding operation, which pushed the Z840 to 98 percent+ usage (Figure 3).
Figure 3. Twelve simultaneous VP9 encodes pushed the Z840 to the max.
VP9 isn’t slow; it’s just highly inefficient in a multiple-core environment, which makes it tougher for developers to design encoding systems that operate efficiently. For this reason, expect to see substantial differences in encoding times (and quality) in programs that support VP9. Developers creating their own encoders should design their architecture from the start knowing that they’re going to have to deploy a system like Google’s to maximize VP9 encoding time and efficiency. This is especially true in the cloud, where you pay for an instance by the hour, and the ability to spread encoding chores over as few CPUs and CPU-hours as possible translates directly to the bottom line.
And You’ll Have to Experiment
Here’s the second question: If JW Player’s script produced both better quality and a faster encoding time than Google’s recommendations, why didn’t I use it for my quality tests? JW Player’s script uses a technique called capped common rate factor (CRF), which tells FFmpeg to deliver a file at a certain quality level but capped at a certain bitrate. For example, the JW script used a CRF value of 30 and a data rate of 2Mbps. This tells FFmpeg to encode the file to a quality level of 30, but to cap the data rate at 2Mbps.
If a file is easy to compress, such as a talking head clip, the entire clip will likely be well under 2Mbps. If the clip is hard to compress, the entire clip will likely be capped at 2Mbps. So when encoding with capped CRF, the bit rate, and the resultant file size, will vary from clip to clip depending upon content.
This presents a challenge when comparing codecs, because it’s always necessary to check output file size to ensure that the encoder met the target data rate. Obviously, this can’t be done with capped CRF-encoded files. I didn’t think it was appropriate to use a capped CRF approach for benchmarking, though I certainly would consider it for production. If you’re designing your own system, be sure to incorporate this feature, and if you’re buying an encoder, verify that capped CRF is available before purchase.
Finally, for those encoding their own VP9 files, Madan was kind enough to share her top four VP9 encoding takeaways:
- Use two-pass encoding; one pass does not perform well.
- With two-pass encoding, generate the first pass log for the largest resolution and then reuse it for the other resolutions. VP9 handles this gracefully.
- While VP9 allows much larger CRF values, we noticed that CRF < 33 speeds up the encoding process considerably without significant losses in file size savings.
- You must use the “tile-columns” parameter in the second pass. This provides multi-threaded encoding and decoding at minor costs to quality.
I’ll add that you shouldn’t assume that Google’s recommendations are the best practices for your needs. The bottom line is that you may have to invest substantial time to create the optimal mix of encoding time and output quality.
You’ll Have a Growing List of Encoding Options
As of NAB 2016, you should have more options when choosing an encoder, both on-premise and in the cloud. On the enterprise-encoding front, Telestream announced that it would incorporate VP9 encoding into the Vantage Transcode Multiscreen program by mid-2016. Exploring what motivated Telestream to integrate VP9 into Vantage 3 years after the codec became available, I asked Paul Turner, VP of enterprise product management, “Why VP9, and why now?” He says, “New codecs show up all the time; as an encoding supplier, we look at all of them but don’t instantly add them to our products. At this point, we believe VP9 has legs and is a viable alternative for customers who have been encoding in H.264 and other UHD codecs.”
Brightcove also announced the addition of VP9 encoding for both progressive download (in WebM format) and adaptive streaming (in DASH) to its Zencoder cloud encoding service. I also asked David Sayed, Brightcove’s VP of product management, “Why now?” “We’re starting to see customer interest in VP9 especially in Asia-Pacific,” he says. “The interest there is less about higher-than-HD resolutions and more about bandwidth savings due to the codec, and the WebM Project’s position on royalties.” Sayed also says that Brightcove expects to add support for VP9 in its online video platform by the end of 2016.
On April 20, 2016, Amazon announced that the Elastic Transcoder cloud encoding service can “create WebM outputs using the VP9 codec.” There’s no word as to whether the service also supports DASH output.
Of course, one of the earliest adopters of VP9 was JW Player, which has been working for months to incorporate VP9 into its online video platform. According to Greg Twohig, JW Player’s product manager, the company wasn’t motivated by customer requests for VP9 encoding. Rather, it was a desire to provide the same or better quality at much lower bitrates than H.264 could afford.
JW Player planned to start customer trials in late April or May of this year, which should be fairly far along by the time you read this article. Twohig reported that finding customers to try the new format shouldn’t be a problem. “We sent an introduction to the service to several customers, and the results were positive and immediate. They were chomping at the bit to be part of the pilot.”
Wowza Media Systems also announced multiple VP9-related capabilities at NAB, including the ability to transcode and package VP9 encoding video into DASH format for adaptive streaming, for either live or on-demand delivery (Figure 4). The Wowza Streaming Engine can also push a VP9 stream to YouTube Live, decreasing the bandwidth requirements for a live program. Also for live event producers, Media Excel announced support for VP9/DASH streaming within the HERO Encoder product line.
Figure 4. A master class on how to produce and deliver VP9 in DASH with the Wowza Streaming Engine and JW Player (go2sm.com/9masterclass)
Basically, prior to NAB 2016, VP9 support was exceptionally limited in encoding, transcoding, or associated products, and it was completely unavailable in leading online video platform providers. Clearly, the logjam has been broken. Whether the result is a torrent or trickle of other products remains to be seen.
It Will Play on Most Desktops With No Problem
Once the video is encoded, attention turns to where it will play. Let’s start with desktop and notebook computers. VP9 plays natively via HTML5 in Google Chrome (version 29+), Mozilla Firefox (version 28+), the Opera Browser (version 15+), and in preview versions of Microsoft Edge, which is available only on Windows 10. According to the website StatCounter, as of March 2016, all versions of Chrome comprise 60.1 percent of browsers used on desktop and notebook computers, trailed by Firefox at 15.7 percent and Opera at 1.96 percent, totaling about 77.76 percent of all users. By way of comparison, no other UHD-capable codec is available for free in any desktop browser.
VP9 won’t play in Internet Explorer and Safari, which StatCounter puts at 13.7 percent and 4.5 percent, respectively, with another 1.8 percent in the “other” category, presumably without VP9 support. At 2.15 percent, Edge users are in limbo. Those using the Preview version of Windows 10 can play VP9, but those on the version currently shipping can’t, and there’s no way to tell which is which.
Including all browsers that don’t support VP9, plus allowing for older versions of Chrome, Firefox, and Opera browsers without VP9 support, likely means that between 25 percent and 35 percent of desktop/notebook browsers won’t play VP9 encoded video in mid-2016, though this number will drop over time. To these users, the player will have to detect the inability to play VP9 and point the browser to an H.264-based DASH manifest file.
In terms of performance, VP9 seems to play very well on even modestly configured computers, though my tests were minimal. Specifically, I tested H.264 and VP9 playback on a dual-core (four with HTT) Dell Inspiron Notebook computer with an Intel i5-4210U CPU running at 1.7GHz, a $400 notebook running Intel graphics. This is the slowest computer I have available for testing, since my notebook is an HP i7-based 4-core system (eight with HTT) running at 2.3Ghz, and all my desktop computers are dual-CPU HP workstations.
On the Inspiron, playing a 720p@2Mbps H.264 file consumed about 20 percent of the CPU, with VP9 slightly higher at about 21 percent to 22 percent. I next tried playing a 4K file in both formats. Again, VP9 consumed more of the CPU but not a dramatic amount, which you can see in Figure 5. These results, coupled with the fact that YouTube has been distributing VP9 since 2013, should provide comfort that VP9 plays well on most relatively modern computers.
Figure 5. CPU consumption playing 4K files on a 1.7 GHz i5-based Dell notebook
Those designing HTML5 players for video distribution should find lots of help. For example, as you would expect, JW Player supports VP9 playback in its namesake off-the-shelf player. Just before NAB, castLabs announced support for VP9 encoding to DASH format via its Video Toolkit service, with DRM protection via its DRMtoday service, and video playback via its Video Player SDKs for Android. castLabs plans to fully support VP9 playback in its DASH Everywhere player by the end of June 2016.
On Mobile, Not So Much
My experience with Android was not as positive, though this was on a 3-year-old Samsung Nexus 10 driven by a Dual Core 1.7GHz CPU running Android version 5.1.1. Specifically, playing the 720p 2Mbps New test file in Chrome maxed out the CPU at times and required an overall average of about 58 percent CPU during playback according to the CPU Monitor App. This compared with a maximum of 71 percent for playing back a similarly configured H.264 file, which averaged a much more palatable 47 percent CPU load overall during playback. I tested a number of other free Android players and the Firefox browser, and Chrome proved the most efficient, other than the Ittiam demo player discussed later.
Certainly most premium content distributors will eschew all of these options in favor of a custom app or player. This is absolutely essential for iOS devices, which offer no native VP9 support. In this regard, Ittiam, an India-based company that licenses VP9 and other technologies into multiple markets, offers highly optimized encoders and decoders for Android, iOS, and Windows environments. When I tested the Ittiam demo player, it maxed out at 78 percent playing the WebM New file, compared with Chrome at 100 percent, and averaged 41 percent overall CPU utilization during WebM playback, compared with 58 percent for Chrome. This should put Ittiam on the short list for companies seeking VP9-related playback IP. I spoke with Ittiam company officials, who described significant market momentum for VP9 among their customers and prospects. This is in part because all computer, mobile, and OTT playback environments want to support YouTube fully, and also because of general interest in a royalty-free alternative.
Overall, the common wisdom for UHD and other advanced codec playback on mobile devices is that hardware support will be necessary for adequate performance and to preserve battery life. In this regard, Wikipedia lists a number of CPUs, GPUs, and SoCs (systems on a chip) with hardware-based VP9 decoding, including the MediaTek MT6795, the NVIDIA Tegra X1, Qualcomm Snapdragon 820, and two chips from Samsung. Hardware support exists, at least for Android devices, though it’s certainly not pervasive.
I asked my contact at JW Player whether the company will automatically send VP9-encoded files to compatible Android players, or continue with H264. The response was that this was one of the issues to be determined after trials. While it seems relatively safe to automatically send VP9-encoded content to any computer/notebook with a compatible browser, the best approach with Android is to perform your own testing before making a decision one way or the other.
Intellectual Property Risks and Shelf Life
In the interest of completeness, I wanted to very briefly address the patent risk associated with VP9. As we previously reported in 2013, Nokia had sued Google for patent infringement over VP8 back in 2013, though Nokia ultimately lost that suit. I asked Florian Mueller, owner of the authoritative FOSS Patents blog, if he knew of any newer legal action relating to VP9, and he responded, “At this stage I’m not aware of any pending lawsuit over VP9.”
I asked Italian IP attorney Carlo Piana the same question. He responded with a depressing reality: “The fact that as of now we don’t know of any outstanding claim against a format is by all means no assurance that a later claim will not surface,” he says. “Usually those claims show up when the standard is deeply rooted in the technology and therefore very difficult to disregard.” In other words, the fact that the sky is clear today doesn’t mean it won’t rain down IP claims tomorrow.
Since VP9 will have a relatively short shelf life, perhaps IP claims simply won’t appear or will be targeted toward VP9-next. It’s possible claims will be targeted toward AV1, the first product of the Alliance for Open Media, instead. Of course, with founding members Amazon, Cisco, Google, Intel, Microsoft, Mozilla, and Netflix and subsequent members AMD, ARM, and NVIDIA, the alliance can amass a significant war chest to dispute any IP claims.
More to the point, it’s clear that VP9 will soon be succeeded by AV1, which is scheduled to ship as soon as the end of 2016. Until we know AV1’s output quality and playback complexity, it’s tough to say whether AV1 will replace VP9 or simply supplant it for high dynamic range or 60/120 fps videos.
Many companies, including Microsoft, see working with VP9 today as a way to ensure the fast and smooth transition to AV1 when it’s available. For example, Gabe Frost, principal lead program manager on the Windows and Devices Group, stated, “The work we’ve done on VP9 on Windows gives us a great head start in how they [Microsoft] think about optimizing this class of codec for Windows and windows devices with GPU acceleration.” (Frost is also executive director of the Alliance for Open Media.) On the other hand, companies with a procrastination bent may decide to eschew VP9 in favor of AV1.
At the end of the day, it’s a pretty simple equation. VP9 will let you deliver the same quality video at a significantly lower bitrate, saving bandwidth and distributing higher-quality video to viewers on slower connections. Alternately, you can serve higher-quality video to existing customers at the same bandwidth as H.264 and increase your customer’s quality of experience. Either way, distributing VP9 will add to both encoding and storage costs because you’ll need to continue producing H.264 to serve some browsers and mobile platforms. Do the math, and see what the numbers tell you.