The benefits of per-title optimization aren’t just for the major players, anymore. Streaming Media reviews the first solution for smaller content owners and finds the results promising.
Netflix announced the beginning of the end for fixed encoding ladders with a December 2015 blog post called, “Per-Title Encode Optimization”. The big question was, what would replace them? For companies with the development budgets of Netflix and YouTube, which announced its own neural networkbased pertitle optimisation approach soon thereafter, DIY is the way to go. But what about smaller companies that need the efficiencies and quality-ofexperience benefits that pertitle encoding can provide, but that don’t have the budget for DIY Enter Capella Systems Cambria FTC, the first enterprise encoding package we know of with a scriptable pertitle optimization feature. In the copycat encoding market, the Capella launch likely presages the introduction of similar features from competitive vendors, but at the time of this writing, Cambria stands alone.
In this review, I provide a brief overview of the Cambria encoder and then dive into how the pertitle encoding works, how we tested this function, and how Cambria performed in those tests.
Capella is a privately held encoding company boasting several employees who worked at Rhozet, which developed the highly regarded Carbon Coder product that was acquired by Harmonic in 2007. Cambria FTC is a VOD file converter that runs on multiple flavors of Windows, including Windows 10. The base price is $8,700, to which you’ll have to add $5,950 for HEVC encoding and $1,500 for adaptive bitrate (ABR) output. The overall feature set is competitive, with the typical vast array of input formats, output formats, watch folder operation, and a REST API.
One relatively unique capability is support for scriptable workflows, which is how Cambria performs pertitle optimization. Via scripting, Cambria can also perform workflowlike operations such as changing the preset depending upon source properties or setting in and out points automatically. You create the scripts in Perl, and Capella supplies several example scripts, including one for pertitle encoding.
Cambria operates using two separate programs: Cambria File Convert (where you create presets, add input files, and start manual encoding projects) and Cambria Manager (where you configure job management and watch folder settings, create and monitor watch folders, and monitor encoding jobs). Like most encoders, Cambria works from presets, and you can create presets for singlefile and ABR outputs, including HLS, MPEGDASH, and Smooth Streaming.
Figure 1 shows the File Convert application, with tabs for Source (where you select the files to compress), Encoding (where you create and choose presets), and Conversion (where you can watch the encoding process, though I sent most jobs to the Cambria Manager for processing).
Figure 2 shows the Encoding tab. The Preset Editor is open with a preset for an MPEGDASH ABR group with seven video streams. The Video Stream Configuration for the 1080p stream is open, showing the configuration options for that stream. One frustration is that you can’t access x264 presets such as slow, medium, and fast from the Edit screen, though you can manually input most of the parameters necessary to achieve the same quality/performance balance. Note the Script tab in the upper section of the Preset Editor; that’s where we’ll enter the pertitle script.
You can select multiple presets in the Encoding tab, which will be applied to all files loaded in the Sources tab. When you’re ready to encode, click either Queue All Jobs to send the jobs to the Cambria Manager or Convert All Jobs to convert in the File Convert tab. Again, I sent all my encoding jobs to the Cambria Manager, which has a loglike function that makes it simpler to track multiple jobs, though both encoding workspaces deliver the same encoding performance.
That’s the overview. Now let’s look at the pertitle optimization feature.
The official name for Cambria’s feature is “source adaptive bitrate ladder,” or SABL. The starting point for every encode is the encoding ladder shown on the left in Figure 2—1080p at 4300Kbps, 720p at 2500Kbps, and so on. When enabled via a script, like that shown in Figure 3, Cambria runs a fast constant rate factor (CRF) encode of the file to gauge encoding complexity. Briefly, CRF is an encoding technique available with x264 and several other codecs that lets you select the desired quality level rather than a data rate. While encoding with CRF, x264 produces a file with the selected CRF quality level, adjusting the data rate as necessary to deliver that quality.
In this fashion, the data rate produced during the CRF encode is a measure of encoding complexity. For example, Table 1 shows the files used to test SABL, and the results of the complexity measurements from the 1080p CRF encode, where all figures show the kilobitspersecond output. The 30 Sec Peak value shows the highest data rate for any 30second chunk of the movie, while the 10 Sec Peak shows the highest data rate for any 10second chunk. The Average Complexity shows the average rate for the entire movie.
To reflect for a moment, Table 1 shows exactly why a fixed bitrate ladder is so suboptimal. Consider the 4300Kbps target data rate for the 1080p stream shown in Figure 2. Applied to the Zoolander movie, it would be too low, resulting in a poor quality file. Applied to almost all other files in the test, it would be too high, especially for synthetic files such as Camtasia or PowerPoint-based files. These files would be encoded at too high a data rate, wasting bandwidth and limiting their reach on slower bandwidth connections.
Back on point, you control which measure that encoder uses to adjust the encoding ladder. A conservative approach might use 10 seconds as the measure, pushing the data rate up even though it might only affect one highly complex region in the file. This would generally result in files with few encoding artifacts, but with some bandwidth wasted in other areas.
Alternately, you could base the decision on the 30-second peak, which would result in a lower overall data rate, but perhaps some artifacts in some shorter regions. You could also set it for any arbitrary length.
Once this value is returned, the script shown in Figure 3 manages the adjustment. That is, if the encoding complexity was more than 7,000Kbps, the encoder would adjust the data rate of all streams in the adaptive group by 1.5, boosting the target data rate by 50 percent. If the complexity value was 2,000Kbps or less, the encoder would adjust the date by 0.6, dropping the data rate of all streams by 40 percent.
All these adjustments are totally configurable. For example, a conservative encoding shop could adjust the data rate upward for complex clips but never adjust the data rate downward for simple clips, maintaining a very high quality level. Or, you could adjust the data rates for some, but not all, of the streams. Of course, you select both the encoding thresholds and the percentage adjustments to each level.
How We Tested
Intuitively, the goal of any pertitle optimization technique would be to boost the data rate and quality of a file when necessary, but only when the improvement would be noticeable to the viewer. Otherwise, bandwidth would be wasted. Conversely, you would want the encoder to drop the data rate and quality when possible, but only when it wouldn’t result in visible artifacts that would degrade the viewer’s quality of experience.
How do we measure these concepts? In the “PerTitle Encode Optimization” blog post mentioned previously, Netflix made several general observations about the peak signal-to-noise ratio (PSNR) metric that originally powered its analysis (Netflix has since transitioned to its own video multimethod assessment fusion [VMAF] metric). First, Netflix stated that PSNR values in excess of 45 are seldom noticeable by the viewer (“for encodes with PSNR 45 dB or above, the distortion is perceptually unnoticeable”). At the other end of the quality spectrum, the researchers also stated that PSNR values below 35 are often accompanied by visible artifacts (“35 dB will show encoding artifacts”). Let’s agree that these are very rough metrics, but they’re a useful yardstick for assessing Cambria’s SABL-related performance.
Table 2 shows the results of our tests. The SABL data rate is the rate after the analysis; as you recall, the original rate was 4300Kbps (Figure 2). The bandwidth is the percentage reduction between the original and SABL rate, with the PSNR values calculated for both the original and SABL streams.
In these 10 tests, the results are all positive (in some cases, extremely positive). For example, with the cartoon El Ultimo, Cambria dropped the data rate by 50 percent while keeping the PSNR at 45.24 dB. This is quite a significant data rate drop with a quality delta that should be imperceptible to viewers. We see similar results in the Screencam and Tutorial clips, where 50 percent data rate reductions still left the SABL PSNR well above 45 dB. At no point did a drop in data rate push the PSNR value anywhere close to 35 dB. In the two clips where Cambria increased the data rate, the music video “Freedom” and the short Zoolander clip, the increase was clearly beneficial and not wasteful (i.e., it didn’t push the data rate above 45 dB).
Let me reiterate that virtually all the parameters used in these tests are configurable. If I wanted to create another level at the bottom that dropped the data rate by 60 percent, I could. I could also create another level at the top, or push the data rate for existing levels higher for complex videos. You control all basic parameters so you can dial in the desired level of risk and reward.
How long will this analysis take? This depends upon a number of factors, including the workstation that you’re testing on, content resolution, and the number of simultaneous encodes you’ve configured in Cambria Manager. We tested on an HP Z840 workstation with two 2.6 GHz E52690 v3 CPUs running Windows 10. We created six copies of the same 1:45 (hours:mins) 1080p feature film and processed the analysis phase running five simultaneous encodes, which pushed CPU utilization up to about 96 percent. The total time to process the five movies was just under 2 hours, which would be in addition to any encoding time. At the other end of the spectrum, with two simultaneous encodes, it took 4:07 (min:sec) to process the 4:25 (min:sec) 1080p “Freedom” music video clip.
What’s the bottom line? Fixed encoding ladders are history for the reasons discussed in the Netflix blog post and the text around Table 1. Assuming that you don’t have the development budgets of Netflix and Google, the solution offered by Cambria looks very promising. If you’re locked into another encoding platform, don’t be shy about pressing for a per-title option, since the basic building blocks are available to all encoding vendors, cloud and onprem, that use a CRF-compatible codec.