This article shows the quality/encoding time tradeoffs for producers choosing a preset for SVT-AV1 and libaom-AV1.
Note to readers – 12/13 – AOM has released version 1.4, which fixed the SVT-AV1 preset issues reported in the first version of this article.
Presets are the most important configuration option for controlling quality and throughput for most codecs. For this reason, when testing a new codec or codec update, the first configuration option you should explore is the preset.
In preparing for an upcoming presentation entitled Encoding AV1 With Open Source Alternatives at Streaming Media West, November 14 – 16, 2022, I tested the latest releases of libaom-AV1, the AV1 codec in FFmpeg, and version 1.3 of the SVT-AV1 encoder. The results changed the way I think about these codecs, and you may find them interesting as well.
Choosing the Best Preset for Libaom
To test the presets, I encode several files with all the presets and record encoding time, VMAF Harmonic Mean quality, which I’ll call VMAF, and low-frame VMAF quality, the last a predictor of transient quality issues. Then I plot the three results for all presets on a scale from 0 – 100% and plot them as shown below.
In this case, libaom-AV1 preset 0 takes the longest to encode but delivers the best VMAF and low-frame quality, both 100%. At the other end of the spectrum, preset 8 delivers 97.61% of VMAF quality and 96.43% of low-frame quality in 1.03% of the encoding time. I tested three files totaling about 8.5 minutes; at preset 0, it took over 140 hours to encode the three; at preset 8, it was about 90 minutes, which is more than 10x real-time. That tells you that you’re not going live with libaom-AV1 anytime soon.
Sharp-eyed readers will note a strange similarity between the results for presets 6-8; that they are all identical. This is not an error or misprint; I tried it twice with all three files and got the same exact result. After I posted this article on LinkedIn, a group product manager at Google added a comment that:
The reason that libaom speeds 6, 7, and 8 have identical results is that they are essentially identical when running in the default “Good Quality” mode. (That’s the mode for VoD.) “Good Quality” supports speeds 0-6, and “Realtime Mode” (for live, RTC) supports speeds 5-10. Perhaps we should throw an error when an invalid speed setting/mode combination is used. I think this is only explained in aomenc -help and in the aom/aomcx.h header.
I checked, and this usage pattern wasn’t in the help file for the version of FFmpeg that I tested or one I downloaded on December 4 when I updated this article. No matter, in any event, it doesn’t make sense to use presets higher than 6 unless you switch to realtime mode which was beyond the scope of this testing.
Beyond this interesting factoid, the timings and quality results in layout as expected; each preset takes longer to encode but delivers better overall quality in all cases and better low-frame quality in all but one. This logical progression makes selecting a preset a fairly logical process. I usually recommend preset 4, though you could argue for anywhere from 6 – 3. If you don’t choose a preset, the default is 1; given the meager improvement from preset 3 and 7x the encoding time, this doesn’t seem like a good investment.
Choosing the Best Preset for SVT-AV1 – Version 1.4
Figure 2 shows the same chart for SVT-AV1 version 1.4. On average, for VOD, between 3-6 appears to be the most relevant range.
- Preset 4 delivers 98.92% overall quality in 3.5% of overall encoding time.
- Preset 3 more than doubles the encoding cost and adds only .38 overall VMAF points.
- Preset 2 is expensive, close to tripling encoding time and adding only .27 VMAF points.
When choosing a preset, remember that while lower-quality presets save encoding time, you’ll have to boost the bitrate to deliver the same quality as higher-quality presets. Particularly for videos with high view counts, what you save in encoding time may be much less than what you spend on additional bandwidth. For videos with view counts in the millions, preset 0 might be the best option.
The first preset capable of producing a single real-time 1080p stream was preset 9; note the drop of over 5 VMAF points.
Choosing the Best Preset for SVT-AV1 – Version 1.3 (Problem Fixed in Version 1.4)
The issue reported below was corrected in version 1.4 as you can see above. I’m retaining the information below for the historical record.
Now let’s consider SVT-AV1 and a similar chart shown in Figure 2. Again, this is the amalgam of three files comprising about 8.5 minutes of video. The range of encoding time is much shorter, from 3:38 (hours:min) at the top end to 1:09 at the low end. All presets from 6-12 encoded the single 1080p30 faster than real-time. Impressive, but a long way from full ladder creation in software.
What’s funky, of course, is the quality and low frame progression over all the presets. While overall quality isn’t too far off the normal progression, low frame quality sure is, with a very significant drop-off in quality from preset 7 to preset 5. Longer encoding time should deliver higher quality average and low-frame scores, not the reverse.
While all three clips showed this pattern, it was worst in the Harmonic Football test clip, where the VMAF quality dropped by over 5 points between presets 7 and 5, and the low frame quality dropped by about 25. Figure 3 shows the two files in the Result Plot of the Moscow State University Video Quality Measurement Tool, with preset 7 in green and preset 5 in red. Remember that preset 5 took 3x longer to encode and was supposed to deliver better quality.
Figure 4. Preset 7 in green delivers better quality than preset 5 in red.Briefly, the Results Plot shows VMAF values for each frame over the duration of both video files. The downward spikes show significant quality drops for 1-3 frames. While brief, the quality differences are often significant. Figure 4 shows a comparison of preset 5 on the left and 7 on the right. In this high-motion frame, the differences are noticeable but not really prominent. Still, however prominent, the fewer low-frame scores, the better.
Figure 5. Preset five on the left; preset seven on the right.If you’re a VMAF denier (and I know you’re out there), I’ve included PSNR and SSIM Result Plots at the bottom of this article. While these metrics show that overall quality is much closer, they mimic the low-frame issues seen in Figure 3.
To paraphrase Shakespeare, something may be rotten in the state of SVT-AV1’s presets in version 1.3. I previously recommended preset 4 as the best option; with version 1.3, I’d recommend 6 or 7. Otherwise, you could be investing substantially longer encoding times for slightly lower overall quality and significantly reduced low-frame quality, particularly for high-motion clips.
Below is the command string I used to produce the SVT results, which is pretty much copied from the user guide here. Note that I was testing SVT-AV1 Windows version 1.3.0. It’s possible, though unlikely, that these results are idiosyncratic to my test clips. If you’re encoding with SVT-AV1 using a preset north of 6 or 7, you should run your own tests to see if your results are similar. If you get different results, or if you spot any errors in my analysis, please let me know at [email protected].
SvtAv1EncApp -i football.y4m --rc 1 --tbr 2310 --mbr 4620 --keyint 2s --passes 2 --lp 8 --preset 5 -b football_p5.ivf
Finally, always, always, always consider l0w-frame quality in addition to overall when choosing a preset or any configuration option. A few really ugly frames can impact viewer QoE more than a point or two in average score one way or the other.
How does SVT 1.3 compare to libaom-AV1? Well, for that, you’ll have to attend my upcoming session at Streaming Media West entitled “Encoding AV1 With Open Source Alternatives.” If you can’t make it to Huntington Beach next week, keep your eyes open for the recorded version on the Streaming Media website.
Here are the SSIM and PSNR Result Plots.