As of June 22, 2022, libaom-AV1 and SVT-AV1 tune for PSNR by default, and libaom-AV1 doesn’t appear to have a mode that optimizes actual video quality. Be sure to verify and consider this when comparing the codecs to others.
Most of the time with codec experimentation, a lapse in attention costs you hours, days, or weeks of re-encoding and re-testing, and/or a public shaming if you’ve already gone to print. Not that any of this ever happened to me (har, har). I must be living right; a recent lapse of attention produced neither retesting nor shame.
It did, however, raise an important point for all who test codecs, and those who read these analyses. Specifically, that both SVT-AV1 and libaom-av1 tune for PSNR by default. With SVT-AV1, you can tune for visual quality, but with libaom-av1, it doesn’t appear that you can. I’ll cover why that’s important in a moment.
Back to the storyline, the potential issue had to do with some quality comparisons I was running between SVT-AV1 and libaom-AV1, two open-source AV1 codecs. I was finishing up my comparisons and noted the following in SVT-AV1’s documentation.
About Tuning for PSNR and AV1
This reminded me that I should at least check and see how SVT-AV1 handled tuning for metrics. By way of background, most codecs have different tuning options for specific types of content and for the application of different quality metrics. As these mechanisms relate to metrics, a great definition is from x265 documentation, which states:
The psnr and ssim tune options disable all optimizations that sacrifice metric scores for perceived visual quality (also known as psycho-visual optimizations). By default x265 always tunes for the highest perceived visual quality but if one intends to measure an encode using PSNR or SSIM for the purpose of benchmarking, we highly recommend you configure x265 to tune for that particular metric.
Tuning was critically important when applying metrics like PSNR and SSIM because these older metrics heavily weight differences between the source and encoded files, and don’t assess whether the differences actually improve the perceived quality for human viewers. When using these metrics, you want to eliminate encoding techniques that create differences that are known to degrade metric scores, so you use the tuning options.
More recent metrics like VMAF and SSIMPLUS incorporate the human visual system into the analysis and reward differences that improve subjective appearance, which lessens the need for tuning. As these metrics improve, producers also want to test using their real-world output settings, which almost never involve tuning (see Netflix practices here and Facebook/Meta here).
All that said, the cardinal rule of codec comparisons is not whether you tune or not, but that you’re consistent in your approach with all codecs. With a metric like VMAF, swings of up to 3 full points are not usual between tuned and untuned output, which is obviously meaningful.
In most of my recent tests, I’ve consciously not tuned, relying on VMAF and sometimes SSIMPLUS to accurately assess how the test files would look to human eyes. With x264 and x265, if you don’t include the tune parameter in your command string, there’s no tuning, and as the quote from the x265 documentation above states, the codec attempts to produce the highest-perceived visual quality.
I had checked the libaom-av1 help file many times, most recently in a build from June 9, 2022, and it seems to indicate that tuning is disabled by default. Surely, -1 means tuning is disabled, right?
SVT-AV1 Tunes for PSNR by Default
Once I saw the SVT-AV1 reference to tuning for visual quality in Figure 1, it seemed like a good idea to confirm that this was the default setting. So, I checked the help file and saw this:
It turned out that visual quality wasn’t the default, it was tuning for PSNR. I verified this with a contact in the SVT-AV1 development group and assumed that I was going to have to re-encode the SVT-AV1 encodes to match the untuned libaom-av1 encodes. I explained this to my contact who responded, “no worries, tuning for PSNR is enabled by default with libaom-av1 as well.”
Given Figure 2, I was dubious, but it was easy enough to check. I tested two short files with four different encoding parameters; one with no tuning option specified, and three others with tune=0, tune=1, and tune=-1. As you can see in Figure 4:
- The files encoded with no tuning reference are identical in size (and VMAF score) to files encoded with tune=0 for PSNR. So, tuning for PSNR is the default option.
- The result is the same tune=-1, which appears to mean that this option doesn’t work.
As you can see in Table 1, the VMAF scores follow the same pattern; no tune, tune=0, and tune=-1 all produce the same scores.
I checked the FFmpeg codec help page and saw this in the 9.7 libaom-av1 codec section. While it contradicts the help file, it certainly appears to be accurate.
After this research, I asked my contact in the SVT-AV1 development group, “what’s the libaom equivalent to your tune=0?” (which optimizes for video quality). He replied, “Libaom does not seem to have the equivalent to the SVT tune 0.” Hmm.
What This All Means
This means a couple of things. Looking at the documentation, there doesn’t appear to be a way to optimize libaom-av1 encodes for human viewing; you either tune for PSNR or SSIM. As far as I know, this is unique to libaom-av1; all other codecs enable optimizing for visual quality, and most default to it. This makes sense since the overwhelming amount of video is produced for visual consumption, not benchmark testing.
Not to obsess, but the reason these tuning mechanisms exist is to disable adjustments that improve perceived quality for humans but degrade metric scores. Which setting do you use to optimize quality for human viewers?
Beyond this navel-gazing, when comparing libaom-AV1 to any other codec, the best practice would be to tune for PSNR for that codec as well; otherwise, you’re comparing apples and oranges.
Going back to the quality comparison that I was about to publish between libaom-avi and SVT-AV1, I was OK because the tuning for PSNR was the default for both codecs, so it was apples and apples. This is fortunate, as it leaves me with plenty of time to figure out which comparisons I’ve produced in the past that I have to go back and correct.
I’ve reached out to contacts at the Alliance for Open Media about this issue but haven’t heard back. I’ll update this article if and when I get useful information in response. In the meantime, if anyone out there has any useful input, please contact me at [email protected].