TL/DR: Netflix updates VMAF to include a new model that reverses out the effects of techniques like sharpening, contrasting, and histogram equalization.
As I reported in VMAF is Hackable: What Now?, you can increase VMAF scores via simple image enhancement techniques like sharpening, contrasting, and histogram equalization, which, when applied inappropriately, may not actually improve subjective quality and may even degrade it. In a publically available Google document, Netflix’s VMAF expert, Zhi Li, a senior software engineer on the Netflix Video Algorithms and Research team, details why VMAF can overreact to these adjustments and lays out Netflix’s solution.
Regarding why VMAF can overreact to these image enhancement techniques, recall that VMAF stands for Video Multimethod Enhancement Fusion and that the metric is the fusion of four separate metrics, including VIF and DLM, which both seem to over-respond to image enhancement techniques detailed above. This is important to know because you have to address the issues with both metrics to resolve the problem.
Specifically, to reverse out the impact of these image enhancement techniques, you’ll have to:
- Download a new version of Netflix’s vmafossexec.exe (or a compiled version for x86 here; x64 here)
- Either run a new model called vmaf_v0.6.1neg.pkl (“neg” stands for “no enhancement gain”)
- Or disable both enhancement gains via the command-line options (vif_enhn_gain_limit=1.0 and adm_enhn_gain_limit=1.0). Zhi’s report specifies the command strings that do so.
Li states in his report that “we are currently in the middle of revamping the C library API, and there is an old executable vmafossexec (and the corresponding library libvmaf) and a new executable vmaf_rc (and the corresponding library libvmaf_rc). The option to disable the enhancement gain is only implemented in the new executable vmaf_rc.”
The table below shows some of the data that Li presented in his report. The columns represent the Original baseline video, then the video treated with sharpening and histogram equalization. With enhancement gain enabled (top set of files and first VMAF score) sharpening increases the VMAF score to 111.9868, while histogram equalization pushes it to 144.
With enhancement gain disabled (using the new version with either the neg model or command line controls), the results drop to well below the baseline video.
The other table in Li’s report is shown below. Briefly, Google recently added a tune=vmaf option to AV1 that sharpens frames before encoding which boosts the VMAF score. In the table, the center column shows the results of an AV1-encoded file with the tune=vmaf option enabled as compared to a baseline file without that option in the libaom column on the right. In the top set of rows with enhancement gain enabled, the tune=vmaf option delivers a VMAF score of around 105, much higher than the baseline result of 95.1425.
With enhancement gain disabled, tune=vmaf actually delivers a lower score than the baseline, but the baseline score also dropped by about 2 VMAF points. Addressing this, Zi states, “One thing to note is that the absolute score of VMAF does drop slightly, typically by 1~3.”
So, if you use the old model, your VMAF scores will remain the same, but VMAF will overreact to the aforementioned image adjustments. If you use the new model, this won’t happen, but your scores will be slightly lower.
Just thinking out loud here, but if you’re using VMAF to assist in day-to-day configuration decisions, you’ll probably want to use the older model. If you’re evaluating pre-processing techniques or even different codecs, you probably want to use the new model. Once I get the new version and model downloaded and operational and have some time, I’ll rerun the metrics against the iSize files that I tested here and see what I learn.
If you’re using third-party tools to produce your VMAF scores, like I do, you’ll have to wait for each vendor to update their application.
Either way, kudos to Zhi Li and Netflix for continuing to invest in VMAF. Beyond that, when it comes to comparing codecs and preprocessing techniques, while objective metrics can supply valuable data, you should also arrange some subjective evaluations.