Testing Video Quality With Apple AVQT

In the first article in this series, Baby Steps with Apple’s Advanced Video Quality Tool and Quality Metric, I detailed what Apple’s Advanced Video Quality Tool (AVQT) is and how to use it. In this article, I tested multiple video files with Apple AVQT and describe my user experience. In the next article, I’ll review how AVQT scoring compares to VMAF and SSIMPLUS on a very limited test set.

Working with AVQT was not the smoothest road I’ve ever traveled, but some of that was the author’s failure to RTFM (read the freaking manual) and part of it an obscure but avoidable quirk in AVFoundation. I reached out to Apple on both issues and they quickly helped me resolve them.

Here were the issues I encountered and the fixes:

  • AVQT couldn’t decode files encoded with FFmpeg using the x265 codec – problem – AV Foundation doesn’t work with hev1 tagging – solution – rewrap the files (not reencode) as shown below.
  • Very minor duration and frame rate mismatches trigger an error – problem – the author didn’t read the ReadMe file (and a less than helpful error message) – solution – use the -f switch to force the metric computation (and add this suggestion to the error message).

Let’s jump in.

Testing x264 Files with Apple AVQT

Many standardized test files are between 10 – 20 seconds long, which speeds testing throughput and enables subjective testing, which typically is most effective with shorter clips. I recently ran a range of tests on ten-second clips for this codec comparison and used these clips to test Apple AVQT. I encoded all clips with FFmpeg using the simple command strings referenced in the article.

When I tested clips encoded with the x264 codec, AVQT performed perfectly and I was able to compute the scores discussed in the third article. While on a positive note, I also want to comment on AVQT’s speed, which is fabulous. On my 8-core Apple M1-based Mac Mini, processing a two-minute file took about 15 seconds or about 8x real-time. On an 8-core Intel Xeon CPU E3-1505M Windows based-computer, computing VMAF took around 8:40 (min:sec).

Testing x265 Files with Apple AVQT

I experienced the first problems with files produced in FFmpeg with the x265 codec. Here’s the AVQT command string:

AVQT -sd 25 -tp HarmonicMean -o football_x265.csv -t football_x265.mp4 -r Football_10.mp4

Here was the result.

Apple AVQT couldn't test files encoded using x265 and FFMpeg.

I tried to play the file in QuickTime and got this result. No surprise; AVQT supports all video formats and codecs supported by AVFoundation, and if it wasn’t supported in AVQT it’s likely not supported in QuickTime.

Here’s what I heard back from Apple.

The first issue is due to container level tagging. When setting the codec to libx265, FFmpeg defaults to hev1 tagging which is not supported by AVFoundation. The supported tag is hvc1 and can be set in FFmpeg by adding “-tag:v hvc1” to the command.

Since it’s a container-level tag (not an encoder parameter), you can re-tag your files without re-encoding them (the bitstream will stay as is). This can be done in FFmpeg by running the following command:

ffmpeg -i football_265.mp4 -c:v copy -tag:v hvc1 football_x265_hvc1.mp4

I checked, and the resultant file both played in QuickTime Player and sailed through the AVQT measurement.

I also verified that you can add the same switch (-tag: v hvc1)  to a two-pass encoding string and deliver a compatible file without this step using this argument (or a single-pass argument with the same switch inserted).

ffmpeg -y -i Football_10.mp4 -c:v libx265 -tag:v hvc1 -x265-params keyint=60:min-keyint=60:scenecut=0:bitrate=2400k:vbv-maxrate=4800k:vbv-bufsize=4800k:pass=1 -f mp4 NUL && \

ffmpeg -y -i Football_10.mp4 -c:v libx265 -tag:v hvc1 -x265-params keyint=60:min-keyint=60:scenecut=0:bitrate=5000k:vbv-maxrate=10000k:vbv-bufsize=10000k:pass=2 Football_1080p_5000_x265_hvc1.mp4

Duration and Frame Mismatches

While foundering for a solution to the above issue, I encountered a different issue. For example, I first tried encoding the source file to HEVC format using an Apple Compressor preset. The Compressor file played fine in QuickTime, but failed in AVQT, which reported the duration mismatch.

So, AVQT failed to run because of an approximate 0.0002667 difference in duration between the source and encoded file. Curious as to how other tools would fare with the Compressor output, I ran the following script to compute PSNR in FFmpeg, which worked perfectly.

ffmpeg -i Football_10_Compressor.m4v -i Football_10.mp4 -lavfi psnr=stats_file=psnr_logfile.txt -f null -

Obviously, FFmpeg didn’t see the same issue.

Then I moved the Compressor output to a Windows machine and loaded both files into the Moscow State University Video Quality Measurement Tool (VQMT, see below), which also worked perfectly. Using the Show frame button on the lower right, I toggled through the source and compressed file looking for frame mismatches and found none. So, the timing mismatch AVQT stalled on didn’t appear to cause any problems with FFmpeg or VQMT.

Then I tried converting the HEVC file produced in FFmpeg to YUV format, using the following command:

FFmpeg -y -i football_x265.mp4 -s 1920x1080 -pix_fmt yuv420p football_x265.yuv

and then submitted the YUV file to AVQT using the following command:

AVQT -tf I420 -tr 1920x1080 -ttc ITU_R_709_2 -tfr 29.97 -sd 25 -tp HarmonicMean -o football_10_360p_700_x265_vslow.csv -t football_x265.yuv -r football_10.mp4

and got the same error.

Then I wondered whether this lack of duration and frame rate tolerance would impact longer H.264 test files. To test this, I ran six two-minute test files encoded with the x264 codec in FFmpeg and they all failed (see below). Ten-second versions of the same test clip worked perfectly.

All of these files ran fine in VQMT and FFmpeg. Below is the Result Plot from a two-minute test file in VQMT; the fact that there were VMAF scores of 100 late in the file indicates that tiny differences in frame rates or overall duration didn’t cause a frame mismatch (which I visually verified using the Show frame feature).

The Solution?

I could have avoided all these issues by checking the Readme.pdf file, which defines the following switch:

–force (or -f)
Bypass frame rate, aspect ratio, video duration, and transfer characteristics equality checks between reference and test video. AVQT requires spatial and temporal alignment between the reference and test videos. You can use this flag to disable this requirement.

I added the switch and AQVT processed all the tested files without issue.

Regarding this issue, Apple commented, “frame misalignments can invalidate video quality scores. By default, AVQT warns the user that this might be the case instead of producing wrong scores.” This is a tough issue; Apple is basically saying “go ahead, compute the score, but we can’t guarantee its accuracy.”

Because of AVQT’s tight tolerances, it’s going to be difficult to compute AVQT without the -f switch on longer clips or on clips encoded with codecs that aren’t natively supported that you have to first convert to YUV format. AVQT should work very well on the pristine 10 to 15-second YUV origin test clips in the various test libraries, but will likely require the -f switch on many less formal and longer test clips. While the -f switch forces the AVQT score, it also makes you doubt its veracity with no way to verify accuracy.

In contrast, SSIMWAVE’s SSIMPLUS VOD monitor can automatically synchronize files and simply won’t process two files it can’t synch. The Moscow State University VQMT tool allows you to manually sync the files visually (with auto-synch coming in an upcoming version). Both tools let you check the sync visually so you can ensure scoring integrity. These are both for-fee tools, but the typical user of video quality metrics cares more about accuracy and performance than cost. (Full disclosure: The Author has produced training tutorials for SSIMWAVE under a work-for-hire consulting arrangement).

An alternative approach for Apple might be processing the score but reporting an error if the number of input and output frames don’t match, which often means that there’s a misalignment. In addition, Apple might develop logic to determine if the frames aren’t aligned as SSIMWAVE has done and MSU is doing, and then report a problem.

Finally, Apple might consider adding a line to the error message that says something like “Use the -f switch to override this problem and compute anyway.” That way, folks like me who might not RTFM won’t waste a bunch of time figuring out potential workarounds when a simple one exists.

OK, that’s it for now. In the next article, I will share how AVQT compared to SSIMPLUS and VMAF scoring, including how they correlated with subjective tests performed by Subjectify.us for a very small sample of files.

About Jan Ozer

I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks, and evaluate new encoders and codecs.

Check Also

8K Video; Per-Title Encoding; HDR Metrics

2 comments

  1. Thanks for this. Good to see you doing more with a Mac.

    Might get to be a Mac guy afterall:-)

Leave a Reply

Your email address will not be published. Required fields are marked *