AV1 encoding is now only 2x slower than x265. See the latest results here.
I’m comparing AV1 encoders for Streaming Media Magazine. I plan to include codecs from:
The Alliance for Open Media (hopefully versions 1.0/2.0)
Visionular
Intel
Mozilla (Mozilla is out – didn’t respond to my inquiry).
If there are any other codecs that should be considered, please get in touch with me at [email protected]. Basically, I’d need code for Linux and Windows and recommendations on the optimal encoding parameters to use.
I’ll be testing 12 ten-second files from five different genres; animations, games, movies, sports, and other. I will analyze with VMAF and produce rate-distortion curves and BD-Rate computations. I may include subjective tests via Subjectify if time/budget permits. I’ll measure encoding speed and analyze files for transient quality issues.
I’m asking for a bit of help and sharing some initial findings. If you can help, please contact me at [email protected].
Contents
Question 1: What version of AV1 is in FFmpeg version 4.3?
AV1 version 2.0 shipped on May 18; according to the readme file from the version of FFmpeg that I downloaded, the build includes aom 20200620-8c113ea <https://aomedia.googlesource.com/aom>, which presumably means it was downloaded on June 20. Encoding is much faster than with previous FFmpeg versions but the version-related information looks the same in MediaInfo, so I can’t tell. If anyone knows which version of AV1 is included in FFmpeg 4.3, please let me know at [email protected].
Question 2: Comments on this Command String
Here’s the encoding string that I’m using. For perspective, all files are 1080p at between 24 and 60 fps.
ffmpeg -y -i input.mp4 -c:v libaom-av1 -strict -2 -b:v 3600K -g 48 -keyint_min 48 -sc_threshold 0 -row-mt 1 -tile-columns 1 -tile-rows 0 -threads 8 -cpu-used 8 -pass 1 -f matroska NUL & \
ffmpeg -y -i input.mp4 -c:v libaom-av1 -strict -2 -b:v 3600K -maxrate 7200K -bufsize 7200k -g 48 -keyint_min 48 -sc_threshold 0 -row-mt 1 -tile-columns 1 -an -tile-rows 0 -threads 8 -cpu-used 8 -pass 2 output.mkv
I created most of this with Google’s help when I wrote Good News: AV1 Encoding Times Drop to Near-Reasonable Levels for Streaming Media Magazine. I added the -row-mt switch upon the advice of Dirk Hildebrandt, CTO of Wavelet Beam. If anyone sees any options that would improve encoding speed or quality, or if anything looks funky, please contact me at [email protected].
Question 3: Comments on the Preset
According to the FFmpeg documentation, the switch cpu-used “Set[s] the quality/encoding speed tradeoff. The valid range is from 0 to 8, higher numbers indicating greater speed and lower quality. The default value is 1, which will be slow and high quality.” As with VP9, it’s OK to use the fastest preset (8) with the first pass; it’s the setting used in the second pass that sets overall quality and encoding time.
To identify the appropriate preset to use for my tests, I encoded seven of the files to all presets and measured the VMAF harmonic mean and low frame score, the latter a measure of the potential for transient quality issues. I also recorded the encoding time. The table presents the averages along with average encoding times, again for ten-second files.
The first figure below presents the quality data graphically. As you can see, both metrics stay pretty flat through CPU Used 5, but jump noticeably at the 4 value, with another noticeable boost at 3 and 2 but flattening out thereafter. Cost permitting, you’d certainly like to get to at least 4, if not 3 or 2.
For perspective, however, the total increase in harmonic mean from 94.90 to 96.31 is about 1.41 VMAF points, and it takes 6 VMAF points to produce a just noticeable difference that 75% of viewers would notice. The delta for low frame quality is 2.1. So, you certainly could argue that a setting of 8 is reasonable for all encodes.
The next figure adds the time element, and plots all values as a percentage of 100. For example, the setting of five, on average, delivered 98.6% of the available harmonic mean quality at 1.92% of the maximum encoding time. Between the chart and the table, you see encoding times start to get pretty ugly after 3.
You also see that the jump from 5 to 4 cuts throughput by about 50%, so roughly doubles encoding time and cost. A setting of 3 extends the encoding time by about 1.6x, making it roughly 3.3x more costly than 5. It would seem to make very little sense to go beyond 3 as 2 triples the encoding time (and cost) for truly negligible gains.
Looking at this data, I’ll probably use a cpu-used value of 3 for my tests. If you have any comments, please let me know at [email protected]
Incidentally, I recommend producing this type of analysis for every encoder and codec that you deploy; it really helps identify the best preset for your use. To learn how, check out my course on video quality metrics.