While low-latency transcoding sounds desirable, low-latency transcode settings can reduce quality and may not noticeably impact latency.
Reducing latency has been a major focus for many live producers, and appropriately so, particularly for events that viewers can watch via other media, like sporting events available through satellite or cable TV. However, it’s important to understand that transcoding latency contributes minimally to overall latency in ABR applications and that low-latency transcode settings reduce video quality. Unless you’re running ultra-low latency applications like gambling, auctions, or conferencing over technologies like WebRTC or HESP, you should strongly consider not using the lowest possible latency settings.
The image above shows the components of overall glass-to-glass latency for a live event delivered via adaptive bitrate technologies. By far, the largest component is the ABR packaging. WebRTC and similar technologies don’t use this form of packaging, which is how they deliver sub-1-second latency.
If you’re distributing live events via a low-latency ABR technology like LL HLS, LL DASH, or LL CMAF, you’re probably in the 5-8 second latency range. The highest transcoding-only latency times I’ve seen is around 500 ms to 750 ms, and the lowest is around 50 ms. So, if you’re in the 5-8 second range, transcoding with ultra-low latency settings doesn’t reduce latency significantly but can cost you quality-wise, particularly with x264. I also measured with x265 and found the quality of zero latency and normal latency output roughly equivalent, though low throughput makes x265 transcoding very expensive.
Contents
The Quality Cost of Low-Latency Transcoding – x264
To test the quality of low and normal latency videos, I encoded four files with FFmpeg using the following command string.
ffmpeg -i soccer.mp4 -c:v libx264 -b:v 5000k -minrate 5000k -maxrate 5000k -bufsize 10000k -preset medium -tune zerolatency -force_key_frames "expr:gte(t,n_forced*2)" -an soccer_zerolatency.mp4
I removed -tune zerolatency and encoded again, adjusting the bitrates until file sizes were within 1%. You can see the results for Harmonic Mean and low-frame (the score of the lowest quality frame in the file, an indicator of the potential for transient quality issues).
For harmonic mean VMAF, zerolatency costs about 2.33 VMAF points on the top-quality stream in your encoding ladder. You can look at this in two different ways. First, is that most viewers can’t discern a 3 VMAF point differential, so don’t worry, be happy. The glass-half-empty view is that you’d have to boost the bitrate of the zero latency stream by between 500 kbps to 1 Mbps to achieve the same quality as a stream encoded using the normal latency settings.
Let’s visualize the difference using the Riverplate soccer clip, which showed the greatest Harmonic Mean and low-frame delta. Figure 1 shows the Results Plot from the Moscow State University Video Quality Measurement Tool with the zero latency file in red and normal latency in green. To be fair, most of the really low zones in red were crowd shots that few viewers would notice. Still, better quality is always better, and the frequent red drops in quality are meaningful.
A quick comparison of the switches used for zero latency (on the right in Table 2) and normal latency settings when using the Medium preset revealed a host of differences that could impact quality. For example, B-frames drop from 3 to 0 while reference frames drop from 3 to 1. Certainly, reducing lookahead from 40 to 0 would impact the encoder’s ability to detect screen changes; hence the reduced low-frame scores, particularly in clips with lots of scene changes like the Riverplate clip.
I’m not going to fully explore the difference between threads and sliced threads here but may do so down the road. Very briefly, using multiple threads during encoding increases latency because each frame is encoded by a single thread; the more threads, the greater the latency.
In contrast, slices divide each frame into slices, which are handled by separate threads. This may reduce quality slightly, but it improves throughput, which may allow you to use a higher-quality preset. That’s why sliced threads are enabled for zero-latency and not for normal (see here for a full explanation).
The Latency Cost of x265 – Not So Bad
I ran the same tests using the x265 codec and the command string below, again with and without the -tune zerolatency option. I used the superfast preset as compared to medium to achieve faster than 30 fps on my test workstation.
ffmpeg -y -i soccer.mp4 -c:v libx265 -b:v 3580k -minrate 3580k -maxrate 3580k -bufsize 7160k -preset superfast -tune zerolatency -force_key_frames "expr:gte(t,n_forced*2)" -an soccer_zerolatency.mp4
As you can see in Table 2, the results were much closer. If you’re transcoding with x265 using a high-speed preset, you may not experience the same quality penalty as there was with x264. In fact, low-frame quality is actually a bit higher.
Table 4 shows why the quality delta may not be that significant, as the values for the Superfast preset aren’t that different from the Zero Latency values. Beyond those shown, though the Zero Latency tune doesn’t control reference frames, x265 uses only a single reference frame for the Superfast preset, which will carry through to the Zero Latency value. The bottom line is that the superfast encoding switches are so constrained that tuning for zerolatency doesn’t further degrade output quality.
Of course, if you encode using a higher-quality preset, it likely won’t improve quality significantly anyway since the zerolatency tune would likely eliminate many of the high-quality configurations. Since you’d probably have to deploy multiple threads to support a higher-quality preset, you’d also be boosting latency. Any way you look at it—quality, throughput, or latency—encoding with x265 in software appears suboptimal.
The Bottom Line
The bottom line is to recognize that deploying a low-latency transcoding setting may impact video quality, particularly if you’re encoding with x264. When the target latency is sub 1 second, say for conferencing, auctions, gambling, and other interactive applications, you really have no option. However, when encoding for distribution via any low latency ABR application, you may want to consider opting for higher quality as opposed to lower latency.