Figure shows the different components to live streaming latency.
Figure shows the different components to live streaming latency.

The Quality Cost of Low-Latency Transcoding

While low-latency transcoding sounds desirable, low-latency transcode settings can reduce quality and may not noticeably impact latency.

Reducing latency has been a major focus for many live producers, and appropriately so, particularly for events that viewers can watch via other media, like sporting events available through satellite or cable TV. However, it’s important to understand that transcoding latency contributes minimally to overall latency in ABR applications and that low-latency transcode settings reduce video quality. Unless you’re running ultra-low latency applications like gambling, auctions, or conferencing over technologies like WebRTC or HESP, you should strongly consider not using the lowest possible latency settings.

The image above shows the components of overall glass-to-glass latency for a live event delivered via adaptive bitrate technologies. By far, the largest component is the ABR packaging. WebRTC and similar technologies don’t use this form of packaging, which is how they deliver sub-1-second latency.

If you’re distributing live events via a low-latency ABR technology like LL HLS, LL DASH, or LL CMAF, you’re probably in the 5-8 second latency range. The highest transcoding-only latency times I’ve seen is around 500 ms to 750 ms, and the lowest is around 50 ms. So, if you’re in the 5-8 second range, transcoding with ultra-low latency settings doesn’t reduce latency significantly but can cost you quality-wise, particularly with x264. I also measured with x265 and found the quality of zero latency and normal latency output roughly equivalent, though low throughput makes x265 transcoding very expensive.

The Quality Cost of Low-Latency Transcoding – x264

To test the quality of low and normal latency videos, I encoded four files with FFmpeg using the following command string.

ffmpeg -i soccer.mp4 -c:v libx264 -b:v 5000k -minrate 5000k -maxrate 5000k -bufsize 10000k -preset medium -tune zerolatency -force_key_frames "expr:gte(t,n_forced*2)" -an soccer_zerolatency.mp4

I removed -tune zerolatency and encoded again, adjusting the bitrates until file sizes were within 1%. You  can see the results for Harmonic Mean and low-frame (the score of the lowest quality frame in the file, an indicator of the potential for transient quality issues).

VMAF harmonic mean and low-frame quality with and without -tune zerolatency using the x264 codec.
Table 1. VMAF harmonic mean and low-frame quality with and without -tune zerolatency using the x264 codec.

For harmonic mean VMAF, zerolatency costs about 2.33 VMAF points on the top-quality stream in your encoding ladder. You can look at this in two different ways. First, is that most viewers can’t discern a 3 VMAF point differential, so don’t worry, be happy. The glass-half-empty view is that you’d have to boost the bitrate of the zero latency stream by between 500 kbps to 1 Mbps to achieve the same quality as a stream encoded using the normal latency settings.

Let’s visualize the difference using the Riverplate soccer clip, which showed the greatest Harmonic Mean and low-frame delta. Figure 1 shows the Results Plot from the Moscow State University Video Quality Measurement Tool with the zero latency file in red and normal latency in green. To be fair, most of the really low zones in red were crowd shots that few viewers would notice. Still, better quality is always better, and the frequent red drops in quality are meaningful.

Figure 1. Results Plot comparing the VMAF frame scores with tune - zerolatency in red and without in green.
Figure 1. Results Plot comparing the VMAF frame scores with tune – zerolatency in red and without in green.

A quick comparison of the switches used for zero latency  (on the right in Table 2) and normal latency settings when using the Medium preset revealed a host of differences that could impact quality. For example, B-frames drop from 3 to 0 while reference frames drop from 3 to 1.  Certainly, reducing lookahead from 40 to 0 would impact the encoder’s ability to detect screen changes; hence the reduced low-frame scores, particularly in clips with lots of scene changes like the Riverplate clip.

Switches impacted by tune - zerolatency compared to the medium x264 preset.
Table 2. Switches impacted by tune – zerolatency compared to the medium x264 preset.

I’m not going to fully explore the difference between threads and sliced threads here but may do so down the road. Very briefly, using multiple threads during encoding increases latency because each frame is encoded by a single thread; the more threads, the greater the latency.

In contrast, slices divide each frame into slices, which are handled by separate threads. This may reduce quality slightly, but it improves throughput, which may allow you to use a higher-quality preset. That’s why sliced threads are enabled for zero-latency and not for normal (see here for a full explanation).

The Latency Cost of x265 – Not So Bad

I ran the same tests using the x265 codec and the command string below, again with and without the -tune zerolatency option. I used the superfast preset as compared to medium to achieve faster than 30 fps on my test workstation.

ffmpeg -y -i soccer.mp4 -c:v libx265 -b:v 3580k -minrate 3580k -maxrate 3580k -bufsize 7160k -preset superfast -tune zerolatency -force_key_frames "expr:gte(t,n_forced*2)" -an soccer_zerolatency.mp4

As you can see in Table 2, the results were much closer. If you’re transcoding with x265 using a high-speed preset, you may not experience the same quality penalty as there was with x264. In fact, low-frame quality is actually a bit higher.

VMAF harmonic mean and low-frame quality with and without -tune zerolatency using the x265 codec.
Table 3. VMAF harmonic mean and low-frame quality with and without -tune zerolatency using the x265 codec.

Table 4 shows why the quality delta may not be that significant, as the values for the Superfast preset aren’t that different from the Zero Latency values. Beyond those shown, though the Zero Latency tune doesn’t control reference frames, x265 uses only a single reference frame for the Superfast preset, which will carry through to the Zero Latency value. The bottom line is that the superfast encoding switches are so constrained that tuning for zerolatency doesn’t further degrade output quality.

Switches impacted by tune - zerolatency compared to the superfast x265 preset.
Table 4. Switches impacted by tune – zerolatency compared to the superfast x265 preset.

Of course, if you encode using a higher-quality preset, it likely won’t improve quality significantly anyway since the zerolatency tune would likely eliminate many of the high-quality configurations. Since you’d probably have to deploy multiple threads to support a higher-quality preset, you’d also be boosting latency. Any way you look at it—quality, throughput, or latency—encoding with x265 in software appears suboptimal.

The Bottom Line

The bottom line is to recognize that deploying a low-latency transcoding setting may impact video quality, particularly if you’re encoding with x264. When the target latency is sub 1 second, say for conferencing, auctions, gambling, and other interactive applications, you really have no option. However, when encoding for distribution via any low latency ABR application, you may want to consider opting for higher quality as opposed to lower latency.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Steve Strong Details Norsk's Suitability for Broadcasters

Norsk for Broadcasters: Interview with id3as’ Steve Strong

Recently, I spoke with Steve Strong, id3as’ co-founder and director, about Norsk’s suitability for broadcasters. …

Five star review for video quality metrics course.

New Five Star Review for Video Quality Metrics Course

The Computing and Using Video Quality Metrics course teaches encoding pro to compute and use video metrics like VMAF, PSNR, and SSIM.

NAB Session on AI in Video Streaming

Like most encoding professionals, I’ve followed AI-related streaming advancements for the last few years. I’m …

Leave a Reply

Your email address will not be published. Required fields are marked *