B-Frames, Ultra Low-Latency Encoding, and Parking Lot Rules

One of my sweetest memories of bringing up our two daughters was weekly trips to the grocery store. Each got a $5.00 bribe for accompanying their father, which they happily invested in various tchotchkes that seldom lasted the week. When we exited the car, “parking lot rules” always applied, which meant that each daughter held one of Daddy’s hands for the walk to the store. Two girls, two hands, no running around the busy parking lot.

Parking lot rules came to mind as we debugged a decoding latency issue when testing a new server product called the Quadra Video Server. Initial tests revealed a decoding latency of up to 200 milliseconds in some high-volume configurations. Given that the encoding latency was under 20 milliseconds, the decoding numbers were uncomfortably high.

Eliminate B-Frames from the Origination Stream

After raising the issue, our testing team implemented a fix, which dropped latency to under 20 milliseconds, and decreased encoding latency as well. The change is the parking-lot-rules corollary for live streamers, which is “for ultra-low latency, eliminate B-frames from your live streaming workflow.” For most live encoders and transcoders, disabling B-frames for AVC or HEVC should be simple in the GUI or via a change to your command string.

A quick glance at Figure 1 reveals why B-frames blow-up decoding latency (shoutout to OTTverse, where we grabbed the image). B-frames, of course, incorporate redundancies from frames before and after the frame being encoded. They are packed and decoded out of order. Any frame decoded out of order adds latency – the further they are out of order, the greater the latency.

B-frames are packed out of order and can increase decode latency.
Figure 1. B-frames are packed out of order and can increase decode latency.

Will eliminating B-frames (or the Baseline H.264 profile) reduce the quality of the incoming stream? Only minimally, if at all. These streams are typically produced at a relatively high bit rate, so B-frames or higher-quality profiles deliver minimal additional quality. It’s even less likely that any decrease in quality would be noticeable in the output stream (see here).

B-Frames and Latency

Let’s pause for a moment and reflect on the bigger picture. Figure 2 shows the typical live-streaming workflow. We’ve been talking about B-frames in the on-premise encode impacting the decoding latency in the transcoding server. What about B-frames in the transcoding server when encoding streams for delivery to viewers?

B-frames from on-premise transcoder will increase latency from transcoding server.
Figure 2. B-frames from the on-premise transcoder will increase latency from the transcoding server.

Predictably, the result is the same. B-frames introduce the same latency during encoding for delivery for the same reason–packing frames out of order introduces delays. This is why, when implementing low-latency mode with the NETINT Quadra Video Processing Unit and T408 transcoder, you must use a GOP preset that encodes with consecutive frames.

When you get things right – incoming streams without B-frames and outgoing streams without B-frames, the results are transformative. Let’s have a look.

Tue Low Latency Transcoding

Table 1 below shows the actual testing results. This use case involves scaling 1080p AVC input down to 720p for delivery, which is common for interactive gaming, auction sites, and conferencing, and the server can produce 320 streams while encoding AVC, HEVC, and AV1. I don’t have the original data for the input file with B-frames, but as I recall, decoder latency averaged 150 – 200 ms, a noticeable break in a live conversation. Even worse, unlike encoder latency, it didn’t drop significantly in low-delay mode.

As you see in the table, after the fix, total latency is around 160 ms for all outputs in normal (latency-tolerant) mode. Working with the input file without B-frames, and outputting streams without B-frames, combined encoder and decoder latency plummets to around 22 ms, well under a single frame (which for 30 fps video takes 33 ms to display). That’s low enough for even the most latency-sensitive applications.

Encode/decode latency in normal and low-delay mode (with a properly formatted input file).

Table 1. Encode/decode latency in normal and low-delay mode (with a properly formatted input file).How much will the lack of B-frames impact quality in the output encoding ladder? Once again, B-frames have delivered surprisingly little value in the tests that I’ve performed. You can read a good article on the subject here and access updated data here (see page 22), which shows less than a 1% quality difference between streams with and without B-frames. The bottom line, of course, is that if your application needs ultra-low latency, you have to prioritize that over any potential quality loss, though it’s good to know that few, if any, viewers will notice it.

Returning to the thoughts that prompted this article, when my daughters have their kids, an endearing wish is that they implement parking lot rules in all relevant shopping trips. Given their progress to date, this may not occur in my lifetime. If you’re a live-streaming engineer, you have no similar excuse to ignore the corollary. If latency is critical, make sure you eliminate B-frames from your live-streaming workflows.

Again, the server referenced is the Quadra Video Server, which combines ten Quadra video transcoding units (VPUs) with a SuperMicro chassis driven by a 32-core CPU. The total cost should be around $20,000 for this configuration. Stay tuned for more details.

(Author’s note: this article was edited after publishing to remove the recommendation to use the baseline profile to eliminate B-frames from H.264 streams. As several LinkedIn commenters pointed out, a better solution was to use the High profile and simply disable B-frames in the GUI or command string). 

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

CSAI vs SSAI in Video Ad Insertion: A Comprehensive Guide with Recommendations

Introduction Ad insertion technologies play a crucial role in monetization strategies. Two primary methods dominate …

DCVC-B: A New Deep Learning Codec for Efficient B-Frame Compression

In a recent white paper titled Bi-Directional Deep Contextual Video Compression (DCVC-B), researchers Xihua Sheng, …

M3-CVC: A Glimpse into the Future of AI-Driven Video Compression

A new AI-based codec proved 18% more efficient than VVC but substantial decoding requirements will …

Leave a Reply

Your email address will not be published. Required fields are marked *