Book Excerpt: VBV Buffer Explained

One of the topics I'm addressing in my upcoming book is the VBR rate control model, a very complicated concept. Here's the section in the book, which I'm making available for comments, corrections, and clarifications. I'm pretty sure the technical details about how the model works is correct, but I'm guessing the description and definition could use some improvement. If you take the time to comment, please be as complete as possible so I can make the corrections. Also please include a reference, so I can verify your comments. 

The overall chapter is on bitrate control, and there are some references to previous content, but this section is pretty self-contained. I hope you find it helpful. If you want to know when the book, tentatively entitled Encoding by the Numbers, is available, you can sign up for the Streaming Learning Center newsletter on the upper left. 

Working with the VBV Buffer

With some applications, like Telestream Episode Pro, you control the bitrate of your file with a combination of average bitrate, VBV size, and VBV Max Bit rate, as shown in Figure 8-14. This model is also used by cloud services like encoding.com, and in the FFmpeg command line. What the heck is a VBV and how do you use it? That's what you'll learn in this section. 

Figure8_14.png

Figure 8-14. Controlling bitrate with the average, max, and VBV size. 

By way of background, the VBV buffer is a model used by the encoder to ensure that the bitstream produced doesn't exceed the limitations of the decoder. For example, if you set the VBV buffer to 5000 kbps, the encoder produces a stream that won't underrun or overun a 5,000 kbps buffer. The model was created and deployed back in the days when most decoding was performed in very limited hardware, and was necessary to ensure smooth playback on these devices. 

Today, when most players are software-based, or contained in vastly more capable hardware, the buffer is used more to control the quality and variability of the encoded file than to ensure player compatibility. Still, as you'll see towards the end of this section, if you’re encoding for low-end devices like older iPhones you have to pay attention, because if the buffer is too large, it may not play on the device. 

Some authorities state that the size of the buffer is the enforcement period for your VBR encoding. For example, if your target data rate is 2,000 kbps, your maximum data rate 4,000, and your buffer 2,000, it means that every second of video should average 2000 kbps. If your buffer is 10,000 kbps, it means that each five-second chunk must average 2000 kbps. But I've not been able to verify this by objectively analyzing encoded files, because the stream variability always exceeds this limitation. 

What I have been able to verify is that buffer size effects both stream variability and video quality. Intuitively, smaller buffers should reduce quality, because the encoder has less wiggle room to adjust the data rate upwards or downwards to match scene complexity. On the other hand, a smaller buffer should produce a stream with less variability. Conversely, you would expect a file with a large buffer to enjoy higher quality, but also more stream variability. Let’s test these assumptions. 

To create the results shown in Table 8-2, I encoded the Sintel clip to 1080p at 4 mbps using 2-pass CBR encoding. The first file was encoded with a buffer target of 2 Mbps, but the resultant data rate was only 3,587, outside of my 5% target. I leave this in as a reminder that you should check the data rate of your CBR encoded files, particularly those with a small buffer, since they are typically well below the target, at least when encoding with x264. To meet the 4 Mbps target, I re-encoded at 4.4 Mbps to create the second file, which has an average data rate of 3,893 kbps. I encoded the next three files with a 4, 8, and 12 Mbps buffer. I ascertained the peak bitrate of each file in Bitrate Viewer (see Figure 8-5), and used this value to compute the maximum data rate variance. I computed PSNR with the Moscow University tool. 

Table08-2.png

Table 8-2. The impact of buffer size on data rate variance and video quality. Green means the highest value (not necessarily the best value). 

As you can see, the file encoded with a 12 Mbps buffer had the highest variance and the highest quality, which are shown in chart form in the next two figures. As you can see in Figure 8-15, increasing buffer size has a dramatic impact on data rate variability. Basically, if you plan on encoding with CBR, you should limit the buffer to one second, perhaps even .5 seconds.

Figure8_15.png

Figure 8-15. Increasing the buffer size increases data rate variance.

As shown in Figure 8-16, encoding with a small buffer size has an obvious cost in quality, though the slope on the graph makes the difference seem more significant than it really is. That is, the difference between 40.44, the PSNR value achieved with the 4 Mbps encoded file with a 2 second buffer, and the 40.92 PSNR value for the file encoded with the 12 Mbps buffer, is only 1.2%.

Figure8_16a.png

Figure 8-16. Increasing the buffer size increases data rate variance.

Note that increasing the buffer size has a similar impact on stream variability when encoding with constrained VBR. For example, I encoded the 1080p version of Tears of Steel to 2 mbps using 150% constrained VBR with a VBV buffer of one and two seconds. With a one second buffer, the data rate variance from the overall target was 50.1%, almost perfect. At two seconds, the variance was 82%, almost 30% higher. So whether you’re encoding with CBR or constrained VBR, you should expect buffer values in excess of one second to impact stream variability.

Maximum Buffer in Practice

What maximum buffer values are used in practice? Table 8-3 provides a partial picture of two distinct targets, streaming and podcasts. For streaming, I grabbed videos from the sites listed, and downloaded the podcasts from iTunes. To learn the target, max data rate, and max buffer settings, I inspected the files in MediaInfo, which you learned about in Chapter 3. Sometimes, though not all the time, the command line information is saved in the Encoding Settings box shown in Figure 8-17 below. Again, this doesn’t happen with all files, but when it does, you’ve got the precise recipe used by the producer to encode the file. 

 Table08-3.png

Table 8-3. Buffer durations of video files from varying sites.

As you can see, most streaming files use a one or two second buffer, most likely to limit stream variability for the files delivered over the internet. The numbers are much higher for the few podcasts that provided the information, though Apple, who obviously knows iTunes and their devices, used a buffer duration of over six seconds. Obviously, deliverability isn’t a factor for downloaded podcasts, and the longer buffer times should increase the quality to some degree. 

 Figure8_17.png

Figure 8-17. This data isn’t included in every encoded file, but it’s incredibly useful when it is.

Note that the O’Reilly Factor video is 640x360 and encoded in the Baseline profile, which means it plays on very old devices, going back to the iPhone 4 and iPod touch 4. If you’re encoding podcasts for these devices, I would follow Fox’s lead on a fairly small buffer size. On the other hand, the This Week in Tech video is encoded at 864x480 resolution using the Main profile, which excludes the aforementioned older devices, and uses a 12 second buffer. The Apple video is 1080p, and uses the High profile, so it’s targeted towards the newer class of iOS devices, and uses a 6.25 second buffer. 

I don’t know the VBV-buffer related specs for all iOS devices, and it’s a bit off topic for this book, so I’m not going to explore further. The most conservative approach is to copy the parameters shown in the table, and customize the buffer size for the lowest common denominator playback device. 

For streaming, the size of your VBV buffer should be dictated by your concerns over stream variability. If you want your streams to meet your CBR or even constrained VBR targets, use a buffer size of one second. If you don’t care about variability, a longer buffer will improve quality, however slightly.

______________________________________________

End of section: If you'd like to know when the book ships (late summer, 2016), you can sign up for the newsletter on the top left of the page. 


Comments (10)

Gabe Russell
Said this on 5-23-2016 At 09:20 am

Interesting article - great to see some of these numbers in action. I was a bit confused by this section:

"With a one second buffer, the data rate variance from the overall target was 50.1%, almost perfect. At two seconds, the variance was 82%, almost 30% higher."

Why is 50.1% "almost perfect"? Wouldn't be 0% be perfect - zero variance from the constraint?

Said this on 5-23-2016 At 11:14 am
Thanks for the read and comments Gabe.

Yeah, I agonized over how to describe this. With 150% constrained VBR, you would expect 150% variation from the overall target. So a 50% variation from the overall target is almost exactly 150% constrained VBR, so it's almost perfect.

I'll noodle on how to make this clearer.

Thanks again for the read and comment.

Jan
Said this on 5-23-2016 At 03:49 pm

"The first file was encoded with a buffer target of 2 Mbps, but the resultant data rate was only 3,587, outside of my 5% target. I leave this in as a reminder that you should check the data rate of your CBR encoded files, particularly those with a small buffer, since they are typically well below the target, at least when encoding with x264. To meet the 4 Mbps target, I re-encoded at 4.4 Mbps to create the second file, which has an average data rate of 3,893 kbps."

I would recommend leaving that out, as it's confusing and bemuddles the point you're trying to make. Sounds like you may need to re-run your tests with a different source or a different encoder build. It's strange that an encoder's rate control would be consistently off-target for 15-minute video, so including that in a "here's how VBV works" chapter seems out of place.

The greater point I think you may be missing is the role VBV size plays in modern streaming. Constraining buffer sizes may no longer matter much for decoding purposes, but the following two factors still apply:

1) Larger VBV sizes increase e2e latency in live streaming workflows. Every second added to VBV duration adds 2 seconds (1 encode + 1 decode) to e2e latency.

2) Larger VBV sizes increase bitrate oscillations across fragments/segments in HLS/DASH, theoretically making it more difficult for player heuristics to appropriately adapt playback bitrate to network conditions. The larger the VBV, the more likely it is to make uneven segments (e.g. 6MB+1MB+8MB, instead of 5MB+5MB+5MB).

Said this on 5-23-2016 At 04:09 pm
Thanks Alex.

Good point on the confusion; i'll take that out.

I didn't know about the latency, will add that. Is that live only?

Covered heuristics in the sections on VBR vs. CBR, but will re-emphasize the point.

Thanks again for your comment.

Jan

Jan
Purvin Pandit
Said this on 5-24-2016 At 06:58 am

Jan

Few comments/questions:

  1. You start off by talking about VBR but your experiments are CBR based. Why is that? Might be better to run both types and present results. 
  2. You seem to use seconds and bps to represent Buffer size. Might be better to just either seconds or bits instead of bps.
  3. @Alex: your comment about "theoretically making more difficult..." is an interesting one. Are you talking about uneven segments in CBR or VBR scenario? I would expect to see variations when using VBR. What has your experience been with various players with uneven segment sizes (practical v/s theoretical).
  4. @Alex: I believe the "initial cpy removal delay" can be used to reduce the delay provided it doesnt cause underflow. I need to brush up on HRD to be certain though.
Said this on 5-24-2016 At 07:55 am
Purvin:

Good stuff, I'll be making some changes to make the content more consistent.

Tough to run more tests, as the combinations are endless. Which level of VBR constraint and what VBV max value? The start of the book tells the reader how to run their own tests and each chapter details some of the considerations of the particular configuration option discussed. I will be clearer as to the options used to run each test though, which I'll have to go back and add.

Thanks for taking the time to read and provide your comments.

Jan
Said this on 5-25-2016 At 05:25 pm

In practical terms, uneven segments shouldn't pose too big of a problem for player heuristics as long as the player buffers more than several segments at a time.

My example of larger VBV (CPB, if we want to keep the nomenclature correct) causing uneven segments can apply in both CBR and VBR scenarios. If your buffer duration is 6 seconds, but your segment duration is 2 seconds, your buffer would span ~3 segments at a time... so any uneveness in the rate control would be reflected in the segment sizes too.

Dan Erichsen
Said this on 5-26-2016 At 01:08 am
Hey, Jan.

Nice writeup. Hope you can tolerate some pedantry.

You write

Some authorities state that the size of the buffer is the enforcement period for your VBR encoding. For example, if your target data rate is 2,000 kbps, your maximum data rate 4,000, and your buffer 2,000, it means that every second of video should average 2000 kb (edit: removed some of your ps's). If your buffer is 10,000 kb, it means that each five-second chunk must average 2000 kb. But I've not been able to verify this by objectively analyzing encoded files, because the stream variability always exceeds this limitation.

There is definitely some element of truth in this when you want to support DASH scenarioes where a set of frames can constitute a single memory chunk marked for download, but if your decoder starts off with a VBV/HRD, the difference should be eaten by the VBV, so I'm not sure it really belongs in this chapter. Essentially the difference between the encoding and the transmission rate is the motion in the VBV which is your rate control. IIRC MPEG-2 TM5 allocated static blocks, but that was, what? '93? I'm really not sure that dogma is still alive. Sure, it should average out in the long run, and some positive difference between encoding and draining should be compensated by a negative difference and vice versa to ensure the VBV level does not pass its bounds, but not with the granularity you present. Please double check with your sources.

And sorry to say this, but I'm a bit hesitant about your graph. I like your numbers, half a second and one second are good choices for your measurements. However, the real risk you run by reducing your buffer size doesn't really show in average PSNR. A large buffer gives you the option of allocating many bits to one picture and then fewer to others as needed. As you limit the buffer size, you limit your maximum picture size and pictures that might have needed many bits will suddenly appear at a lower quality than your easy pictures. This will show in your instantaneous PSNR and may even be visible to the untrained eye, but it will not necessarily be caught by average PSNR. OK, if you're always above 44dB, you're probably doing fine, but some notion of standard deviation or a max-min marker on the graph would have been nice.

Thanks.
Said this on 5-26-2016 At 06:48 am
Dan:

Great stuff. I'm not sure I understand the second graph, but I think I'll just delete that explanation graph in the book and be done with it.

Your last point is great - I've kind of looked at the low points in a file with the frame graph in the VQMT tool - see figure 2 here:

http://www.streamingmediaglobal.com/Articles/ReadA...

But I may be able to formalize it for table presentation. I've been struggling with how to present findings for B-frame and reference frames, where the average value on some encodes is fine, but the quality drops through the floor for transient periods (mostly in animated, screencam, and similar synthetic videos). Your idea provides the inspiration to come up with a way to present this, not only for VBR but for many other areas (like CBR vs VBR, etc).

Thanks for the read and sharing your thoughts. Very useful.

Jan
Amnon Cohen-Tidhar
Said this on 5-27-2016 At 10:06 am

Nice,

Some comments:

1. never once in this VBV article you ectualy write what does VBV stand for ? (Video Buffering Verifier).

2. what's the purpose of it ?- preventing the decoder input buffer from overrun while allowing the encoder to deviate from the strict CBR.

3. how does it work ?-  runs a decoder buffer model inside the encoder which tells the encoder if it's allowed to produce higher instantaneous bit rate or not.

4. what's it good for ?- video scenes are not monotonous, so for uniform quality some parts requires more bits than other. under constant bit rate (CBR) the bitrate remains constant throughout the clip. VBV settings tells the encoder how much it allows to deviate form the CBR to move bits from "boring" parts to "action" parts.

5.  "if you set the VBV buffer to 5000 kbps" -  VBV is a buffer, you set its size, not speed, so it should be in kb not kbps.

 

hope that helps

Amnon

Post a Comment
* Your Name:
* Your Email:
(not publicly displayed)
Reply Notification:
Approval Notification:
Website:
* Security Image:
Security Image Generate new
Copy the numbers and letters from the security image:
* Message: