This article is kind of a prequel to my book, Encoding by the Numbers, which I published in 2016. That is, I published this article to get commentary from folks who read it, which I factored back into my book. Unfortunately, I changed content management systems in 2018 and lost the comments.
One of the topics I’m addressing in my upcoming book is the VBR rate control model, a very complicated concept. Here’s the section in the book, which I’m making available for comments, corrections, and clarifications. I’m pretty sure the technical details about how the model works are correct, but I’m guessing the description and definition could use some improvement. If you take the time to comment, please be as complete as possible so I can make the corrections. Also please include a reference, so I can verify your comments.
The overall chapter is on bitrate control, and there are some references to previous content, but this section is pretty self-contained. I hope you find it helpful. If you want to know when the book, tentatively entitled Encoding by the Numbers, is available, you can sign up for the Streaming Learning Center newsletter on the upper right.
Here’s the excerpt.
Contents
Working with the VBV Buffer
With some applications, like Telestream Episode Pro, you control the bitrate of your file with a combination of average bitrate, VBV size, and VBV Max Bit rate, as shown in Figure 1. This model is also used by cloud services like encoding.com, and in the FFmpeg command line.
So, what the heck is a VBV and how do you use it? That’s what you’ll learn in this section.
Figure 1. Controlling bitrate with the average, max, and VBV size.
By way of background, the VBV buffer is a model used by the encoder to ensure that the bitstream produced doesn’t exceed the limitations of the decoder. For example, if you set the VBV buffer to 5000 kbps, the encoder produces a stream that won’t underrun or overrun a 5,000 kbps buffer. The model was created and deployed back in the days when most decoding was performed in very limited hardware and was necessary to ensure smooth playback on these devices.
Today, when most players are software-based or contained in vastly more capable hardware, the buffer is used more to control the quality and variability of the encoded file than to ensure player compatibility. Still, as you’ll see towards the end of this section if you’re encoding for low-end devices like older iPhones you have to pay attention, because if the buffer is too large, it may not play on the device.
Some authorities state that the size of the buffer is the enforcement period for your VBR encoding. For example, if your target data rate is 2,000 kbps, your maximum data rate 4,000, and your buffer 2,000, it means that every second of video should average 2000 kbps. If your buffer is 10,000 kbps, it means that each five-second chunk must average 2000 kbps. But I’ve not been able to verify this by objectively analyzing encoded files, because the stream variability always exceeds this limitation.
What I have been able to verify is that buffer size affects both stream variability and video quality. Intuitively, smaller buffers should reduce quality, because the encoder has less wiggle room to adjust the data rate upwards or downwards to match scene complexity. On the other hand, a smaller buffer should produce a stream with less variability. Conversely, you would expect a file with a large buffer to enjoy higher quality, but also more stream variability. Let’s test these assumptions.
To create the results shown in Table 1, I encoded a short section from the movie Zoolander to 1080p at 4 Mbps using 2-pass CBR encoding. The first file was encoded with a buffer target of 2 Mbps, but the resultant data rate was only 3,587, outside of my 5% target. I leave this in as a reminder that you should check the data rate of your CBR encoded files, particularly those with a small buffer, since they are typically well below the target, at least when encoding with x264. To meet the 4 Mbps target, I re-encoded at 4.4 Mbps to create the next file, which has an average data rate of 4,112 kbps. I encoded the next four files at that same data rate with 4, 8, 12, and 16 MB buffers. I determined the peak bitrate of each file in Bitrate Viewer and used this value to compute the maximum data rate variance. I computed PSNR with the Moscow University tool.
Table 1. The impact of buffer size on data rate variance and video quality.
As you can see, the file encoded with a 16 Mbps buffer had the highest variance and the highest quality, which are shown in chart form in the next two figures. As you can see in Figure 2, increasing buffer size has a dramatic impact on data rate variability. Basically, if you plan on encoding with CBR, you should limit the buffer to one second.
Figure 2. Increasing the buffer size increases data rate variance.
As shown in Figure 3, encoding with a small buffer size reduces overall quality somewhat, but not significantly. That is, the difference between 36.44, the PSNR value achieved with the 4 Mbps encoded file with a 2-second buffer, and the 37.06 PSNR value for the file encoded with the 16 Mbps buffer, is only 1.68%.
Figure 3. Increasing the buffer size increases data rate variance.
What is significant is the low-frame quality or the quality of the lowest quality frame in the video. Here, the differential between 2 MB and 16 MB is close to 6 PSNR points, a significant, potentially noticeable difference of 20.67%. So, while the overall quality might not suffer significantly, small buffers mean potentially ugly frames in the video, which could impair the quality of the viewing experience.
Note that increasing the buffer size has a similar impact on stream variability when encoding with constrained VBR. For example, I encoded the 1080p version of Tears of Steel to 2 Mbps using 150% constrained VBR with a VBV buffer of one and two seconds. With a one second buffer, the data rate variance from the overall target was 50.1%, almost perfect. At two seconds, the variance was 82%, almost 30% higher. So whether you’re encoding with CBR or constrained VBR, you should expect buffer values in excess of one second to impact stream variability.
Maximum Buffer in Practice
What maximum buffer values are used in practice? Table 2 provides a partial picture of two distinct targets, streaming and podcasts. For streaming, I grabbed videos from the sites listed and downloaded the podcasts from iTunes. To learn the target, max data rate, and max buffer settings, I inspected the files in MediaInfo. Sometimes, though not all the time, the command line information is saved in the Encoding Settings box shown in Figure 4 below. Again, this doesn’t happen with all files, but when it does, you’ve got the precise recipe used by the producer to encode the file.
Table 2. Buffer durations of video files from varying sites.
As you can see, most streaming files use a one or two-second buffer, most likely to limit stream variability for the files delivered over the internet. The numbers are much higher for the few podcasts that provided the information, though Apple, who obviously knows iTunes and their devices, used a buffer duration of over six seconds. Obviously, deliverability isn’t a factor for downloaded podcasts, and the longer buffer times should increase the quality to some degree.
Figure 4. This data isn’t included in every encoded file, but it’s incredibly useful when it is.
Note that the O’Reilly Factor video is 640×360 and encoded in the Baseline profile, which means it plays on very old devices, going back to the iPhone 4 and iPod touch 4. If you’re encoding podcasts for these devices, I would follow Fox’s lead and use a fairly small buffer size. On the other hand, the This Week in Tech video is encoded at 864×480 resolution using the Main profile, which excludes the aforementioned older devices, and uses a 12-second buffer. The Apple video is 1080p resolution and uses the High profile, so it’s targeted towards the newer class of iOS devices and uses a 6.25-second buffer.
I don’t know the VBV-buffer related specs for all iOS devices, so I’m not going to explore further. The most conservative approach is to copy the parameters shown in the table and customize the buffer size for the lowest common denominator playback device.
For streaming, the size of your VBV buffer should be dictated by your concerns over stream variability. If you want your streams to meet your CBR or even constrained VBR targets, use a buffer size of one second. If you don’t care about variability, a longer buffer will improve overall quality, however slightly, and may significantly improve the quality of the lowest quality frames in the video file.
______________________________________________
End of section: If you’d like to know when the book ships (late summer, 2016), you can sign up for the newsletter on the top left of the page.