Summary (The MPD)
When distributing video under constrained conditions, the bitrate control technique used to encode the files can have a profound impact on the quality of experience (QoE). Specifically, under some conditions, CBR-encoded files deliver a superior QoE to files encoded using 200% constrained VBR, while also reducing the overall bandwidth delivered.
For producers concerned with the potential for transient quality issues with CBR-encoded files, 110% constrained VBR avoids these transient quality issues and the QoE-related problems experienced with 200% constrained VBR. Producers currently distributing files encoded using 200% constrained VBR should consider running trials with CBR-encoded files to determine if this improves QoE.
In addition, the initial file played during HLS playback can have a profound impact on initial QoE. Optimal playback is achieved when the highest rate the playback device can sustain is played first. If it’s not possible to determine this, choose the highest quality “safe” rate, but avoid a situation where the majority of your audience will have to quickly switch down after attempting to play the highest quality stream.
This video shows an introductory video describing the test file and test procedure for tests run at 3200 kbps. Click over to YouTube to view the video in full screen.
Segment 1: The Test File
As an overview, the test involved a three-minute long synthetic test file comprised of 30 seconds of talking head followed by 30 seconds of high motion ballet. Both files were embedded into a test chart video to make it simpler to discern when the player switched streams (see top of Figure 1).
Figure 1. The two extremes, talking head on the left, ballet on the right.
The file was produced in Adobe Premiere Pro, output as a 30 mbps 1080p mezzanine file and output to HLS format in Capella Systems Cambria FTC encoder (keyframe interval two seconds, segment size six seconds). The streams were produced using the same bitrate ladder (Figure 2), but in two bitrate control modes, constant bitrate encoding (CBR) and 200% constrained variable bitrate encoding (VBR). Videos were produced on an HP Z840 workstation.
Figure 2. The bitrate ladder from Capella Systems Cambria FTC encoder.
Note that we customized which file was played first for each of our two tests. When testing with a throttle of 3200 kbps, we positioned the 2100 kbps file first in the .m3u8 file, so it was the first file called. With the 4500 kbps throttle, the 3100 kbps file was first. As we’ll discuss at the very end, we originally used the highest quality file as the starting file, which degraded playback results for both CBR and VBR files.
To visualize the frames in the encoded files in Bitrate Viewer, we also output MP4 files using the same parameters. The frame profile for the highest quality (1080p@4500 kbps) CBR file is shown in Figure 3. Note the average data rate of 4538 kbps on the upper right. The individual spikes show the size of each group of pictures (GOP) with the average data rate shown as a faint blue line just under the 5000 kbps line.
Figure 3. The frame profile for the CBR file.
The frame profile for the VBR file is shown in Figure 4. As you can see, though the overall data rate is very similar (4515 kbps compared to 4538 for the CBR file) the data rate varied significantly between the low and high motion regions.
Figure 4. The frame profile for the VBR file. Note the dramatic difference in data rate between the talking head and dance sequences.
Again, we produced the .ts files in 6-second chunks comprised of three GOPs with 2-second keyframe intervals. With the VBR file, the average size of the talking head segments was roughly half the size of the ballet segments. Each 3:00 (min:sec) file was comprised of 30 six-second segments. Since the Cambria system labeled the first segment 0, the final segment was segment 29.
After encoding the files, we uploaded them to a website for deployment.
Segment 2: Embedding the Files
We tested the playback of the two files using two different players; the HLS player in the Safari Mac browser, and the JWPlayer. To test the Safari browser we loaded the .m3u8 file directly into Safari. Links to the two playlists are below:
VBR file: http://www.doceopub/VBR/Playlist.m3u8
CBR file: http://www.doceopub/VBR/Playlist.m3u8
To test the JWPlayer, we ingested the .m3u8 files into the JWPlayer online video platform service and created a simple player for playback. Then we loaded those pages into Safari. Links to the two files are below:
VBR file: http://www.doceopub.com/JW_VBR.html
CBR file: http://www.doceopub.com/JW_CBR.html
Segment 3: Playback Tests
We used version 4.0 of the Charles Debugging Proxy for our tests, which served two purposes. First, Charles allows you to throttle the download and upload bandwidth available to the system to simulate constrained playback conditions. We ran two sets of tests, the first with the throttle set at 3200 kbps (Figure 5), the second set at 4500 kbps. The premise was that since the segment size in the ballet sequences in the VBR clip were so much larger than those of the talking head clips, that the VBR file would experience more stream switches and potentially other problems.
Figure 5. Setting the throttle in Charles Debugging Proxy.
Another other useful feature in the Charles Debugging Proxy is the ability to see which chunks the player was retrieving, as seen on the left in Figure 6. We would play the video in Safari on the right, and could see the segments being retrieved in Charles on the left. We recorded all tests in Camtasia and used these files to create the video files that accompany this post.
Figure 6. Charles also made it easy to see which fragments the player was retrieving and when. Click the figure to see it at full resolution in a separate browser window.
Here are the steps we used for all tests.
1.We played back all files in Safari, either using the HLS player in Safari by loading the .m3u8 file, or using the JWPlayer by loading the webpage with the embedded players.
2.We cleared Safari’s buffer between each test.
3.The throttle was engaged throughout.
Segment 4: Quality Overview
The are multiple papers on subjective QoS, several of which I reference here. While there are few absolutes, here are my assumptions:
•Breaks of any kind any kind (audio, video, or both) detract from the experience.
•Loss of synch and similar issues also degrade QoE.
•Playing back higher quality streams improve QoE, while lower quality streams degrade QoE, though this is tempered by the next point.
•QoE improves when stream switching is minimized, particularly when the stream stabilizes at a good (if not great) level of quality. This is probably preferable to most viewers than obvious quality switching, even if switching occasionally results in playing back a higher quality stream.
With this as background, let’s review our tests. Note that I am solely comparing VBR vs. CBR playback within each player, rather than comparing the playback experience delivered by the two players, which would require much more testing under a variety of conditions.
Segment 5: Throttling at 3200 kbps
We’ll start with an extensive look at the results at 3200 kbps, and finish with a summary look at the 4500 kbps tests. Note that you can watch the tests via YouTube videos included at the bottom of each test description.
Segment 6: Test Results – Safari
While playing the VBR clip in Safari, the clip stopped for almost three seconds during the first switch to the ballet sequence, with an audio break about a second later. Just before the switch back to talking head at the one-minute mark, there was an audio stoppage, and the audio was clearly out of synch for about 15 seconds. There was also an audio stop at 2:39.
Figure 7 (from Charles) shows most of the packets received during the playback experience, with VBR on the left, and CBR on the right. As you can see, during VBR playback, Safari pulled packets from layer 3 and layer 4, and spent significant time in layers 3 (1500 kbps) and 4 (1000 kbps). In comparison, CBR playback was exclusively in layers 1 and 2.
Figure 7. Fragments retrieved by the Safari HLS player (VBR on left, CBR on the right). Note that there at eight index files in the left and only three on the right.
If we assume that Safari switched streams each time it downloaded an index file, the VBR stream switched seven times (eight index files less the initial index file for the start of playback), while the CBR stream switched only twice (three index files less the initial index file). With several of the switches in the VBR stream involving layers 3 and 4, you would assume that most viewers would notice them. In comparison, the switches in CBR playback involved only the two highest quality streams, making detection much less likely.
Table 1 summarizes the playback data. As you can see, VBR was much less efficient than CBR, retrieving 49 segments rather than 34 for CBR (there were 30 in the 3:00 file). Even though VBR resulted in 63% of packets retrieved from layers 3 and 4 (compared to 0% for CBR), the VBR experience retrieved 17 MB of additional data, a difference of about 24%. In other words, the VBR playback experience cost about 24% more for bandwidth than the CBR file.
Table 1. Tale of the numbers in Safari for VBR/CBR playback.
You can watch the Safari tests performed at 3200 kbps in this video. Click over to YouTube to view the video in full screen.
Segment 7: Test Results – JWPlayer
With the VBR file, JWPlayer experienced one four-second audio/video stoppage during the first switch from talking head to ballet (same place as Safari), and one other slight audio stoppage, but no loss of synch. Fragments downloaded during playback are shown in Figure 9. JWPlayer spent significant time in the relatively low quality layer 4, but only switched streams four times (five index files less one). CBR playback was very much like HLS with the vast majority of the time spent in layer 2, and two apparent stream switches (three index files minus one).
Figure 8. Fragments retrieved by JWPlayer (VBR on left, CBR on the right). Note that there at five index files in the left and only three on the right.
Table 2 shows the numbers. Again, VBR was much less efficient from a fragment retrieval perspective (ten more than were necessary), but since the majority of the time was spent in layer 4, less data was actually retrieved during CBR playback than during VBR playback. Looking at the table, it’s easy to see that the overall QoE delivered by the CBR streams was much, much higher.
Table 2. How the numbers looked for JWPlayer for VBR/CBR playback.
You can watch the JWPlayer tests performed at 3200 kbps in this video. Click over to YouTube to view the video in full screen.
Segment 8: Throttling at 4500 kbps
In these tests, the first file listed in the .m3u8 file was the 3100 kbps file, so this was the first file retrieved by both players. The throttle in Charles was set to 4500 kbps.
In VBR playback tests, both the Safari player and JWPlayer experienced complete stoppages of about six seconds just after the first ballet sequence appeared 30 seconds in, with a shorter audio break a few moments later. Both players lost audio synch for the rest of the file playback. Since the problems were nearly identical in both clips, we played the HLS streams from the index files (http://doceopub.com/VBR/Layer_2/index2.m3u8, http://doceopub.com/VBR/Layer_3/index3.m3u8) in Safari to determine if the problem existed in the packaged files, but the clips played normally. This proves that the problems were caused by the lack of ability to retrieve the VBR segments in time to continue smooth playback.
As you can see in Table 3, during VBR playback, both Safari and the JWPlayer retrieved segments in three layers, dropping down as low as the 1500 kbps stream. The retrieved index files in both VBR tests showed four index files, indicating that both players switched streams three times.
Table 3. Playback with the throttle set at 4500 kbps. Click the table to view it at full resolution in another browser window.
In contrast, CBR playback with both players was as efficient as possible; both players started with the 3100 kbps file, and remained at the level the entire time. Taking into account the stoppages, loss of synch and switches experienced in the VBR tests, it’s safe to say that the quality of experience produced by the CBR files was superior.
You can watch the Safari tests performed at 4500 kbps here. Click over to YouTube to view the video in full screen.
You can watch the JWPlayer tests performed at 4500 kbps here. Click over to YouTube to view the video in full screen.
Segment 9: Other Tests
For the record, I tested a number of times before arriving at the final test configurations, adjusting the encoding ladder, the initial stream called by the HLS player, and the throttling level, among other factors. In general, the QoE results were consistent with those discussed here, with CBR showing fewer stoppages and less switching.
A summary of these tests is shown in Table 4, which do not include the results presented above. As you can see, on average, with the Safari HLS player, CBR was 17% more efficient than VBR for both the number of fragments retrieved and total bandwidth retrieved. This number jumps to a 25.65% bandwidth saving for JWPlayer when deploying CBR over VBR. We expected that CBR would improve QoE under constricted playback conditions, with the bandwidth saving an unexpected benefit.
Table 4. Summary of other test results.
Segment 10: But What About CBR Quality?
As we have written about previously, most notably in a Technical Brief entitled, Switch from CBR to VBR to Improve Overall Quality and Avoid Transient Quality Issues , CBR-encoded video often exhibits transient quality issues as compared to VBR. With this particular test file, the 1080p VBR version had a slightly higher PSNR value than the CBR file (41.1 dB compared to 40.7 dB) a qualitative difference that would not be visible to the normal viewer.
As shown in Figure 9, the Moscow University Video Quality Measurement Tool revealed no transient quality issues with the CBR file (in red), which exhibited slightly higher quality in the talking head sequences (where the data rate was higher than VBR) and slightly lower in the ballet sequences (where the data rate was lower). Transient quality issues would appear as dramatic drops of the red results; for example the drop to 36 dB on the lower right in the last clump of results (we checked the frame and the difference wasn’t visible).
Figure 9. PSNR comparisons shown no transient quality issues in the 1080p version of the test file.
In the aforementioned technical brief, we state that “Producing using 110% constrained VBR seems to avoid these quality issues without introducing significant data rate variability.” To test how 110% constrained VBR would perform under the conditions tested here, we produced a 110% constrained version that’s available at http://www.doceopub.com/110_CVBR/Playlist.m3u8 (we did not create a JWPlayer version). As shown in Figure 10, while the differences between the talking head and ballet segments are not as stark as those shown in Figure 4 for the 200% constrained VBR clip, they are significant. The average size of the talking head segments was 2.7 MB, compared to about 4.2M for the ballet clips, a difference of over 50%.
Figure 10. The frame profile for the 110% constrained VBR file.
The VBR playback experience constrained to 3200 kbps is shown on the left in Figure 11, with most of the time spent in Layer 2, but seven segments spent in layer 3, which the CBR file did not experience. With five index files, it appears that Safari changed streams four times, as compared to two stream changes for the CBR file.
Figure 11. 110% constrained VBR on the left, CBR on the right.
On the other hand, there were no stoppages during the playback of the 110% constrained VBR-encoded file, or loss of synch, so the major elements that detracted from QoE were not present in the 110% constrained VBR file. If you’re concerned about transient quality issues in CBR files (as you should be) 110% constrained VBR may be the best alternative.
Segment 11: Conclusions
What conclusions can we draw from this data? Obviously, the test file presents close to the worst-case analysis, and the constraint imposed by Charles is likely much more absolute than will be experienced in the real world. So it’s hard to shout that the sky is falling, and that those distributing at 200% constrained VBR (or similar) are significantly impacting their QoE. Still, we learned that:
1.Under some constrained playback conditions, files encoded using 200% constrained VBR are more likely to experience stoppages, audio breaks, and loss of synch, and to switch layers more often than CBR files. Under these playback conditions, files encoded using 200% constrained VBR deliver a degraded QoE as compared to CBR.
2.Under the same conditions, the stream switching necessitated by the VBR bitstream may result in more segments being downloaded, and increased bandwidth consumption.
How to apply this? In essence, the issue comes down to this; does encoding using 200% constrained VBR produce enough extra quality over CBR or 110% constrained VBR to outweigh the risks of QoE issues like those discussed above? As shown in Table 5, the answer is probably not.
This table shows six test files encoded to 1080p resolution using the encoding techniques listed atop each column. Data rates vary by file as we’ve used per-title encoding to compute the appropriate data rate for each file.
Table 5. PSNR quality for files encoded using different bitrate control techniques.
The table shows PSNR values, where higher values are better, with the highest value in green and the lowest in red. The Total Quality Delta shows the difference between the highest and the lowest value, while the Delta 110% to 200% column shows the quality difference between files encoded using 200% constrained VBR, and 110% constrained VBR. In only one instance, Big Buck Bunny, does this value exceed 4%, and even 4% likely would not be noticeable by even the most golden-eyed viewer. In other words, it’s highly unlikely that using 200% constrained VBR over 110% constrained VBR would noticeably improve QoE.
As you know, QoE has two components, file quality and delivery quality. As we’ve seen, under some conditions, 200% constrained VBR can seriously degrade delivery quality and overall QoE. Accordingly, any files that may be delivered over constrained connections, like mobile, should be encoded using 110% constrained VBR, or CBR. If you’re creating one set of files for all of your targets, you should strongly consider using 110% constrained VBR.
3.If you’re already producing files using 200% constrained VBR, you should consider testing files encoded via CBR or 110% constrained VBR in either a split or sequential test. All analytics packages are different, but if you can isolate the layers played during VBR and CBR playback, or stoppages or other issues experienced, you should be able to tell if encoding in CBR or 110% constrained VBR does result in less stream switching than 200% constrained VBR files, fewer stoppages, and/or less overall bandwidth for each video play.
Segment 12: Ancillary Conclusion: Customize Your M3U8 files (or MPD Files) To Match Sustainable Bandwidth
As mentioned above, the stream listed first in the M3U8 file is the stream played first by the HLS player. Our tests revealed that choosing the right stream could have a dramatic impact on the efficiency and initial QoE of video playback. This is shown in Figure 12. Specifically, Figure 12 shows the segments pulled during HLS playback of the 110% constrained VBR file in the Safari browser with bandwidth constrained at 3200 kbps. On the left, the initial file in the M3U8 file was layer 0, or the 4500 kbps file. On the right, the initial stream is set to layer 2. In both streams, we recorded until the eighth segment started downloading.
Figure 12. Playback of the 110% constrained VBR stream with layer 0 called first on the left, and layer 2 called first on the right.
On the left, with layer 0 called first, Safari knew immediately there was a problem, and started retrieving layers as low as layer 6 to address it. The viewer sees a quick panoply of multiple quality levels, finally settling in at layer 2 about 36 seconds in. During this scramble, Safari retrieved 16 segments to play the first eight, which obviously wastes bandwidth.
On the right, with layer 2 the initial stream, Safari immediately settles into smooth playback, and downloads eight segments to play eight segments.
Most encoders like the Capella Systems Cambria encoder used in our tests, create the .M3U8 file based upon the order of layers in the encoding preset. If you build your encoding ladder from the top down, like we did, the highest quality file is listed first. If you build from the bottom up, the lowest quality stream is listed first. Seldom, if ever, are either of these choices the correct one.
For desktop or OTT playback, the initial stream should be a reasonably high quality stream, but the not the highest; perhaps the lowest quality 720p stream (see Figure 2). That way, quality starts at a good level, and playback quality should improve for most viewers. For mobile, the initial stream should be lower quality, perhaps the lowest quality 360p stream. What you want to avoid is a high bitrate stream that will push the player into panic mode, resulting in the low QoE and inefficient playback shown on the left in Figure 12.
About the Streaming Learning Center
The Streaming Learning Center is a premiere resource for companies seeking advice or training on cloud or on-premise encoder selection, preset creation, video quality optimization, adaptive bitrate distribution, and associated topics. Please contact Jan Ozer at firstname.lastname@example.org for more information about these services.