This research-based article details the factors to consider when choosing the segment duration for DASH or HLS.
By way of background, when choosing an HLS or DASH segment duration for VOD video, the nature of the server/player connection is critical. For persistent connections, a segment size of two-three seconds produces good quality and optimal system throughput. For non-persistent connections, a segment size of six seconds produces the best overall throughput. With live video, if latency is a concern, shorter segments can decrease latency but may increase the risk of buffering and reduced QoE. In all instances, you should thoroughly check how these configuration options work on your key target platforms before deployment.
Whenever you create an adaptive bitrate package like HLS or DASH, you have to choose a segment size. When choosing a segment size for on-demand videos (VOD), your primary considerations are encoding quality and throughput. These also matter for live video, but latency is often a more important consideration. Let’s start with a look at VOD.
1. Longer segment sizes (and keyframe intervals) improve quality, but not as much as you think.
When encoding for adaptive streaming, your keyframe interval must divide evenly into your segment size to ensure a keyframe at the start of each segment. Accordingly, the longer the segment size, the longer the potential keyframe interval. That is, if your segment size is two seconds, the longest keyframe interval you can support is two seconds, whereas if your segment size is nine seconds, you can encode with a keyframe interval of up to nine seconds (or 4.5, 3, or 1).
Longer keyframe intervals typically result in higher video quality because keyframes, which are encoded without reference to any other frame, are the least efficient frame. However, the difference is actually pretty modest for most files as you can see in Figure 1 which shows the PSNR values for 1080p files encoded at the same target bitrate (which varied by file) using the keyframe intervals shown on the bottom.
The test video Big Buck Bunny showed the largest PSNR difference between keyframe intervals of one and ten seconds, but the quality difference was only 5.57% in total (see table below). Most other videos showed much less differential, with the bulk of the difference recouped when switching from one second to two seconds. Beyond two seconds, the improvement slope flattened out, and it’s impossible to believe that the quality difference between two and ten seconds would be noticeable to even the most discriminating viewer. The bottom line is that any keyframe interval beyond two seconds will deliver equivalent quality in the eye of the viewer.
As shown in the table above, note that my findings for synthetic files were much greater, including a total delta of 16.71% for a Camtasia-based screencam file and 25.17% on a PowerPoint-based tutorial. If you’re working with this type of content, run your own tests to figure out if my results were idiosyncratic or representative for these types of files.
2. Segment size affects throughput differently with persistent and non-persistent connections.
Here we look to research performed by friend and colleague Stefan Lederer, CEO and CO-Founder of bitmovin, who runs an online video platform and distributes a highly-regarded HTML5 player. In his blog post, Optimal Adaptive Streaming Formats MPEG-DASH & HLS Segment Length, Stefan first presented results similar to those shown above that detail how keyframe interval impacts quality. Then he focused on how segment length impacts network throughput.
Before considering his results, let’s get theoretical for a moment. Intuitively, shorter segment lengths are more responsive to changing network conditions, which should improve throughput. On the other hand, shorter segments also require more communication between the player and the server to access the segments, which slows throughput. How did this theory play out in the practice?
As you can see in Figure 2, Stefan found a dramatic throughput difference depending upon whether the server connection was persistent or non-persistent. Specifically, if the connection was persistent, the optimum segment size was 2-3 seconds. If the connection was non-persistent, these smaller segment sizes were very inefficient, and performance peaked at a segment length of six-seconds.
Briefly, according to Wikipedia, a persistent connection uses “a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.” Here’s a further explanation from Alex Martelli on stackoverflow (with minor edits to shorten).
Persistent means the server doesn’t close the socket once it’s finished pushing out the response, so the client can make other requests on the same socket. Reusing the socket can reduce overall latency compared to closing the original socket and opening new ones for all the follow-on requests.
Applying the impact of persistent vs. non-persistent connections to ABR streaming, Lederer continues, “the influence of the network delay (RTT) gets bigger when using smaller segment lengths. This especially affects the non-persistent/HTTP1.0 connection results because in this case one round-trip-time (RTT) is needed for establishing the TCP connection to the server after each segment.”
How do you know if your streaming server or CDN uses a persistent or non-persistent connection? Check documentation and settings. For example, this came from the Amazon Cloudfront documentation:
When CloudFront gets a response from your origin, it tries to maintain the connection for several seconds in case another request arrives during that period. Maintaining a persistent connection saves the time that is required to re-establish the TCP connection and perform another TLS handshake for subsequent requests. To improve performance, we recommend that you configure your origin server to allow persistent connections.
Based upon this research, Stefan recommends “DASH or HLS chunk sizes around 2 to 4 seconds, which is a good compromise between encoding efficiency and flexibility for stream adaptation to bandwidth changes. Furthermore, it is recommended to use Web servers and CDNs that enable persistent HTTP connections, as this is a easy and cheap way to increase streaming performance. Thus as presented, by doing so the effective media throughput and QoS can be increased without any changes to the client’s implementation just by choosing the right segment length.”
Obviously, if you’re using a non-persistent connection for some reason, you should increase the segment duration to somewhere around six seconds.
3. In live applications, a smaller segment size can also improve latency, though there are some caveats, and you should thoroughly test any configuration before deployment.
The genesis of this blog post came from a request from a colleague to comment on Bruce Wilkinson’s post on LinkedIn’s Live Video forum, where he asked:
Looking to lower the latency of an adaptive streaming (HLS) solution which, as most of you know, can only be done by reducing the .ts segment length. We are considering a 2-second segment which may get the latency closer to 6 seconds. Making the encoded GOP length of 2 seconds will not be a problem. My main concern is the net effect of increased communication between the client’s player and the index page. Any thoughts or advice on such short segmenting would be appreciated.
I pointed my colleague to Stefan’s post as discussed in point 3. However, Grant Simonds responded to Bruce’s question as follows (with minor edits).
Broadcasters always ask for lower latency without understanding the trade-offs: Lower latency means more chance of buffering and every time the stream buffers it halves the customers satisfaction level with your service.
Recently we had a provider set up a live stream of tennis which used 4-second segments and 3 segments per playlist – in testing it worked most of the time on mobile phones, but under load there was a lot of buffering on some phones and on Roku players it buffered after each segment (not useable). I’d say a safe minimum is 5-second segments with 6 segments per playlist but it depends on the players and the network in between.
Grant makes two key points. First, attempting to create the shortest possible latency increases the risk of buffering, which absolutely degrades QoE. Second, all playback devices respond to configuration options like segment size differently. Before finalizing your configuration, you should thoroughly test on all relevant devices.
Beyond this, understand that segment size is only one element in overall latency. For a great overview all the factors, check out Understanding ABR Latency: A Guest Post from Alex Zambelli.
About the Streaming Learning Center
The Streaming Learning Center is a premiere resource for companies seeking advice or training on cloud or on-premise encoder selection, preset creation, video quality optimization, adaptive bitrate distribution, and associated topics. Please contact Jan Ozer at email@example.com for more information about these services.