One of the nice things about the streaming industry is that many folks will share their knowledge with you if you ask a quick question on a subject they are passionate and knowledgeable about. So it was with a question I asked of Alex Zambelli, now with iStreamPlanet, and formerly with the Microsoft team that streamed multiple Olympics and other large events.
By way of background, I was chatting with a contact about low-latency ABR video, and I thought the concept might be an oxymoron. So I fired off the following note to Alex, who I knew would know.
Alex:
What’s the minimum latency I could expect if implementing smooth streaming or DASH?
How far was the live stream behind TV in the Olympics that you worked with (and other events).
Please let me know.
To which he replied:
There are multiple factors that go into determining ABR latency, the most important of which are:
· Video encoder buffer duration
· Segment/fragment duration
· Server-side buffer duration, aka lookaheadChunks value (exclusive to Smooth Streaming)
· CDN delivery latency
· Player buffer duration
What I’ve found in my experience is that with a typical HSS fragment duration of 2 seconds and a video encoder buffer of 4 seconds, the encoder-to-origin latency for Smooth Streaming is typically < 10 seconds. By reducing both durations one could probably tune that down to ~5 seconds.
The bulk of the latency actually gets introduced in the origin-to-player part of the path. An IISMS or AMS server will buffer 2 fragments on the server by default (4 seconds), the CDN delivery path will likely introduce at least a few seconds of latency in just getting fragments propagated through its network for the first time, and then finally – and most importantly – the player will buffer however much data it deems necessary to provide smooth playback resistant to network jitter. In the case of the Smooth Streaming Client SDK, that default value is 13 seconds. So when you add that all up for HSS, the origin-to-player latency is generally around 18 seconds. By reducing all of these elements (though still caching some fragments – let’s not get too crazy), you could probably reduce that to 6 seconds.
So in summary… for HSS the typical end-to-end latency is 25-30 seconds, while with some tuning (though nothing too extreme, else you risk making playback unstable) that could be reduced to 10-12 seconds.
HLS follows the same rules, though with HLS the segment duration is typically larger – 6 to 10 seconds – which means some of the individual latencies get compounded as a result. It takes longer for the encoder to produce a segment, it takes longer to upload a segment, it takes longer to download a segment, and finally it takes more time to buffer a whole segment. So in my experience the typical HLS end-to-end latency is in the neighborhood of 40 seconds.
In the real world I’ve seen live events sometimes experience a latency of over 1 minute, though often times that’s by choice (e.g. customer choosing increased buffer & stability over low latency), but most of the time I’d say the e2e latency numbers follow the typical numbers laid out above.
GPAC last year did some research around low-latency DASH and found that latency could be brought down to < 250 ms, but not without making some major sacrifices: H.264 GDR feature may not be compatible with all decoders and decreases video quality per bitrate; chunked-transfer encoding that may not be supported by all encoders, servers and clients; and using very short GOPs leads to decreased video quality per bitrate. So the answer for DASH seems to be “Yes, you can have low-latency streaming, but it will cost you.” For most DASH applications I suspect that the lowest latency most customers can hope for will still be in the 10-20 second range.
I asked Alex if I could publish the above on this blog, and he agreed, but wanted me to make it clear that your mileage may vary, and that there are few absolutes in streaming. So if your HLS latency is 20 seconds, more power to you, write in and let us know how you did it-Alex will be the first to say, “Well Done!” He concluded by sharing that:
There are a lot of factors that go into ABR end-to-end latency, so the range of results can be quite wide. It’s not uncommon for us to run 2 live channels for 2 different customers and see vastly different latency results. Some of it is format specific, but some of it is just down to configuration details, so blanket statements such as “HSS is lower latency than HLS!” are not necessarily applicable.
So there you have it. Thanks Alex!
Depending upon your application, 10 – 60 seconds may or not be a big deal. For a live concert, debate, or other performance that you can’t watch simultaneously on TV, it’s likely acceptable. If you’re watching Sunday Night Football on TV, and cutting over to streaming for highlights, it could be a stretch. For any kind of interactive presentation or demo, it would probably feel like you’re communicating with someone located on the planet Mars.