Beginners’ Guide to Adaptive Bitrate Streaming

This document describes what adaptive bitrate streaming  (ABR) is and its components and operation. It’s intended for newbies and provides a high-level overview, not a detailed how-to for implementing ABR.

These materials were adopted from lessons in the online course Streaming Media 101: Technical Onboarding for Streaming Media Professionals.

Adaptive Bitrate Streaming Overview

ABR technologies encode live and on-demand videos into multiple files (called an encoding ladder) for delivery to a range of playback devices connecting via different speeds. ABR technologies adapt in at least two ways; sending different files in the encoding ladder to different devices depending upon their playback capabilities and adapting the file sent to a particular client to changing bandwidth conditions. This allows ABR technologies to deliver an optimum viewing experience to those watching on a range of devices over many different connection speeds.

YouTube is a great example of adaptive bitrate streaming.

Figure 1. YouTube’s implementation of ABR streaming uses a six-rung ladder and can work in either auto or manual mode. How I start my day Monday – Sunday.

Most ABR technologies work automatically without user input. As shown in Figure 1, however, the YouTube Player, many also allow the viewer to choose the stream to be viewed.

ABR streaming achieved mass-market success as a technology deployed by Adobe using the Flash Media Server and compatible products using the RTMP protocol. As you can see in Figure 2, from the Bitmovin 2020 Video Developer Report, approximately 36% of survey respondents continue to use RTMP-based ABR streaming, with 13% planning to implement RTMP-based streaming in 2021. RTMP-based technologies require a server to communicate with the player which makes them more expensive to implement, and some firewalls reject RTMP packets which can cause delivery issues.

Figure 2. ABR formats deployed per the 2020 Bitmovin Video Developer Report

They also rely upon the Flash player, which was end-of-lifed by Adobe in January 2021. Though there are alternatives, most new ABR deployments are based upon HTTP-based ABR technologies.

Most prominent is Apple’s HTTP Live Streaming (HLS), which launched in 2009 and accounts for the lion’s share of ABR streaming today. HLS is followed by the MPEG standard Dynamic Adaptive Streaming over HTTP (DASH) which shipped in 2012. Since both of these technologies are HTTP-based, they can run on a plain HTTP web server without a streaming server. This cuts the implementation cost over RTMP technologies that require a server, improves scalability, and because HTTP packets aren’t blocked by firewalls, improves delivery reliability.

Microsoft shipped their own HTTP-based ABR technology called Smooth Streaming in 2008, which is still supported by 24% of survey respondents, mostly to support game platforms like the Xbox. Adobe introduced HTTP Dynamic Streaming (HDS) in 2010, though it never really caught on and today is only supported by 5% of Bitmovin respondents.

The Common Media Application Format (CMAF) is a standards-based container format launched in 2018. As explained in more detail below, CMAF isn’t an ABR technology. Rather, it’s a technology that allows producers to create one set of files they can deliver to multiple ABR delivery technologies, including HLS, DASH, Smooth Streaming, and HDS. As you can see in Figure 2, CMAF was supported by 21% of respondents in 2021, but 36% of respondents plan to adopt it in 2021.

Rounding out Figure 2, video delivered via progressive streaming is typically enabled via a single encoded file. While this reduces encoding costs, it doesn’t optimize either quality or deliverability for viewers watching on different devices or connection speeds, and is generally disfavored for mission-critical video content.

This document focuses on HTTP-based Adaptive Streaming streaming technologies, not RTMP, and not progressive streaming.

How ABR Technologies Work

There are two sides to ABR delivery technologies, file preparation and playback. To prepare the file, you encode the live or on-demand video into various files that constitute the encoding ladder. Then, during a phase called packaging, these files are segmented into short chunks for sequential delivery. The packaging software also creates manifest files that the player will access during playback to determine file compatibility and location. Digital Rights Management technologies, or DRM, are often applied to protect the playback of premium content during packaging.

During playback, the player uses the manifest files to choose and retrieve the media segments and play them.

Creating the Encoding Ladder

Figure 3 shows the encoding ladder from Apple Technote TN2224 which was deprecated and replaced by the Apple HLS Authoring Specification available here. I’m showing this obsolete ladder because it does a great job illustrating how the different rungs of the ladder were encoded for compatibility for a range of different Apple devices, old and new. The H.264 profile is one key differentiator, with the Baseline profile used for the oldest and least powerful set of devices, while resolution and bitrate are the others.

Figure 3. Apple’s iconic encoding ladder from TN2224.

The HLS Authoring specifications succeeded TN224 after about eight years. During that time, virtually all the devices shown on the right of the ladder have been replaced by newer, more capable devices. Accordingly, in the HLS Authoring Specifications, Apple recommends encoding all files using the High profile though many producers still encode lower-quality files in the encoding ladder using the Baseline or Main profile for compatibility with older devices.

A couple of points on encoding ladder creation. First, most producers below the very top rung of producers like Netflix, Amazon, and Hulu, create a single encoding ladder to deliver to all target platforms. This means that you have to understand the compatibility requirements of all customers and make sure that they can play at least one stream, even on legacy devices.

Second, because all video files present their own unique blend of encoding complexity, many producers are transitioning away from a single encoding ladder for all content. For example, you might create a ladder with a top rung of 3.5 Mbps for animated or talk show content, since that bitrate provides the necessary quality because both types of content are relatively simple to encode. In contrast, you might create a ladder with a top rung of 7 Mbps for football or hockey, or action movies because the higher rate is necessary to maintain quality for this harder-to-encode content.

Techniques that change the encoding ladder for the content are called per-title encoding systems. You can learn much more about per-title encoding here.

Packaging

During the packaging phase, the packaging software breaks the individual ladder rungs into segments for sequential delivery and creates the manifest files (Figure 4). The packager may also apply DRM during this stage. Note that these segments can be actual separate files or byte-range requests within a single file which is more efficient than uploading thousands of tiny files to the webserver.

There are two kinds of manifest files; media playlists and master playlists. The packager creates a media playlist for each form of content, whether audio, video or captions. This media playlist contains the URL for all segments in the file and enables their retrieval.

The master playlist contains the location of all media playlists and certain file characteristics that determine player compatibility, like the H.264 profile discussed above. During playback, the player first loads the master manifest, finds compatible streams in the encoding ladder and then loads the media manifests to start retrieving the video, audio, and closed caption segments.

Figure 4. The packaging phase-segmenting files and creating the manifest files.

Different technologies use different formats for the manifest files and different extensions. HLS manifest files use the .M3U8 extension, while DASH files use.MPD. After creating the segments and manifest files, they are uploaded to a web server for delivery and playback.

ABR Playback

To play back the files, the player first loads the master manifest file, which lists the compatibility-related playback characteristics of each content type and the location of the associated media manifest file. Figure 5 shows a simple master manifest file for HLS.

After loading the master manifest, the player scans all files to identify those it can play. In Figure 5, for example, a very old iOS device might not be able to play any file larger than 640×360.

Figure 5. A master manifest file for HLS showing critical compatibility-related configuration options and the location of the media manifest files (.M3U8).

When playing back HLS content, the player will scan the master manifest and start to play the first content that it’s compatible with; in the case of Figure 5, it would be the 512×288 174,000 kbps stream for most devices. To start playback, the player identifies the stream, locates the associated media manifest file, then the location of the first segment, which it downloads and starts to play.

Figure 6 shows a media manifest file for separate segments of a 720p file. You can tell that the segments are separate because they each have a .ts extension signifying a file packaged using the MPEG-2 transport stream container format, which is common for HLS. Since the files are in the same folder as the media manifest file the URL is simply the name of the file. If located in a different folder they media manifest file would contain the complete address.

Figure 6. A media manifest file listing all the segments and their locations.

Once the player starts playing a segment it monitors the playback buffer. If the buffer remains full, this means that the connection speed is adequate for the current rung, so the player will try to retrieve a higher quality rung. If the buffer remains full, the player might retrieve a higher quality segment; if the buffer drops, the player will switch back to a lower rung to maintain playback. The player continues to analyze the buffer throughout playback to ensure uninterrupted playback using the highest quality file the connection speed can support.

Though there are minor differences, all HTTP-based ABR technologies work similarly. You create the encoding ladder, segment the files, create the manifest files and upload them to a web server. Then the player retrieves the master manifest and starts the playback process.

Choosing an ABR Format

You choose an ABR format based upon the playback platforms that you’re targeting and how and where your videos will play. For example, if you want your videos to play in a browser on iOS devices, you have to use HLS. If you’re playing back in an app, you can use any ABR technology. Overall, as shown in Figure 7, you can use DASH or HLS for most platforms including Smart TVs.

Figure 7. Choosing an ABR format.

If you scan Figure 2, you’ll note that the numbers for current and future ABR technology usage exceed 100%; in fact, 79% of respondents currently support HLS, while 62% support DASH. That’s because most producers support multiple formats so they can deliver to all of their target devices.

Supporting Multiple ABR Technologie

At a high level, there are two ways to support more than one ABR format, static or dynamic. With static support you create a separate set of files for each format, so one for HLS, one for DASH, one for Smooth, and so on. This can increase encoding costs because you’re creating four sets of files and storage costs on the origin server which is relatively expensive. It also makes very little sense because all ABR technologies use the same encoded files as the source, they simply store them in a different container format.

With dynamic ABR support, you upload your unpackaged encoding ladder to the origin server and use a process like the Wowza Streaming Engine or Nimble Streamer to package for each viewer on the fly. If an iPhone requests the video, the process would package the files into an HLS-compatible container format, produce .M3U8 manifests and start delivery. If a Smart TV, the process would package the files into a DASH-compatible container format, produce the MPDs, and deliver those. These operations can occur in real time with minimum latency because manifest file creation is lightweight, and because the process is changing the container format of the media files, not reencoding.

While dynamic delivery saves encoding and storage costs, you have to license the dynamic software and maintain a separate cloud instance to run it 24/7, which gets expensive. So, neither option is optimal, and it’s no surprise that the Bitmovin report shows that 46.5% of respondents use static packaging, while 37.6 use dynamic.

Enter the Common Media Application Format (CMAF)

Efficient multiple format support is the reason that CMAF was created. As shown in Figure 8, on the left are the files necessary to support HLS and DASH with separately packaged files for each format. The HLS files are packaged using the MPEG-2 Transport Stream container format with M3U8 manifest files, while the DASH files are created using fragmented MP4 files and MPD manifest files.

To support CMAF, Apple extended compatibility to fragmented MP4 files. Once accomplished, a single set of fragmented MP4 files with both MPD and M3U8 manifests could support both DASH and HLS playback. Since the encoded audio and video is much larger than the manifest files, this is very efficient from a storage perspective. Since Microsoft’s Smooth Streaming and Adobes HDS both supported fragmented MP4 playback you could also add their respective manifest files to CMAF collection, extending support to those formats with minimal additional expense.

Figure 8. You can deliver a single set of CMAF files to multiple ABR formats (not my diagram – if it’s yours, ping me at [email protected] and I’ll add the credit).

The downside of CMAF is compatibility; some viewers on very old platforms may not be able to watch the streams. For example, any iOS device that can’t support fragmented MP4 files can’t play HLS streams from a CMAF container. CMAF is clearly the future, however, and Disney launched their new streaming service in 2019 using files delivered in a CMAF container. Obviously, with a new service, legacy support isn’t an issue so Disney was free to use CMAF without concerns for stranding clients with older devices.

Resources

Lesson of the Week: Per-Title Encoding in 6.2 Dimensions – Streaming Learning Center, March 2021. Details the evolution of per-title encoding from simple optimization technologies to complex, AI-driven systems. It’s an example of the technical perspective the course provides and our commitment to continually updating course content.

How Many Rungs on Your Encoding Ladder? – Streaming Media, March 2021 – video except.

CMAF Proof of Concept – Streaming Learning Center, February 2020. The Common Media Application Format (CMAF) is supposed to be the Holy Grail of streaming; one set of files that you can deliver to multiple output points. How well does it work today? This 3:47 video shows you.

The Evolving Encoding Ladder: What You Need to Know – Streaming Learning Center, May 2019. This article discusses the evolution of the encoding ladder from the fixed ladder presented by Apple in Tech Note TN2224 to context-aware-encoding, which creates a ladder that not only considers the encoding complexity of the content, but also the producer’s QoE and QoS metrics. The encoding ladder embodies the most significant encoding decisions made by encoding professionals, and understanding this evolution is critical to optimal encoding and delivery.

Saving on Encoding and Delivery: Dynamic Packaging – Streaming Learning Center, August 2018. You can dramatically reduce net encoding and storage costs by implementing dynamic packaging for your live or VOD video. This article defines dynamic packaging, explores its benefits, and outlines some implementation options.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Single-Pass vs Two-Pass VBR: Which is Better?

Let’s start this article with a quiz regarding how the quality and encoding speed of …

My FFmpeg-Related New Year’s Resolution: Document Before I Test

My typical workflow for testing-related articles and reports is to create the command string, run …

Transcoding UGC: ASICs Are the Lowest Cost Option

The predominant use for ASIC-based transcoders like NETINT’s Quadra Video Processing Unit (VPU) has been …

2 comments

  1. Thanks for the explanation, nice and crisp.

Leave a Reply

Your email address will not be published. Required fields are marked *