At its June 2017 Worldwide Developer’s Conference, Apple announced support for HEVC playback in HTTP Live Streaming (HLS) delivered to iOS, MacOS, and tvOS end points. For many, this announcement raised more questions than answers, which we hope to address in this article.
By way of background, at Streaming Media West 2017, I co-produced a preconference session on encoding HEVC for HLS with David Hassoun and Jun Heider from RealEyes Media, a streaming media consultancy. Much of the materials presented herein was assembled for that session, and you can download the handout for that presentation. As with the preconference session, I’m going to assume that you know the basics of HLS production and I will focus primarily on the HEVC-specific aspects.
Here are the top 10 questions for developers seeking to add HEVC to their HLS streams.
Contents
1. Which Devices Support HEVC Playback in HLS?
Three classes of devices support HEVC playback in HLS:
iOS: All devices compatible with iOS 11. This includes all iPhones from the 5s forward, iPad Air and iPad Pro models, iPad mini 2 and later models, the iPad 5th generation, and iPod touch 6th generation devices.
Mac: All devices compatible with MacOS 11, or High Sierra, including MacBook (late 2009 or newer), MacBook Pro (mid-2010 or newer), MacBook Air (late 2010 or newer), Mac mini (mid-2010 or newer), iMac (late 2009 or newer), and Mac Pro (mid-2010 or newer).
Apple TV: Apple TV 4K.
At this point, it’s also worth discussing what you don’t get by supporting HEVC in HLS, at least currently. Specifically, although Android supports HLS and HEVC, at this point Android can’t play back HEVC video included as part of an HLS presentation, though this could change at any time. Ditto for Microsoft Edge on Windows 10 platforms with HEVC hardware support, which can play HEVC streams, just not HEVC in HLS. Once you have the encoded HEVC streams, you should be able to transmux into DASH for delivery to both platforms, though that adds another process into the mix.
2. What’s the Effect on Battery Life?
To understand battery life, we compared CPU use between H.264 and HEVC playback for the Streaming Media West session with positive results, which you can read about in the article “HEVC in HLS: How Does It Affect Device Performance?” The pithy summary states, “Overall, while you may have issues with the oldest generation of supported mobile devices and computers, the next generation in all cases show only a slight increase in CPU use for HEVC playback, while new iPhones at least show relative parity. Publishers considering deploying HEVC should do so without concerns that the higher-end format will create significant battery life issues for most potential viewers.”
3. What Does Supporting HEVC Get Me?
Supporting HEVC will deliver multiple benefits, including economic savings and service improvements.
Bandwidth Savings: HEVC should deliver some bandwidth savings, though the benefits will vary from service to service and will depend on multiple factors. In terms of compression performance, HEVC should deliver about the same quality as H.264 at substantially lower bitrates, as much as 50 percent lower at 1080p, though this will drop substantially at lower resolutions. To determine how much actual bandwidth this will save you, you’ll have to start by checking your server logs to see the distribution of streams that you’re currently delivering.
At one end of the spectrum, assume that your typical viewer is watching a 4Mbps 720p H.264-encoded stream. Switching to HEVC would deliver little bandwidth savings because after making the switch, you would likely be delivering a 1080p 4Mbps stream. While the perceived quality of the stream would increase, the bandwidth would be the same.
On the other hand, I recently chatted with an OTT provider in Denmark who reported that 93 percent of streams delivered were the highest quality 1080p H.264 stream available, which was encoded at 8Mbps. In this case, bandwidth savings could be close to 50 percent because the service could deliver the same video quality using HEVC at about half the bandwidth.
Quality of Experience Benefits: Quality of Experience (QoE) benefits also depend on the streams that you’re currently delivering. In the case of the Danish OTT provider, QoE would vary very little since the perceived quality of the 1080p H.264 stream encoded at 8Mbps would be nearly identical to the 1080p HEVC stream encoded at 4Mbps (see Table 1).
Table 1. QoE improvements from using HEVC rather than H.264
On the other hand, if you’re currently delivering mid-ladder streams to your mobile customers, the QoE benefits could be quite substantial, as you can see in Table 1. Briefly, to complete the table, I produced optimal encoding ladders in both H.264 and HEVC for two test clips, Tears of Steel and Sintel. Then I computed the VMAF score for each clip, with the Delta column on the right showing the improvement from using HEVC instead of H.264.
By way of background, VMAF scores of six or more represent a just noticeable difference (JND), which means that 75 percent of viewers would notice the difference. By using higher-resolution streams for HEVC at 365-2000Kbps, the QoE benefits are quite substantial. On the other hand, as previously mentioned, when both codecs are displaying at 1080p, the QoE benefits are minimal. The bottom line is that while many vendors will tout that HEVC delivers bandwidth savings, QoE improvements, or both, mileage will vary by producer, and you’ll need to check your own logs to gauge the benefits of adding HEVC to the mix.
High Dynamic Range (HDR): Although I won’t spend a lot of time on HDR in this article, it’s worth noting that the newest release of HLS does incorporate HDR. This simplifies the delivery of HDR video to all supported HLS end points.
4. What Does HEVC Support Cost?
There are multiple categories regarding HEVC support costs.
Encoding and Storage Costs: Obviously, you’ll have to encode your videos into HEVC format. If you’re encoding internally, you’ll have to calculate the cost of buying and maintaining additional encoding platforms, if needed. If encoding in the cloud, cost will vary by the number of rungs in your encoding ladder, as well as resolution and data rate. At high volumes, you should be able to achieve encoding costs of well under $20/hour for all rungs. You’ll have to continue to encode in H.264 format for other targets, so these costs will be on top of H.264. Ditto for storage at the origin server.
Royalty for PPV and Subscription Services: If you’re distributing subscription or PPV video, you may already be paying royalties for H.264 usage under the MPEG LA H.264 patent pool. For HEVC, there are three pools, MPEG LA, HEVC Advance, and Velos Media. This is shown in Figure 1, which is adopted from a presentation given by Divideon’s Jonatan Samuelsson at Streaming Tech Sweden in November 2017.
Figure 1. HEVC IP owners and patent pools
Of the three pools, MPEG LA’s license terms do not include content royalties, and HEVC Advance charges $0.015/month per subscriber for 2018–2019. Velos Media hasn’t announced any proposed royalty terms yet, but as of Nov. 28, 2017, the site’s Q&A stated, “As it relates to content, we will take our time to fully understand the dynamics of the ecosystem and ensure that our model best supports the advancement and adoption of HEVC technology.” So, content royalties may be on the table.
With respect to the companies on the bottom left who haven’t joined a pool, it’s impossible to say whether they plan to charge content-related royalties or not. If you’re looking for a reason that streaming producers haven’t jumped aboard the HEVC/HLS train, it could very well be the uncertainty regarding content-related royalties.
Player Development: If all of your playback is achieved in the iOS/MacOS browser, player development should be minimal, as the native HLS players in both should handle HEVC automatically. If you’re deploying apps for delivery, some development costs may be involved.
5. What Are the Controlling Documents I Should Get to Know?
There are two sources of documentation that you should be familiar with. The first is the HLS Authoring Specification for Apple Devices that contains most of the specifications relating to HEVC usage. The second are HLS examples provided by Apple at go2sm.com/hlsexamples that fill in most of the details missing from the Authoring Specification. For example, the Authoring Specification states, “For backward compatibility some video content SHOULD be encoded with H.264.” Apple’s examples show exactly which HEVC and H.264 bitstreams Apple included in its HLS presentation, as we’ll share with you below.
6. I Know How to Encode with H.264. What Else Do I Need to Know to Produce HEVC?
If you understand H.264 encoding, you don’t need to know much more to produce HEVC. HEVC is a lot like H.264 and MPEG-2 before it, and most of what you know about data rates, keyframe intervals, bitrate control, and other common configuration options work very similarly. Like H.264, HEVC has different profiles, two of which are available for HLS—Main and Main 10. As the name suggests, Main10 encodes in either 8-bit or 10-bit bit depths, while Main is 8-bit only. HLS can handle either, though you’ll need to produce in Main 10 format for HDR output. Note that the Authoring Specification has detailed rules for bitrate control for live and VOD streams you should learn if you’re new to HLS encoding.
Most encoders will have some kind of trade-off between complexity and quality. For example, the x265 codec uses the same presets as x264 (ultra fast to placebo) while MainConcept uses multiple levels from 1 to 30. Once you get familiar with these controls for your codec/encoder, you should be in good shape.
7. What Are the Requirements for HEVC?
The requirements fall into three rough classes:
HEVC Encoded Files: The HLS Authoring Specification states, “Profile, Level, and Tier for HEVC MUST be less than or equal to Main10 Profile, Level 5.0, High Tier.” Table 2 shows the level restrictions from the Wikipedia HEVC pagewhich details the level and tier restrictions. Significantly, while you can encode 1080p video at frame rates as high as 128 frames per second, 4K resolutions are restricted to 30 fps or lower. Note that the HLS Authoring Specification prohibits frame rates beyond 60 fps for all codecs.
Table 2. Level and Tier restrictions for HEVC encoding
Another notable requirement from the Authoring Specification is that “The container format for HEVC video MUST be fMP4,” or fragmented MP4 files, which means that MPEG-2 transport streams are out. This should simplify delivering unencrypted HEVC encoded video to DASH and HLS clients since both should be able to deploy the same bitstreams. In the short term, differences between PlayReady and FairPlay encryption schemes may prevent interoperability of encrypted fMP4 content to DASH and HLS end points, though Microsoft has committed to resolving this for compatible hardware devices in 2018 with the release of PlayReady 4.0.
The HLS Authoring Specification contains two bitrate ladders, one for video files, the other for trick play files used for scrubbing and scanning. The video bitrate ladder is included as Figure 4. Note that the suggested bitrate ladder indicates that the frame rate for 2K and 4K resolutions be the same as source, which is identical to all other resolutions down to 540p.
However, if you’re working with 60 fps 4K source, the aforementioned Level 5 limitation restricts you to 30 fps as shown in Table 2. Unfortunately, Apple hasn’t posted any HLS examples with 2K/4K videos, which might resolve this seeming inconsistency. Until it is resolved, I recommend the conservative route and restricting 2K and 4K HEVC videos to 30 fps.
H.264 Encoded Files: As mentioned above, the Authoring Specification requires that some videos should be encoded with H.264, but provides no further guidance. So we looked at the mixed HEVC/H.264 ladder on the Apple developer site, and saw that Apple provided completely separate encoding ladders for both HEVC and HLS, nine rungs each, just as specified in Table 3, though the highest resolution supported in either format was 1080p. Looking at the master M3U8 manifest file, the player selects the codec first, then the appropriate rung (note that the Apple playlist calls the rungs “gears”).
Table 3. Apple’s suggested encoding ladder for H.264, HEVC, and HDR
This is interesting because before Apple provided its example, there were multiple theories for the optimal composition of an HEVC/H.264 ladder, including a ladder that provided H.264 for lower quality rungs and HEVC for the higher resolution rungs. At the session, several attendees and the two producers from RealEyes suggested that it would be tough for any software-based player to smoothly switch between H.264 and HEVC playback, which tends to support the Apple approach. The obvious downside is that it doubles your encoding costs and substantially increases storage costs.
At least until proven otherwise, I would recommend adopting Apple’s approach. You should also download the Master M3U8 file and mine this for other encoding and presentation details.
I-Frame/Trick Play Support: Apple added trick play support for fast forward and reverse playback, either in the video playback window or as thumbnails, in iOS 5, and detailed how to create I-frame playlists to support this feature in Apple Technical Note TN2288. In TN2288, Apple states, “you don’t need to produce special purpose content to support Fast Forward and Reverse Playback. All you need to do is specify where the I-frames are. I-frames, or Intra frames, are encoded video frames whose encoding does not depend on any other frame. To specify where the I-frames are, iOS 5 introduces a new I-frame only playlist.” According to TN2288, you don’t need to create a separate encoded file for trick play support, just a playlist that points to I-frames in existing content files.
In the HLS Authoring Specification, Apple modified this recommendation, stating, “You SHOULD have one frame per second ‘dense’ I-frame renditions. These are dedicated renditions that only contain I-frames. Alternatively, you MAY use the I-frames from your normal content, but trick play performance is improved with a higher density of I-frames.”
The spec also states, “If you provide multiple bit rates at the same spatial resolution for your regular video then you SHOULD create the I-frame playlist for that resolution from the same source used for the lowest bit rate in that group.” As further guidance, Apple provides the suggested encoding ladder shown in Table 4. As you would expect, the Apple sample presentation implemented these recommendations to the letter, with separate I-frame encoded files for both H.264 and HEVC at all suggested resolutions.
Table 4. The suggested trick play encoding ladder from the HLS Authoring Specification
By my count, between H.264 and HEVC content and I-frame-only files, Apple encoded the source video to 28 separate files, which may strain the budgets of some producers. This is particularly true for 4K producers, since Apple’s ladder didn’t include 2K/4K iterations, which are the most expensive to encode, and would have swelled total encoding requirements to 31 files, with potentially 17 more required for HDR.
During the session, these requirements generated significant discussions among the attendees, many of which had been producing HLS presentations for years. Most stated that they provided one or two trick play files, with few providing at all resolutions, and most pointing to the I-frames in existing files rather than encoding separate, I-frame-only files. Producers will have to make their own cost/benefit analysis to decide upon the optimal approach for them.
8. Should I Use Apple’s Suggestions Verbatim?
Sometime during the last revision or two of the Authoring Specification, Apple addressed per-title encoding implementations, stating that “The above bitrates are initial encoding targets for typical content delivered via HLS. We recommend you evaluate them against your specific content and encoding workflow then adjust accordingly.” So Apple isn’t dictating a fixed encoding ladder.
Beyond data rates, if you study Apple’s ladder, you’ll note that it uses essentially the same resolutions for HEVC and H.264 for all rungs below 2K. At the preconference session, one of the more technically savvy attendees suggested that Apple’s ladder should have completely different rungs for HEVC to account for the codec’s greater efficiency with high-resolution videos. This led to the analysis presented in an article entitled, “Apple Got It Wrong: Encoding Specs for HEVC in HLS.”
Long story short, the article proposes that the optimal ladder for HEVC would eliminate several lower resolution rungs, and push higher resolution rungs lower in the ladder. This is shown in Table 5, which shows Apple’s suggested ladder on the left and a more optimal ladder on the right (customized for the animated movie Sintel), along with VMAF scores rating the quality of both alternatives. For optimal QoE, you’ll get better results with the Should Be ladder, rather than the Was ladder designated by Apple.
Table 5. Apple’s HEVC encoding ladder on the left, proposed encoding ladder on the right
9. What Are My Live Options?
Live options are nascent but rapidly becoming available, and the presentation handout lists encoders from Bitmovin, Elemental, Harmonic, and Hybrik, as well as transcoding solutions from Wowza and Nimble Streamer. For developer-level producers, MulticoreWare, MainConcept, and Beamr all have SDKs, and the handout details how to produce output using FFmpeg and Bento4.
10. What Does the Spec Say About High Dynamic Range (HDR)?
The Authoring Specification states that HDR video must be encoded as either HDR10 or DolbyVision, and that HDR encoded streams should be provided at all resolutions. If you provide HDR content, you should also provide SDR content for the main video files and trick play files, as well as H.264 content, boosting the stream count to potentially dozens of individual files.
Note that Apple doesn’t yet provide an example file with HDR, leaving several questions unanswered, such as whether the required H.264 content can also serve as the SDR content, or whether producers should also supply separate HEVC-encoded SDR streams (and trick play files). I’m guessing that Apple will always supply the most expansive (and expensive) way to meet the requirements stated in the Authoring Specification, leaving developers to choose their own configuration based on cost and the desired QoE.
It’s early days for HEVC in HLS, and the topic and technology will be fast moving. Hopefully, these questions and answers have helped you off to a quick start.