Author’s note: This article was updated to include subjective comparisons of the 30p and 60p clips.
N2224 has long been considered the Rosetta Stone of
ABR encoding (image courtesy of Beamr).
Apple TN2224 was originally posted in March 2010 to provide direction for streaming producers encoding for delivery to iOS devices via HTTP Live Streaming (HLS). Because the document was so comprehensive and well thought out, and HLS became so successful, TN2224 has often been thought of as the Rosetta Stone of adaptive bitrate streaming.
Over the last nine months or so, Apple has made sweeping changes to the venerable Tech Note, including (gasp!) deprecating the document in favor of another document called the HLS Authoring Specification for Apple Devices. In this post, I’ll present an overview of those changes.
I’ll start with how the documents interrelate. I first reported on the Apple Devices Spec back in March 2016, when it was called the Apple Specification for Apple TV. Later, Apple amended that title to Apple Devices, and expanded the scope of the document to all include all HLS production, but didn’t retire TN2224. Instead, Apple added this to TN2224.
In other words, where the two documents conflict, the Apple Devices spec is the controlling document.
In truth, some of these changes were in Apple TN2224 before Apple switched over to the Devices spec, but if you haven’t checked TN2224 in awhile, here are the major overall changes.
1.Completely new encoding ladder. Both documents present different ladders, since the Apple Devices spec controls, here’s the one from that document.
The recommended encoding ladder from the Apple Devices spec.
Here are some interesting observations about the two encoding ladders.
a. Up to 200% constrained VBR. TN2224 preserves the 110% constrained VBR requirement, while the Apple Devices spec says “1.19. For VOD content the peak bit rate SHOULD be no more than 200% of the average bit rate.” Since the latter document controls, you are free to use 200% constrained VBR, though the deliverability issues discussed here indicate that 110% constrained VBR may deliver a better quality of experience. The wording indicates that you’re not required to use 200% constrained VBR, but that you are free to do so.
b. Ignore Baseline and Main profiles. Both documents recommend the High profile. TN2224 says “You should also expect that all devices will be able to play content encoded using High Profile Level 4.1.” The Apple Devices spec says, “ 1.2. Profile and Level MUST be less than or equal to High Profile, Level 4.2. 1.3. You SHOULD use High Profile in preference to Main or Baseline Profile.” Apple TN2224 provides a table showing which devices the new recommendation obsoletes, which basically are iPhones and iPad touches from before 2013. Apple is pretty quick to ignore older products; I would check your server logs to determine how many of these older devices are still consuming your content before obsoleting them.
c. New keyframe interval/segment duration. Both documents agree on key frames every two seconds, and six second segments. TN2224 (kindly) states, “Note: We used to recommended a ten second target duration. We aren’t expecting you to suddenly re-segment all your content. But we do believe that, going forward, six seconds makes for a better tradeoff.”
d. Byte-range address OK. TN2224 says, “In practice you should expect that all devices will support HLS version 4 and most will support the latest version.” The Apple Devices spec doesn’t address this issue. Note that one key feature of HLS version 4 is the ability to use byte-range requests rather than discrete segments, which minimizes the administrative hassle of creating and distributing HLS streams.
e. 2000 kbps variant first for Wi-Fi, 730 kbps for cellular. The first compatible variant in the master playlist file is the first video played by the HLS client, and this can have a dramatic impact on initial quality of experience (see tests at the bottom of this post. In this regard, the Apple Devices Spec makes two recommendations in the iOS section (rather than general), stating, “1.21.a. For WiFi delivery, the default video variant(s) SHOULD be the 2000 kb/s variant. 1.21.b. For cellular delivery, the default video variant(s) SHOULD be the 730 kb/s variant.”
30i to 60i?
One of the more interesting suggestions in the Apple Devices spec is “1.7. You SHOULD de-interlace 30i content to 60p instead of 30p.” 60p is considered the source frame rate thereafter, which the encoding ladder directs you to use starting with the 2000 kbps clip (the first encoding ladder shown here states that “30i source content is considered to have a source frame rate of 60 fps”).
Obviously, Apple is promoting smoothness with a potential loss in spatial detail and encoding quality. At least, that’s what I thought in my original post, where I commented that converting to 60p was a bad idea, because spatial detail and encoding quality would both suffer. Rather than simply reiterate this concern in this document, I decided to run a quick test on a DV file I’ve used for about 15 years to test deinterlacing and scaling quality.
To do so, I created 30p and 60p sequences in Adobe Premiere Pro, input the interlaced DV file and then produced deinterlaced 640×480 masters, both at 14 mbps. Then I input the two into a 1280×480 timeline in Premiere for side-by-side display. As you can see below, the detail preserved by both approaches is nearly identical (click image to see full frame in a separate browser window).
How can that be? I had assumed that when converting from 30i to 60p, the editor would simply downsample the image and interpolate. Instead, it appears as if Premiere is taking a much smarter approach, and deinterlacing all sequential fields into frames (field 1&2=frame 1; field 2&3=frame 2). So I’m not seeing much loss of detail, if any.
What about encoding quality? Since the 60p clip would have twice as many frames, it would also have a bits-per-pixel value of 50%, which I assumed would drop quality noticeably. To test this, I encoded each file to 1 mbps and then computed PSNR on the two files. Of course, I compared the 30p encoded clip to the 30p master, and the 60p encoded clip to the 60p master so it’s not a complete apples-to-apples comparison. PSNR on the 30p clip was 35.56 compared to 34.57 for the 60p clip, about a 2.8% difference that likely wouldn’t be noticeable to most viewers (you can see for yourself below).
That said, with some frames like this one, the difference was very easy to notice. Click the image to view it at full resolution in a separate browser window.
One of the comments to my original post was from Alex Zambelli, who was originally on the Microsoft Windows Media team, and has since gone on to fame and fortune with iStreamPlanet and Hulu. He commented,
“So on one hand I support Apple’s insistence on preserving the full temporal resolution of programs recorded at 60 or 50 Hz in order to provide a full TV-like experience, but I disagree with their suggestion that the jump from half-framerate to full-framerate should occur at 2 Mbps. Doubling the framerate while keeping the same bitrate hurts spatial picture quality. I would’ve preferred to see either a separate encoding profile for 25/30 vs 50/60 fps, or the full framerate variant introduced only at 7.8 Mbps.”
This test tend to prove Alex’s theory. Depending upon the source footage and variant you’re encoding, it’s possible that your viewers could see some ugly frames in your videos, albeit for just 1/60th of a second. What about smoothness? Again, see for yourself below, where the 30p file is presented first below, then the 60p. When I moved through the files frame-by-frame from my hard drive, the difference was profound. In real time, not so much, at least for me.
Note that Apple says “should” for this conversion, not “must.” So run your own tests, and see what you see. If you think this approach delivers better smoothness with minimal noticeable quality loss, give it a shot. As Alex suggests, however, start with the highest quality stream, and work your way down; the 2000 mbps might be too lean to not show obvious encoding artifacts.
The other comment made in my original reaction to the Apple Devices spec, that fixed encoding ladders are on the way out, remains correct. Most high-end producers like Netflix and YouTube have already switched to per-title encoding, and this trend will only accelerate in 2017 and beyond. In any event, I would take the bitrate suggestions in the encoding ladder as just that.
Here are the video clips.
Beyond the issues discussed above, Apple also completely revamped the audio requirements in the Apple Devices spec, adding HE-AAC v2 plus Dolby Digital (AC-3) and Dolby Digital Plus (E-AC-3). Apple provides much more guidance regarding captions, subtitles, and advertising, as well as master and media playlists, even beyond that provided in the HLS specification itself. For example, where the latest draft of the Apple spec recommends including Resolution and Codecs tags in the playlist, the Devices Spec makes them mandatory.
Apple also provides direction for those moving to the Common Media Application Format (CMAF) fragmented MP4 format. The bottom line is that if you’re an HLS publisher and you haven’t scrutinized the Apple Devices spec, you should do so ASAP.
About the Streaming Learning Center:
The Streaming Learning Center is a premiere resource for companies seeking advice or training on cloud or on-premise encoder selection, preset creation, video quality optimization, adaptive bitrate distribution, and associated topics. Please contact Jan Ozer at firstname.lastname@example.org for more information about these services.