Many companies spend too much on adaptive bitrate encoding. In turns out there’s a pricey way to go about it and a cheaper way. Dynamic packaging to the rescue!
Let me throw a couple of numbers at you. The first is shown in Figure 1, from Encoding.com’s “Global Media Format Report 2016,” which shows the respective share of the adaptive bitrate (ABR) formats produced by the cloud encoding service in 2015. As you can see in Figure 1, HTTP Live Streaming (HLS) dominates with 71 percent, Smooth Streaming is second with 19 percent, and Dynamic Adaptive Streaming over HTTP (DASH) is third with 10 percent.
I hear you thinking, “Smooth Streaming is second? All I hear about is DASH, DASH, DASH (and more DASH).” Yes, Smooth Streaming is second, which leads to the second number. That is, as recently as 2014, the percentage of viewers watching Netflix on gaming consoles like Microsoft Xbox, Sony PlayStation, or the Nintendo Wii totaled 43 percent. Though this number has clearly dropped since then, Encoding.com’s report attributes the Smooth Streaming number to the Xbox One, “which has evolved beyond a game console to become a leading means of delivering content, including OTT subscriptions.”
What does this mean? If you’re distributing premium content, you probably should support two if not three ABR technologies. And even if you’re distributing free content for training, sales, marketing, or other purposes, you’ll probably also have to support multiple ABR formats sometime in the future. As you’ll read in this article, there’s an expensive way to support multiple ABR formats, and an inexpensive way. It’s your money; give me a few minutes and I’ll show you how to save a bundle.
Figure 1. Share of ABR formats produced by Encoding.com in 2015
Let’s start with everyone on the same page. All videos, live and VOD, start with a single file. To deliver this file adaptively, we create multiple renditions of that file at various resolutions and data rates suitable for delivery to a range of playback devices. This group of files is often referred to as an encoding ladder.
All HTTP-based adaptive bitrate technologies like HLS, DASH, and Smooth Streaming use video stored in specific container formats with manifest files that identify the locations of those files for the player. HLS uses files stored in the MPEG-2 transport stream format (TS files) along with M3U8 manifest files. Smooth Streaming uses fragmented MP4 files stored in the ISMV format with ISM manifest files. DASH uses single or multiple MP4 files stored in the MP4 format with MPD manifest files. Completing the picture, Adobe’s HTTP-based Dynamic Streaming (HDS) uses fragmented MP4 files stored in the F4F format, with F4M manifest files.
Note that when HTTP-based adaptive streaming originated, each file in the encoding ladder had to be broken into separate chunks, usually 4 to 10 seconds long, which created an administrative and storage nightmare. Since them, all ABR technologies have incorporated the ability to work with a single source video file in the correct ABR format, retrieving segments from that file via byte-range requests from the player. Instead of retrieving a specific chunk, the player retrieves a specific segment from the single file, simplifying file creation and distribution.
The important thing to recognize is that you can use a single set of encoded H.264 files as the source for all ABR formats. That is, once you have the H.264 files encoded into the encoding ladder, it’s trivial to package them into multiple ABR formats. This is shown in Figure 2, from the Microsoft Azure website.
Figure 2. Creating multiple ABR packaging from a single set of encoded files (Image from Microsoft Azure website).
On the left is the single input file, which is then encoded into multiple files, usually in MP4 format. These files are then converted into the container format required by the specific ABR technology (if necessary), and the manifest file is created, a process often called transmuxing. The figure shows the multiple MP4 files packaged into HLS and Smooth Streaming formats, but the same schema works with DASH and HDS. Basically, once you have the encoded multiple-bitrate MP4 files, the hard work is done; transmuxing is a relatively lightweight operation that can be performed very quickly and efficiently.
So far, so good; you have the multiple files packaged in the ABR format necessary for delivery to your target player. What’s not so good? Well, think back to our original premise, the need to deliver using three ABR packages— HLS, Smooth Streaming, and DASH. If you create all three packages and upload them to the cloud for delivery, you just tripled your monthly storage costs. To add insult to injury, if you’re encoding in the cloud, you also may have tripled your encoding costs. As an example, the Amazon Elastic Transcoder can produce DASH, HLS, and Smooth Streaming ABR packaging, but if you use the service to do so, you’ll pay the normal rate three times for the privilege.
OK, you get it: Paying your cloud encoder to produce multiple ABR packages and storing them all in the cloud might not be the most financially prudent move. What’s the alternative? Dynamic packaging.
Delivery: Static vs. Dynamic
With dynamic packaging, you create the multibitrate MP4 files and upload them to a streaming endpoint as shown in Figure 3. As different players request chunks or byte-range requests in the video, the server retrieves the MP4 file, transmuxes to the necessary ABR format, and sends it to the player. While the chunk may be temporarily stored somewhere in the content delivery network’s HTTP caching structure, they are never stored on the streaming server, so you’re charged simply for the stored MP4 files. Since the server creates the ABR packaging dynamically, you’re never charged for that operation by your cloud encoder.
Figure 3. Dynamic packaging (Image from Microsoft Azure website)
You can perform dynamic packaging even if you are applying digital rights management to your video, or if you need to apply different caption formats for your target players. In most instances, there are few, if any, meaningful downsides.
Dynamic With Microsoft Azure
I’ll start with Microsoft Azure, the source of Figures 2 and 3. Here I spoke with Martin Wahl, principal program manager for global customer engagement. According to Wahl, though the Azure cloud encoder was formerly able to produce static packaging, the company is deprecating that capability as Microsoft urges its customers to create their packaging dynamically rather than statically.
The cost? Basically, it’s included in the cost of distribution. According to Wahl, with Azure, you can stream directly from the origin server, or use the Azure CDN, which adds points of presence (POS) at the edge, and caching, which should improve the overall quality of experience of your viewers. Both services charge by the gigabyte delivered, and in each case, dynamic packaging, including captions and encryption, is included. Or, you can send the packaged ABR videos from the origin server to a third-party CDN, in which case you’ll pay for the transfer bandwidth to the CDN, and dynamic packaging is also included.
Microsoft offers the license-free Azure Media Player for live and VOD playback of HLS, DASH, or Smooth Streaming formats with fallback to Flash or Silverlight. Microsoft can supply license keys for PlayReady or Widevine at a small extra charge or provide access to a range of third-party providers of this service.
Discussing the costs savings delivered by dynamic packaging, Wahl pointed me toward a case study on the Azure site that describes how Japanese premium content platform Rakuten ShowTime used Azure to simplify its distribution workflow and cut costs. Rakuten offered more than 120,000 videos for delivery to smart TVs, PCs, smartphones, tablets, and gaming platforms. Before moving to the Azure platform, the service created and stored separate iterations of each title in Smooth Streaming, HDS, HLS, and some others, oftentimes creating multiple iterations of these ABR formats for different DRM technologies.
In 2013, the number of discrete files managed exceeded 100 million, which was the limit of Rakuten’s on-premises storage management system. This prompted a move to the cloud, and discussions with Microsoft Azure. Rakuten ultimately switched to dynamic packaging in 2015, and it found three key benefits. First, Rakuten’s storage cost was “reduced to 25 percent of what it was before.” Second, the company was able to offer new titles faster because it no longer had to produce the multiple iterations. Third, since the number of output files was dramatically reduced, the company could more easily find and resolve file-related issues, which Rakuten stated cut management costs by “as much as 60 percent.”
How it Works With Wowza
One of the first (if not the first) to offer transmuxing capabilities was Wowza Media Systems, which offers the Wowza Streaming Engine for DIYs and the Wowza Streaming Cloud for those who want a managed service. Wowza pricing varies by product, but you can start with a subscription for the Wowza Streaming Engine for $65/month. Wowza Streaming Cloud has plans ranging from single live events to 24/7 broadcasting, and both products can deploy using the new Wowza Player.
Like Azure, both Wowza products can accept a single live input stream, convert that to multiple bitrate MP4 files, and then package these files for different display devices on-the-fly (Figure 4). The alternative, “old-style” of multiple format delivery required expensive hardware encoders at the live site to create the streams and packages, and sufficient bandwidth to get these streams out of the building in real time. The encoder cost and bandwidth savings make the combination of real-time transcoding and dynamic packaging a slam dunk for live event producers.
Figure 4. Real-time transcoding and dynamic packaging with Wowza Streaming Cloud
One company using Wowza Streaming Engine for both live and VOD distribution with dynamic packaging is TourGigs, a concert film technology company specializing in live streaming and concert videos. I spoke with Casey Charvet, director of technology for the company. According to Charvet, TourGigs switched to Wowza from an RTMP-based system that didn’t offer ABR streaming. The company chose Wowza because of the combination of live transcoding and dynamic packaging, starting off creating ABR packages for HDS for computer playback on its Flash-based player, and HLS for mobile.
At the time, TourGigs delivered its VOD files via progressive download, which increased CDN costs and also lacked ABR. Once Tour Gigs created the new live workflow with Wowza, it converted its VOD delivery system to the same schema. Rather than use the MP4 files created during the live event, TourGigs remixes the original camcorder videos, which lets the company fix any glitches in the live production and encode the multiple bitrate MP4 files using higher quality encoding techniques, such as two-pass encoding, and more advanced x264 settings.
Interestingly, as TourGigs migrated its player from Flash to HTML5, it standardized on HLS as the universal delivery format. Charvet continued with the Wowza system for single format delivery because it was so “simple and elegant” and because he wants the ability to service customers who may want to deliver in other formats, like DASH, HDS, or Smooth Streaming. TourGigs’ Wowza-based schema is shown in Figure 5, with live feeds processed in the cloud and VOD clips encoded and originated from gear located in a co-location facility. The ControlPlane layer is a proprietary server-side layer that provides redundancy, access to multiple CDNs, and security.
Figure 5. The Wowza-based TourGigs distribution architecture
Several other Wowza customers weighed in on the benefits of dynamic packaging. For example, Endavo Media, which provides media management and distribution platform technology and services, deployed Wowza when it transitioned to an HLS delivery of live and VOD streams, with fallback to RTMP-based Dynamic Streaming. In both cases, Endavo creates a single set of MP4 files and hands it off to the Wowza Streaming Engine, which transmuxes as needed.
According to Endavo CEO Paul Hamm, the Wowza-based solution delivers up to 1080p streams, some with encryption, with no latency or decreased throughput. For VOD delivery, Endavo didn’t even have to re-encode the files originally used for RTMP Dynamic Streaming. According to Hamm, “We really did not find any drawbacks with dynamic HLS from Wowza as we had the infrastructure in place along with suitable mezzanine files. Choosing dynamic saved the time and storage space which would have been required to chunk the entire library.”
Similarly, Panda O.S., a software development company specializing in online video solutions, previously used static packaging for its products and services. According to CEO and co-founder Leon Gordin, the many small files associated with static packaging created problems during upload and complicated experimentation with segment size and other configuration options. Panda O.S. did have to convert its MPEG-2 transport stream content to the MP4 format to make the switch to dynamic packaging. However, after the switch, Panda O.S. delivers live and VOD streams at 1080p resolution and above, many with DRM, and reports no latency or decreased throughput resulting from dynamic packaging.
Dynamic packaging has also proved useful in the enterprise. For example, I spoke with a streaming technology expert from a household name-technology company who could not speak on the record. The company is using Wowza as the video distribution hub for its internal webinar system, inputting a single RTMP stream, transcoding that stream to multiple resolutions, and distributing the streams via multiple protocols and formats, including multicast for internal distribution and HLS for most other endpoints. He reported that converting over to dynamic packaging cut storage costs by over 70 percent.
Want actual numbers? Well, in early 2016, I worked with a consulting client who was converting a large existing library over to ABR streaming, as well as producing significant hours of new videos. During the transition year, dynamic packaging reduced the encoding costs of the library transcode and ongoing encodes by close to $90,000 and cut storage costs by around $34,000. Add back the $20,000 or so it would cost to run Wowza Streaming Engine, and total savings slightly exceeded $100,000.
Note that Wowza and Azure are not the only dynamic packaging solutions available. For example, Panda O.S.’s Gordin reported that the company also uses open-source solution NGINX to dynamically package for some customers. In addition, Elemental Delta is a video delivery platform that can perform what the company calls “just-in-time (JIT) video packaging,” as well as many other features.
Akamai and Throughput Concerns
Though dynamic packaging works well for hundreds or even thousands of simultaneous users, Will Law, chief architect of media cloud engineering at Akamai, points out that when streaming gets to a truly massive scale, like tens of millions of connections, the extra work associated with dynamic packaging is magnified. According to Law, this reduces the throughput of Akamai’s edge servers, and the extra complexity associated with dynamic packaging increases the opportunity for workflow issues to arise.
Though Akamai has supported dynamic packaging for more than 6 years, and continues to do so, Law notes that Apple’s adoption of the fragmented MP4 format may enable static packaging without the associated storage costs, providing the best of both worlds. At a high level, the common media application format (CMAF) support makes HLS compatible with fragmented MP4 files, enabling one set of media files that can be deployed via HLS and DASH simply by creating unique text-based manifest files for each format. At present, however, though HLS and DASH can share a common media file format, the Apple ecosystem uses a different encryption technique, so until this is resolved, producers will still need two sets of CMAF media files, one for content encrypted with cipher block chaining (CBC), and the other for other content encrypted with common encryption (CENC).
I asked Netflix director of encoding technologies David Ronca about the potential for CMAF, and he commented, “Our primary packaging model is … DASH-CENC with AES-CTR (advance encryption counter mode) encryption. CMAF is shaping up to be compatible with our streaming format, and if that compatibility is achieved, it is valuable to us because it encourages wider support of the existing, and broadly deployed, DASH model. If CMAF fails to provide compatibility with the existing DASH model including requiring support for AES-CTR encryption, then all CMAF will have achieved is to add yet another format, and thus the value would be low.”
The bottom line is that though CMAF provides a common media format, until the DRM issues are ironed out, which may be complicated by the installed base of HLS-compatible devices, it won’t resolve the storage issues related to static packaging.
Creating the MP4 Files
In most instances, for live event streaming, the simplest, least expensive approach is to send a single input stream from the event into your encoding facility where it can be transcoded into multiple MP4s and then dynamically packaged as appropriate. What about creating multiple bitrate MP4 files for VOD?
Though the gap is closing, on-premises encoders still tend to be the least expensive alternative. As an example, in Figure 5, TourGigs deployed an “offline encoding cluster,” which is its own homegrown encoder developed precisely for this reason. In my own experience, in early 2016, I compared cloud and on-premises encoders for a client implementing dynamic packaging. I verified my estimates with all the various providers, who will go unnamed. Prices for producing MP4 files in the cloud ranged from $82,000 to $178,000 for 2016–2019. Alternatively, the client could buy a single appliance for a 4-year cost of just under $32,000 to handle the same load.
While certainly there are some storage and operating costs associated with an appliance, these seem less than the $50K difference between the appliance and the cheapest cloud solution. However, as cloud prices drop, the gap will obviously narrow. Still, if you’re searching for the cheapest way to produce your MP4 files for VOD, start with an appliance.
When pricing cloud options, try to find alternatives to per-gigabyte or per-minute pricing, like the reserved cloud instances that you can rent from Encoding.com, or Elemental’s platform as a service approach, which is priced similarly. Also consider new service provider Hybrik, which charges a flat fee based upon the number of cloud machines you can run simultaneously with its software. (Full disclosure: I have done some consulting projects with Hybrik.)