This entry-level article describes what an encoder is and how to choose one and focuses on VOD, not live. It was derived from a lesson in my Streaming Media 101 course, which teaches you the skills and techniques to succeed in a streaming media-related role. Click here for more details about the course.
This article walks newbies through what an encoder is and how to choose one, rather than helping a serious buyer choose a vendor or approach in any of the categories covered. If you’re new to the market, you’ll learn a bit about who’s who and what’s what; if you’ve been in the streaming business for a while, you’ll probably not get a lot out of this. As always, the companies mentioned are representative rather than exhaustive.
Let’s start with some data points courtesy of Bitmovin’s 2021 Video Developer Report which incorporates responses from 538 participants located in over 65 different countries. Figure 1 shows the collective answer to the question, “Where do you encode your video?” and includes data from both the 2020 report and the 2019 report. Totals for each year exceed 100% because many respondents used multiple methods.
Looking at VOD encoding (in pink), the two software encoder categories (on-premise and cloud) total 82%, making software encoders the largest category. A later question revealed that of those who used a software encoder, 54% used a commercial encoder while 37% used an open-source encoder like FFmpeg to build their own software encoding facilities. Working down the list, 33% of respondents used a cloud encoding service for VOD encoding, while 20% used managed on-premise encoding services.
Hardware VOD encoders comprised a surprisingly high percentage (38%) but because it’s a distinctly separate market from software encoders, I won’t cover it in this article.
Choosing a Tool for ABR Encoding
There are several concepts you need to understand before you choose an encoder or encoding approach. First is adaptive bitrate streaming (ABR) which are technologies that enable you to deliver to viewers watching on different devices via different connection speeds. Common technologies include Apple’s HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH).
All ABR technologies encode files into what’s called an encoding ladder, which includes 5-7 files customized for different viewers. Figure 2 shows Apple’s recommended encoding ladder from the HLS Authoring Specification. At the top are lower-resolution, lower-bitrate files for those watching on mobile phones while the bottom shows high-resolution, high bitrate files for viewers watching on SmartTVs over high bandwidths.
To distribute via ABR, you need to produce the files in the encoding ladder. You also need metadata files that help the player choose the best rung in the encoding ladder which can also add captions to the videos, and digital rights management (DRM) protection.
Encoding vs. ABR Packaging
Creating the encoding ladder is encoding; creating the metadata files that pull the audio, video, captions, and DRM together is called packaging. Sometimes packaging involves chunking the original files in the encoding ladder into shorter segments for easier distribution; sometimes it doesn’t.
A desktop encoder like the Adobe Media Encoder (AME) is an encoder but not a packager. That’s all you need if you’re using an online video platform (OVP) like Brightcove or Kaltura to deliver your videos or even YouTube; all these services ingest a single high-quality file, transcode into the encoding ladder, and package for the ABR technologies that they deploy. On the other hand, if your goal is to produce content you can deliver directly to your viewers via HLS or DASH, you’ll need both an encoder and a packager, or a tool that does both.
Another concept to understand is static and dynamic packaging. With static packaging, you create the encoding ladder and necessary packaging and upload all the files to the origin server for distribution. With dynamic packaging, you create your encoding ladder, upload files to the origin server, and use servers like the Wowza Streaming Engine and Softvelum Nimble Streamer to package the content in real-time as needed to match the ABR technology compatible with the viewers.
Interestingly, the Bitmovin report tells us that 37.6% of respondents used dynamic packaging. To go dynamic, you need an encoder but not a packager. AME would again be fine; just encode to multiple outputs and upload the files to your origin server where the dynamic packager can do the rest.
Long story short, before you choose your approach, you need to understand whether you need an encoder, or an encoder and packager.
Desktop Video Encoders
Desktop encoders are software programs that you install on local Windows or Mac computers and include the aforementioned Adobe Media Encoder, as well as Apple Compressor, and HandBrake. You can throw Avid Media Composer’s export function into this group as well. Of the four, Compressor is the only tool that can package to an ABR format, obviously Apple’s HTTP Live Streaming, with captions but no DRM. The rest can output one or multiple files in different formats.
While AME can’t package, it does have a watch folder function to enable simple automation; anyone with access to that folder on a network can drop a file in, and AME will launch and encode the file to whatever presets you had selected. If the presets constituted a full ABR ladder, you’d be good to go with a system that used dynamic packaging. With Compressor, you can combine multiple Macs into an encoding workgroup. With HandBrake, you can easily convert a folder or multiple files into a single output preset, but like AME, there’s no packaging function.
If all you need is HLS packaging without DRM, Compressor should work for modest production volumes. If you’re distributing via an OVP or YouTube or Facebook, any of the desktop encoders should do. If you full-service encoding and packaging to multiple ABR formats with DRM, you need to look elsewhere.
Assessing Your Options
If this describes you, start by making a list of all required features of your encoder/packager, including ingest format support, output codec/ABR support, supported HDR formats, DRM requirements, captioning requirements, and the like, and expected volumes. Consider the specific processing that your use case requires. For example, transcoding a simple MP4 file with 2-channel audio to an HLS/DASH ladder is pretty simple. On the other hand, if you’re working with IMF files and need to map audio tracks for specific outputs while creating captions in multiple languages, you’ll need a much more capable system or service provider.
If you’re considering third-party software, you should know where you want to install the software; if you’re considering a cloud service, whether you want to deploy using the service or launch the software on your own hardware. In all cases, you’ll need to know expected day-to-day volumes and consider available options should demand spike for any reason.
Enterprise Video Encoders
Enterprise encoders are programs that you license and install on-premise or in a private or public cloud that perform a full range of encoding and packaging functions. Buyers in this category obviously want to own and control their own encoding experience, wherever they deploy it, as compared to using a third-party service.
Most products in this class can support all relevant input files and output in multiple codecs and ABR formats, with captions and DRM support while providing a range of high-end features like ad insertion, watermarking, and audio loudness management. Most offer both a graphical user interface and application programming interface (API) for automated interaction with media asset management programs and other programs in the encoding and distribution workflow.
One potential differentiator is the deployment model; can you install the software where you want to use it and how does pricing work in the different environments? What’s the required number of licenses to handle both day-to-day encoding chores and the required level of redundancy? How many computers will you need to acquire to support your anticipated operation?
Another differentiator is the concept of workflow control over the encoding process. Systems with workflow capabilities can examine files and/or file metadata upon ingest and make encoding decisions like choosing the preset or removing potentially faulty files from the encoding pipeline and notifying a technician. This functionality can be delivered via a user interface or scripting and helps make operation more flexible and robust.
Another is per-title capabilities, or the ability to customize the encoding ladder depending upon the complexity of the video being encoded. Implementations vary, but every legitimate product in this category should offer this option.
Scalability is another consideration. What are your options if your company acquires a third-party library and needs to get it online as quickly as possible? Some vendors offer hardware acceleration, which is an expensive option for a temporary need but might make sense if day-to-day encoding demands increase. Does the company offer daily or monthly licenses, or is there a sister cloud service that can handle your overage using the same presets as you use internally?
Don’t consider your encoder selection in a vacuum. If you’ll be acquiring software for other functions like live streaming, advertising insertion, streaming file origin, or packaging, consider the benefits of acquiring two or more of these capabilities from a single vendor, and/or understand how the encoder you’re considering will interface with products from other vendors.
Cloud Video Encoders
Cloud encoding is typically provided as a Software as a Service (SaaS), where you upload your files to the service, choose your encoding options, and direct the service where to send the finished files. The primary benefits of SaaS cloud encoding as compared to on-premise software deployments are lower capital expenditures for the hardware and software, reduced operating costs related to housing and powering the encoding farm, built-in system redundancy, and eliminating software update costs. As compared to third-party software installed in the cloud, you don’t have to buy, install, or maintain the third-party software.
Of course, with only 32% of Bitmovin respondents using a cloud platform, choosing a cloud service as compared to buying or developing your own encoder can’t be a slam dunk. Viewed from the distance, it appears that the SaaS vs. own is more a philosophical decision than an economic one.
Cloud encoding services range from compression-only services like Coconut, companies that offer encoding as well as other services like Bitmovin, encoding workflow vendors like encoding.com and Dolby Hybrik, to companies like Amazon and Microsoft that offer encoding as a component of an overall storage, encoding, and delivery workflow.
Choose a class of vendors that can deliver the range of services that you’re looking for and match your desired deployment model. For example, Bitmovin and encoding.com both allow you to install their software on-premise or external private clouds, but not all vendors do.
Consider how you want to interface with the system. Most cloud services support API-driven operation but not all provide user interfaces for getting started or for non-technical users. In particular, AWS Elemental MediaConvert has both a highly usable UI and capable API making the service appropriate for all technical levels.
Pricing is one of the biggest differentiators. Most vendors charge by the output minute, but some, like encoding.com, let you rent a managed cloud instance by the month for unlimited processing at one set price. Dolby/Hybrik charges a flat fee per month based upon the number of AWS instances that you can their software on.
Building Your Own Video Encoder
As mentioned at the top, 41% of those who responded to Bitmovin’s survey said that they used an open-source encoder like FFmpeg. What we don’t know is how many use FFmpeg casually as compared to those who build and host their encoding farm using FFmpeg, usually in combination with packagers like Bento4 or MP4Box.
In my view, two types of companies should consider building their own encoding facilities. At the top end are companies like Netflix, YouTube, and others where the ability to encode at high quality, high capacity, or both, deliver a clear competitive advantage. These companies have and need to continue to innovate on the encoding front, and you can do that best if you control the entire pipeline.
At the other end are small companies with relatively straightforward needs, where anyone with a little time on their hands can create a script for encoding and packaging files for distribution (see How to Automate FFmpeg and Bento4 With Bash Scripts). Otherwise, for high-volume and/or complex needs, you’re almost always better off going with a commercial software program or cloud encoder.