Figure 2. Argos delivers “near-parity” with software encoders.

Key Lessons from YouTube’s ARGOS Encoding ASIC

Jan Ozer February 11, 2023 Encoding Leave a comment 1,533 Views

Even in 2023, many high-volume streaming producers continue to rely on software-based transcoding, despite the clear CAPEX, OPEX, and environmental benefits of ASIC-based transcoding. At least part of this inertia relates to outdated concerns about the shortcomings of ASICs, including sub-par quality and lack of upgradeability.

As a parent, I long ago concluded that there were no words that could come out of my mouth that would change my daughter’s views on certain topics. As a marketer, I feel some of that same dynamic, that no words can come out of my keyboard that would shake the negative beliefs about ASICs from staunch software-encoding supporters.

So, don’t take our word that these beliefs are outdated; consider the results from the world’s largest video producer, YouTube. These slides and observations are from a Google presentation on the Argos ASIC-based transcoder at Hot Chips 33 back in August 2021, with the slides available here and the video here. The speakers were Aki Kuusela and Clint Smullen.

In the presentation, the speakers discussed why YouTube developed its own ASIC and the performance and power efficiency achieved during the first 16 months of deployment. Their comments go a long way toward dispelling the myths identified above and make for interesting reading.

Contents

Encoding Time Has Grown by 8000%

In discussing why Google created its own encoder, Kuusela explained that video was getting harder to compress, not only from a codec perspective but from a resolution and frame rate perspective. Here’s Kuusela (all quotes lightly edited for readability).

“In order to sustain the higher resolutions and frame rate requirements of video, we have to develop better video compression algorithms with improved compression efficiency. However, this efficiency comes with greatly increased complexity. For example, if we compare the vp9 from 2013 to the decade older h.264 the time to encode videos in software has grown to 10x. The more recent av1 format from 2018 is already 200 times more time-consuming than the h264.

If we further compound this effect with the increase in resolution and frame rate for top-quality video, we can see that the time to encode a video from 2003 to 2018 has grown eight thousand-fold. It is very obvious that the CPU performance improvement has not kept up with this massive complexity growth, and to keep our video services running smoothly, we had to consider warehouse scale acceleration. We also knew things would not get any better with the next generation of compression.”

Few producers use VP9 as extensively as YouTube, but if you swap HEVC for VP9 the point is the same. Beyond the higher resolutions and frame rates producers now must support to remain competitive, when you also consider the demands of live production, the need for hardware becomes even more apparent.

“Near Parity” with Software Encoding Quality

One consistent concern about ASICs has been quality, which admittedly lagged in early hardware generations. However, Google’s comparison shows that properly designed hardware can deliver near-parity to software-based transcoding.

Kuusela doesn’t spend a lot of time on Slide 2, merely stating that “we also wanted to be able to optimize the compression efficiency of the video encoder based on the real-time requirements and time available for each encoder and to have full access to all quality control algorithms such as bitrate allocation and group of picture selection. So, with our no-compromises implementation, we were able to get near parity to software-based encoding quality.”

NETINT’s own data more than supports this claim. For example, the table below compares the NETINT Quadra VPU with various x265 presets. Depending upon the test configuration, Quadra delivers quality on par with the x265 medium preset. When you consider that software-based live production often necessitates using the veryfast or even ultrafast preset to achieve even marginal throughput, Quadra’s quality far exceeds that available for software-based transcoding.

ASICs Performance Continues to Improve After Deployment

Another concern about ASIC-based transcoders is the inability to upgrade and accelerated obsolescence. However, proper ASIC design can balance encoding tasks between hardware, firmware, and driver software to ensure continued upgradeability.

Figure 3 shows how the bitrate of VP9 and H.264 continued to improve as compared to software in the months after the product launch, even without changes to the firmware or kernel driver. The second Google presenter, Clint Smullen attributed this to a hybrid hardware/software design, commenting that “Using a software approach was critical both to supporting the quality and feature development in the video core as well as allowing customer teams to iteratively improve quality and performance.”

Most modern ASICs, including NETINT’s T408 and Quadra, use a hybrid design that balances critical functions between the ASIC, driver software, and firmware. In particular, NETINT optimizes ASIC design to maximize functional longevity. As explained here on the role of firmware in ASIC implementations, “The functions implemented in the hardware are typically the lower-level parts of a video codec standard that do not change over time, so the hardware does not need to be updated. The higher levels parts of the video codecs are in firmware and can still be changed.”

As Google’s experience and NETINT’s own data show, well-designed ASICs can continue to improve in both quality and functionality long after deployment.

90% Reduction in Power Consumption

Few producers question the throughput and power efficiency delivered by ASICs, and Google’s data bears this out. Commenting on Figure 4, Smullen stated, “For H.264, transcoding a single VCU matches the speed of the baseline system while using about one-tenth of the system level power. For VP9, a single 20 VCU machine replaces multiple racks of CPU-only systems.”

NETINT ASICs deliver similar results. For example, a single T408 transcoder (H.264 and HEVC) delivers roughly the same throughput as a 16-core computer encoding with software and draws only about 7 watts compared to 250+ for the computer. NETINT Quadra draws 20 watts and delivers roughly 4x the performance for H.264, HEVC, and AV1. In one implementation, a single 1RU rack of ten Quadras can deliver up to 200 720p interactive streams for cloud gaming, like Argos, replacing multiple racks of CPUs.

Time to Reconsider?

As Google’s experience with YouTube and Argos shows, ASICs deliver unparalleled throughput and power efficiency in high-volume publishing workflows. That advanced ASIC design has negated or reversed outdated concerns about quality and obsolescence. If you haven’t considered ASICs for your production, it’s definitely time for another look.

Streaming Learning Center Where Streaming Professionals Learn to Excel

Key Lessons from YouTube’s ARGOS Encoding ASIC

Related Articles

Encoding Time Has Grown by 8000%

“Near Parity” with Software Encoding Quality

ASICs Performance Continues to Improve After Deployment

90% Reduction in Power Consumption

Time to Reconsider?

About Jan Ozer

Check Also

Ateme’s Mickaël Raulet Talks AI Codecs and OTT Contribution at Mile High Video 2025

Romain Bouqueau on GPAC, Low-Latency Streaming, and AI at Mile High Video 2025

Bitmovin’s Igor Oreper at Mile High Video 2025 Talks New Web Player

Leave a Reply Cancel reply