Figure 14. Subjective quality testing.

Inside AV2: Architecture, Performance, and Adoption Outlook

Three recent Alliance for Open Media presentations on YouTube shed new light on AV2’s performance and utility.

  • Andrew Norkin, Director of Codec Development at Netflix, presented the current status and architecture of AV2. He outlined the codec’s design goals, early performance results, and hardware-focused development approach, noting that the low-level toolset is now finalized. The YouTube video is here.

  • Ryan Lei, Video Compression Engineer at Meta, shared AV2 performance data generated under AOM’s Common Test Conditions. His talk covered test configurations, performance metrics, and recent results across adaptive streaming, HDR, and extended color formats. The YouTube video is here.

  • Li-Heng Chen, Software Engineer at Netflix, discussed the deployment of AV1’s Film Grain Synthesis tool. While centered on AV1, the talk confirmed that AV2 will retain FGS as a mandatory feature and introduced enhancements relevant to next-gen codec development. The YouTube video is here.

I present them in order with a combined conclusion at the end.

For performance results, Norkin presented a single slide, while Lei presented All Intra, Random Access, Adaptive Bitrate Streaming, and Subjective Results. Jump to his presentation if that’s your primary concern. Just so you know, all performance data compared AV2 to AV1, with no reference to HEVC or VVV.

AV2 Codec Architecture, Presented by Netflix’s Andrew Norkin

This section comes from Norkin’s YouTube video, which is available here and embedded below.

AV2 Introduction and Timeline

Figure 1. What’s past is prologue. Click to view at full resolution.

Norkin began his presentation by outlining the timeline and motivation behind AV2. He recapped AOM’s formation in 2015 and the 2018 release of AV1, a codec built from several open technologies, including VP10, Daala, and Thor. AV1 has since been widely adopted by major streaming platforms.

In 2020, AOM began work on the next-generation video codec that would become AV2, designed to significantly improve compression efficiency and expand the toolset to support a broader range of use cases, including low-bitrate streaming, screen content, and layered video.

Figure 2. Timeline and contributors. Click to view at full resolution.

Norkin emphasized that over the past four years, the codec working group evaluated dozens of proposed tools for AV2, applying strict criteria around complexity and implementation feasibility. He noted that hardware decoding concerns were a consistent focus, with companies like AMD and Realtek participating specifically to assess hardware readiness. While he did not comment on licensing or IP strategy, the involvement of major tech and streaming platforms, including companies that are also active in VVC development, like Tencent and Alibaba, suggests that AV2 is being seriously considered across the industry.

Analysis:  The breadth of participation indicates that AV2 is not just a research project, but a codec under serious consideration by companies with significant influence over streaming infrastructure and device deployment. This includes Amazon and Apple, both of which deliver streaming hardware devices and premium content. Notably, neither has publicly adopted AV1 for premium content, making their participation in AV2 potentially noteworthy if for content and not exclusively for devices. Involvement from VVC-aligned companies also suggests that many platforms are hedging, planning to support multiple formats depending on use case or geography. Whether that turns into real-world deployment will likely depend on the final spec, IP clarity, and available encoder/decoder support.

Norkin described AV2 as a new codec built around an expanded toolset and significantly higher compression efficiency. He did not explicitly state that hardware would be required, but the repeated emphasis on decoding complexity, combined with feedback from hardware implementers, suggests that AV2 is being designed with hardware acceleration in mind.

Analysis: This is consistent with how next-gen codecs are typically developed: software reference implementations prove out the tools, but real-world adoption, especially in mobile and consumer devices, often depends on custom silicon. AV2 will benefit from hardware, and for some use cases, particularly high-res or low-power environments, it may be essential.

AV2 Performance Results: Part I

Figure 3. The money slide. Click to view at full resolution.

Norkin shared the most recent performance results for AV2, based on version 11.0.0 of the AVM reference software. He noted that nearly all low-level coding tools have now been finalized, with remaining work focused on high-level syntax. The results compare AV2 against a modified AV1 anchor across multiple configurations, including all-intra (AI), low-delay (LD), and random access (RA). In the RA configuration, which is the most representative for streaming, AV2 showed a 28.6% bitrate reduction for equivalent PSNR-YUV and a 32.6% reduction based on VMAF.

Analysis: These are substantial gains over AV1 and suggest that AV2 is approaching a level of maturity suitable for early testing. Random access is the most representative configuration for streaming use cases, so improvements in this mode are especially relevant for commercial deployments. While objective metrics like PSNR and VMAF do not fully capture perceptual quality, the consistency across metrics supports the claim that AV2 offers meaningful efficiency gains.

Norkin did not address encoding complexity, which will be a critical factor for adoption. AV1’s initial encoding cost limited its use in many real-time or large-scale workflows, and AV2 introduces additional tool complexity. Whether these gains can be delivered at acceptable encoding speeds remains an open question. On the decoding side, the emphasis on hardware review confirms that AV2 is being shaped with hardware support in mind, but actual decoder implementations have not yet been benchmarked publicly.

Note that Ryan Lei shared a lot more performance data in his discussion, which I present below.

AV2 Framework and Tools

Figure 4. AV2 framework and tools. Click to view at full resolution.

Norkin walked through the overall AV2 codec framework, noting that while many individual tools have evolved, the fundamental architecture remains consistent with the hybrid block-based approach used for decades. As he put it, the codec is “basically a typical hybrid block-based model that has… existed for at least… 35 years.” The flow includes standard components like block partitioning, intra and inter prediction, transforms, quantization, coefficient and entropy coding, and in-loop filtering. He emphasized that the high-level structure, from prediction through reconstruction, remains largely unchanged from earlier generations of video codecs.

Norkin also highlighted other targeted use cases, including screen content tools like palette modes and inter-block copy, stereo video, and support for multi-layer or atlas-based video compositions. These features aim to improve coding efficiency and flexibility for complex visual experiences, such as overlays, spatial video, or layered UI elements.

Analysis: From an IP perspective, nothing signals patent exposure quite like a model that, as Norkin put it, has “existed for at least 35 years.” The hybrid block-based design has been the basis for virtually every major codec since the 1990s and has been the subject of extensive litigation and licensing (see here). By staying within this well-trodden framework, AV2 benefits from engineering familiarity and hardware compatibility, but it also re-enters a space that is densely populated with existing claims. Whether AV2 avoids the same IP minefield that has challenged other formats remains an open question..

Deep Dive into AV2 Tools

At this point, Norkin started discussing the individual tools. Here’s the video, queued to that section.

Norkin’s Conclusion

Figure 5. Norkin’s conclusions. Click to view at full resolution.

In his closing remarks, Norkin confirmed that AV2’s low-level toolset is essentially finalized. Work now shifts to high-level syntax and specification writing, with a formal release expected by the end of 2025.

The most recent AVM version shows substantial bitrate savings over AV1 as measured by VMAF under random access configurations. These results, while promising, still come with open questions about encoder complexity, hardware readiness, and deployment models.

Future work will focus on software speed improvements, encoder-side tuning for visual quality, and possible extensions to support higher bit depth content or AI-based profiles. Norkin framed these results as a sign that the codec is ready for testing, if not yet deployment.

The next video captures Ryan Lei’s presentation on AV2 Common Test Conditions and test results. By way of background, Lei was instrumental in deploying AV1 for Facebook Reels in 2023.

Figure 6. Groups involved with AV2 testing. Click to view at full resolution.

Lei began by walking through the evaluation framework used to assess AV2 coding tools. As shown in Figure 6, the AOM Testing Subgroup is responsible for defining common test conditions, which include test sequences, encoding configurations, performance metrics, and the infrastructure needed to run evaluations consistently across proposals. The current version of these conditions is version 7.0, which was formalized under document CWG-E083. That framework has recently been extended to support 4:2:2 and 4:4:4 color formats, expanding testing beyond traditional 4:2:0 use cases.

Ryan noted that the group conducts regular testing with each release of the AVM reference encoder, including both full evaluations and “tools on/off” tests that isolate the impact of specific coding features. Two major anchor releases were highlighted: version 10.0 in June 2025 and version 11.0 in September 2025. In addition to objective metrics like PSNR and VMAF, the subgroup is also working on a plan to introduce subjective testing, where human viewers rate the perceptual quality of encoded video.

Figure 7. Video and still image files used during testing. Click to view at full resolution.

Test Descriptions

Lei presented the latest version of the test sequences used to evaluate AV2 tools under the AOM Common Test Conditions. The current set includes 91 videos and 51 images, covering a wide range of formats and use cases. These are grouped into categories such as high-resolution video (including 4K and 8K), lower-resolution mobile formats, synthetic content for gaming and screen sharing, and still images at multiple resolutions.

Two new classes have been added to account for HDR content, using BT.2100 color space with a PQ transfer function. Another class includes user-generated content, such as handheld or action camera footage. This expanded test set allows the group to measure codec performance across a broader spectrum of content types, from cinematic video to noisy real-world captures.

Figure 8. Encoding configurations. Click to view at full resolution.

Lei then explained the different encoding configurations used in AV2 testing. All results are based on normative-only encoding, meaning non-standardized tools like two-pass encoding, adaptive quantization, and keyframe filtering were disabled. These features can significantly improve quality and efficiency in production workflows, but are excluded by AOM for this testing to ensure a controlled, codec-level comparison. Since the same constraints apply to AV1, the gains shown reflect the impact of AV2’s toolset alone, not complete encoder optimization.

There are five primary encoding modes: All Intra (AI), Random Access (RA), Low Delay (LD), Adaptive Streaming (AS), and Still Image. AI is used for testing keyframes and stills by encoding the first 30 frames of each sequence as intra-only. RA uses a closed GOP structure with five hierarchical layers and tests long-form streaming scenarios. LD is similar but has only one keyframe and no future references, modeling low-latency use cases. AS downscales 4K sequences into five resolutions for streaming evaluation.

Each configuration is encoded using constant quality mode, with fixed QP values across hierarchical layers. A table of QP values was provided for reference.

Figure 9. Adaptive streaming configuration. Click to view at full resolution.

To better evaluate streaming performance, the team introduced an adaptive streaming configuration. Each 4K sequence from the A1 class was downscaled into six lower resolutions using Lanczos 5 filtering. These versions were then encoded, decoded, and upscaled back to 4K for quality assessment. Metrics like PSNR and VMAF were calculated against the original 4K resolution to simulate real-world playback scenarios. This approach helps model how well AV2 handles resolution switching and upscaling in adaptive bitrate ladders.

To evaluate performance across bitrates, the team generated additional rate-distortion (RD) points through bilinear interpolation. This helped smooth out the quality curve and better reflect the codec’s performance across the full bitrate range. For each resolution, 41 total data points were used. These points were then used to construct convex hulls, which represent the most efficient quality-bitrate tradeoffs. BDRATE calculations were performed between hulls from different tool configurations to isolate the coding gain of individual AV2 features. This method enables more accurate comparisons between candidate tools and previous baselines, particularly in adaptive streaming scenarios.

The testing subgroup used a broad set of quality metrics to evaluate AV2 performance, including PSNR, SSIM, VMAF, and CAMBI for banding artifacts. Most metrics were calculated using Netflix’s VMAF toolset, with weighting applied to better reflect perceptual differences across YUV channels.

All Intra Results

Figure 10. All Intra results. Click to view at full resolution.

All Intra encodes every frame as a standalone image, without referencing past or future frames. Though not used for streaming, All Intra results are still useful in understanding codec performance. This configuration isolates spatial compression tools like intra prediction, transforms, and entropy coding. These tools are used in all encoding modes, so performance gains here usually translate to improvements elsewhere.

In the AVM v10.0 tests, AV2 delivered a 23.6 percent bitrate reduction in VMAF over the AV1 baseline. The table breaks down results across content types, showing consistent gains of 15 to 25 percent for most classes. HDR and screen content classes saw even larger reductions, with B2 (screen content) reaching nearly 38 percent in PSNR-Y and 38 percent in SSIM-Y.

The graph to the right shows how VMAF-based BD-Rate improved over successive AV2 encoder versions, from v2.0 through v10.0. Each bar represents a release, and the steady downward trend highlights ongoing refinement of AV2’s toolset. The results validate that the core intra tools are not only effective, but also improving steadily across development cycles.

Random Access Results

Figure 11. Random access results. Click to view at full resolution.

The Random Access (RA) configuration models typical encoder behavior in video-on-demand scenarios. It enables both forward and backward prediction, which is common in offline encoding pipelines where latency is not a constraint. While it doesn’t fully simulate adaptive streaming environments, it still offers a consistent baseline for evaluating coding efficiency with bidirectional prediction enabled.

In the AVM v10.0 test, AV2 showed an overall 33.7 percent bitrate reduction for the same perceptual quality (as measured by VMAF) compared to the AV1 baseline. The table on the left breaks down these results by content class. Compression gains were consistent across the board, with reductions around 30 percent for most natural video and even larger gains for screen content (over 41 percent for B2) and HDR (up to 42.8 percent for Class HDR1).

The values highlighted in red show the overall averages across all test categories. These represent summary figures for the entire configuration and represent AV2’s performance across a diverse set of content types, resolutions, and color formats. As in the previous slide, the graph on the right tracks how VMAF-based BD-Rate has improved across successive AV2 reference releases.

Adaptive Streaming Results

Figure 12. Adaptive streaming results. Click to view at full resolution.

The Adaptive Streaming configuration models how real-world services encode content into multiple resolutions for bitrate ladders. As described above, in this test, 4K video sequences from the A1 class were downscaled to six resolutions, then encoded using the same settings as Random Access mode. After decoding, the streams were upscaled back to 4K and compared against the original.

This method captures how well a codec maintains quality when streaming clients switch between resolutions based on bandwidth. The results show strong performance. AV2 achieved an average bitrate reduction of 30.96 percent in PSNR-YUV and 35.71 percent in VMAF compared to AV1. These numbers, highlighted in red, represent the average across the full range of sequences and resolutions, offering a realistic summary of expected gains in adaptive environments.

The chart on the right tracks how BDRATE scores have improved across AVM versions, with v10.0 showing the strongest performance to date.

Lei also shared preliminary results from AVM v11.0, based on a newly updated anchor released just a few weeks prior. Compared to v10.0, the updated encoder delivers an additional 0.3 percent bitrate reduction in Random Access and 0.7 percent in Low Delay configurations. While the gains are small, they highlight the continued refinement of AV2 as the codec nears completion.

Extended Color Format (ECF) Testing

Figure 13. Extended Color Format (ECF) testing. Click to view at full resolution.

Lei then introduced the Extended Color Format (ECF) test results (Figure 13). These tests evaluate AV2’s performance on high-fidelity content that extends beyond the typical 4:2:0 streaming use case. This includes professional and broadcast workflows, as well as HDR content that uses 10-bit depth and broader color gamuts.

The ECF test set includes 34 sequences across six content classes. These cover a mix of YUV 4:4:4 and 4:2:2 material, screen content, and a range of resolutions from 640×360 up to 4K. The sequences span SDR, HDR, RGB, and YCocCg formats, with frame rates from 15 to 120 fps. This setup is designed to stress the codec’s ability to handle richer chroma, higher dynamic range, and non-standard content types not typically encountered in web video.

Figure 14. Extended Color Format (ECF) test results. Click to view at full resolution.

Lei presented test results for extended color format (ECF) sequences using both 4:2:2 and 4:4:4 subsampling in All Intra, Random Access, and Low Delay modes (Figure 14). AV2 delivered strong results across all three modes, with overall bitrate reductions between 24 and 33 percent, depending on configuration and content type. The table breaks down performance across various classes, with results aggregated into mandatory sets and overall averages, highlighted in red.

AOM tested two versions. The left used SDP=1 (Simple Decoding Profile), a constraint mode for simpler decoders, while the right used SDP=0, which allows more complex tools. The results show slightly better efficiency when these constraints are removed, as expected.

Subjective Quality Testing

Figure 15. Subjective quality testing. Click to view at full resolution.

Lei closed with early subjective testing conducted internally at Google on UHD content. In this test, human viewers rated compressed videos using an 11-point DCR (Double Stimulus Continuous Quality Scale) method. The results showed strong alignment with objective measures: AV2 achieved an average 38 percent bitrate reduction over AV1 for similar perceived quality. Individual clips showed savings as high as 50 percent.

Here’s Lei’s presentation on YouTube.

Film Grain Synthesis

The final presentation was by Li-Heng Chen, a video encoding specialist at Netflix, who discussed the company’s deployment of film grain synthesis (FGS) using AV1. His talk focused on the technical challenges and practical strategies Netflix used to implement FGS at scale, including denoising, grain parameter estimation, decoder-side rendering, and adaptive streaming constraints. He walked through the FGS model’s use of autoregressive noise synthesis and piecewise linear intensity scaling, and showed how FGS improves bitrate efficiency and visual quality on noisy content.

While the presentation centered on AV1, its particular relevance to AV2 is that it confirms the continued importance of film grain synthesis in modern video pipelines. Chen noted that AV2 retains FGS as a mandatory tool, with improvements underway in areas like grain randomness and smaller block support. This supports AV2’s goal of delivering artifact-free reproduction of film-style content, especially at higher resolutions.

Here’s Chen’s presentation.

What We Still Don’t Know About AV2

  • Encoder complexity: No benchmarks or comments were provided. Encoding cost remains a major unknown, especially for real-time or large-scale use.

  • Decoder performance in software: No benchmarks or comments were provided. Decoder performance dictates usability in computers and mobile devices. The fact that it wasn’t addressed was remarkable.

  • Decoder performance in hardware: While all tools were reviewed for hardware feasibility, no actual silicon implementations or power/performance estimates were shared.
  • HDR support: Norkin mentioned possible extensions for higher bit depths, but made no reference to HDR formats like PQ or HLG, or how AV2 might handle tone mapping or metadata. This lack of clarity is particularly noticeable now that AV1 has reached full HDR10+ delivery support. For example, Netflix recently began streaming AV1 with HDR10+ to certified devices, marking a major milestone in AV1’s evolution. More on that here.

  • Licensing and IP risk: No discussion of royalty status or patent exposure. The reliance on a 35-year-old hybrid block-based framework raises familiar concerns.

  • Toolset profiles: It’s unclear whether there will be baseline, main, or constrained tool profiles that simplify implementation.
  • Streaming-specific features: No mention of low-latency modes.

Overall Conclusions

Figure 16. My AV2 deployment timeline.

Here are some final thoughts. Note that I cover much of this in much greater detail in a Streaming Media article entitled, AV2 Arriving: What We Know, and What We Don’t Know.

1. Competitive performance. Regarding VVC, most comparisons show VVC to be about 30–40% more efficient than AV1. If the current AV2 test results hold, AV2 should close most, if not all, of that gap. However, it is unlikely to significantly outperform VVC in raw compression efficiency.

2. Historically, new codecs only succeed when they open or stabilize new markets. H.264 unlocked mass-market streaming and mobile video. HEVC followed with support for 4K and HDR at scale. By contrast, formats like VP9, AV1, and even VVC have seen only partial or regional uptake. They improve efficiency but do not open clearly defined new opportunities. AV2 will face the same challenge.

While AV2 introduces support for multi-layer and overlay tools, it remains unclear whether these features will drive adoption in practical deployments. Otherwise, if it can deliver high-quality, low-bitrate encoding for mobile networks or UGC platforms without hardware, it may follow the same path AV1 has taken with short-form UGC video.

But if it requires hardware and HDR to be competitive for premium content, it will enter a long queue of formats waiting for industry-wide support. Even Netflix may be slow to adopt it at scale, since HEVC and AV1 are already deeply integrated and widely deployed. AV2 would need a specific gain, use case, or business trigger to justify replacing both.

3. UGC is always first. YouTube started deploying AV1 when (at least for the rest of us) encoding times were hundreds if not thousands of times longer than HEVC. Of course, this relies on software playback. Remarkably, neither Norkin nor Lei provided any details regarding either encoding time or decoding complexity. So, it’s tough to tell how close UGC deployment is.

4. Premium Content needs hardware playback. Best case, a critical mass of AV2 hardware in the living room is 5- 7 years away.

5. AV2 may slow AV1 adoption. There is also a risk that AV2’s arrival could slow AV1 adoption among premium platforms, similar to how AV1 may have delayed VP9. If companies like Disney or Amazon decide to wait for AV2, AV1’s role could remain limited to mobile and internal workflows. In that scenario, HEVC could retain its place for premium streaming for another five years, even if better options exist.

6.  All codecs beyond H.264 are encumbered by pools claiming royalties on content. While it seems unlikely that a codec based on 35-year-old technology will avoid this, hope springs eternal. Of course, hope isn’t an IP strategy; check with your IP attorneys.

7.  VVC has a head start in Brazil (with LCEVC) and momentum in China, though China’s efforts are centered more on AVS3. Netflix, YouTube, and Meta are remarkable with their codec support; most other major UGC and premium content services support far fewer codecs. Many UGC services support only H.264, while most premium shops support only H.264 and HEVC. It’s challenging to predict a world where many primary services support more than three codecs.

8.  At some point, AI codecs will have to disrupt the streaming compression market. Norkin’s comment about 35-year-old technology highlights the fact that the video codec market is long past due for a technical and IP disruption. Neural processing units (NPU) for AI-based processing have appeared on smartphones since 2017, computers since 2023, and Smart TVs starting in 2025. It’s much more likely that low-end Android phones will feature an NPU than a dedicated hardware decoder. A coordinated effort to standardize NPUs could open the door for an AI-based codec much sooner than AV2.

At a large scale, the cost of encoding is rarely the problem. Once hardware support is in place, even complex formats can be adopted for high-value content. But reaching that point requires more than compression gains. It takes compelling use cases, aligned incentives, and a clear return on effort. Whether AV2 will meet that bar remains to be seen.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

The Business Models Powering Modern Streaming

Every streaming service runs on a business model which shapes everything from content acquisition to …

Rethinking Multiview Economics: When Server-Side Beats Client-Side

As you may have seen, I’ve been spending a lot of time analyzing multiview solutions …

The Future of Multiview: Client, Server, and Build Your Own (BYOMV)

Multiview, or the ability to view multiple live feeds simultaneously, is quickly becoming a must-have …

2 comments

  1. AV2 looks impressive, but it seems like its ready for a cocktail party, not a dance floor – still needs work on its encoder complexity! While the performance gains are tempting, the lack of hardware decoder benchmarks and HDR details leaves us wondering if AV2 is just showing off its moves without telling us how its done. And speaking of moves, the fact that its sticking to the hybrid block-based model is like showing up to a tech party in jeans – familiar, but not exactly exciting. Lets wait and see if AV2 can finally break a leg and impress us beyond the initial performance numbers.

  2. As a non-technical person, this looks very informative. My only contribution is speculation, about “AI codecs”. It was interesting that Tencent’s internal, non-standardized codec that has been developed with the help of AI did better than AV1, in the latest MSU tests: both faster to encode, and more efficient compression. But I did not see anything about decode speed.

Leave a Reply

Your email address will not be published. Required fields are marked *