Three recent Alliance for Open Media presentations on YouTube shed new light on AV2’s performance and utility.
-
Andrew Norkin, Director of Codec Development at Netflix, presented the current status and architecture of AV2. He outlined the codec’s design goals, early performance results, and hardware-focused development approach, noting that the low-level toolset is now finalized. The YouTube video is here.
-
Ryan Lei, Video Compression Engineer at Meta, shared AV2 performance data generated under AOM’s Common Test Conditions. His talk covered test configurations, performance metrics, and recent results across adaptive streaming, HDR, and extended color formats. The YouTube video is here.
-
Li-Heng Chen, Software Engineer at Netflix, discussed the deployment of AV1’s Film Grain Synthesis tool. While centered on AV1, the talk confirmed that AV2 will retain FGS as a mandatory feature and introduced enhancements relevant to next-gen codec development. The YouTube video is here.
I present them in order with a combined conclusion at the end.
For performance results, Norkin presented a single slide, while Lei presented All Intra, Random Access, Adaptive Bitrate Streaming, and Subjective Results. Jump to his presentation if that’s your primary concern. Just so you know, all performance data compared AV2 to AV1, with no reference to HEVC or VVV.
Contents
AV2 Codec Architecture, Presented by Netflix’s Andrew Norkin
This section comes from Norkin’s YouTube video, which is available here and embedded below.
AV2 Introduction and Timeline

Norkin began his presentation by outlining the timeline and motivation behind AV2. He recapped AOM’s formation in 2015 and the 2018 release of AV1, a codec built from several open technologies, including VP10, Daala, and Thor. AV1 has since been widely adopted by major streaming platforms.
In 2020, AOM began work on the next-generation video codec that would become AV2, designed to significantly improve compression efficiency and expand the toolset to support a broader range of use cases, including low-bitrate streaming, screen content, and layered video.

Norkin emphasized that over the past four years, the codec working group evaluated dozens of proposed tools for AV2, applying strict criteria around complexity and implementation feasibility. He noted that hardware decoding concerns were a consistent focus, with companies like AMD and Realtek participating specifically to assess hardware readiness. While he did not comment on licensing or IP strategy, the involvement of major tech and streaming platforms, including companies that are also active in VVC development, like Tencent and Alibaba, suggests that AV2 is being seriously considered across the industry.
Analysis: The breadth of participation indicates that AV2 is not just a research project, but a codec under serious consideration by companies with significant influence over streaming infrastructure and device deployment. This includes Amazon and Apple, both of which deliver streaming hardware devices and premium content. Notably, neither has publicly adopted AV1 for premium content, making their participation in AV2 potentially noteworthy if for content and not exclusively for devices. Involvement from VVC-aligned companies also suggests that many platforms are hedging, planning to support multiple formats depending on use case or geography. Whether that turns into real-world deployment will likely depend on the final spec, IP clarity, and available encoder/decoder support.
Norkin described AV2 as a new codec built around an expanded toolset and significantly higher compression efficiency. He did not explicitly state that hardware would be required, but the repeated emphasis on decoding complexity, combined with feedback from hardware implementers, suggests that AV2 is being designed with hardware acceleration in mind.
Analysis: This is consistent with how next-gen codecs are typically developed: software reference implementations prove out the tools, but real-world adoption, especially in mobile and consumer devices, often depends on custom silicon. AV2 will benefit from hardware, and for some use cases, particularly high-res or low-power environments, it may be essential.
AV2 Performance Results: Part I

Norkin shared the most recent performance results for AV2, based on version 11.0.0 of the AVM reference software. He noted that nearly all low-level coding tools have now been finalized, with remaining work focused on high-level syntax. The results compare AV2 against a modified AV1 anchor across multiple configurations, including all-intra (AI), low-delay (LD), and random access (RA). In the RA configuration, which is the most representative for streaming, AV2 showed a 28.6% bitrate reduction for equivalent PSNR-YUV and a 32.6% reduction based on VMAF.
Analysis: These are substantial gains over AV1 and suggest that AV2 is approaching a level of maturity suitable for early testing. Random access is the most representative configuration for streaming use cases, so improvements in this mode are especially relevant for commercial deployments. While objective metrics like PSNR and VMAF do not fully capture perceptual quality, the consistency across metrics supports the claim that AV2 offers meaningful efficiency gains.
Norkin did not address encoding complexity, which will be a critical factor for adoption. AV1’s initial encoding cost limited its use in many real-time or large-scale workflows, and AV2 introduces additional tool complexity. Whether these gains can be delivered at acceptable encoding speeds remains an open question. On the decoding side, the emphasis on hardware review confirms that AV2 is being shaped with hardware support in mind, but actual decoder implementations have not yet been benchmarked publicly.
Note that Ryan Lei shared a lot more performance data in his discussion, which I present below.
AV2 Framework and Tools

Norkin walked through the overall AV2 codec framework, noting that while many individual tools have evolved, the fundamental architecture remains consistent with the hybrid block-based approach used for decades. As he put it, the codec is “basically a typical hybrid block-based model that has… existed for at least… 35 years.” The flow includes standard components like block partitioning, intra and inter prediction, transforms, quantization, coefficient and entropy coding, and in-loop filtering. He emphasized that the high-level structure, from prediction through reconstruction, remains largely unchanged from earlier generations of video codecs.
Norkin also highlighted other targeted use cases, including screen content tools like palette modes and inter-block copy, stereo video, and support for multi-layer or atlas-based video compositions. These features aim to improve coding efficiency and flexibility for complex visual experiences, such as overlays, spatial video, or layered UI elements.
Analysis: From an IP perspective, nothing signals patent exposure quite like a model that, as Norkin put it, has “existed for at least 35 years.” The hybrid block-based design has been the basis for virtually every major codec since the 1990s and has been the subject of extensive litigation and licensing (see here). By staying within this well-trodden framework, AV2 benefits from engineering familiarity and hardware compatibility, but it also re-enters a space that is densely populated with existing claims. Whether AV2 avoids the same IP minefield that has challenged other formats remains an open question..
Deep Dive into AV2 Tools
At this point, Norkin started discussing the individual tools. Here’s the video, queued to that section.
Norkin’s Conclusion

In his closing remarks, Norkin confirmed that AV2’s low-level toolset is essentially finalized. Work now shifts to high-level syntax and specification writing, with a formal release expected by the end of 2025.
The most recent AVM version shows substantial bitrate savings over AV1 as measured by VMAF under random access configurations. These results, while promising, still come with open questions about encoder complexity, hardware readiness, and deployment models.
Future work will focus on software speed improvements, encoder-side tuning for visual quality, and possible extensions to support higher bit depth content or AI-based profiles. Norkin framed these results as a sign that the codec is ready for testing, if not yet deployment.
AV2 Common Test Conditions
The next video captures Ryan Lei’s presentation on AV2 Common Test Conditions and test results. By way of background, Lei was instrumental in deploying AV1 for Facebook Reels in 2023.

Lei began by walking through the evaluation framework used to assess AV2 coding tools. As shown in Figure 6, the AOM Testing Subgroup is responsible for defining common test conditions, which include test sequences, encoding configurations, performance metrics, and the infrastructure needed to run evaluations consistently across proposals. The current version of these conditions is version 7.0, which was formalized under document CWG-E083. That framework has recently been extended to support 4:2:2 and 4:4:4 color formats, expanding testing beyond traditional 4:2:0 use cases.
Ryan noted that the group conducts regular testing with each release of the AVM reference encoder, including both full evaluations and “tools on/off” tests that isolate the impact of specific coding features. Two major anchor releases were highlighted: version 10.0 in June 2025 and version 11.0 in September 2025. In addition to objective metrics like PSNR and VMAF, the subgroup is also working on a plan to introduce subjective testing, where human viewers rate the perceptual quality of encoded video.

Test Descriptions
Lei presented the latest version of the test sequences used to evaluate AV2 tools under the AOM Common Test Conditions. The current set includes 91 videos and 51 images, covering a wide range of formats and use cases. These are grouped into categories such as high-resolution video (including 4K and 8K), lower-resolution mobile formats, synthetic content for gaming and screen sharing, and still images at multiple resolutions.
Two new classes have been added to account for HDR content, using BT.2100 color space with a PQ transfer function. Another class includes user-generated content, such as handheld or action camera footage. This expanded test set allows the group to measure codec performance across a broader spectrum of content types, from cinematic video to noisy real-world captures.

Lei then explained the different encoding configurations used in AV2 testing. All results are based on normative-only encoding, meaning non-standardized tools like two-pass encoding, adaptive quantization, and keyframe filtering were disabled. These features can significantly improve quality and efficiency in production workflows, but are excluded by AOM for this testing to ensure a controlled, codec-level comparison. Since the same constraints apply to AV1, the gains shown reflect the impact of AV2’s toolset alone, not complete encoder optimization.
There are five primary encoding modes: All Intra (AI), Random Access (RA), Low Delay (LD), Adaptive Streaming (AS), and Still Image. AI is used for testing keyframes and stills by encoding the first 30 frames of each sequence as intra-only. RA uses a closed GOP structure with five hierarchical layers and tests long-form streaming scenarios. LD is similar but has only one keyframe and no future references, modeling low-latency use cases. AS downscales 4K sequences into five resolutions for streaming evaluation.
Each configuration is encoded using constant quality mode, with fixed QP values across hierarchical layers. A table of QP values was provided for reference.

To better evaluate streaming performance, the team introduced an adaptive streaming configuration. Each 4K sequence from the A1 class was downscaled into six lower resolutions using Lanczos 5 filtering. These versions were then encoded, decoded, and upscaled back to 4K for quality assessment. Metrics like PSNR and VMAF were calculated against the original 4K resolution to simulate real-world playback scenarios. This approach helps model how well AV2 handles resolution switching and upscaling in adaptive bitrate ladders.
To evaluate performance across bitrates, the team generated additional rate-distortion (RD) points through bilinear interpolation. This helped smooth out the quality curve and better reflect the codec’s performance across the full bitrate range. For each resolution, 41 total data points were used. These points were then used to construct convex hulls, which represent the most efficient quality-bitrate tradeoffs. BDRATE calculations were performed between hulls from different tool configurations to isolate the coding gain of individual AV2 features. This method enables more accurate comparisons between candidate tools and previous baselines, particularly in adaptive streaming scenarios.
The testing subgroup used a broad set of quality metrics to evaluate AV2 performance, including PSNR, SSIM, VMAF, and CAMBI for banding artifacts. Most metrics were calculated using Netflix’s VMAF toolset, with weighting applied to better reflect perceptual differences across YUV channels.
All Intra Results

Extended Color Format (ECF) Testing

Lei then introduced the Extended Color Format (ECF) test results (Figure 13). These tests evaluate AV2’s performance on high-fidelity content that extends beyond the typical 4:2:0 streaming use case. This includes professional and broadcast workflows, as well as HDR content that uses 10-bit depth and broader color gamuts.
The ECF test set includes 34 sequences across six content classes. These cover a mix of YUV 4:4:4 and 4:2:2 material, screen content, and a range of resolutions from 640×360 up to 4K. The sequences span SDR, HDR, RGB, and YCocCg formats, with frame rates from 15 to 120 fps. This setup is designed to stress the codec’s ability to handle richer chroma, higher dynamic range, and non-standard content types not typically encountered in web video.

Subjective Quality Testing

Lei closed with early subjective testing conducted internally at Google on UHD content. In this test, human viewers rated compressed videos using an 11-point DCR (Double Stimulus Continuous Quality Scale) method. The results showed strong alignment with objective measures: AV2 achieved an average 38 percent bitrate reduction over AV1 for similar perceived quality. Individual clips showed savings as high as 50 percent.
Here’s Lei’s presentation on YouTube.
What We Still Don’t Know About AV2
Streaming Learning Center Where Streaming Professionals Learn to Excel




AV2 looks impressive, but it seems like its ready for a cocktail party, not a dance floor – still needs work on its encoder complexity! While the performance gains are tempting, the lack of hardware decoder benchmarks and HDR details leaves us wondering if AV2 is just showing off its moves without telling us how its done. And speaking of moves, the fact that its sticking to the hybrid block-based model is like showing up to a tech party in jeans – familiar, but not exactly exciting. Lets wait and see if AV2 can finally break a leg and impress us beyond the initial performance numbers.
As a non-technical person, this looks very informative. My only contribution is speculation, about “AI codecs”. It was interesting that Tencent’s internal, non-standardized codec that has been developed with the help of AI did better than AV1, in the latest MSU tests: both faster to encode, and more efficient compression. But I did not see anything about decode speed.