Evaluating DeepMind C3: A Low-Complexity Neural Codec with Competitive Compression Efficiency

This paper, entitled, C3: High-performance and low-complexity neural compression from a single image or video, and authored by a team from Google DeepMind (Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Schwarz, and Emilien Dupont), introduces DeepMind C3, a neural compression method designed to deliver low decoding complexity while achieving competitive rate-distortion (RD) performance. The authors are recognized for their contributions to neural compression technologies, with expertise in autoencoder and neural field-based models. The motivation for C3 is to overcome the computational challenges of neural codecs, which typically require high decoding resources, limiting their deployment on constrained hardware.

The Big Idea

Neural compression has long promised superior efficiency over traditional codecs, but its high decoding complexity has been a major roadblock to real-world deployment—especially in latency-sensitive applications like streaming and mobile video. C3 flips this challenge on its head by proving that state-of-the-art compression performance doesn’t have to come at the cost of heavy decoding demands.

What makes C3 groundbreaking is its ability to match or even outperform traditional codecs like VTM (H.266 reference), while slashing decoding complexity to less than 3,000 MACs per pixel. It achieves this through smart architectural choices, including resolution-adaptive entropy models and soft-rounding techniques that simplify computation without compromising quality. This represents a major shift in neural codec design, positioning C3 as a practical, scalable solution for both image and video compression in real-world applications.

Technology Overview

C3 builds upon the COOL-CHIC framework, introducing several key innovations to enhance RD performance while dramatically reducing computational demands:

  • Optimization Improvements:
    • Soft-rounding with annealed temperature for smoother quantization during training.
    • Kumaraswamy noise to improve the approximation of quantization effects.
    • Adaptive learning rates for improved training stability.
  • Model Architecture Refinements:
    • GELU activations for enhanced expressiveness in compact networks.
    • A resolution-adaptive entropy model that adjusts dynamically to different compression settings.
  • Video-Specific Extensions:
    • Extension from 2D to 3D models to capture temporal dependencies.
    • Custom masking strategies to reduce computational overhead for video processing.

C3 demonstrates state-of-the-art decoding efficiency, achieving RD performance comparable to VTM (H.266 reference) for images and competitive results with leading neural video codecs, all while requiring significantly lower decoding complexity.

Experimental Setup and Results

Experimental Setup

The C3 codec was evaluated using comprehensive experiments designed to measure RD performance, decoding complexity, and computational efficiency. The setup included:

  • Datasets:
    • Kodak: A standard benchmark dataset for image compression containing 24 images at 512×768 resolution.
    • CLIC2020: A diverse dataset of professional images, used to test RD performance at varying resolutions.
    • UVG Video Dataset: High-quality videos used for video codec benchmarking, containing diverse motion and content complexity.
  • Metrics:
    • BD-Rate: Used to quantify RD performance relative to baselines.
    • MACs/pixel: A measure of decoding complexity.
    • PSNR: Assesses pixel-level accuracy.
  • Hardware Configuration:
    • Decoding times were evaluated on Intel Xeon CPUs and NVIDIA GPUs to assess both CPU-only and GPU-accelerated scenarios.

Results

DeepMind C3 Compression Efficiency:

“On CLIC2020, C3 (with a single setting for its architecture and hyperparameters) significantly outperforms COOL-CHICv2 across all bitrates (−22.2% BD-rate) and nearly matches VTM (+1.4% BD-rate). When adapting the model per image, C3 even outperforms VTM (−2.0% BD-rate). To the best of our knowledge, this is the first time a neural codec has been able to match VTM while having very low decoding complexity (below 3k MACs/pixel).”

According to the authors, this represents a breakthrough for neural codecs, with C3 not only competing with traditional compression algorithms like VTM but even surpassing them in certain configurations—without the heavy computational cost typically associated with neural models.

Figure 1: This plot compares the BD-rate (compression efficiency) and decoding complexity (MACs/pixel) of C3 against other codecs, including VTM (H.266), COOL-CHIC, and other neural baselines. The two red points represent C3 configurations, highlighting its superior trade-off between RD performance and computational efficiency.

Encoding Complexity

“The encoding of C3 is slow, making it impractical for use cases requiring real-time encoding. Yet, there are several use cases for which paying a significant encoding cost upfront can be justified if RD performance and decoding time are improved. For example, a popular video on a streaming service is encoded once but decoded millions of times.”

While C3 excels in decoding efficiency, its encoding process is computationally intensive, making it unsuitable for real-time encoding scenarios. However, this limitation is mitigated in video-on-demand (VOD) environments, where the same content is encoded once but decoded by millions of users. In such cases, the encoding overhead is an acceptable trade-off for the gains in decoding speed and RD performance.

Decoding Complexity

“We then extend C3 to the video setting, where we match the RD performance of VCT with less than 0.1% of their decoding complexity.”​

According to the authors, this highlights C3’s most remarkable achievement: delivering RD performance on par with advanced video codecs like VCT while operating at less than 0.1% of their decoding complexity. This level of efficiency makes C3 highly attractive for deployment on mobile devices, embedded systems, and other resource-constrained environments where decoding power is a key bottleneck.

Detailed Evaluation

  • Compression Efficiency:
    Score: 9/10
    According to the authors, C3 significantly outperforms COOL-CHICv2 and even surpasses VTM in certain scenarios, establishing itself as a leading neural codec in terms of RD performance.
  • Encoding Complexity:
    Score: 4/10
    The encoding process is slow and resource-intensive, making it impractical for real-time applications. However, it’s acceptable for workflows like VOD where encoding is done once but decoding occurs millions of times.
  • Decoding Complexity:
    Score: 9/10
    C3 delivers exceptionally low decoding complexity, matching VCT’s RD performance with less than 0.1% of the computational cost, making it ideal for mobile and embedded systems.
  • Applicability:
    Score: 6/10
    While C3’s decoding efficiency is impressive, its high encoding demands and reliance on AI frameworks limit its applicability in real-time streaming or traditional broadcast environments.
  • Compatibility & Integration:
    Score: 7/10
    C3 integrates well within neural codec pipelines but requires significant adaptations for compatibility with traditional codecs like x265 or AV1.
  • Intellectual Property:
    Score: 4/10
    The paper does not specify licensing terms, creating uncertainty around its potential for commercial deployment.

Scoring Summary

Category Score Weight (%) Weighted
Contribution
Compression Efficiency 9 30% 2.7
Encoding Complexity 4 20% 0.8
Decoding Complexity 9 20% 1.8
Applicability 6 15% 0.9
Compatibility & Integration 7 10% 0.7
Intellectual Property 4 5% 0.2
Total Score 7.1/10

Summary & Recommendations

C3 represents a major advancement in neural compression, achieving an impressive balance between compression efficiency and decoding simplicity—a combination rarely achieved in the field. Its ability to match or exceed VTM’s performance while operating at a fraction of the decoding complexity makes it a strong candidate for real-world deployment in areas where decoding efficiency is critical.

However, C3’s high encoding complexity remains a bottleneck, restricting its utility in real-time applications.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Sye: The Best Low-Latency Tech You’ve Never Heard Of

When your service spends a fortune on sports rights, the pressure on your technology team …

The Future of Targeted Ads: Lessons from Disney’s Ad Tech Evolution

As third-party cookies are phasing out, the race to refine first-party data strategies is on, …

The New Face of FAST: How Original Content Is Redefining Free Streaming

When Free Ad-Supported Streaming TV (FAST) first emerged, its channels quickly became known as the …

Leave a Reply

Your email address will not be published. Required fields are marked *