Few-Shot Domain Adaptation for Learned Image Compression

The white paper, authored by researchers from the University of Science and Technology of China, introduces a novel approach to addressing the limitations of pre-trained learned image compression (LIC) models. These models, while effective within the domains they were trained on, often falter when applied to new, domain-specific data. To overcome this, the researchers propose lightweight adapters that allow efficient domain adaptation while preserving compression performance. This approach aims to balance computational feasibility with improved adaptability for real-world applications, such as medical imaging and gaming.

Technology Overview

The researchers present two types of adapters: Conv-Adapters and LoRA-Adapters, which modify latent channel energy distributions to optimize the pre-trained LIC models for specific domains. These adapters are modular and architecture-agnostic, making them applicable to a range of existing LIC frameworks. The training process involves two stages: an initial joint optimization of all adapters followed by fine-tuning for the specific domain, ensuring both generalization and domain specificity.

Figure 1. Integration of Conv-Adapters and LoRA-Adapters into the LIC pipeline.

Evolution of the Technology

Traditionally, codecs like JPEG, H.264, and HEVC/x265 have relied on hand-crafted algorithms, achieving decades of success but facing limitations in adapting to specific domains. Modern LIC models, such as Cheng2020 and ELIC, surpass these traditional approaches in compression efficiency but struggle with domain-specific data.

Prior attempts at domain adaptation, including DANICE, paved the way for techniques like the proposed adapters. However, the researchers’ method advances the field by introducing lightweight and architecture-agnostic solutions that integrate seamlessly into existing LIC workflows without requiring substantial computational resources.

How the Researchers Tested

The researchers evaluated their approach using five datasets: SCID (screen content), LROC (craters), GamingVideoSET (gaming videos), pathology images, and pixel art. These datasets represent a diverse set of challenging domains, ranging from high-contrast text and graphics to detailed medical scans.

  • Testing Hardware: The experiments were conducted on NVIDIA Tesla V100 GPUs, ensuring compatibility with widely used high-performance hardware.
  • Metrics Used: Key metrics included BD-rate for compression efficiency, PSNR for objective quality, and subjective evaluations to assess visual fidelity.
  • Baseline Comparisons: The models compared included Cheng2020, ELIC, TCM, and MLIC++. The researchers tested the base models and their adapted versions to measure improvements attributable to the adapters.

Key Findings

Figure 2. This figure shows RD curves for adapted versus unadapted models, illustrating consistent performance gains.
  • Compression Efficiency:
    The adapters demonstrated an average BD-rate reduction of 22% across the datasets, outperforming HEVC/x265 and, in some cases, rivaling VVC/x266. For example, the adapters excelled in gaming and pathology datasets, preserving intricate textures and colors while reducing bitrates.
  • Encoding Complexity:
    Encoding speed was measured at 35 FPS on NVIDIA Tesla V100 GPUs, with adapters introducing only a 5–10% increase in parameter count. This minimal overhead demonstrates that the method is practical for real-time applications.
  • Decoding Complexity:
    Decoding was efficient, achieving real-time playback (1080p30) on mid-range GPUs and scaling effectively to NPUs. The results indicate broad hardware compatibility, making the approach viable for enterprise and consumer devices.

Detailed Evaluation

  1. Detailed Evaluation

    1. Compression Efficiency:
      Score: 9/10
      Rationale: The adapters consistently reduced BD-rate by over 20%, surpassing HEVC/x265 and matching VVC/x266 in certain scenarios, particularly for gaming and pathology datasets.
    2. Encoding Complexity:
      Score: 7/10
      Rationale: Encoding at 35 FPS on Tesla V100 GPUs with a modest 5–10% increase in computational overhead aligns with the benchmarks for mid-tier hardware efficiency.
    3. Decoding Complexity:
      Score: 8/10
      Rationale: Demonstrated real-time decoding for 1080p30 on consumer GPUs and NPUs, with consistent performance across supported devices.
    4. Applicability:
      Score: 7/10
      Rationale: The method is highly effective for domains such as gaming and medical imaging but requires GPU support, limiting deployment in resource-constrained environments.
    5. Compatibility & Integration:
      Score: 6/10
      Rationale: The adapters are designed for LIC frameworks, requiring an AI codec but functioning as a plug-in. They are incompatible with traditional codecs (e.g., x265 or AV1) without substantial modifications.
    6. Intellectual Property:
      Score: 4/10
      Rationale: Licensing terms are ambiguous, posing potential challenges for commercial deployment.
Category Score Weight (%) Weighted
Contribution
Compression Efficiency 9 30% 2.7
Encoding Complexity 7 20% 1.4
Decoding Complexity 8 20% 1.6
Applicability 7 15% 1.05
Compatibility & Integration 6 10% 0.6
Intellectual Property 4 5% 0.2
Total Score 7.55/10

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

The New Face of FAST: How Original Content Is Redefining Free Streaming

When Free Ad-Supported Streaming TV (FAST) first emerged, its channels quickly became known as the …

Cracking the Code(c): It’s All About the Implementation

Free Webinar: Cracking the Code(c): It’s All About the Implementation January 30, 2025 – 16:00 …

CSAI vs SSAI in Video Ad Insertion: A Comprehensive Guide with Recommendations

Introduction Ad insertion technologies play a crucial role in monetization strategies. Two primary methods dominate …

Leave a Reply

Your email address will not be published. Required fields are marked *