Few-Shot Domain Adaptation for Learned Image Compression

The white paper, authored by researchers from the University of Science and Technology of China, introduces a novel approach to addressing the limitations of pre-trained learned image compression (LIC) models. These models, while effective within the domains they were trained on, often falter when applied to new, domain-specific data.

To overcome this, the researchers propose lightweight adapters that allow efficient domain adaptation while preserving compression performance. This approach aims to balance computational feasibility with improved adaptability for real-world applications, such as medical imaging and gaming.

While the paper compares its performance to video codecs like HEVC/x265 and VVC/x266, it specifically focuses on their intra-frame (still image) compression modes, which are commonly used as benchmarks for image compression efficiency. In typical video delivery scenarios, where I-frames might appear as infrequently as once every 60 frames, a 20% gain in I-frame compression won’t translate to a 20% overall bitrate reduction. However, these results are still relevant for applications with high I-frame frequency, keyframe-heavy content, or all-intra coding workflows—and even for streaming delivery, where I-frames, though infrequent, can still contribute meaningfully to the overall data rate, particularly in scenarios with shorter GOP structures.

The Big Idea

Pre-trained learned image compression (LIC) models perform well within their original training domains but struggle with new types of images, causing significant performance drops. This paper introduces a lightweight, plug-and-play solution: adapters that fine-tune compression models to new domains with minimal data and computational effort.

Instead of retraining entire models—which is time-consuming and resource-intensive—these adapters adjust how information is handled inside the model, like rebalancing data flow to fit new image characteristics (e.g., pixel art, medical images, game graphics). The result? Up to 22% better compression efficiency with less than 2% extra computational overhead, matching or even outperforming traditional codecs like HEVC/x265 and VVC/x266 in some cases. This makes domain adaptation feasible for real-world applications without the heavy costs typically associated with retraining.

Technology Overview

The researchers present two types of adapters: Conv-Adapters and LoRA-Adapters, which reallocate information across channels to optimize pre-trained LIC models for specific domains. These adapters are modular and architecture-agnostic, making them applicable to a range of existing LIC frameworks. The training process involves two stages: an initial joint optimization of all adapters followed by fine-tuning for the specific domain, ensuring both generalization and domain specificity.

Figure 1. Integration of Conv-Adapters and LoRA-Adapters into the LIC pipeline.

Evolution of the Technology

Traditionally, codecs like JPEG, H.264, and HEVC/x265 have relied on hand-crafted algorithms, achieving decades of success but facing limitations in adapting to specific domains. Modern LIC models, such as Cheng2020 and ELIC, surpass these traditional approaches in compression efficiency but struggle with domain-specific data.

Prior attempts at domain adaptation, including DANICE, paved the way for techniques like the proposed adapters. However, the researchers’ method advances the field by introducing lightweight and architecture-agnostic solutions that integrate seamlessly into existing LIC workflows without requiring substantial computational resources.

How the Researchers Tested

The researchers evaluated their approach using five datasets: SCID (screen content), LROC (craters), GamingVideoSET (gaming videos), Pathology Images, and Pixel Art. These datasets represent a diverse set of challenging domains, ranging from high-contrast text and graphics to detailed medical scans.

  • Testing Hardware: Experiments were conducted on NVIDIA Tesla V100 GPUs, ensuring compatibility with widely used high-performance hardware.
  • Metrics Used: Key metrics included BD-rate for compression efficiency, PSNR for objective quality, and subjective evaluations to assess visual fidelity.
  • Baseline Comparisons: Models compared included Cheng2020, ELIC, TCM, and MLIC++, with both base models and their adapted versions tested to measure improvements attributable to the adapters.

Key Findings

Figure 2. This figure shows RD curves for adapted versus unadapted models, illustrating consistent performance gains.
  • Compression Efficiency:
    The adapters demonstrated BD-rate reductions exceeding 20% across several datasets, outperforming HEVC/x265 and, in some cases, matching the performance of VVC/x266. For example, the adapters excelled in gaming and pathology datasets, preserving intricate textures and colors while reducing bitrates. The authors stated, “With 25 target samples, our method achieves BD-rate reductions exceeding 20% across multiple domains, including -27.85% for Pixel, -31.10% for Screen-content, -19.12% for Craters, -14.20% for Game, and -18.52% for Pathology, outperforming existing adaptation methods and matching VVC performance in several cases.”
  • Encoding Complexity:
    Encoding performance on NVIDIA Tesla V100 GPUs showed minimal overhead, with adapters introducing less than a 10% increase in FLOPs and requiring fewer than 2% additional parameters compared to full model fine-tuning. Encoding performance on Tesla V100 GPUs demonstrates strong efficiency, though specific frame rate benchmarks were not provided.
  • Decoding Complexity:
    Decoding was efficient, with less than a 1.5% increase in decoding time and under 10% additional FLOPs compared to pre-trained models. This suggests potential for efficient decoding with minimal overhead, though specific benchmarks for real-time playback at 1080p30 or performance on NPUs were not provided.

Detailed Evaluation

  1. Compression Efficiency:
    Score: 7/10
    Rationale: The adapters consistently reduced BD-rate by over 20% in image compression scenarios, surpassing HEVC/x265 and matching VVC/x266 in specific domains like gaming and pathology. However, as the impact is limited to I-frames in video workflows, the overall benefit for video compression is more modest.
  2. Encoding Complexity:
    Score: 6/10
    Rationale: Encoding performance on Tesla V100 GPUs shows strong efficiency, with less than a 10% increase in FLOPs and minimal additional parameters compared to full model fine-tuning. Since this applies only to intra-frame encoding, the broader impact on video encoding is limited.
  3. Decoding Complexity:
    Score: 7/10
    Rationale: The method introduces minimal overhead for decoding, with less than a 1.5% increase in decoding time and under 10% additional FLOPs. While this suggests strong efficiency for still images and I-frames, its effect on overall video decoding is marginal.
  4. Applicability:
    Score: 5/10
    Rationale: The technology is highly effective for still image compression and video applications with frequent I-frames or all-intra coding. Its applicability to typical video delivery scenarios is marginal due to the infrequent use of I-frames, which limits its overall impact on end-to-end video compression efficiency.
  5. Compatibility & Integration:
    Score: 6/10
    Rationale: The adapters integrate well within LIC frameworks but are not directly compatible with traditional video codecs like x265 or AV1 without significant modifications.
  6. Intellectual Property:
    Score: 4/10
    Rationale: The paper does not specify licensing details, which may require further clarification for commercial deployment.
Category Score Weight (%) Weighted
Contribution
Compression Efficiency 7 30% 2.1
Encoding Complexity 6 20% 1.2
Decoding Complexity 7 20% 1.4
Applicability 5 15% 0.75
Compatibility & Integration 6 10% 0.6
Intellectual Property 4 5% 0.2
Total Score 6.25/10

Summary & Recommendations

While the adapters deliver impressive compression gains for still images, their impact on video compression is confined to I-frames. In typical video delivery workflows, where I-frames are infrequent, the overall bitrate reduction is limited. However, the technology holds promise for keyframe-heavy content, all-intra coding scenarios, and domains like medical imaging and gaming where image-based efficiency is critical.

Future research could explore extending these techniques to inter-frame compression or optimizing them for video-specific workflows to broaden their applicability in the video ecosystem.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Sye: The Best Low-Latency Tech You’ve Never Heard Of

When your service spends a fortune on sports rights, the pressure on your technology team …

The Future of Targeted Ads: Lessons from Disney’s Ad Tech Evolution

As third-party cookies are phasing out, the race to refine first-party data strategies is on, …

The New Face of FAST: How Original Content Is Redefining Free Streaming

When Free Ad-Supported Streaming TV (FAST) first emerged, its channels quickly became known as the …

Leave a Reply

Your email address will not be published. Required fields are marked *