JPEG AI Is Coming: What You Need to Know

This article provides an overview of JPEG AI, which delivers superior compression efficiency and improved performance for AI tasks like object detection.

JPEG AI is an initiative by the Joint Photographic Experts Group (JPEG) aimed at developing a learning-based image coding standard that uses machine learning techniques to achieve superior compression efficiency, offering a compact representation optimized for both human visualization and computer vision tasks. Divided into four parts—core coding system, profiling, reference software, and conformance—JPEG AI is designed to support a wide range of applications, from cloud storage to autonomous vehicles. Using neural network-based compression, JPEG AI promises significant improvements over traditional codecs, though it is not backwards compatible with existing JPEG standards.

Early results from the Verification Model (VM) show impressive compression gains, and ongoing research explores further advancements. While primarily focused on still images, the techniques developed have potential applications in video coding, paving the way for future standards. The standardization process is underway, with the first part expected to be published in October 2024, setting the stage for a new era in image compression.

Standards-Based Origins

ISO/IEC DIS 6048-1Information technology — JPEG AI learning-based image coding system
Figure 1. JPEG AI is an ISO standard – ISO/IEC DIS 6048-1
Information technology — JPEG AI learning-based image coding system

JPEG AI is being developed as an international standard under ISO/IEC 6048, with the INCITS/JPEG Committee participating in the standardization process. The standard is divided into four parts currently in initial stages of development: Part 1 for the core coding system, Part 2 for profiling, Part 3 for reference software, and Part 4 for conformance.

By applying machine learning, JPEG AI aims to substantially improve compression efficiency over existing image coding standards while enabling efficient distribution and consumption of images across a wide range of applications related to the growing AI ecosystem, including cloud storage, visual surveillance, autonomous vehicles, and media distribution.

How JPEG AI Works

JPEG AI is a new image compression method that uses machine learning to compress images more efficiently than traditional methods. This process involves several key steps:

Diagrams shows the multiple steps involved in JPEG AI encoding and decoding.
Figure 2. The multiple steps involved in JPEG AI encoding and decoding.

This process involves several steps:

  • Analysis Transform: The encoder changes the original image into a simpler format (latent representation) using a function known as a nonlinear analysis transform.
  • Quantization: This step reduces the precision of the simplified data to make it more compressible. Unlike traditional methods that often divide the image into blocks before quantizing, JPEG AI applies quantization across the entire image based on learned patterns, optimizing the entire compression process.
  • Entropy Coding: The quantized data is then compressed into a bitstream, a tightly packed sequence of binary data, using a technique that accounts for the predictability of the information, making it even smaller.
  • Synthesis Transform: When it’s time to view the image again, a decoder uses a nonlinear synthesis transform to reconstruct the image from the compressed data.

The system is trained end-to-end to minimize errors and reduce file size through a rate-distortion loss function, which balances the amount of data lost during compression (distortion) and the size of the compressed image (rate).

Object Detection on Compressed Files

Beyond its encoding efficiency, JPEG AI’s technology allows for certain image processing and computer vision tasks to be performed directly on the compressed file. This capability enables operations to be conducted faster and with less computational effort, potentially facilitating new applications that integrate image compression with machine learning techniques.

Specifically, JPEG AI maintains a semantically meaningful representation of the image, allowing for direct performance of tasks such as object detection on the compressed file itself. This feature facilitates faster processing, reduces computational demands, and opens up new applications in environments where computing resources are limited, transforming image compression into a dynamic tool for modern imaging and analysis.

The ability to detect objects without fully decoding is a key feature of JPEG AI.
Figure 3. The ability to detect objects without fully decoding is a key feature of JPEG AI (image from this Medium article).

As an example, in traditional scenarios, to perform object detection (identifying and classifying objects within an image), the entire image usually needs to be fully decompressed into pixel data before analysis. However, with JPEG AI, since the compression process incorporates machine learning and maintains semantically meaningful and rich latent representations of the image, these representations can be directly utilized to detect objects without fully decompressing the image.

Object detection without decompression accelerates object detection, reduces the computational load and allows object detection algorithms to run on devices with limited processing power or in environments where computational resources are a constraint. The applications considered in the use cases and requirements document include visual surveillance, autonomous vehicles and devices, and mobile applications.

Backwards Compatibility

JPEG AI is designed as a new learning-based image coding standard and is not backwards compatible with existing JPEG standards like JPEG-1, JPEG 2000, or JPEG XL. However, JPEG AI will provide a royalty-free baseline that can be implemented across a wide range of devices and applications.

JPEG AI Verification Model Performance

Early results from the JPEG AI Verification Model (VM) have demonstrated significant compression efficiency improvements over state-of-the-art traditional image coding standards. This is from the JPEG press release here:

The current JPEG AI Verification Model (VM) has two operation points, called base and high which include several tools which can be enabled or disabled, without re-training the neural network models. The base operation point is a subset of design elements of the high operation point. The lowest configuration (base operating point without tools) provides 8% rate savings over the VVC Intra anchor with twice faster decoding and 250 times faster encoder run time on CPU. In the most powerful configuration, the current VM achieves a 29% compression gain over the VVC Intra anchor.

What JPEG AI Means for Video

JPEG AI is currently focused on compressing still images, but the same machine learning techniques could also transform video compression. Many of JPEG AI’s core techniques should be easily adaptable for video. Obviously, the semantic representations created by JPEG AI-like models bring similar benefits to video in the same markets, like security and autonomous cars and devices.

However, adapting JPEG AI for video involves challenges like managing motion in scenes, ensuring consistency over time, and meeting the demands of real-time encoding and transcoding. Overcoming these will require more research and could eventually lead to a new standard for machine learning-based video compression. Given that we don’t know when the still image standard will be released, don’t hold our breath on a JPEG AI-based video codec.

JPEG AI Standardization Timeline

JPEG AI is currently in the early stages of development, with the standard expected to be finalized and published in the coming years. In April 2024, the JPEG Committee produced the Draft International Standard (DIS) of JPEG AI Part 1 Core Coding Engine, which is anticipated to be published as an International Standard in October 2024. The same press release noted that work on JPEG AI profiles and levels (Part 2), reference software (Part 3), and conformance (Part 4) has begun, and a new part on the file format (Part 5) is being established. There is no information provided about when these critical Parts will be published and the specification finalized.

Figure 4. At a recent MPEG meeting, a JPEG AI demo ran on a Huawei Mate50 Pro with a Qualcomm Snapdragon 8+ Gen1 with high resolution (4K) image decoding, tiling, full base operating point support and arbitrary image resolution decoding.
Figure 4. At a recent meeting, a JPEG AI demo ran on a Huawei Mate50 Pro with a Qualcomm Snapdragon 8+ Gen1 with high resolution (4K) image decoding, tiling, full base operating point support and arbitrary image resolution decoding.

In addition, the release reported that early implementations of JPEG AI are already emerging, with demos shown running on smartphones like the Huawei Mate50 Pro with a Qualcomm Snapdragon 8+ Gen1 chipset, showcasing high-resolution 4K image decoding, tiling, and arbitrary resolution decoding.

Interestingly, while the Snapdragon SOC powering the Mate50 Pro doesn’t have an neural processing unit (NPU) by name, it does feature the 7th Gen Qualcomm AI Engine which “enables ultra-advanced AI use cases across the board, operating up to 4x faster than the predecessor—our fastest to date.” These types of ML accelerators (Apple started shipping NPUs in 2017) will help accelerate the deployment of JPEG AI once it’s all the Parts are complete.

Conversely, at this point, we don’t know what it will take to decode JPEG AI on a system without an NPU or equivalent. While this shouldn’t delay closed loop applications like security or autonomous vehicles, which can simply deploy whatever hardware is required to decode JPEG AI, it may prevent deployments on legacy platforms without ML acceleration hardware.

Summary

JPEG AI is an emerging image coding standard spearheaded by the Joint Photographic Experts Group (JPEG), designed to leverage machine learning techniques for superior compression efficiency. This new standard is tailored for both human visual perception and computer vision applications. The development is structured into four key areas: the core coding system, profiling, reference software, and conformance, ensuring broad applicability from cloud storage to autonomous vehicles.

Employing neural network-based compression, JPEG AI promises significant enhancements over traditional codecs, as demonstrated by early results from the Verification Model (VM). This includes potential uses in video coding, which could revolutionize future standards.

JPEG AI is not backwards compatible to existing JPEG decoders. The only tests released to date were performed on a Snapdragon equipped Huawei phone with an ML accelerator, so we don’t know if JPEG AI will be deployable on devices without such hardware.

With the first part of the standard anticipated for publication in October 2024, JPEG AI stands to redefine image compression in an AI-centric era. The standard is being developed under the international standard ISO/IEC 6048, reflecting a concerted effort to improve compression across diverse AI-related applications.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Mac Video Apps for Streaming Professionals

Though I work primarily on Windows computers, I also have several Macs. Here are the …

Announcing Free Course on Controlling the AMD MA35D with FFmpeg

I’m pleased to announce a new free course, MA35D & FFmpeg Quick Start: Essential Skills …

Choosing the Best Preset for Live Transcoding

When choosing a preset for VOD transcoding, it almost always makes sense to use the …

Leave a Reply

Your email address will not be published. Required fields are marked *