Deep Render AI Codec Running in FFmpeg and VLC

Jan Ozer February 13, 2025 Articles, Codecs, FFmpeg Leave a comment 2,514 Views

AI-based video compression has been discussed in research for years, but practical implementations are virtually nonexistent. In a recent conversation with Arsalan Zafar, CTO and co-founder of Deep Render, and Sebastjan Cizel, Head of Engineering, we explored the real-world performance of Deep Render’s AI codec real-world encoding and decoding performance, how its quality compares to HEVC, AV1, and VP9, and saw a demonstration of it running in FFmpeg and VLC. You can see the video on YouTube here; it’s also embedded below.

The discussion had three sections. We started with a presentation from Arsalan covering what an AI codec is and how the Deep Render codec is different from traditional codecs in performance, quality, and operation. Then Sebastjan demonstrated how the Deep Render codec encodes a file in FFmpeg and decodes a file in VLC Player. The final discussion focused on practical aspects of the codec, including training and deployment.

Contents

Deep Render’s AI Codec: A New Approach to Video Compression

As shown in Figure 1, Deep Render’s codec operates entirely on neural networks rather than traditional macroblocking and DCT-based compression techniques. This shift eliminates the need for legacy encoding components and instead relies on machine learning-based methods to optimize compression.

Figure 1. Traditional codec on the left; Deep Render AI codec on the right.

Unlike traditional codecs, which require years of standardization and hardware adoption, Deep Render’s approach allows for faster iteration and deployment. Because the codec encodes and decodes using neural processing units, or NPUs, which are now standard in many modern devices, it can operate efficiently on existing hardware without requiring dedicated decoder hardware.

Performance Comparisons

Then Arsalan shared benchmarking results comparing Deep Render’s AI codec to AV1, HEVC, and VP9 (Figure 2). Using standard datasets like MCTC-V2, Deep Render measured performance across various content types, including video conferencing, gaming, animation, and mobile recordings, and comparing their codec to SVT-AV1, x265, and libVP9. As shown in Figure 2. Deep Render outperformed all three in compression efficiency, maintaining the same visual quality while achieving a 40 to 50 percent bitrate reduction.

Figure 2. Deep Render performance vs. traditional codecs.

Production Readiness

Regarding production readiness, Arsalan explained that Deep Render’s AI codec has already demonstrated strong performance on readily available consumer hardware. Unlike traditional codecs that require custom encoding and decoding chips that take years to deploy and reach critical mass, Deep Render’s neural network-based approach already achieves real-time encoding and decoding speeds on devices with NPUs, such as Apple’s MacBook Pro with the M4 chip.

Figure 3 illustrates these results, showing that at 1080p resolution, the codec encodes at 22 fps and decodes at 69 fps, ensuring smooth playback for standard streaming content. Performance scales even better at lower resolutions, with 720p encoding at 41 fps and decoding at 135 fps, while 540p reaches 52 fps encoding and an impressive 296 fps decoding speed.

Figure 3. Current codec performance on readily available consumer products.

Beyond raw speed, battery efficiency is a critical factor for mobile and low-power devices. The battery drain test (BDT) results in Figure 3 indicate that Deep Render’s codec can sustain video playback for 12 hours at 1080p, 16 hours at 720p, and 20 hours at 540p on an M4-powered device, demonstrating that the AI-based codec is already highly efficient for mobile applications.

Beyond this, Arsalan emphasized that further optimization over the next 12 months will improve both compression efficiency and power consumption. As NPUs continue to advance across hardware platforms from Apple, Qualcomm, MediaTek, Intel, and Google, the codec’s real-time capabilities will expand to an even broader range of devices.

Comparative Efficiency of Deep Render’s AI Codec

Deep Render’s AI codec is designed to achieve both high compression efficiency and low complexity, a combination that most other AI-based codecs have struggled to balance. Arsalan explained that much of the work on AI-driven compression has remained within academic research, with limited real-world deployment due to high computational costs or inefficiencies.

Figure 4. Deep Render compared to other AI implementations.

Figure 4 illustrates this challenge by plotting AI codec research efforts from companies like Microsoft, Apple’s WaveOne, and Qualcomm. The ideal position on this chart is the top-right corner, where a codec offers high compression efficiency and low complexity, ensuring it can run efficiently across a broad range of devices. Deep Render’s codec occupies this space, whereas competing approaches either introduce excessive complexity or fail to deliver meaningful bitrate reductions.

Microsoft’s DCVC-FM, for example, is two orders of magnitude more complex than Deep Render’s codec, making real-time performance impractical. Qualcomm’s NVC codec, while lower in complexity, performs hundreds of percent worse than AV1, limiting its usefulness. Apple’s WaveOne solution also suffers from high complexity and limited efficiency improvements.

The numbers in Figure 4 are based on a combination of reported white paper results and independent performance analysis. Deep Render’s team extracted as much data as possible from published research, comparing compression performance and computational cost. While some architectures were easier to analyze than others, the team confirmed that Deep Render’s codec remains the only AI-based solution providing both production-ready complexity and compression efficiency.

A key factor in this comparison is that most AI codecs are designed to run on GPUs, which have higher power consumption and computational demands, and are not generally available on mobile devices. Deep Render’s model, by contrast, is optimized for NPUs, which are more common in mobile and embedded devices and operate at significantly lower power levels. This design choice allows Deep Render to deliver real-time encoding and decoding without the excessive hardware demands seen in competing AI-based solutions.

Accelerated Development, Clearer IP Picture

The benefits of Deep Render’s AI codec also extend to development time and licensing. Traditional codec development involves multiple stakeholders—research labs, standardization bodies, hardware vendors, and patent firms—leading to slow innovation cycles. Arsalan explained how Deep Render disrupts this process by bringing everything in-house.

Figure 5: Deep Render’s streamlined AI codec development and deployment process.

Figure 5 illustrates this streamlined approach. Deep Render manages AI research, hardware integration, intellectual property, and commercialization, eliminating inefficiencies in traditional codec pipelines. Instead of waiting years for new standards and silicon, Deep Render develops software-based AI codecs that run on commodity NPUs, accelerating deployment.

Arsalan then handed off to Sebastjan for the demo.

Demonstrating Deep Render’s Codec in FFmpeg and VLC

Before starting the demo, Sebastian explained why integrating with FFmpeg and VLC was crucial for Deep Render. Video codecs are deeply embedded in the tech ecosystem, with extensive evaluation and application stacks. By integrating with widely used tools, Deep Render can seamlessly compare its AI codec against existing standards. More importantly, most AI-based compression remains in research, requiring expensive, high-powered GPUs. Deep Render set out to develop an AI codec that runs on consumer hardware, including laptops and smartphones. Integrating with VLC proves this is possible, making the codec accessible and shareable while demonstrating its real-world viability.

Encoding in FFmpeg

Sebastjan ran a live demo using a MacBook Pro with an M4 chip, showcasing how Deep Render’s AI codec integrates into FFmpeg. The source file, taken from Netflix’s Open Content dataset, was encoded at 1080p, 30fps using a quantization parameter of 35.

The encoding process ran at approximately 20fps on the MacBook’s NPU, leveraging Apple’s CoreML and Metal frameworks. The output was a standard MP4 file, which could then be decoded and played back using Deep Render’s custom VLC build.

Figure 6. The Deep Render AI codec encoded a 1080p30 source file at 20 fps on the Mac Book Pro.

After encoding, Sebastjan “rolled up his sleeves” and ran FFprobe to prove that the Deep Render Encoder encoded this file. You see the positive result in Figure 7.

Figure 7. FFprobe proves that the Deep Render Encoder encoded this file.

Decoding and Playback in VLC

After encoding, Sebastjan decoded the compressed file to Y4M in FFmpeg to measure decoding speed, which reached 64fps, confirming that the AI codec supports real-time decoding speeds (Figure 8).

Figure 8. The Deep Render AI codec decoded a compressed 1080p30 file to Y4M at 64 fps on the Mac Book Pro.

Playing the Deep Render Codec in VLC Player

Sebastjan then loaded the file into a customized version of VLC Player (Figure 9).

Figure 9. Opening the compressed Deep Render file in VLC Player.

VLC then played the file without a hitch, just like a file encoded with any traditional codec supported by VLC (Figure 10). The demo was remarkable because VCL playback was normal.

For perspective, note that most early versions of new codecs are first made available for testing as clumsy proprietary executables that only work with raw YUV source files and can only be decoded by equally clumsy proprietary decoders. This demonstration is significant because it marks the first AI-based codec running in both FFmpeg and VLC, which will make benchmarking and adoption easier for developers and content providers. At a macro level, it demonstrates the maturity of the Deep Render codec, and how close to actual deployment it is.

Discussing the Road Ahead for AI Codecs

After the demo, there was a question-and-answer phase, with Jan asking and Arsalan or Sebastjan answering.

How is Deep Render’s Model Trained?

Deep Render’s team trains tens to hundreds of models per week using an in-house dataset consisting of terabytes of high-quality video content. The goal is to refine compression performance with each iteration, ensuring steady improvement.

Internally, Deep Render releases new models every two to four months, with each version targeting a 15 to 20 percent efficiency gain. These updates are integrated into the software stack, meaning continuous improvements do not require the slow standardization processes seen in traditional codecs.

How is the Model Delivered and Installed?

When asked about deployment, Sebastjan explained that the AI models are packaged with the binary rather than requiring large separate downloads. The current total binary size is around 150MB, which includes multiple models optimized for different use cases. While this size is too large for some applications, future optimizations will allow Deep Render to reduce it to a few hundred kilobytes. There are also methods to separate model weights from the main binary, allowing applications to download updates incrementally rather than bundling everything together.

Which Devices Can Run Deep Render’s Codec?

Arsalan explained that the codec is designed to run on commodity NPUs found in modern smartphones, laptops, and other consumer devices. Apple introduced NPUs in the iPhone 8, with significant improvements in later models. Deep Render expects iPhone 12 and newer devices to handle the codec efficiently. Similar support is expected from Qualcomm, MediaTek, Intel, and Google, which are also expanding NPU integration in their chips.

At a high level, Arsalan expects that deployment will follow a top-down adoption strategy, starting with flagship devices that have powerful NPUs before expanding to mid-range and lower-end devices. For less powerful hardware, trade-offs may include lower resolutions, lighter models, or slight reductions in performance.

What About Smart TVs and Other Devices?

While MacBooks and Intel-based PCs are well-positioned for NPU acceleration, Arsalan shared that smart TVs are slower to integrate new processing technologies. However, TV manufacturers are starting to add NPUs, which could make AI-based video compression viable in the future.

Another potential avenue for deployment is dongles or set-top boxes with built-in NPUs. Apple TV already includes an NPU, and other manufacturers are likely to follow. This could enable Deep Render’s codec to be more widely adopted in home entertainment without waiting for native TV support.

Final Thoughts

This discussion and demonstration prove that AI-based video compression is moving from research to reality. Deep Render’s codec achieves real-time encoding and decoding speeds, integrates into widely used tools, and outperforms traditional codecs in efficiency.

While AI compression is still in its early days, rapid model iteration and the increasing availability of NPUs suggest that AI codecs could soon play a major role in the future of video distribution. The fact that Deep Render’s solution runs on consumer hardware today makes it a strong contender in the evolving compression landscape.

For those interested in seeing the codec in action, the full conversation and demo are available on YouTube.

Streaming Learning Center Where Streaming Professionals Learn to Excel

Deep Render AI Codec Running in FFmpeg and VLC

Related Articles