Evaluating DCVC-RT: A Real-Time Neural Video Codec That Delivers on Speed and Compression

Background

Authors & Affiliations: Zhaoyang Jia and Linfeng Qi (USTC), Bin Li, Jiahao Li, Wenxuan Xie, Houqiang Li, and Yan Lu (Microsoft Research Asia). This project stems from an open-source effort initiated in late 2023, with code available on GitHub.

The paper targets a long-standing obstacle for neural video codecs (NVCs): achieving real-time performance without sacrificing compression quality. Existing approaches either optimize rate-distortion performance at the cost of speed or manage real-time encoding with significantly worse efficiency (e.g., MobileNVC barely beats x264). This work aims to resolve both.

Technology Overview

DCVC-RT stands for Deep Contextual Video Codec – Real Time. Its core innovations are:

  • Operational cost optimization: Identifies memory I/O and function call overhead—not just computation (MACs)—as the primary performance bottleneck. This shift in focus from traditional computational metrics represents a key novelty.
  • Implicit Temporal Modeling: Instead of explicit motion vectors, DCVC-RT leverages a lightweight feature extractor and context propagation mechanism. It concatenates temporal features to enable efficient prediction without explicit motion estimation.
  • Single-scale latent representation: Uses a fixed 1/8 resolution rather than progressive downsampling, reducing memory access and improving speed.
  • Model Integerization: Converts floating-point operations to 16-bit integers using scaling factors (K₁ = 512 and K₂ = 8192) to enable deterministic cross-platform inference.
  • Parallel Coding and Module-bank Rate Control: Employs parallel encoding/decoding paths to accelerate throughput. The codec uses modular entropy models to adjust quantization parameters for fine-grained bitrate control.

Testing and Results

Compression Efficiency and Evaluation Methodology

The authors compute BD-Rate using full bitstreams rather than just entropy estimates, as is sometimes done in neural codec research. While this should be standard practice, it’s worth noting that their results reflect actual encoded output, not just internal model predictions.

Testing was performed on UVG, HEVC Classes B–E (note: A and F are excluded), and MCL-JCV, all in YUV420 low-delay format with an intra period of –1.

The results are shown in Table 2, and I’ll use the Figure and Table numbers from the paper to avoid confusion. In terms of reading the table, the top row lists video datasets, not codecs — including UVG, MCL-JCV, and HEVC test classes B through E. The rows list video codecs being compared in their official designations, which include:

  • VTM-17.0: The official reference software for VVC (H.266) — used here as the baseline for comparison.
  • HM-16.25: Reference software for H.264/AVC.
  • ECM-11.0: Experimental model for VVC successors.
  • DCVC-DC, DCVC-FM, DCVC-RT: Various neural codecs.

Because VTM is the reference, its BD-Rate is set to 0.0% across all datasets. All other values indicate how much more or less bitrate is needed to match VTM’s quality. Negative numbers mean better compression efficiency.

For example, DCVC-RT (fp16) shows an average BD-Rate gain of -21.0%, delivering the same quality as VVC while using 21% less bitrate. It does so at over 125 fps encoding, unlike VTM’s 0.01 fps.

DCVC-RT-Large, a higher-capacity variant, improves this to 30.8% while maintaining near real-time speed.

Figure 6 illustrates rate-distortion curves over UVG: DCVC-RT generally outperforms VTM and DCVC-FM, although a slight performance drop is observed in the high-quality range. Click the figure to view it at full resolution.

While the paper excels in bitstream-based evaluation and frame-level fidelity, it omits subjective quality assessments and perceptual metrics like VMAF or LPIPS. As neural codecs increasingly target perceptual optimization, this limits our understanding of how DCVC-RT compares in viewer-perceived quality, especially at low bitrates.

The authors did a ton of comparison work for this paper; unfortunately, the lack of structured subjective findings or even VMAF scoring leaves one wondering whether this favorable scoring would translate to happy viewers. It seems particularly strange that the authors would choose to gauge the quality of an AI-based video codec using a still-image metric invented in the early 1900s that’s been largely abandoned by most current video producers.

Encoding and Decoding Performance

DCVC-RT reaches 125.2 fps encoding and 112.8 fps decoding on an A100 at 1080p resolution. On an RTX 2080Ti, it maintains 39.5 / 34.1 fps, confirming real-time feasibility on upper-tier consumer GPUs. The codec supports both fp16 (optimized for Tensor Cores) and int16 (for deterministic, cross-platform reproducibility).

Table 3 also shows DCVC-FM, a state-of-the-art baseline, achieves only 5.0 / 5.9 fps on the same A100 GPU at 1080p, compared to DCVC-RT’s 125.2 / 112.8 fps—demonstrating a 20x speed advantage.

DCVC-RT-Large (Table 8) performs slightly slower but still achieves 47.6 / 45.2 fps on A100, while delivering BD-Rate gains over DCVC-FM and ECM.

Scoring

Category Score (0–10) Weighted Score
Deployability 6 (GPU only today, NPU/CPU not yet viable) 1.50
Compression Efficiency 9* (beats H.266 (VTM), strong BD-Rate results) 1.80
Encoding Complexity 6 (real-time on high-end GPUs, no CPU/NPU or power data) 0.90
IP & Licensing 10 (fully open-source, clear terms) 1.00
Strategic Differentiator 8 (integer inference, motion-free coding, parallelism) 1.20
Implementation Maturity 9 (real bitstreams, open repo, reproducible tests) 0.90
AI Adaptability 7 (int16 support, pretrained, deployable with tuning) 0.70
Total Score 8.00 / 10

* Compression efficiency as measured by PSNR, with no subjective verification. PSNR has proven to have a low correlation with subjective findings, and has been superceded by VMAF by most streaming publishers and researchers. 

Strengths

  • First practical NVC to achieve real-time 1080p+ on consumer GPUs
  • Beats H.266 (VTM) and H.264 (HM) in compression while being significantly faster
  • Integer inference yields bitstream determinism across hardware
  • Open-source, reproducible pipeline with testable bitstreams
  • Drops motion estimation entirely, enabling simpler architectures

Weaknesses

  • Lacks CPU and NPU decoding support, limiting mobile and low-power applicability
  • Real-time performance demonstrated only on high-end GPUs; no power efficiency benchmarks
  • Untested on unconstrained or noisy video data outside carefully selected academic datasets
  • Not yet integrated with real-time frameworks (e.g., FFmpeg, WebRTC)

Final Verdict

DCVC-RT represents a strong evolutionary step in neural video compression. It trades deep complexity for pragmatic acceleration, enabling real-time performance with high efficiency and reproducibility. While it isn’t yet deployable on mobile NPUs or CPUs, its modular design, integer path, and parallelized execution make it a credible foundation for future inference-first video platforms.

Final Score: 8.00 / 10

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

HEVC Licensing: Misunderstood, Maligned, and Surprisingly Successful

I’ve been involved in a seemingly never-ending debate that started with the dubious (to me) …

Adobe Updates Premiere Pro for NAB 2025

If you edit using the Adobe Creative Suite, you’ve doubtless heard about the updates released …

Amazon Takes a License From Nokia: A Milestone for Content-Side Codec Royalties

Nokia has announced a patent agreement with Amazon, covering video technologies used in both “streaming …

Leave a Reply

Your email address will not be published. Required fields are marked *