Super-resolution techniques scale low-resolution videos to higher resolutions at high quality, and any video publisher with older, low-resolution footage should be aware of them. This article provides an overview of what super-resolution is, how it works, where to get it, and its limitations.
Contents
Introduction to Super-Resolution
As stated, super-resolution describes techniques used to scale lower resolutions to higher resolutions at the highest possible quality. By leveraging advanced algorithms, including those powered by artificial intelligence (AI), super-resolution can intelligently upscale and enhance video to higher resolutions, delivering a crisper, more detailed viewing experience.
Super-resolution is particularly valuable for owners and distributors of low-resolution video assets, who can now leverage super-resolution to provide their audiences with higher-quality output. Whether you have an archive of standard definition (SD) videos or 720p high definition (HD) footage, super-resolution can breathe new life into your content by intelligently upscaling it to 1080p, 2K, or even 4K resolution.
There are two main approaches to super-resolution for video:
- Traditional Methods: These techniques use mathematical algorithms, such as bicubic or Lanczos interpolation, to upscale the video. While simple to implement, these methods have limited effectiveness and can introduce artifacts or blurriness.
- AI-Powered Super-Resolution: More advanced super-resolution leverages deep learning neural networks trained on datasets of high and low-resolution image/video pairs. These AI models can intelligently reconstruct fine details and textures, resulting in a much more natural and artifact-free upscaling.
One prominent example of AI-powered super-resolution is the work being done by the MPAI-EVC (AI-Enhanced Video Coding) project. This project has developed a diverse dataset of over 22,000 4K video frames to train and test state-of-the-art super-resolution networks.
These experiments have shown that these AI-based super-resolution techniques can provide significant bitrate savings of up to 29% compared to traditional upscaling methods. The MPAI-EVC project has also explored integrating super-resolution directly into the video encoding process, combining it with other AI-enhanced coding tools like improved intra-prediction. Their results demonstrate that this integrated approach can improve coding efficiency over the baseline video codec.
MSU Super-Resolution Testing
If you’re interested in a current ranking of available methods, check out Moscow State University’s Super-Resolution Benchmark. It currently tracks 37 different techniques and has rankings like those shown in Figure 1.
FFmpeg Super-Resolution Filters
FFmpeg, the popular open-source multimedia framework, includes several filters that leverage super-resolution techniques to upscale video content. These filters utilize machine learning models to intelligently reconstruct details and textures, improving quality over traditional upscaling methods like bicubic or Lanczos interpolation.
The two main super-resolution filters available in FFmpeg are:
- SRCNN (Super-Resolution Convolutional Neural Network): This filter implements the SRCNN model, a 3-layer convolutional neural network that was one of the early breakthroughs in deep learning-based super-resolution. The SRCNN filter can upscale video by factors of 2, 3, or 4.
- ESPCN (Efficient Sub-Pixel Convolutional Neural Network): The ESPCN filter uses a more efficient neural network architecture that can perform 2x super-resolution with lower computational complexity than SRCNN. This makes it a better choice for real-time applications.
Figure 2 shows comparative images from this review of SRCNN, which appears to be derived from this paper. Click the figure to view the images at full resolution.
The FFmpeg documentation details how to use these techniques here, with more useful detail provided here. The bottom line is that you’ll need a pretty hefty GPU and lots of disk space and time.
Beyond the built-in SRCNN and ESPCN filters, the FFmpeg community has explored integrating other state-of-the-art super-resolution techniques. For example, the FFmpegSR project has developed a framework for applying deep learning-based super-resolution using PyTorch models within the FFmpeg ecosystem.
Real-Time Super-Resolution: Enhancing Visuals with GPU Power
While most content super-resolution upsizing is performed in software, GPU-driven super-resolution plays an ever-expanding role in how we experience digital media, from gaming to video streaming and conferencing. By harnessing the power of advanced GPUs from technology providers like NVIDIA and AMD, super-resolution technologies can perform intensive computational tasks swiftly, allowing for the immediate enhancement of visuals.
Gaming Applications: Real-time super-resolution allows gamers to play games at lower resolutions to maximize frame rates and overall speed while displaying the game at a higher resolution (Figure 3). This means gamers can enjoy smooth gameplay with enhanced visual details, even on hardware that might struggle with high-resolution graphics natively.
Video Streaming: Regarding low-resolution content, like SD movies on some streaming services, real-time super-resolution can scale low-resolution videos to higher resolutions during playback. Viewers can watch older or lower-quality videos upscaled to HD or even 4K, with the GPU also filtering and deinterlacing real-time, producing clarity and detail without re-encoding the original files.
Video Conferencing: In video conferencing, bandwidth limitations often result in lower-resolution video transmission. Here, super-resolution technologies can upscale the video on the playback side, producing a higher-quality experience. Leveraging toolkits like NVIDIA’s Maxine, the GPU can perform other processing, like directing the speaker’s eyes towards the camera or even translating to foreign languages in real-time.
By leveraging the robust capabilities of GPUs, real-time super-resolution enhances the user experience across various applications and promotes more efficient use of bandwidth and processing power. As real-time super-resolution technologies continue to evolve, they will play a pivotal role in shaping the future of digital media consumption and interaction.
Evaluating Super-Resolution Performance
There are a few common approaches used to evaluate the performance of super-resolution techniques:
Downscale-Upscale Comparison:
- Start with a high-resolution reference video.
- Downscale the reference video to a lower resolution using a standard method like bicubic interpolation.
- Apply the super-resolution algorithm to the downscaled low-resolution video to upscale it back to the original resolution.
- Compare the upscaled video to the original high-resolution reference using objective quality metrics like PSNR, SSIM, or VMAF.
This provides a quantitative evaluation of the SR algorithm’s ability to recover the lost high-frequency details.
Low-Resolution Input Evaluation:
- Start with a low-resolution video without a high-resolution reference.
- Apply the super-resolution algorithms to upscale the low-res video.
- Conduct subjective testing by having human raters visually compare the different techniques.
This approach evaluates the perceptual quality improvement provided by the SR method but lacks a quantitative comparison.
Real-World Deployment Testing:
- Integrate the super-resolution algorithm into a practical video or imaging pipeline.
- Evaluate the overall system performance, including encoding efficiency, streaming quality, and user experience.
This testing approach is more holistic but can be more complex to set up and measure.
The choice of evaluation method depends on the specific use case and goals of the super-resolution application. A combination of these approaches is typically used to thoroughly assess an SR algorithm’s performance.
Exploring the Boundaries of Super-Resolution
Super-resolution technologies offer significant improvements in video quality but have notable limitations. The computational demand of AI-powered super-resolution methods, particularly those involving deep learning, is substantial. This makes them less suitable for real-time applications.
The quality of the source video also greatly affects the outcomes of super-resolution. Videos that are extremely low-resolution or heavily compressed might not benefit greatly, as there is less original detail to work with. This can lead to artifacts or overly smoothed textures in the enhanced video, detracting from the overall viewing experience.
While super-resolution aims to enhance details realistically, it sometimes introduces artifacts or creates textures that can appear unnatural. Achieving a balance between adding detail and maintaining realism remains challenging, particularly for content where authenticity, such as in historical footage or artistic videos, is crucial.
Ultimately, super-resolution represents a transformative opportunity for video content owners and distributors. By intelligently enhancing the resolution and quality of their low-resolution videos, they can provide their audiences with a more enjoyable, higher-definition viewing experience. As AI and deep learning continue to advance, we can expect to see even more powerful super-resolution techniques emerge in the years to come.
Choosing a Super-Resolution Technology
Here’s a partial list of super-resolution technology or service providers.
Software Solutions Offering Super-Resolution:
- NVIDIA RTX Video Super Resolution (VSR): Integrated into GeForce Game Ready Driver, it uses AI to upscale videos to up to 4K resolution, available for content streamed in browsers like Google Chrome and Microsoft Edge on PCs with compatible NVIDIA GPUs.
- Topaz Labs Video Enhance AI: Part of a suite including Gigapixel AI, Sharpen AI, and others, this software provides professional-grade video super-resolution, stabilization, and other enhancements optimized for GPUs like NVIDIA’s RTX series.
- MotionDSP vReveal: Focuses on GPU-assisted video rendering and includes features like stabilization and light adjustment to enhance video quality. It uses CUDA technology to improve rendering performance.
Cloud-Based Super-Resolution Services for Video:
- Bitmovin: Offers video encoding with super-resolution capabilities, enhancing video quality for various applications.
- Pixop: Utilizes AI and ML to upscale and improve the quality of video footage, ideal for production companies and broadcasters.
- HKAI.IO: Specializes in AI-powered video super-resolution to upscale archival video content for modern high-resolution displays.
- AWS: Via the BLUEDOT DeepField-SR Video Super Resolution product.
- LvisionAI: Intelligent AI Video Super Resolution Service.