Mapping SSIM and VMAF Scores to Subjective Ratings

One visual quality metric that’s getting a bit more love lately is the Structured Similarity Index (SSIM). For example, when Facebook launched their first VR Metric, SSIM360, they based it on SSIM. I’ve generally avoided using SSIM because the scoring range is too small for my liking (0 – 1) and I wasn’t aware of any way to map SSIM scores to subjective evaluations.

Well, a colleague recently pointed me to an article entitled SSIM-based Video Admission Control and Resource Allocation Algorithms published by multiple researchers from the Department of Information Engineering, University of Padova, Italy. The article contains the table copied below, which maps SSIM scores to mean opinion scores, which are subjective ratings. These scores were established in yet another article, available here. As you can see, scores above 0.99 should look perfect, while scores in the 0.95 – 0.99 range would indicate the presence of “perceptible but not annoying” impairments. I have a project now that involves SSIM scoring, and these data points are definitely useful; I hope you find them useful, too.

VMAF and The Magic Number 93

I still prefer using Netflix’s VMAF metric, particularly for assessing the quality of files in an encoding ladder. That’s because VMAF scores range from 0 to 100, providing a more meaningful spread, and because VMAF is designed to rate files from resolutions ranging from 240p to 1080p and has been used to rate videos as large as 4K.

In his article entitled VMAF Reproducibility: Validating a Perceptual Practical Video Quality Metric, RealNetworks CTO Reza Rassool concluded “if a video service operator were to encode video to achieve a VMAF score of about 93 then they would be confident of optimally serving the vast majority of their audience with content that is either indistinguishable from original or with noticeable but not annoying distortion.” So a 93 VMAF score is about the same as .95 for SSIM. Another useful data point relating to VMAF is that a differential of six points is a Just Noticeable Difference, which obviously adds context when comparing scores.

Actual human subjective ratings will always be the gold standard, though totally impractical for most day-to-day use cases where objective metrics shine. Whether you’re using SSIM or VMAF, when you need to predict subjective quality based upon objective scoring, it’s nice to have authoritative backing.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

A Guide to VVC Deployment

Below are six video presentations and downloadable PDFs relating to VVC deployment from a session …

The Streaming Media East Conference Logo

Join Me at Streaming Media East in Boston in May

Note: I’ve updated the descriptions below with links to the actual presentations. I will add …

NETINT Quadra vs. NVIDIA T4 – Benchmarking Hardware Encoding Performance

NETINT Quadra vs. NVIDIA T4 – Benchmarking Hardware Encoding Performance

This article is the second in a series about benchmarking hardware encoding performance. In the …

Leave a Reply

Your email address will not be published. Required fields are marked *