Figure 2. The most accurate VQMT configuration (click to view at full resolution).

VMAF, SSIM, and Color Spaces: Getting the Metrics Right

So there I was, having dinner on a Saturday night date night with my wife…

And in comes a text: “Hi Jan, I was wondering when you compute SSIM, do you use a single channel or multi-channel? And if multi-channel, then anything different than averaging the three?”

(And yes, anyone with children, even grown children, checks incoming texts on a date night.)

Not knowing the answer offhand, I checked my go-to tool, the Moscow State University Video Quality Measurement Tool (VQMT). By default, it computes SSIM using only the Y channel. Knowing the MSU folks as I do, I assumed that this was the right decision, but you know what assumptions do to you and me.

Figure 1. VQMT defaults to the Y channel for SSIM and many other metrics (click to view at full resolution).

So, into the rabbit hole I went. Fortunately, the first source I found was a white paper from the MSU crew titled, Applying Objective Quality Metrics to Video-Codec Comparisons: Choosing the Best Metric for Subjective Quality Estimation. It had all the answers I needed.

Let’s Take a Step Back

When computing metrics like SSIM, VMAF, or PSNR, there’s an important decision to make: Do you analyze just the Y (luminance) channel, or do you include all three color channels (Y, U, and V)? If you use all three, how should they be weighted in the final computation? This matters because the Y channel dominates human visual perception, while the U and V channels (color components) play a smaller role. Incorrectly weighting these components could lead to a metric that doesn’t align well with human opinions of quality.

As a reminder, the role of a quality metric is to accurately predict human visual responses, and getting the YUV configuration wrong can result in scores that are far less correlated with subjective quality. So, the question on the table was, which is the most accurate metric, and in what channel configuration?

How MSU Tested This

To identify this, MSU compared various configurations of popular metrics like SSIM, VMAF, and PSNR against Mean Opinion Scores (MOS) from subjective human evaluations. They tested:

  1. Different metrics (e.g., SSIM, VMAF, PSNR).
  2. Variations in how the Y, U, and V channels were weighted, including ratios like 6:1:1, 8:1:1, and 1:1:1.
  3. A dataset of 789 encoded video streams across multiple codecs, bitrates, and visual complexities to represent real-world variations in encoding artifacts..

Their goal was simple but critical: Determine which metric, and which configuration, best matched human perception of video quality.

Their Conclusion?

The results showed that VMAF was the best overall metric for predicting subjective quality, particularly when configured with 6:1:1 or 8:1:1 YUV weighting. Figure 1a below illustrates this correlation, highlighting how various configurations affect VMAF’s performance.

Figure 2. The most accurate VQMT configuration (click to view at full resolution).

Key Findings:

  • With optimal YUV configurations like 6:1:1 or 8:1:1, VMAF achieved a near-perfect correlation (~0.95) with subjective scores.
  • Poor configurations, such as 1:1:1 or 2:1:1, resulted in correlations as low as ~0.85—a drop of 0.10 or 10.5% in accuracy.
  • SSIM, by contrast, showed no significant difference across YUV configurations, making it less sensitive but also less tunable.

Here’s a breakdown of the best and worst configurations for each metric:

Metric Best YUV Formulation Worst YUV Formulation
VMAF 6:1:1, 8:1:1, 10:1:1 1:1:1, 2:1:1
VMAF NEG 6:1:1 1:1:1, 2:1:1
PSNR (MSE) 6:1:1, 4:1:1 1:1:1, 2:1:1
PSNR (Log) 4:1:1 1:1:1, 2:1:1
SSIM No significant difference 1:1:1, 2:1:1
MS-SSIM 10:1:1, 6:1:1, 8:1:1 1:1:1, 2:1:1

The Impact of YUV Weighting on Accuracy

The difference between the best and worst configurations is striking. For example, VMAF’s correlation with subjective quality dropped by 10.5% when poorly weighted configurations like 1:1:1 were used. For tools and workflows that rely on objective metrics to make decisions, that’s a significant hit to accuracy.

More About SSIM and VQMT

Interestingly, VQMT provides significant computational flexibility for SSIM calculations that isn’t available for metrics like PSNR and VMAF.

Figure 3. Configuration options for SSIM in VQMT.

Specifically, as shown in the screenshot, users can choose to calculate SSIM using individual channels (e.g., Y, U, V) or combinations of channels, including RGB, LUV, or the entire YUV color space. This flexibility extends to customizing the combining mode for YUV images, where users can select either:

  1. Default Mode: Applies a custom weight to the Y channel (default is 4) and equal weights for the U and V channels, ensuring that luminance dominates the computation.
  2. FFmpeg Mode: Dynamically assigns weights based on the area of each component, aligning with the subsampling format (e.g., 4:2:0 or 4:4:4).

Such options allow professionals to tailor SSIM calculations to their specific content and quality-evaluation needs, especially when balancing computation time with perceptual accuracy. This level of control makes VQMT a standout tool for SSIM testing.

Practical Application

  1. For Practitioners: If you’re using SSIM, defaulting to the Y channel only is a safe and effective choice. For metrics like VMAF or PSNR, where configurations matter more, check your tool’s settings and, if possible, use a configuration like 6:1:1 or 8:1:1.
  2. For Metric Developers: Tools should specify their default color-space settings. Better yet, allow users to select their configuration and explicitly define how channels are weighted. This gives professionals the flexibility to align metrics with their specific use cases.

Every tool is different, and not all offer the granularity of the MSU VQMT. But if you have control over the color space, it’s worth checking. The wrong configuration can lead to metrics that don’t align with human perception—a critical misstep when the goal is to deliver high-quality video.

If you found this article helpful, you’ll love my course, Computing and Using Video Quality Metrics: A Course for Encoding Professionals. You’ll learn how to compute and interpret metrics like VMAF, PSNR, and SSIMPLUS, create rate distortion curves, and calculate BD-Rate functions for decision-making or presentations. Plus, I’ll show you how to evaluate bandwidth savings, choose encoding settings, and use tools like the MSU VQMT, FFmpeg, and more to optimize your video workflows. Perfect for anyone looking to elevate their encoding expertise!

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Table 2. Top-rung quality of these UGC and premium services.

Finding the Optimal Top Rung Data Rate

The top rung of your encoding ladder is the most expensive to deliver, and in …

Video Quality Measurement Tool

MSU Updates Video Quality Measurement Tool

Moscow State University’s Video Quality Measurement Tool (MSU VQMT) is my go-to utility for computing …

Announcing Updates to Video Quality Metrics Course

Just a brief announcement that I’ve updated the course, Computing and Using Video Quality Metrics: …

Leave a Reply

Your email address will not be published. Required fields are marked *