Figure 2. The most accurate VQMT configuration (click to view at full resolution).

VMAF, SSIM, and Color Spaces: Getting the Metrics Right

Jan Ozer January 5, 2025 Metrics Leave a comment 1,476 Views

So there I was, having dinner on a Saturday night date night with my wife…

And in comes a text: “Hi Jan, I was wondering when you compute SSIM, do you use a single channel or multi-channel? And if multi-channel, then anything different than averaging the three?”

(And yes, anyone with children, even grown children, checks incoming texts on a date night.)

Not knowing the answer offhand, I checked my go-to tool, the Moscow State University Video Quality Measurement Tool (VQMT). By default, it computes SSIM using only the Y channel. Knowing the MSU folks as I do, I assumed that this was the right decision, but you know what assumptions do to you and me.

Figure 1. VQMT defaults to the Y channel for SSIM and many other metrics (click to view at full resolution).

So, into the rabbit hole I went. Fortunately, the first source I found was a white paper from the MSU crew titled, “Applying Objective Quality Metrics to Video-Codec Comparisons: Choosing the Best Metric for Subjective Quality Estimation.“ It had all the answers I needed.

Contents

Let’s Take a Step Back

When computing metrics like SSIM, VMAF, or PSNR, there’s an important decision to make: Do you analyze just the Y (luminance) channel, or do you include all three color channels (Y, U, and V)? If you use all three, how should they be weighted in the final computation? This matters because the Y channel dominates human visual perception, while the U and V channels (color components) play a smaller role. Incorrectly weighting these components could lead to a metric that doesn’t align well with human opinions of quality.

As a reminder, the role of a quality metric is to accurately predict human visual responses, and getting the YUV configuration wrong can result in scores that are far less correlated with subjective quality. So, the question on the table was, which is the most accurate metric, and in what channel configuration?

How MSU Tested This

To identify this, MSU compared various configurations of popular metrics like SSIM, VMAF, and PSNR against Mean Opinion Scores (MOS) from subjective human evaluations. They tested:

Different metrics (e.g., SSIM, VMAF, PSNR).
Variations in how the Y, U, and V channels were weighted, including ratios like 6:1:1, 8:1:1, and 1:1:1.
A dataset of 789 encoded video streams across multiple codecs, bitrates, and visual complexities to represent real-world variations in encoding artifacts..

Their goal was simple but critical: Determine which metric, and which configuration, best matched human perception of video quality.

Their Conclusion?

The results showed that VMAF was the best overall metric for predicting subjective quality, particularly when configured with 6:1:1 or 8:1:1 YUV weighting. Figure 1a below illustrates this correlation, highlighting how various configurations affect VMAF’s performance.

Figure 2. The most accurate VQMT configuration (click to view at full resolution).

Key Findings:

With optimal YUV configurations like 6:1:1 or 8:1:1, VMAF achieved a near-perfect correlation (~0.95) with subjective scores.
Poor configurations, such as 1:1:1 or 2:1:1, resulted in correlations as low as ~0.85—a drop of 0.10 or 10.5% in accuracy.
SSIM, by contrast, showed no significant difference across YUV configurations, making it less sensitive but also less tunable.

Here’s a breakdown of the best and worst configurations for each metric:

Metric	Best YUV Formulation	Worst YUV Formulation
VMAF	6:1:1, 8:1:1, 10:1:1	1:1:1, 2:1:1
VMAF NEG	6:1:1	1:1:1, 2:1:1
PSNR (MSE)	6:1:1, 4:1:1	1:1:1, 2:1:1
PSNR (Log)	4:1:1	1:1:1, 2:1:1
SSIM	No significant difference	1:1:1, 2:1:1
MS-SSIM	10:1:1, 6:1:1, 8:1:1	1:1:1, 2:1:1

The Impact of YUV Weighting on Accuracy

The difference between the best and worst configurations is striking. For example, VMAF’s correlation with subjective quality dropped by 10.5% when poorly weighted configurations like 1:1:1 were used. For tools and workflows that rely on objective metrics to make decisions, that’s a significant hit to accuracy.

More About SSIM and VQMT

Interestingly, VQMT provides significant computational flexibility for SSIM calculations that isn’t available for metrics like PSNR and VMAF.

Figure 3. Configuration options for SSIM in VQMT.

Specifically, as shown in the screenshot, users can choose to calculate SSIM using individual channels (e.g., Y, U, V) or combinations of channels, including RGB, LUV, or the entire YUV color space. This flexibility extends to customizing the combining mode for YUV images, where users can select either:

Default Mode: Applies a custom weight to the Y channel (default is 4) and equal weights for the U and V channels, ensuring that luminance dominates the computation.
FFmpeg Mode: Dynamically assigns weights based on the area of each component, aligning with the subsampling format (e.g., 4:2:0 or 4:4:4).

Such options allow professionals to tailor SSIM calculations to their specific content and quality-evaluation needs, especially when balancing computation time with perceptual accuracy. This level of control makes VQMT a standout tool for SSIM testing.

Practical Application

For Practitioners: If you’re using SSIM, defaulting to the Y channel only is a safe and effective choice. For metrics like VMAF or PSNR, where configurations matter more, check your tool’s settings and, if possible, use a configuration like 6:1:1 or 8:1:1.
For Metric Developers: Tools should specify their default color-space settings. Better yet, allow users to select their configuration and explicitly define how channels are weighted. This gives professionals the flexibility to align metrics with their specific use cases.

Every tool is different, and not all offer the granularity of the MSU VQMT. But if you have control over the color space, it’s worth checking. The wrong configuration can lead to metrics that don’t align with human perception—a critical misstep when the goal is to deliver high-quality video.

If you found this article helpful, you’ll love my course, Computing and Using Video Quality Metrics: A Course for Encoding Professionals. You’ll learn how to compute and interpret metrics like VMAF, PSNR, and SSIMPLUS, create rate distortion curves, and calculate BD-Rate functions for decision-making or presentations. Plus, I’ll show you how to evaluate bandwidth savings, choose encoding settings, and use tools like the MSU VQMT, FFmpeg, and more to optimize your video workflows. Perfect for anyone looking to elevate their encoding expertise!

Streaming Learning Center Where Streaming Professionals Learn to Excel

VMAF, SSIM, and Color Spaces: Getting the Metrics Right

Related Articles

Let’s Take a Step Back

How MSU Tested This

Their Conclusion?

The Impact of YUV Weighting on Accuracy

More About SSIM and VQMT

Practical Application

About Jan Ozer

Check Also

Finding the Optimal Top Rung Data Rate

MSU Updates Video Quality Measurement Tool

Announcing Updates to Video Quality Metrics Course

Leave a Reply Cancel reply