Why I like VQM better than PSNR or SSIM

Author's note: Since the first publication, I converted SSIM to decibel form as suggested by the first comment below (which should also address the second comment). The formula used was that suggested by the Video Codec Testing and Quality Measurement source cited in the first message, or:

 -10 * log10 (1 - SSIM)

As it turns out, the results are the same (no comparisons in red) but now they should be correct. 

Objective video quality benchmarks serve a valuable role in evaluating the quality of encoded video. There are multiple algorithms, and multiple tools to measure them. Though the Peak Signal to Noise (PSNR) and Structural Similarity Index (SSIM) have been around longer and are much more widely used, I prefer the VQM metric as applied in the Moscow University Video Quality Measurement Tool (you can read my review of the tool here). A brief glance at the table, which you can see at full resolution by clicking it, explains why. 

VQM_SSIM_PSNR.png

The background is last summer, in 2014, I had a consulting project that involved comparing 3 HEVC codecs. I purchased the Moscow University tool to apply the objective benchmarks. I worked with PSNR and SSIM initially, but they provided little differentiation between the contenders. You can see this in the SSIM and PSNR tables in the screen grab.

The high versus low column shows the biggest difference between the highest and lowest score in each clip test. Those cells marked in red show a difference of greater than 7.5%. As you can see in the SSIM and PSNR tables, none of the tests crossed that threshold, or really even came close.

In contrast, the VQM test showed 13 instances where the best quality option was greater than 7.5% better than the lowest quality option. More importantly, in all those instances, the differentials accurately presaged true visual distinctions between the contenders. Even better, as you can see in the video below, the Moscow University tool also makes it easy to visualize the actual differences after identifying them.

When you're choosing an objective quality algorithms, you want one that identifies significant difference between the technologies you're analyzing, and an analysis tool that makes it easy to see and verify those differences. That's why I like the VQM metric and the Moscow University tool. Another up-and-coming algorithm/tool is the SSIMWave Quality of Experience Monitor that I recently reviewed for Streaming Media in a review you can read here. I like that tool, and its SSIMplus algorithm, which is a significant advance over Plain Jane SSIM.

Dr. Zhou Wang, a co-inventor of SSIM and co-founder of SSIMWave, recently won an Emmy Award for the SSIM algorithm (along with his co-inventors), and I certainly don't mean to detract from that accomplishment in any way. However, in my use, VQM is a much better canary in a coal mine for actual differences between compared files, and I'm sure at this point Dr Wang would say that SSIMplus is a significant advance over SSIM as well.

Here's the video showing the Moscow University Tool in operation. 

Now that I have your attention, permit me to make one shameless plug. I found that using these objective video quality benchmarks are a great way to analyze the streams included in an adaptive group. In a recent consulting project for Mexican movie distributor Cinepolis, using these tools led to several critical quality adjustments, and allowed the client to reduce the number of streams being encoded.

Here's a quote from the client that I copied from LinkedIn.

consulting.png

If you're interested in learning if your adaptive group is as efficient and effective as possible, contact me at jozer@mindspring.com. 


Comments (2)

Said this on 3-27-2016 At 02:41 pm

I recommend converting SSIM indexes to decibel form for more meaningful comparisons. This is common practice, implemented in the x264 and x265 encoders, and also in Daala development at https://arewecompressedyet.com/

Conversion is fairly simple in Excel. If interested, refer to http://tools.ietf.org/html/draft-daede-netvc-testing-02

I provide samples at http://estevao.altervista.org/video/codecs.htm

Hip
Said this on 8-1-2016 At 05:26 pm

You're a moron, SSIM does not work that way. A score of 0.9 is twice as better than a score of 0.8, not ~10%. The formula is (1 - A) / (1 - B) where A and B are the SSIM scores. Several of your SSIM scores pass your arbitrary 7.5% threshold you laid out.

Jeez, if an incompetent like you can afford $1000 for a petty computer program then I demand to be a millionaire.

Post a Comment
* Your Name:
* Your Email:
(not publicly displayed)
Reply Notification:
Approval Notification:
Website:
* Security Image:
Security Image Generate new
Copy the numbers and letters from the security image:
* Message: