SSIMWave SQM Review: Frustrating Video Quality Measurement

Jan Ozer December 10, 2015 Articles Leave a comment 1,182 Views

The perfect video analysis tool combines a video quality metric that accurately predicts the subjective ratings of real human eyes with the ability to show these quality differences to the operator. While SSIMWave makes a strong case that its SSIMplus algorithm has those predictive capabilities, the company’s SQM (for SSIMWave Video Quality-of-Experience Monitor) video analysis tool provides only a limited ability to visualize and confirm these differences. If you can get comfortable with the idea that SSIMplus is the best video quality metric available, you’ll find SQM a highly efficient tool for calculating the SSIMplus rating within the UI or batch mode. But I’m from Missouri—the Show-Me state—so when it comes to making compression decisions, I want to see easily the differences that the tool is measuring. If you’re like me, you’ll find the $3,600 tool frustrating in this regard.

Contents

Overview

Compressionists need quality testing for purposes that range from choosing a codec to finding the optimal compression parameters. While the gold standard is always triple-blind subjective testing, this approach is time-consuming and expensive. Therefore, we have a number of software-based quality measurement tools and metrics, such as the Moscow University Video Quality Measurement Test we reviewed back in 2014, which can compute multiple quality metrics, including the popular peak signal to noise ratio (PSNR), the structured similarity index (SSIM), and the video quality metric (VQM).

The problem with these metrics is that they don’t relate directly to any meaningful human measure: good, bad, or excellent quality. The scores indicate that one video is of higher quality than another, but provide no clue as to whether either video will look good to a human viewer. Measures such as PSNR, SSIM, and VQM also don’t take into account the viewing platform, specifically that a 640×360 video that looks great on an iPhone might look awful when viewed in full screen on a 31″ monitor or 65″ TV.

SSIMWave’s SQM tackles both problems, rating all videos on a scale of 1 to 100, with 20-point gaps separating the video into bad, poor, fair, good, or excellent ratings, and the ability to apply the metric to different viewing platforms. You can see this in Figure 1. On the upper left is the test video, on the upper right, the quality map showing how it differs from the source video from which it was encoded.

Figure 1. The main interface of SSIMWave’s SQM

The graph on the lower right shows the ratings for the various devices tested, which includes the SSIMplus Core, a composite metric computed in every test, and device scores for an iPhone 6 Plus, an iPad Air, a Lenovo 30″ monitor, and a Dell 27″ monitor. Though the video (barely) rates as excellent on the relatively small screen of the iPhone 6 Plus, it drops to mid-good ratings on the larger monitors. The graph on the lower left tracks those scores over the duration of the clip.

You can run the tool in GUI mode (Figure 1), or in batch mode, which I’ll describe later. In both cases you can choose up to 10 devices to score, and you can save groups of devices as profiles to simplify multiple device testing. The program comes with 31 device profiles, and you can request additional profiles after purchase. Another unique feature is the ability to run cross resolution testing, answering questions such as whether an 854×480 video looks better on an iPhone 6 than a 640×360 file does. You’ll see the results of this testing at the end of this review.

Manual Operation

In manual and batch operation, SQM computes the SSIMplus score for a single file as compared to its source file. In both modes, operation is very straightforward, with several wonderful convenience-oriented features. As shown in Figure 2, you can load HEVC files directly, as well as MP4 and VP9, saving the conversion to YUV step required by other tools.

Figure 2. Loading the test file (on the left) and reference file (on the right)

There’s also a Frame offset control for both videos (set to one for the test video in Figure 2), which is an exceptionally simple way to eliminate the extra frames some encoding programs insert at the start of a video. You can also limit the number of frames processed via the Process frames adjustment at the bottom of the upper left panel; between this and the frame offset controls, you can easily test random sections of longer video files.

On the bottom of Figure 2, you can see the device settings available in the base product. According to SSIMWave, these settings take into account the resolution and size of the screen, its luminance, the typical viewing distance, and other factors. The “Expert” in the Sony W8 device name indicates that the quality rating assumes that an expert viewer is watching from very close to the screen, rather than the typical viewing distance. All settings assume that the video is watched at 100 percent screen size, so if you wanted to simulate playback in a video window, you’d have to ask SSIMWave to create a different setting.

After loading the video files and choosing the settings, click Continue on the lower right of Figure 2, which takes you back to the screen shown in Figure 1. To start the analysis, press the Play icon on the bottom of Figure 1, which toggles to a Pause button. During the analysis, another icon appears to the right of the Pause button that lets you toggle the visualization on the upper right to the reference video rather than the quality map shown in Figure 1.

Significantly, there’s no way to rewind or randomly seek through the analyzed file; if you want to see the quality map for frame 18:13, you have to stop the video on that precise frame. Of course, that’s very hard to do, and you wouldn’t know that you wanted to see detail on that frame until you actually ran the test. SSIMWave plans to add simple player controls by the end of 2015, which will be a very welcome addition.

There’s also no way to display the test file over the source file, which is a better way to spot artifacts than side-by-side views, and is a view available in other tools, among them the aforementioned Moscow University VQMT tool and Vanguard Video Visual Comparison Tool. As mentioned, SSIMWave’s SQM analyses a single file only, so there’s no way to visually compare the results of two compression alternatives against a single source, such as VBR vs. CBR, or High vs. Main Profile, or VP9 vs. HEVC. This is an invaluable feature of the VQMT tool and is particularly great for showing consulting clients the quality differences delivered by the various compression alternatives being analyzed.

To run SQM in batch mode, create a text file specifying the test video and source file, with optional controls for setting the offsets and number of frames processed. Load the text file into the program, which checks your inputs before allowing you to run the batch—another useful feature. SQM saves off the results of each analysis as individual .csv files.

My Tests

I tested performance first. SSIMWave’s website claims that SQM “performs the QoE of a 4K resolution video at more than 100 frames per second.” I ran SQM on an HP Z840 workstation with two 3.1GHz Intel Xeon E5-2687W v3 CPUs and 64GB of RAM running Windows 7, and all analyzed files were stored on HPs Turbo SSD G2 drives.

Working in the GUI, I analyzed 10 seconds of three files using five device settings, comparing each to its source MP4 file. A 10-second segment of a 4K HEVC file took 56 seconds to analyze, a 10-second segment of a 720p HEVC file took 15 seconds to analyze, and a 1080p MP4 file took 16 seconds to analyze. I preconverted the 1080p files to YUV and tested again and saved only 2 seconds.

I asked SSIMWave about these discrepancies and learned that the website claims reflect a GPU-based implementation that’s available “upon request” and relates only to the SSIMplus computation, not the demuxing, decoding, and display performed by the GUI tool. I didn’t test the GPU-based implementation, so can’t verify the company’s claims, though the company representative said SSIMWave will adjust the language on its website to clarify performance-related expectations.

Quality-Related Testing

I ran through several quality-related scenarios that I’ve used the Moscow VQMT tool for in the recent past. The first is shown in Table 1, wherein I analyze the encoding quality of two clips, an animation and a real-world video. For each clip, I created three short files at 3850Kbps, 4300Kbps, and 5800Kbps, which were the presets used by the client, and then a 5-minute test for the 3850Kbps preset to verify the results of the shorter tests. I encoded all files in CBR and 125 percent constrained VBR mode, and then tested with both tools. The green highlighted box identifies the quality leader; which for VQM is the lowest score, while for SQM it’s the highest.

Table 1. Comparing the MSU VQMT tool and the SSIMWave SQM tool

I was testing to learn two results. First, did 125 percent-constrained VBR produce better quality than CBR, and second, were all three 1920 iterations necessary? As to the first test, the numerical results were very similar between the two tests; overall, VQM showed VBR better by .09 percent, while SQM showed VBR better by .87 percent.

The issue here is that this and other comparisons often hinge not on overall quality but on the quality of one or more frames within the file. This is shown in Figure 3, where the CBR frame looks mangled, and the VBR frame looks only slightly blocky. In GUI mode, the VQM tool makes these results easy to see by presenting a quality graph for the duration of the clip and allowing you to move through the clip and view the actual source and encoded frames. To produce the same analysis with SQM, you would have to run the tests, scan through the CSV file results to identify low-quality frames, load each video into a player, navigate to the frames, and grab them.

Figure 3. While the quality ratings are similar, CBR can produce the occasional ugly frame.

The second question I was testing for was whether the client needed all three 1080p streams; after all, if the quality difference between the three was minimal, why incur the costs of encoding and delivering the higher quality streams? In the animated clip, the difference in quality between the 5800Kbps and 3850Kbps using VBR encoding, was 1.27 percent for VQM and .35 percent for SQM—no major difference there. In the real-world video, quality improved by 1.53 percent with the VQM, but by 5.47 percent in the SQM, where the score improved from a high-good quality (79.87) to a low-excellent quality (84.49). This is another scenario in which the ability to actually see the quality differences would have been very valuable.

HEVC vs. VP9

The second analysis relates to HEVC versus VP9 quality comparisons that I’ve been tracking since December 2014. The results were originally published in “The Great UHD Codec Debate: Google’s VP9 vs. HEVC/H.265,” and I updated the results for a presentation given at Streaming Media East in May (you can watch the video and download the presentation at streamingmedia.com/ConferenceVideos), then finalized the tests in June.

Table 2 shows the results. Again, the best scores are in green, and the results are close with both metrics. Overall, x265 scored 8.69 percent better in VQM, where lower scores are better, and .97 percent higher in SQM, where higher scores are better, though both VP9 and HEVC rated in the excellent range (80–100) in the SQM test.

Table 2. VQM vs. SQM analysis of VP9 vs. x265

Multiple Resolution Tests

The final tests related to SQM’s ability to test multiple resolution renditions against a common source, an analysis that can’t be performed in the Moscow University tool. Table 3 shows how a 640×360 file compares to an 854×480 file when encoded to the same data rate. This answers the question, if you can only have one file encoded at 1050Kbps, should you encode at 640×360 or 854×480?

Table 3. Multiple resolution tests across multiple presets with the SQM.

The 640×360 file has lower resolution, but with more bits per pixel, which means the encoded quality of each pixel should be higher than the other file. The 854×480 file has greater resolution, or detail, but the quality of each pixel is lower because of the lower bits per pixel value. The results presented in Table 3 indicate this smaller file would be perceived as higher quality on all viewed devices. Again, these results assume that the video is watched at full screen, and it would have been interesting to see if testing at the native resolution of the video, which would have required custom presets from SSIMWave, would have changed the results.

So where does that leave us? SQM has many unique features, including an absolute grading system, the ability to rate quality on different devices, and multiple-resolution testing, but provides little ability to actually see the differences that it measures. SSIMWave is working to address many of these concerns by 2015 year end, including video seeking in playback mode, and the ability to visualize results for multiple clips. Until then, if you consider SSIMplus the holy grail of video quality algorithms, you’ll find SQM a highly usable and efficient tool for obtaining these ratings. If you need to be seduced into a new algorithm by confirming quality scores with your own eyeballs, you may want to wait until SSIMWave provides these features.

This article originally appeared in the October 2015 issue of Streaming Media magazine as “Review: SSIMWave.”

Streaming Learning Center Where Streaming Professionals Learn to Excel

SSIMWave SQM Review: Frustrating Video Quality Measurement

Related Articles

Overview

Figure 1. The main interface of SSIMWave’s SQM

Manual Operation

Figure 2. Loading the test file (on the left) and reference file (on the right)

My Tests

Quality-Related Testing

Table 1. Comparing the MSU VQMT tool and the SSIMWave SQM tool

Figure 3. While the quality ratings are similar, CBR can produce the occasional ugly frame.

HEVC vs. VP9

Table 2. VQM vs. SQM analysis of VP9 vs. x265

Multiple Resolution Tests

Table 3. Multiple resolution tests across multiple presets with the SQM.

About Jan Ozer

Check Also

When Metrics Mislead: Evaluating AI-Based Video Codecs Beyond VMAF

Deep Render: An AI Codec That Encodes in FFmpeg, Plays in VLC, and Outperforms SVT-AV1

Evaluating DCVC-RT: A Real-Time Neural Video Codec That Delivers on Speed and Compression

Leave a Reply Cancel reply