x265 and WPP: What’s Fast Isn’t Always Efficient

If you’re optimizing x265 for speed, enabling Wavefront Parallel Processing (WPP) looks like a no-brainer. Table 1 shows a staggering 7.3x improvement in encoding time. A 3:15 encode with WPP turns into a painful 23:51 without it. The quality penalty? Negligible. VMAF drops just 0.19, with the low-frame VMAF off by only 0.77  (low-frame is the lowest VMAF score of any frame in the video, a predictor of transient quality issues).

Given the fabulous performance improvement and low quality penalty, WPP seems like a slam dunk. It may not be.

Table 1. Encoding with and without WPP; seems like a no-brainer.

What Wavefront Parallel Processing Does

WPP works by dividing each frame into rows and distributing them across multiple threads. That speeds up encoding, but those threads don’t come from thin air. On a 32-core system, that means fewer encodes in parallel.

Figure 2 tells the story. WPP-enabled encodes consume significant CPU cycles. While this might be fine for a single-file test encode, it might not be optimal for multiple-instance production encodes.

Figure 2. WPP uses lots of CPU cycles. Click to view at full size.

Testing WPP Under Load

To test this, we ran batches of 1080p30 encodes with and without WPP at various thread counts on a 32-core system, measuring total throughput in frames per second. Figure 3 shows the results. The highest total throughput came from encoding without WPP. Though the advantage is only about 9%, this comes with a slight increase in overall VMAF and .75 point increase in low-frame score.

Figure 3. Processing on a 32-core system with and without WPP. Encoding 16 instances with 2 threads each, without WPP delivers the best performance. Click to see the Figure at full resolution.

The point is simple. Don’t let single-file performance dictate your production settings. The fastest encode on a quiet machine may be the worst choice when the system is loaded.

Figure 4. The most efficient encoding operation occurs not when the CPU is flat-lined but when it is at or near the ceiling.

In this regard, Figure 4 tracks overall CPU utilization during the WPP=0 tests performed in Figure 3. Running the system with 16 instances running two threads each kept overall CPU utilization at or very near 100%, and delivered the best overall performance. Next best was 32 instances running one thread each, which flat-lined CPU consumption but delivered 23% lower throughput. The other configurations used a much lower CPU with correspondingly lower throughput.

The Real Takeaway

The bottom line? What seems like a great configuration option during single-file testing might not be the best alternative in production. Your optimal production configuration will vary by resolution, CPU cores, the number of simultaneous instances, thread count, preset, and, as we just saw, whether WPP is enabled or disabled. You probably will achieve optimal throughput with a configuration that pushes overall CPU utilization to close to 100% but doesn’t flatline it.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Review of Multi-Resolution Encoding for HTTP Adaptive Streaming using VVenC

In their paper entitled, Multi-resolution Encoding for HTTP Adaptive Streaming using VVenC, Kamran Qureshi, Hadi …

Panel for Streaming Media Connect session on video codec adoption

The Codec Conundrum: Navigating the Challenges of Video Codec Adoption

For years, we’ve heard about the allure of HEVC, AV1, and even VVC, all new …

Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

The white paper titled “Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers” is authored …

Leave a Reply

Your email address will not be published. Required fields are marked *