So, I received an email from an acquaintance that read, “I was curious if there is actually any benefit to a “threads=” type custom command in x264. Specifically many streamers are buying 8 core/16 thread CPUs to encode as a standalone client capturing information from a video capture device.” I had an article on FFmpeg due on the editorial schedule so I decided to run a few tests and see, checking VOD and live.
As we all have learned too many times, there are no simple questions when it comes to encoding. So hundreds of encodes and three days later, here are the questions I will answer.
- What does the threads command do?
- What does FFmpeg do if you don’t specify a threads value?
- How does the threads settings impact performance?
- How does the threads setting impact quality?
For those who don’t relish the intellectual exercise of testing and scoring, here’s the net/net.
- The threads command impacts performance, overall quality, and transient quality.
- The overall quality impact in real-time capture applications was more significant than for VOD. If you’re using FFmpeg as a live encoder, or use an encoder like OBS that lets you customize the FFmpeg commands, it’s worth experimenting with different threads values to potentially increase quality.
- Even for very aggressive VOD encoding settings, the impact of the threads setting on overall quality is minimal. However, higher thread counts produced more transient quality issues than lower settings in the ten clips that I tested. It may be worthwhile running tests on your own clips and potentially limiting the threads count to 4 – 8 threads.
Now, on to the narrative.
What Does the Threads Command Do?
On a multiple-core computer, the threads command controls how many threads FFmpeg consumes. On my 40-core HP Z840 computer, a threads setting of one (-threads 1) uses one core (that lonely one the third from the right on the top) and about 4% of overall resources.
Meanwhile, a setting of 32 (-threads 32) uses up to 32 cores amounting (at this time) to 43% of overall resources.
Here’s the FFmpeg command used during these encodes, obviously changing the threads value as needed. Note that this file is 60p.
ffmpeg -i football_1080p.mp4 -c:v libx264 -b:v 3M -bufsize 6M -maxrate 4.5M -threads 1 -g 120 -tune psnr -report football_1080p_3M_threads_1_p.mp4
What Does FFmpeg Do if You Don’t Specify a Threads Value?
This varies depending upon the number of cores in your computer. If you load the encoded file into MediaInfo and choose the HTML view, you’ll see the encoding settings shown below. I produced this file on my 8-core HP ZBook notebook and you see the threads value of 12.
On the 40-core HP Z840, if you don’t specify the number of threads for a VOD operation, the value is 34. For a simulated live capture operation, the value was 22.
How Does the Threads Settings Impact Performance?
I used the FFmpeg command shown above for these tests.
In a simulated capture scenario, outputting 720p video @ 60 fps (using the -re switch to read the incoming file in real time) you get the following. As you can see, using settings for 1 and 4, the computer couldn’t output the stream at full frame rate. In this application, when I didn’t specify a threads value, FFmpeg assigned 22 threads to the encode as compared to 34.
ffmpeg -re -i football_1080p.mp4 -c:v libx264 -s 1280x720 -b:v 5M -bufsize 5M -maxrate 5M -threads 8 -g 120 -tune psnr -report football_1080p_cap_6M_threads_8_p.mp4
How Does the Threads Setting Impact Quality?
Here’s where things get interesting. My assumption was that the threads setting would have minimum impact on quality. My initial tests simulated VOD encoding of a very hard to encode file at 1080p60 @ 3 Mbps (see FFmpeg command above), a very challenging encode. The graph below shows the VMAF value for each stream encoded at the designated thread count (I tuned all files for PSNR for the VMAF scoring). As you can see, the quality drops as the thread count increases for a total drop of .62 VMAF points. For perspective, it takes 6 VMAF points to create a “just noticeable difference” that 75% of viewers would notice. So, the drop of .62 would be noticed by few viewers.
Next, I measured VMAF quality of the simulated live capture streams which I show below. Here the VMAF value varied by 1.98 points from highest to lowest, still under the 6-point threshold but getting to be a real number. What’s interesting about the live scenario is that there is no downside for chasing this quality improvement so long as you achieve real-time encoding. So, if the computer could comfortably output the necessary file at a threads count of 4 or 8, I would try this value. I would test with my typical source footage and output parameters before going into production to see if the benefit is similar to what I saw.
More VOD Tests
Realizing that the initial encode of 1080p60 @ 3 Mbps was unreasonably hard, I decided to test a number of files at a more leisurely 1080p @ 4.5 Mbps, encoding with 4 threads specified and no-threads specified, which on the Z840 ended up using 34 threads. Here’s the command string for the 4-threads version; the other simply had no reference to threads.
ffmpeg -y -i BBB_1080p.mp4 -c:v libx264 -b:v 4.5M -bufsize 9M -maxrate 6M -threads 4 -g 120 -tune psnr -report BBB_1080p_6M_threads_4.mp4
As you can see in the following graph, the overall quality difference was minor, averaging under .2 VMAF points with the file encoded with 4 threads always higher. Certainly, this overall difference wasn’t worth chasing since the frame rate dropped from 111 to 44, cutting capacity (and increasing encoding costs) by about 60%.
However, if you look at the individual files, many of 34-thread versions had transient drops not seen in the 4-thread version. Here’s Big Buck Bunny comparison, with the 4-threads version in red and the 34-thread version in green, with VMAF scores shown on the left. Major transient quality drops are shown circled (click the image to see it at full resolution).
Here’s the comparison from the music video Freedom. While there are some instances where the red file pokes below the green, most of the largest downward spikes are the 34-thread encode.
The screencam video showed two regions where the green 34-threads version spiked downward significantly.
What to take from this? I will say that very few of these discrepancies actually produced frames so ugly that viewers would notice, but several did. So going forward:
- When encoding on systems with fewer cores (like 16) I would run similar tests comparing the default encode to a limited thread encode (4 or 8) to gauge the overall quality difference and identify transient issues, if any.
- When encoding on a sytems with higher numbers of cores, like my Z840, I’ll limit cores to 4 or 8 and setup multiple simulatenous encodes to make up the performance delta.
This is my first deep foray into the threads command and there’s always a chance that I missed something or many things. If so, just let me know at email@example.com or via a comment below.