This article details how the FFmpeg threads command impacts performance, overall quality, and transient quality for live and VOD encoding. The level of testing and analysis detailed here is consistent with the instruction in my book, Learn to Produce Videos with FFmpeg In 30 Minutes or Less ($34.95), and my course, FFmpeg for Adaptive Bitrate Production ($29.95) (which includes a PDF copy of the book). Don’t just learn FFmpeg; become an expert in video compression.
So, I received an email from an acquaintance that read, “I was curious if there is actually any benefit to a “threads=” type custom command in x264. Specifically many streamers are buying 8 core/16 thread CPUs to encode as a standalone client capturing information from a video capture device.” I had an article on FFmpeg due on the editorial schedule so I decided to run a few tests and see, checking VOD and live.
As we all have learned too many times, there are no simple questions when it comes to encoding. So hundreds of encodes and three days later, here are the questions I will answer.
- What does the FFmpeg threads command do?
- What does FFmpeg do if you don’t specify a threads value?
- How does the FFmpeg threads settings impact performance?
- How does the FFmpeg threads setting impact quality?
For those who don’t relish the intellectual exercise of testing and scoring, here’s the net/net.
- The threads command impacts performance, overall quality, and transient quality.
- The overall quality impact in real-time capture applications was more significant than for VOD. If you’re using FFmpeg as a live encoder, or use an encoder like OBS that lets you customize the FFmpeg commands, it’s worth experimenting with different threads values to potentially increase quality.
- Even for very aggressive VOD encoding settings, the impact of the threads setting on overall quality is minimal. However, higher thread counts produced more transient quality issues than lower settings in the ten clips that I tested. It may be worthwhile running tests on your own clips and potentially limiting the threads count to 4 – 8 threads.
Now, on to the narrative.
What Does the FFmpeg Threads Command Do?
On a multiple-core computer, the threads command controls how many threads FFmpeg consumes. On my 40-core HP Z840 computer, a threads setting of one (-threads 1) uses one core (that lonely one the third from the right on the top) and about 4% of overall resources.
Meanwhile, a setting of 32 (-threads 32) uses up to 32 cores amounting (at this time) to 43% of overall resources.
Here’s the FFmpeg command used during these encodes, obviously changing the threads value as needed. Note that this file is 60p.
ffmpeg -i football_1080p.mp4 -c:v libx264 -b:v 3M -bufsize 6M -maxrate 4.5M -threads 1 -g 120 -tune psnr -report football_1080p_3M_threads_1_p.mp4
What Does FFmpeg Do if You Don’t Specify a Threads Value?
This varies depending upon the number of cores in your computer. If you load the encoded file into MediaInfo and choose the HTML view, you’ll see the encoding settings shown below. I produced this file on my 8-core HP ZBook notebook and you see the threads value of 12.
On the 40-core HP Z840, if you don’t specify the number of threads for a VOD operation, the value is 34. For a simulated live capture operation, the value was 22.
How the FFmpeg Threads Command Impacts Performance
Pretty much the way you would expect it to. When encoding a single 60 fps file for VOD transcoding, you see the impact on FPS immediately below.
I used the FFmpeg command shown above for these tests.
In a simulated capture scenario, outputting 720p video @ 60 fps (using the -re switch to read the incoming file in real time) you get the following. As you can see, using settings for 1 and 4, the computer couldn’t output the stream at full frame rate. In this application, when I didn’t specify a threads value, FFmpeg assigned 22 threads to the encode as compared to 34.
I used this FFmpeg command for these tests.
ffmpeg -re -i football_1080p.mp4 -c:v libx264 -s 1280x720 -b:v 5M -bufsize 5M -maxrate 5M -threads 8 -g 120 -tune psnr -report football_1080p_cap_6M_threads_8_p.mp4
How the FFmpeg Threads Command Impacts Quality
Here’s where things get interesting. My assumption was that the threads setting would have minimum impact on quality. My initial tests simulated VOD encoding of a very hard to encode file at 1080p60 @ 3 Mbps (see FFmpeg command above), a very challenging encode. The graph below shows the VMAF value for each stream encoded at the designated thread count (I tuned all files for PSNR for the VMAF scoring). As you can see, the quality drops as the thread count increases for a total drop of .62 VMAF points. For perspective, it takes 6 VMAF points to create a “just noticeable difference” that 75% of viewers would notice. So, the drop of .62 would be noticed by few viewers.
Next, I measured VMAF quality of the simulated live capture streams which I show below. Here the VMAF value varied by 1.98 points from highest to lowest, still under the 6-point threshold but getting to be a real number. What’s interesting about the live scenario is that there is no downside for chasing this quality improvement so long as you achieve real-time encoding. So, if the computer could comfortably output the necessary file at a threads count of 4 or 8, I would try this value. I would test with my typical source footage and output parameters before going into production to see if the benefit is similar to what I saw.
More VOD Tests
Realizing that the initial encode of 1080p60 @ 3 Mbps was unreasonably hard, I decided to test a number of files at a more leisurely 1080p @ 4.5 Mbps, encoding with 4 threads specified and no-threads specified, which on the Z840 ended up using 34 threads. Here’s the command string for the 4-threads version; the other simply had no reference to threads.
ffmpeg -y -i BBB_1080p.mp4 -c:v libx264 -b:v 4.5M -bufsize 9M -maxrate 6M -threads 4 -g 120 -tune psnr -report BBB_1080p_6M_threads_4.mp4
As you can see in the following graph, the overall quality difference was minor, averaging under .2 VMAF points with the file encoded with 4 threads always higher. Certainly, this overall difference wasn’t worth chasing since the frame rate dropped from 111 to 44, cutting capacity (and increasing encoding costs) by about 60%.
However, if you look at the individual files, many of 34-thread versions had transient drops not seen in the 4-thread version. Here’s Big Buck Bunny comparison, with the 4-threads version in red and the 34-thread version in green, with VMAF scores shown on the left. Major transient quality drops are shown circled (click the image to see it at full resolution).
Here’s the comparison from the music video Freedom. While there are some instances where the red file pokes below the green, most of the largest downward spikes are the 34-thread encode.
Here’s the Football clip which I encoded at 3 Mbps and didn’t include in the chart above. You see pattern only more so with one very significant and noticeable drop at the start of the clip.
The screencam video showed two regions where the green 34-threads version spiked downward significantly.
While Sintel showed many, many regions.
What to take from this? I will say that very few of these discrepancies actually produced frames so ugly that viewers would notice, but several did. So going forward:
- When encoding on systems with fewer cores (like 16) I would run similar tests comparing the default encode to a limited thread encode (4 or 8) to gauge the overall quality difference and identify transient issues, if any.
- When encoding on a sytems with higher numbers of cores, like my Z840, I’ll limit cores to 4 or 8 and setup multiple simulatenous encodes to make up the performance delta.
This is my first deep foray into the threads command and there’s always a chance that I missed something or many things. If so, just let me know at [email protected] or via a comment below.
Jan, very interesting analysis.
My first assumptions were same as yours that increasing numbers of threads would not effect the quality, but looks like there is a slight effect.
I would rather test the FFMPEG in CRF mode and try to set slicing on the encoder, 3 to 9 slices. Although increasing number of slices effect the the quality performance, but it decreases processing time. That will cause FFMPEG perform more efficiently in multi threads setup.
Thanks Mark, interesting stuff. Let me know how the experiments work out for you.
Although I found this post two years later, you’re findings are still very interesting. I’m using a very old Z800 with two Xeon X5570, i.e. 2×4 cores (2×8 threads) for two-pass offline HEVC with FFmpeg (2020-11-22-git-0066bf4d1a-full_build-www.gyan.dev). When I opened the resulting file in MediaInfo, it lists the number of “frame-threads”, not “threads” – any idea what the difference is? Since I have so few cores I simply assumed that I should use them all, but perhaps there is some impact on quality also for offline encoding (apologies for being off the topic of streaming)?
Adding to thre previous post, it looks like it’s not possible to specify the number of threads for HEVC:
x265 [error]: frameNumThreads (–frame-threads) must be [0 .. X265_MAX_FRAME_THREADS)
However, HEVC uses all threads if number of threads isn’t specified:
x265 [info]: HEVC encoder version 3.4+27-g5163c32d7014:32:22.77 bitrate= -0.0kbits/s speed=N/A
x265 [info]: build info [Windows][GCC 10.2.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
x265 [info]: Main 4:2:2 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool 0 using 8 threads on numa nodes 0
x265 [info]: Thread pool 1 using 8 threads on numa nodes 1
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 4 / wpp(17 rows)
I looked at this for HEVC around the same time and found what you found; the threads command works differently. Thanks for pointing this out.
Cool, an old Z800 – didn’t realize they were still useful (my Z840 is still a beast, launched in 2014) I’ve got one that I need to format for Ubuntu.
Cheers and take care.
Even if this post is of age it is still quite relevant. But perhaps a new post regarding this subject with new test done with latest ffmpeg5 using different codecs could be something 🙂 It would be interesting to see if there is any improvements regarding this.
In my recent tests using ffmpeg5 with x264 I have found out that the threads number do impact, especially with very very short fade to black. Using less threads instead of the “all you can eat” approach I don’t get the blocky issue I found in several tests.
Andreas – thanks for your comment. I’m swamped for the near future, but might have a look down the road. In th meantime, if you run any of your own tests, I’d be glad to reference them here or even publish them separately.
Cheers and take care.