Let’s start this article with a quiz regarding how the quality and encoding speed of two-pass VBR (variable bitrate encoding) compares to single-pass VBR (that’s right, VBR in both cases).
Choose the best answer.
- Two-pass takes twice as long as single-pass but delivers significantly better overall quality.
- Two-pass takes slightly longer than single-pass but delivers significantly better quality.
- Two-pass takes about the same as single-pass and delivers about the same quality but better low-frame quality.
- It depends.
- None of the above.
The best answer is number 4, but in trials with three different length files, 1 and 2 were never correct. For most videos, what you think you know about two-pass encoding probably isn’t true.
To explain, in this article, I tested three sets of files:
- 21 files averaging around two minutes in length
- Two files averaging 11 minutes each, and
- Two feature-length films averaging about two hours each
The summary table shows the results for encoding time, average VMAF quality, and low-frame VMAF quality.
In the two shorter trials, single-pass and two-pass results were nearly identical. With feature-length videos, two-pass encoding time was meaningfully longer, average quality was nearly identical again, and low-frame quality was meaningfully higher.
What’s all this mean? The bottom line is that there’s very little reason to change from two-pass to single-pass or vice versa, except perhaps to boost low-frame scores for feature-length films. If you’re working with 1-10 minute files, and you’ve been considering switching from one approach to the other for some reason, like switching from two-pass to one-pass to shave encoding time, or the reverse to improve quality, you will find little justification here.
If you’re interested in where capped CRF might fit, I included it in the analysis below, though the most important results relate to two-pass vs single-pass.
Let’s start at the beginning. One of the most fundamental encoding decisions we make during VOD encoding is single-pass vs two-pass encoding. Note that I am not comparing two-pass variable bitrate encoding (VBR) vs single pass constant bitrate encoding (CBR). Rather, I am comparing single vs. two-pass VBR.
As you undoubtedly know, with two-pass encoding, during the first pass the encoder analyzes the video file, gauging encoding complexity throughout. During the second pass, the encoder allocates bitrate according to encoding complexity. With single-pass encoding, the encoder allocates data on the fly, never really knowing what’s coming.
One immediate conclusion is that, since the codec is doing an awful lot of work, your results will vary significantly by codec. In this analysis, I used the x264 codec, and you should not extend the findings to other codecs.
More specifically, in this analysis, I compared single-pass vs. two-pass VBR encoding and included capped CRF encodes as well. As I will explain in a moment, the test procedure largely negated the benefit of capped CRF and shouldn’t be used to judge capped CRF as a technique. Still, the results proved interesting, so I included them.
I first compared the three techniques using twenty-one roughly two-minute test files in four categories, animations, sports, entertainment, and office. The results were so surprising that I extended the tests to two 10-12 test files and then to two feature-length movies. I realize the latter two data sets are too small for hard conclusions, but particularly the last category seems to suggest that the results are slightly different for feature-length videos.
How I Tested
Quality-related trials are most relevant when performed at the quality level most likely to be used by the readers. In this case, I targeted VMAF 89-91, slightly lower than the 93-95 recommended for the top rung but in the zone where different encoding configurations should have maximum impact.
To identify the appropriate bitrate for each file, I encoded at CRF 23, 25, and 27 to find the bitrate that achieved the desired quality level. Then, I encoded each file using the following command strings.
This, for two-pass VBR:
ffmpeg -y -i BBB.mp4 -c:v libx264 -preset veryslow -force_key_frames expr:gte(t,n_forced*2) -threads 8 -an -b:v 2100k -pass 1 -f mp4 NUL && \
ffmpeg -y -i BBB.mp4 -c:v libx264 -preset veryslow -force_key_frames expr:gte(t,n_forced*2) -threads 8 -an -b:v 2100k -maxrate 4200k -bufsize 4200k -pass 2 BBB_2100_VBR.mp4 <\code>
This for single-pass VBR:
ffmpeg -y -i BBB.mp4 -c:v libx264 -preset veryslow -force_key_frames expr:gte(t,n_forced*2) -threads 8 -an -b:v 2100k -maxrate 4200k -bufsize 4200k BBB_2100_CBR.mp4
This for capped CRF:
ffmpeg -y -i BBB.mp4 -c:v libx264 -preset veryslow -force_key_frames expr:gte(t,n_forced*2) -threads 8 -an -crf 23 -maxrate 2100k -bufsize 4200k BBB_2100_capCRF_23.mp4
For the record, I tested on a Dell Precision 7820 Tower with two 2.90 GHz Intel Xeon Gold 6226R CPUs running Windows 10 Pro for Workstations on 64 GB of RAM
Short Video Files
The short video files included four animations, eight general entertainment clips, four sports clips, and five office clips (screencams, talking heads, instructional and demo videos). Table 2 shows the comparative encoding times with the 1-pass Delta column showing the percentage difference between the technique shown and single-pass encoding. You can see that two-pass encoding only extended the encoding time by 3%, which was a major surprize.
Figure 1 shows the overall comparative harmonic mean VMAF quality of 21 clips as encoded using the three techniques. Single-pass produced the best overall quality by a hair, with CRF about 1.6 points behind. Since it takes 3 VMAF points for a just noticeable difference, few viewers would notice this quality delta.
Low-frame score is the lowest VMAF score for any frame in the video, and it is an indicator for the potential of transient quality issues. The two-pass files were one point higher here, with capped CRF about 9.5 points behind. Note that one office clip – the screencam – had a low-frame score of 0. If you toss out that anomalous score, the average increases by about five points, still well behind, but a more representative differential.
Figure 2 shows the bit rate differentials. You can see that the average bitrates are very close, but the maximum bitrates vary significantly. Why is this, and what does it mean?
Let’s take a closer look at the three encodes for a single file, the hockey clip in the sports clips. Here’s two-pass VBR. You see the average bitrate of 4004 kbps and max bitrate of 8425 kbps. Ten years ago, this bitrate distribution might have made you nervous. Now, unless you are distributing to mobile devices over 3G connections, the variations in bitrate probably aren’t a big deal.
Here’s single-pass VBR. The average bitrate is very close to two-pass, at 3894, but the maximum bitrate is 6% lower at 7911. Again, lots of bitrate variations between the various scenes in this very short clip which is very consistent between the two clips. What x264 found hard to encode during the first pass of the two-pass encode, it also found hard to encode during single-pass encoding.
Here is capped CRF. The average bitrate is very close to the others, but the maximum is much lower and the moving average bitrate, largely pushed against the relatively low 4 Mbps cap, is very consistent. Clearly, x264 takes the bitrate cap seriously in a capped CRF encode.
These frame graphs reveal the key advantage of two-pass encoding – the encoder simply can allocate data more aggressively. Though the single-pass encode allocates bitrate similarly, the codec is not as aggressive as with two-pass encoding because it doesn’t know what’s coming. This has minimal impact on the overall VMAF score but gives two-pass an advantage regarding low-frame score because it has more wiggle room during intensely hard-to-encode frame sequences. In this regard, capped CRF simply can’t compete with low-frame scores because it doesn’t have the flexibility to allocate up to twice the target bitrate during very hard-to-encode sections.
About Capped CRF in this Analysis
Why did I say that this particular capped CRF comparison was unfair? In essence, capped CRF is a per-title encoding technique that you use instead of a fixed bitrate ladder. Typically, with a fixed bitrate ladder, you encode using conservative bitrates to ensure that all files encode at high quality. In this schema, capped CRF encodes the hard-to-encode clips at the cap, and the easy-to-encode clips at lower rates, saving bandwidth while delivering very close to the same quality.
In this analysis, we set the cap to the same bitrate as the target for single-pass and two-pass VBR; in essence, we’re comparing capped CRF to manual per-title encoding. In the short clips, there were no easy to encode segments, so capped CRF encoded at or near the cap without the wiggle room enjoyed by the two VBR alternatives. Slightly lower average VMAF scores and much lower low-frame scores was entirely predictable. As you’ll see with the feature length clips, where the bitrates were a bit less aggressive, capped CRF delivered significant bitrate savings and very close to the same average VMAF.
Why does capped CRF suffer from low low-frame scores? Check out the Results Plot shown in Figure 6 from the Moscow State University Video Quality Measurement Tool. This shows the per-frame VMAF scores of the three hockey file encodes (single-pass, two-pass, capped CRF) over the duration of the video file with two-pass in red, single-pass in green, and capped CRF in blue.
If you compare the two downward blue quality drops in Figure 6 with the upward bitrate spikes in Figures 3 and 4, you’ll see that they occur at about the same time in the file. Also notice in Figure 5 that there are no equivalent spikes. The two VBR encodes boosted the bitrates during these hard-to-encode frames to maintain quality, while capped CRF had to adhere to the cap, resulting in the low-frame scores.
A few words about low-frame score. Particularly with VMAF, about 85% of these or more will be indistinguishable by the viewer. VMAF seems to report lower quality on fast-moving, high-detail sequences, and sometimes during dissolve or face to black sequences. In the hockey clip, the lowest blue frames occurred in a 5-frame cross-dissolve and the quality differential would not be obvious to most viewers.
When evaluating low-frame quality, you should always check the frames to see if the scores represent quality deficits that viewers would perceive. Before discounting capped CRF, run some tests and check the frames to see if they would be noticeable by the typical viewer watching at full frame rate.
The 2-Minute Net/Net
What’s the conclusion for short clips? There’s very little difference between single-pass and two-pass VBR. If you are using two-pass and think you can shave encoding time by switching to single-pass, you probably can’t. If you are using single-pass and wonder if you can improve your quality with two-pass, same answer.
Let’s see what it looks like with 10–12-minute clips.
10-12 Minute Clips
After reaching this conclusion with the shorter clips, I immediately started wondering if clip length had anything to do with the results. So, I ran the same tests on the full-length version of Netflix’s Meridian (11:58 min:sec) and Harmonic’s Football clip (10:00 min:sec). Two clips are not that big a sample, but it’s better than nothing, and the results were very similar to the shorter clips.
You see the encoding time average was very close.
Table 3: Average encoding time for the two longer clips.
Single-pass and two-pass VBR were very close for both average and low-frame VMAF scores, with capped CRF slightly behind in overall VMAF but lagging significantly in low-frame scoring. Again, two-pass VBR delivers no discernable quality advantage, but doesn’t cost you much in encoding time either.
Figure 8 explains capped CRF comparatively poor performance. Capped CRF delivered an 11% lower bitrate than single-pass and a 35% lower maximum bitrate.
Again, capped CRF looks worse than it is because of the aggressive caps used to generate the target VMAF score of 90-91 rather than 93-95. Specifically, though the cap was a generous 6 Mbps for the Football clip, it was a very aggressive 1.6 Mbps for Meridian, where most fixed bitrate ladders would use at least 4.5 Mbps. The higher cap would raise all scores into the 93-95 range and boost the overall file savings from capped CRF to 30% or higher.
At this point, I started wondering if 10/12 minutes were long enough. It was time to try the two feature-length movies I have in my archives.
Two Feature-Length Films
The two feature-length films averaged about two hours. Table 4 shows the encoding time, and we finally saw a meaningful separation between two-pass and single-pass. That is, two-pass took 24% longer than single-pass, and even capped CRF took 10% longer than single-pass.
Figure 9 shows the quality results, and there’s still very little difference in average quality between single-pass and two-pass, though two-pass retains a meaningful 3+ point advantage in low-frame score. Capped CRF performs well in overall quality, especially given the comparative bitrate, but continues to exhibit low-frame issues, signaling the potential for transient quality issues.
Figure 10 shows the bitrate for the two-pass encoded file at 4.5% higher than single-pass. Had I re-encoded the two movies to higher bitrates to achieve absolute bitrate parity between two-pass and single-pass, average quality might have been about equal and the low-frame discrepancy would have been smaller.
Here, capped CRF shaved about 33.5% off the two-pass bitrate, finally getting the wiggle room to show its worth. For streaming producers worried about bitrate peaks, you see the top-line results as well.
From an encoding time perspective, these feature length films showed the differential that you might have expected. Though average quality was nearly identical, two-pass delivered a 3-point low frame advantage that some producers might feel worth the additional 24% encoding time.
Is Two-Pass VBR Worth It?
Table 5 summarizes the results between single-pass and two-pass VBR and is the same as table 1. In our first two-use cases, there’s little difference either way. If you are encoding feature-length films, you have to balance a 24% increase in encoding cost vs. a 3-point increase in low-frame score.
As I said at the top, the bottom line is that single-pass and two-pass VBR deliver very similar results across the board. There’s very little reason to change from one to the other, except perhaps to boost low-frame score for feature-length films. If you have been considering switching from one to the other for some reason, like switching from two-pass to one-pass to shave encoding time, or the reverse to improve quality, you will find little justification here. If you’re encoding feature-length productions or longer, it’s worth testing to see if your results match up to mine.
If you’re encoding via a cloud-based service, it’s worth performing these comparisons to gauge the quality difference between single-pass and dual-pass encoding. Generally, there’s a price delta between the two techniques, often close to 2x. I’ll be following up with an article looking at this for an encoding service or two in the near term.
As always, I’m so deep in the weeds testing here that I may have missed some obvious data points or conclusions. Feel free to reach out at [email protected] if you can see anything that I missed.