Figure 1. Test clip encoded with constant bitrate encoding.

How to Choose Your Bitrate Control Technique

This article is derived from a lesson in Streaming Media 101: Technical Onboarding for Streaming Media Professionals. If you’re looking for an efficient way to get up to speed on key streaming terms, technologies, workflows, and best practices, check out the course here

Every time you encode a video file with a distribution-oriented codec like H.264, HEVC, VP9, or AV1, you choose a bitrate control technique that controls bitrate, overall quality, transient quality, and encoding cost. Examples of common rate control modes include CBR, VBR, CRF, and Capped CRF. This article discusses how these options work, their strengths and weaknesses, and how and when to implement them.

The first two modes discussed, constant bitrate encoding and variable bit rate encoding, are available in virtually every encoder for every distribution-oriented codec. The second two, Constant Rate Factor, and Capped Constant Rate Factor, are available in FFmpeg for x264, x265, libvpx-VP9, and libaom-AV1, though I’ll exclusively discuss x264 in this article.

I’m going to refer to three files during this discussion. Please take a moment to familiarize yourself with the content because it will matter during the discussion (particularly Test video).

  • Test – This two-minute clip is comprised of 30 seconds of talking head video and 30 seconds of ballet repeated twice.
  • Football – This is a two-minute section of the high-motion Harmonic Football test clip which contains regions of both high and low motion.
  • Talkingheads – This is a two-minute segment of a low-motion talking-head clip.

By way of background, whenever you encode a video file for distribution (as opposed to archiving or uploading for transcoding) you should consider five factors: compatibility, overall quality, transient quality, deliverability, and quality. Here’s a brief description.

  • Compatibility – can the player you’re delivering the video to decode and play the file? Here we’re discussing H.264 so compatibility is near universal. With HEVC, VP9, and AV1, compatibility may be an issue.
  • Overall quality – this is the overall quality of the file, in this discussion measured with VMAF computed using harmonic mean averaging.
  • Transient quality – this is the likelihood that the file will display transient quality issues, in this discussion measured by low-frame VMAF, or the lowest VMAF score for any frame in the file.
  • Deliverability – this is your ability to deliver the file to the remote viewer without interruption. This is typically not a concern for viewers on high-bandwidth connections but definitely an issue with files delivered over 3G or similar connections.
  • Encoding cost – using a technique that involves more than one pass significantly boosts encoding time; if you’re paying for your own encoding farm, or paying many cloud encoding facilities (like AWS Elemental MediaConvert), two-pass encoding boosts cost significantly.

Finally, in FFmpeg, and most encoding tools that deploy the x264 and x265 codecs, there are three switches that control the bitrate. These are:

  • b:v – This sets the overall bitrate.
  • maxrate – This sets the maximum bitrate.
  • bufsize –  This sets the size of the Video Buffer Verifier – see here.

Let’s start with Constant Bitrate Encoding or CBR.

Constant Bitrate Encoding (CBR)

As the name suggests, when you encode with constant bitrate encoding, you use a constant bitrate over the entire file, irrespective of the complexity of the scenes in the video file. When encoding with FFmpeg, you implement CBR by using the same data rate for b:v, maxrate, and bufsize, as follows.

 -b:v 5000k -maxrate 5000k -bufsize 5000k

In the Test file, which alternates 30 seconds of talking head and thirty seconds of ballet, the CBR encoded file looks like Figure 1 (in Bitrate Viewer). If you look hard enough, you can just see the wavy blue line that tracks the average bitrate hovering right around the 5000k bitrate line.

On the right, you see that the average bitrate is 4938 kbps and the peak bitrate 6013 kbps, about 20% higher. With most software encoders, CBR isn’t a flat line, but it’s certainly less variable than the other control techniques shown below.

Figure 1. Test clip encoded with constant bitrate encoding.

We’ll review the quality implications of CBR encoding in a moment. From a deliverability perspective, the advantage of CBR is clear.

If you’re delivering live video into the cloud over a fixed bitrate connection or video to a constrained connection to a remote viewer, the lack of variability in the stream helps ensure against interruptions. CBR is also a single-pass technique, which means it’s cheaper than variable bitrate encoding discussed next.

Variable Bitrate Encoding (VBR)

Variable Bitrate (VBR) encoding attempts to hit the bitrate target but adjusts the bitrate over the duration of the video according to the complexity of the content. VBR typically requires two passes; one to scan the video and identify the complexity of the different regions, the other to actually encode.

VBR is often further refined by describing the extent to which the maximum rate can vary over the target. You would call the first example below 200% constrained VBR because the maximum rate is 2x the target. You’d call the second example 150% constrained VBR because the maximum is 150% higher than the target. The third example would be 110% constrained VBR.

 -b:v 5000k -maxrate 10000k -bufsize 10000k
 -b:v 5000k -maxrate 7500k -bufsize 7500k
 -b:v 5000k -maxrate 5500k -bufsize 5500k
VBR
Figure 2. The Test clip encoded with 200% constrained VBR.

Figure 2 shows the bitrate profile of the Test file encoded using 200% constrained VBR. The data rate clearly fluctuates between the alternating low-motion talking head sequence and the higher-motion ballet. Though the average bitrate is similar to CBR (5041 kbps compared to 4938 kbps), the maximum bitrate is significantly higher (11137 kbps compared to 6013 kbps ). The 150% constrained VBR clip has a similar average (5036 kbps) and a 20% lower peak bitrate (9090 kbps).

Obviously, from a deliverability perspective, VBR is more challenging but this only matters with constrained connections close to the streaming bitrate. If you’re delivering 5000 kbps 1080p video to viewers in the US, Europe, and Scandinavia with 50 mbps and higher connection speeds, you probably won’t experience any issues. If it’s 40 mbps 8K video to the same regions, 200% constrained VBR starts to feel a bit scary. Of course, if it’s 500 kbps 200% constrained VBR video over a 3G connection, CBR (or 110% constrained VBR) sounds a lot better.

What are the quality implications of all this?

Table 1 shows the scores of the real-world Football clip using the four discussed modes. The average bitrate is very similar with significant deltas in peak bitrate. The overall VMAF score is very close; less than 0.7 VMAF points differentiate CBR and the highest VBR value.

Encoding Mode Average Bitrate PeakBitrate  VMAF Low-frame VMAF
CBR 4938 6013 95.17 79.76
200% CVBR 5041 11137 95.69 85.39
150% CVBR 5036 9090 95.80 84.47
110% CVBR 4944 6662 95.57 82.59
Table 1. Quality implications of bitrate control technique.

The big difference is in low frame score, the indicator for transient quality issues, where CBR is about 5.5 points lower than 200% constrained VBR. This represents a transient issue that some viewers might notice. Interestingly, there’s only about a 1 point difference in low-frame VMAF between 200% and 150% constrained VBR, and another two point difference between 150% and 110% constrained VBR.

To explore further, I compared the CBR and 200% constrained VBR files in the Moscow State University Video Quality Measurement Tool (Figure 3).

  • The top graph is the VMAF score for both files over the duration of the entire file, with CBR in red, 200% Constrained VBR in green.
  • The bottom graph is a zoom in of the highlighted region in the top graph which roughly is from frame 2100 to 3400. The red stalactite-looking formations are frames where CBR quality is significantly worse than VBR.
Figure 3. The Test clip encoded with 200% constrained VBR.

In the figure, you see the Show frame button on the lower right. In this clip, which is encoded using fairly conservative encoding parameters, the difference between the CBR and VBR frames was almost unnoticeable, particularly since the most significant deltas were only one or two frames in duration.

With other clips, encoded with a lower bitrate, the transient issues might be more noticeable. It’s the potential for these transient issues that convinced most VOD producers to use VBR rather than CBR, particularly for 1080p video distributed to high-bandwidth viewers.

Interestingly, Apple endorses 200% constrained VBR in their HLS Authoring Specifications, which states “1.30. For VOD content, the peak bit rate SHOULD be no more than 200% of the average bit rate.” That said, whether 200% constrained VBR is appropriate for high-frame rate 8K content, which might require 40 mbps to achieve acceptable quality, remains to be seen.

To summarize up till now, CBR wins for cost and deliverability while VBR edges CBR overall in quality. However, the risk of transient quality issues is very real with CBR. 

Constant Rate Factor (CRF) Encoding

With CBR and VBR you choose a target bitrate and the encoder adjusts quality to meet that bitrate. The problem with this approach is that if you’re using the same encoding ladder for all of your video clips, you waste a lot of unnecessary bandwidth with easy-to-encode clips like our talkinghead clip.

Figure 4 shows the talking head clip encoded at 200% constrained VBR with a 5 mbps target, same as our football clips. The average and peak bitrates are inline with the football clip above, but the VMAF score is 97.61.

Studies show that VMAF values in excess of 93 aren’t perceivable by viewers which is why I recommend that producers target a VMAF score of 95 for the top of the ladder clip. As you’ll see below, with this clip, you could reduce the bitrate by at least 60% and still hit that 95 target.

Figure 4. The problem with VBR encoding is that it hits the target bitrate, even if the data rate and quality levels are excessive for that clip.

So, again, when encoding with CBR and VBR the encoder adjusts the quality as needed to hit the target bitrate. In contrast, with CRF encoding, a single-pass encoding mode, you choose a quality target and the encoder adjusts the bitrate to achieve that quality level. CRF values range from 0 to 51, with lower numbers delivering higher quality scores. Encoding with CRF and FFmpeg looks like this:

ffmpeg -i input_file -crf 23 output_file

CRF encoding works well for archiving or for producing mezzanine files for upload and transcoding. However, it’s suboptimal from a deliverability perspective because you don’t know the data rate that you’ll produce until you encode the file.

  • With the talking head clip, a CRF value of 22 resulted in a file with an average bitrate of 1878 kbps and a VMAF score of 96.26, shaving more than 60% the data rate of the VBR encode with no perceivable impact on quality.
  • With the football clip, however, CRF 22 produced an average bitrate of 10650 kbps, which is too high for most 1080p encoding ladders.

How do you harvest available bandwidth savings while ensuring a reasonable data rate limit? By combining CRF with a data rate cap, or Capped CRF.

Capped CRF 

As the name suggests, with capped CRF, you combine a CRF value with a data rate cap. The relevant portion of the command string would look like this.

-crf 22 -maxrate 5000k -bufsize 10000k 

With the alternating talkinghead and ballet test clip, this command string produced the result shown in Figure 5. Again, while the max rate isn’t a flat line the ballet GOPs are very closely aligned to the 5000 kbps line and the peak bitrate is 6302. In operation, the encoder used the CRF value to encode the talking head region and applied the cap in the ballet regions.

How does this compare to 200% constrained VBR?

Figure 5. Capped CRF and the alternating talking head and ballet clip.

The 200% constrained VBR encode produced a mean VMAF of 97.30 (and a data rate of 5041). So, the capped CRF encode saved about 30% of the bandwidth and produced a VMAF of 96.55, which would be visually indistinguishable. However, as you see, there is significant bitrate variability, which could hinder deliverability using constrained connections.

In a high motion clip, like the football test clip, there are many regions in the clip where the CRF value produces a data rate higher than the cap. In these regions, the cap controls the bitrate, not the CRF value. In these cases, capped CRF won’t save much bandwidth because there are few regions where the encoder can produce the specified quality without exceeding the cap.

You see this in Table 2 which shows bitrate data and VMAF scores for the Football clip encoded using 200% constrained VBR and Capped CRF (CRF 22/5 mbps cap). The average bitrate is about the same, though the capped CRF clip has a much lower peak. Average VMAF scores are also very similar.

Encoding Mode Average Bitrate (kbps) PeakBitrate (kbps) VMAF Low-Frame VMAF
200% CVBR 5041 11137 95.69 85.39
Capped CRF 5025 5993 95.30 79.83
Table 2. Football clip; 200% constrained VBR vs. Capped CRF.

As with CBR, the major delta is in the low-frame VMAF, the indicator of transient quality issues. Figure 6 shows the comparison Result Plot from VQMT; again, when looking at the frames at the sites of the major stalactites, I saw no observable difference.

However, where CBR only enhances deliverability, capped CRF does this and saves bandwidth on easier-to-encode files. In essence, this makes capped CRF a per-title encoding technology that you can implement with almost all encoding tools, live and VOD, that are based on FFmpeg.

Figure 6. Comparing 200% constrained VBR with capped CRF.

Capped CRF isn’t a slam dunk; you should run your own tests and determine if the transient issues are more evident in your clips than I saw in the football clip. If transient issues are minimal and you are considering capped CRF, you should experiment with different CRF levels (see here).

Again, CRF and capped CRF aren’t available for all encoders and all codecs; so if you’re using a third-party encoder not based upon FFmpeg and not using the x264, x265, libvpx-VP9, or libaom-AV1 codecs, they may not be available.

Table 3 summaries the strengths and weaknesses of the four encoding methods discussed.

Technique Operation Strengths Weaknesses Use for
CBR – Adjusts quality to achieve bitrate
– Same bitrate entire file
– Consistent bitrate
– Single-pass
– Overall quality
– Transient quality
– Live
– VOD with constrained bandwidth
Constrained VBR – Adjusts quality to achieve bitrate
– Adjusts bitrate to scene complexity
– Overall quality
– Transient quality
– Bitrate variability
– Cost (2 or more passes)
– Most other VOD
CRF – Adjusts data rate to achieve quality – Single-pass
– Delivers set quality level
– No bitrate control – Archiving
– Mezz file creation
Capped CRF – CRF with data rate maximum – Per-title method
– Single-pass
– Transient quality
– Bitrate variability
– VOD
– Live

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Announcing Free Course on Controlling the AMD MA35D with FFmpeg

I’m pleased to announce a new free course, MA35D & FFmpeg Quick Start: Essential Skills …

Choosing the Best Preset for Live Transcoding

When choosing a preset for VOD transcoding, it almost always makes sense to use the …

There are no codec comparisons. There are only codec implementation comparisons.

I was reminded of this recently as I prepared for a talk on AV1 readiness …

2 comments

  1. ‘bufsize’ is important parameters for Rate Control methods based on virtual buffer size. Rate Control keeps a virtual buffer which mimics input buffer at Decoder’s side and the purpose of Rate Control not to violate this virtual buffer (mainly not to overwhelm Decoder).

    For low latency applications it’s worth to keep ‘bufsize’ small (say 30ms of payload)

Leave a Reply

Your email address will not be published. Required fields are marked *