*/]]>
Streaming 101 defines file-related terms like bandwidth, frame rate, data rate and resolution, and then delivery options like streaming and progressive download. This article describes codecs and streaming architectures, and then common encoding parameters like constant and variable bit rate encoding, and settings relating to I, B and P frames.
Codecs and Streaming Environments
The term codec is an amalgam of the terms compressor/decompressor or encode/decode, depending upon whom you ask. Either way, codecs do one thing; they compress your media (video and/or audio) to make it smaller when sending it to your viewer, then decode it on the viewer’s computer so they can play the file – assuming they have the required player and decoder on their system.
Video codecs are not standalone technologies; you can only deploy them within streaming environments like Windows Media, Flash and or QuickTime, which provide the required encoding tools and player. In the past, all three environments used different codecs, Windows Media (of course) used the Windows Media Audio and Video codecs, Flash used Sorenson Spark and On2’s VP6, while QuickTime used H.264, MPEG-4 or the older Sorenson Video 3 (SV3) codec.In 2007, however, Adobe added H.264 support to Flash, while Microsoft announced that they would support H.24 in Silverlight in 2009.
When a codec is added to an environment, like H.264 to Flash, you’ll have to download a new version of that player to watch the video, even if another player on your system supports that codec. For example, if you have a recent version of the QuickTime Player installed on your system, you can play video encoded with the H.264 codec in QuickTime format. However, that QuickTime Player can’t play Flash formatted video encoded with H.264; instead, you’ll have to download an H.264 compatible version of the Flash Player.
Many Flash producers have avoided replacing the older VP6 codec with H.264 because it takes some time before the majority of their target viewers can play the H.264 files. Though downloading a Flash update with H.264 playback is relatively painless because the download size is so small, many corporate viewers can’t update their players without involving their IT department. As of August, 2008, Adobe announced that approximately 82% of all computers had an H.264 compatible player, which seems like sufficient critical mass.
However, before switching over from VP6 to H.264, check out www.mpegla.org, the licensing authority for H.264. While you probably won’t incur a royalty obligation for using H.264 in the short term, you may after December 31, 2010.
OK, that’s enough about codecs. With this discussion complete, let’s have a look at some encoding parameters that you’ll almost certainly encounter when producing your streaming files.
Constant vs. Variable Bit Rate Encoding
Constant Bit Rate encoding (CBR) and Variable Bit Rate encoding (VBR) are two techniques for controlling the bit stream of the compressed video file. Simply stated, encoding via CBR produces a file that has a constant bit rate throughout. In contrast, encoding via VBR varies the bit stream according to the complexity of the video file, while achieving the same average data rate as CBR.
Figure 1. Constant and Variable bit rate encoding.
*/]]>
This is shown in Figure 1, which illustrates a file with low motion, easy to compress scenes, and high motion, hard to compress scenes. Both techniques achieve the same average data rate over the duration of the file, but the red CBR line stays constant throughout, while the black VBR line varies with the amount of motion in the scene.
In general, VBR should produce a higher quality file than CBR because it allocates file data rate as necessary to maximize quality. The downside is stream variability, since the per second bitrate can vary significantly from section to section.
When is this a problem? Well, if you were producing video for a tight, relatively low bit rate connection like a cell phone, this variability could interrupt playback. Also, when distributing via a streaming server, which meters out the video to the remote viewers as needed, the consistent data rate of a CBR encoded file is easier to administrate. For this reason, the generally accepted best practice is to use CBR when producing for streaming delivery, and VBR when producing for progressive download.
I, B and P Frames
All codecs use different frame types during encoding. Some, like VP6, use two types, I frames (also called key frames) and P frames, while others like H.264 and VC-1, use three types, I, B and P frames.
*/]]>
Figure 2: Most streaming codecs use three frame types, I, B and P-frames.
*/]]>
Figure 2 shows all three frame types in a group of pictures (GOP), or a sequence of frames that starts with a key frame and includes all frames until the next key frame. Briefly, an I frame is entirely self contained, and is compressed solely with intra-frame encoding techniques, typically a technology like JPG, which is used for still images on the web and in many digital cameras.
P and B frames are “delta” frames that “refer” to other frames for as much of their content as possible. Imagine a talking head video. A P frame will look back to a previous I or P frame for regions in the frame that haven’t changed, and only encodes what has changed between the frames. In a talking head scenario, very little changes so the P-frame tells the player, in effect, “just copy the back wall and subject’s body from that I-frame, and then use these new pixels around the head and mouth.”
This is why talking head videos compress so efficiently; there’s so much inter-frame redundancy, so the delta frames contain very little new information. In a fast paced soccer game, delta frames contain much more original content, which makes compressing down to the target data rate much tougher.
Back to our frame types. By definition, a P-frame looks backwards to a previous P or I frame for redundancies, while a B-frame can look backwards and forwards to previous or subsequent P or I frames. This doubles the chance that the B-frame will find redundancies, making it the most efficient frame in the GOP.
How do you use these frames to your advantage? With I frames, recognize that these are the largest frames and contain the most content, which makes them the least efficient from a compression standpoint. You want as few I-frames as possible to support your quality and playback goals.
For example, all file playback must start on an I frame. To make the video file responsive to viewers dragging the slider to a specific location, or otherwise jumping around the video file, I recommend adding a key frame every five or ten seconds or so.
I-frames also improve quality when placed at a scene change, because all subsequent delta frames get a high quality frame to refer to. So, you really want an I-frame at all scene changes, which you typically enable by checking a checkbox that inserts I-frames at scene changes, or one that enables “natural key frames.” The settings shown in Figure 3, from Mac encoding tool Telestream Episode Pro, accomplish both goals; “forced” keyframes every 300 frames for responsiveness, and “natural” keyframe at scene changes for quality.
*/]]>
Figure 3. For optimal keyframe placement, you want “natural” keyframes at scene changes and “forced” keyframes every ten seconds or so.
*/]]>Now let’s turn our attention to B-frames. As mentioned, the main benefit of B-frames is that they’re very efficient from a compression perspective, so they help improve compressed quality. However, files with B-frames are harder to decode because the player has to buffer all referenced frames in memory while playing back the file, and keep them in their proper order.
For this reason, you shouldn’t use B-frames when producing video for an iPod or similar device, which are relatively low powered compared to a computer. Typically, this shouldn’t be an issue because if you choose a template for an iPod, the encoding tool will gray out all B-frame options.
Otherwise, when producing for general computer playback, always use B-frames when available. Most encoding tools will let you choose the maximum number of B-frames to insert sequentially between I and P frames (or P and P frames), and the recommended practice is to use two or three B-frames in sequence. Figure 4 shows the B-frame control from Episode Pro set to 3.
*/]]>
Figure 4. Setting the number of sequential B-frames in Episode Pro.
What about P-frame related quantity options? You typically never see these because P-frames are the default. That is, if it’s not an I or B-frame, it’s a P-frame.
That’s it for Streaming 102. In Streaming 103, we’ll discuss how to choose encoding parameters like resolution and data rate for your videos.