Describes the most common mistakes made when producing for podcasts for iTunes.

Mistakes to Avoid When Producing Podcasts for iTunes

*/]]>

In the fall, of 2008, I gave a presentation on producing H.264 video at StreamingMedia West in San Jose. During my preparation, I noticed that while H.264 is the hot topic in periodicals and forums, it’s still not widely used for streaming, with Windows Media and VP6-based Flash still predominating. In one market segment, though, H.24 was near pervasive, so I decided to spend some time learning about how H.264 was used – and mis-used there.

 

Not surprisingly, the market was podcasts distributed via iTunes. I say not surprisingly because iTunes is all about iPod/iPhone devices, which play only H.264 and MPEG-4 video. It’s like finding Sox fans in Boston, Bulldog fans in Athens or Yankee fans in New Yawk. You get the point.

 

Having found the Mecca of H.264 usage, I decided to download 50 podcasts, try to load them on my iPod Nano and see what happened. Interestingly, six refused to load at all, and three had what I’ll call “compromised” displays. After analyzing all the podcasts in Inlet HD’s most excellent streaming media analysis tool Semaphore, I noticed that many others used sub-optimal encoding parameters. While producing podcasts is probably a tiny part of what we all do for a living, it’s still a useful skill, so I thought I would detail these findings for you.

 

First, some background. MPEG-4 is the overarching standard that includes two video codecs, the MPEG-4 codec itself, and a more advanced video codec, H.264, also known as AVC. When used in an MPEG-4 “wrapper,” H.264 files typically have an mp4 or m4v extension, the first being the official designation and the latter being the extension Apple created for its devices. You can also “wrap” an MPEG-4 file in a QuickTime file with an mov extension, or encode it for Flash with an flv or f4v extension. Soon, you’ll be able to encode H.264 to Windows Media, presumably with a WMV extension.

 

H.264 has multiple “profiles” that specify levels of playback compatibility. For example, the Baseline profile is typically for devices like iPods or cell phones that have limited playback horsepower. Accordingly, the Baseline profile doesn’t use many of H.264’s more advanced encoding techniques that can produce higher quality but also create a stream that’s too hard to decode. Then there’s the Main and High profiles, typically for computer-based playback, which produce a tighter, higher quality stream that’s harder to decode.

 

Obviously, when producing for devices, rather than general purpose playback, job number 1 is to use the appropriate profile. Interestingly, of the six videos that wouldn’t play on my iPod Nano, five used the Main Profile, which is verboten. The sixth used the Sorenson Video 3 codec of all things, which also won’t play.

 

So, when producing for podcasts, always use the Baseline profile of the H.264 codec. Before encoding, however, go to Apple.com and print the video playback specs for the latest iPod, and make sure that you’re within the resolution and data rate requirements. Unfortunately, this is more complicated than it sounds because the initial iPod could only play H.264, Baseline-profile videos at 320×240 resolution, while current iPods and iPhones can play Baseline H.264 video up to 640×480 resolution.

You can see this in the iPod preset shown in Figure 1, which is from the Adobe Media Encoder CS4. Note that if you choose the Apple iPod Video Small preset, you’ll encode at 320×240, and, of course, that the preset uses the Baseline profile.

Figure 1. An iPod preset from the Adobe Media Encoder CS4.

Anyway, so your next decision is target resolution. In the sample of 44 videos that loaded on my iPod Nano, 25 went with 320×240, which is obviously the safe route, while the other 19 (and 5 of six that failed to play) went 640×360 or larger.

 

Why go larger than 640×480 when the screen resolution of most iPods is 320×240? First, many iPods have composite output ports that let you play the video on a TV set or other analog device. Though display on the device itself is limited to 320×240, 640×480 video will look better when displayed on a TV set than 320×240. More importantly, iPhones and the iPod Touch have 480×320 resolutions, and six of the 19 producers using greater than 320×240 resolution produced at 16:9, which looked better on the iPhone/iPod Touch than 4:3 video.

 

Which leads me to the three podcasts with “compromised displays.” Briefly, if you display your 16:9 video on a 4:3 iPod, the device displays the “center-cut,” much like a 4:3 television does with 16:9 broadcasts. This means that it displays the middle section of the video and cuts off the right and left edges rather than displaying the entire video with letterboxes on the top and bottom.

 

Several producers of 16:9 video – including Photoshop User TV – included screencam videos with content on the edges that wasn’t visible when viewed on a 4:3 display. So, while the announcer was saying “click this menu item,” the menu item was off-screen on 4:3 displays. Your viewers can change this center cut option to letter box the video, but unless you tell your viewers how and where to change that preference.

Figure 2. The outer edges of this 16:9 video won’t show up when played on a 4:3 iPod using the default “center cut” video configuration.

Interestingly, when I examined footage converted from 16:9 broadcast to 16:9 podcasts, like Oprah, it was clear that the cameraperson was framing for 4:3 display, so even during wide shots, the main subjects were within the 4:3 center-cut window. If producing a 16:9 podcast, you should do the same.

Figure 3.  Because the camera operator shot with “center cut” display in mind, this video looks good on all iPods.

The other mistakes were more technical, like exceeding the recommended data rate and using too frequent key frames, which can degrade quality and add a pulsing effect to your video. In this regard, note that the iPod preset in Apple Compressor uses a data rate of 1.12 mbps for 640×480 video, and inserts key frames every 150 – 300 frames, depending upon content, or one every five to ten seconds.

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

Five star review for video quality metrics course.

New Five Star Review for Video Quality Metrics Course

The Computing and Using Video Quality Metrics course teaches encoding pro to compute and use video metrics like VMAF, PSNR, and SSIM.

Figure shows the different components to live streaming latency.

The Quality Cost of Low-Latency Transcoding

While low-latency transcoding sounds desirable, low-latency transcode settings can reduce quality and may not noticeably …

NAB Session on AI in Video Streaming

Like most encoding professionals, I’ve followed AI-related streaming advancements for the last few years. I’m …

Leave a Reply

Your email address will not be published. Required fields are marked *