Strong market forces are pushing for a standards-based resolution to the high-profile battle of proprietary streaming technologies pitting Apple’s QuickTime, Microsoft Windows Media Technologies, and RealNetworks’ RealMedia against one another. Broadcasters are seeking a unified standard that will allow them to use a single delivery method for both traditional programming and Internet offerings. Many publishers of electronic content see users of PDAs, cell phones, Internet appliances, and set-top boxes as a vast new audience. But multiple device-specific technologies create an overwhelming development and support burden. Reaching such a diverse community will require a single content-delivery mechanism that can easily adapt to work with an array of devices and scale to suit the bandwidth of a variety of mediums. That’s the promise of the Moving Pictures Experts Group’s recently ratified MPEG-4 standard.
Until recently, the fanfare surrounding Microsoft Windows Media Technologies, QuickTime, and RealMedia made MPEG-4-the successor to the ubiquitous MPEG-1 and MPEG-2 video standards-easy to forget. (MPEG-3, originally intended for HDTV but now undefined, was folded into MPEG-2, which proved capable of handling HDTV and eliminated the need for a new standard.) But MPEG-4’s wallflower status ended in December 2000, courtesy of a 27-company alliance forged by Apple, Cisco, Philips, and Sun. The group’s initial goal was to create a specification for streaming audio and video over IP networks. This brought MPEG-4, with its powerful, object-driven approach, to the fore.
As streaming media moves outside the Internet Protocol sandbox into wireless, broadcast, satellite, and cable arenas, the need for an inter-medium negotiation protocol becomes paramount. The requirements lie far beyond the scope of current proprietary streaming technologies; fortunately, those needs are precisely what MPEG-4 was designed to address. Unfortunately, inferior quality and unrealistic licensing requirements may create significant barriers to the technology’s acceptance.
Contents
The MPEG-4 Solution
The best way to appreciate the MPEG-4 vision is to understand the problem it’s designed to solve. Imagine you’re the head of programming for a major television news network. You reach your audience primarily via traditional, high-bandwidth means such as broadcast towers, cable, and satellite, so the quality of delivery is excellent. You also provide a feed to your Web site for real-time streaming, but delivery quality over the Internet ranges from acceptable to dismal, depending upon the recipient’s connection speed.
Over a 28.8-Kbps connection the video is awful, and the audio is merely passable, as you’d expect. But even the stock ticker, which seems to be nothing but text (and looks great scrolling across the bottom of a TV screen), turns to visual mush, as do sports scores and captions. The problem occurs because the various separate sounds, pictures, and text elements are merged into a single information feed before they’re sent, so a low-bandwidth connection degrades the whole.
There’s also no way to promote e-commerce or just plain stickiness, because you can’t turn individual elements into hyperlinks. For example, you’d love to link the stock ticker to a library of company data or let the viewer jump from sports scores to more detailed articles, but because there are virtually no individual elements, you can’t. (Closed captioning is a separate element, but no standard has been set for streaming such content.) And you certainly can’t reach wireless devices, Internet appliances, set-top boxes, Dick Tracy watches, or other systems with unique requirements unless you create a separate stream to feed each. Even then, owners of such gadgets must go to separate URLs, and you’re saddled with more to create, manage, and maintain.
If you could snap your fingers and make a solution appear, you’d want the ability to create one standard stream that could intelligently provide the best quality experience to the complete spectrum of consumers. That, in a nutshell, is MPEG-4’s key design goal.
Codec Versus Architecture
To understand the potential importance of MPEG-4, you first need to understand the distinction between a codec and an architecture. The term codec, a contraction of compression and decompression, refers to any algorithm used to compress and decompress digital media streams for more efficient storage and transmission. (Another kind of codec-a coder/decoder-converts analog signals to digital and vice versa, but that’s not what we mean here.) An architecture is a much larger concept. It identifies supported media formats and defines stream synchronization. It also delineates the structure of compressed files, the rules for stream transmission, and the communication protocols between the server and remote user.
As with the MPEG-4 architecture, QuickTime, RealMedia, and Windows Media Technologies support multiple codecs, but these proprietary architectures focus primarily on media delivered over IP networks and to computers. MPEG-4 supports a much broader range of transport mediums and target devices. Moreover, because an international-standards body with representation from all relevant industries created the architecture, MPEG-4 avoids the politically unpalatable “my way or the highway” aspect of a proprietary solution.
You can’t see or hear architecture, though-that’s where codecs come in. Unfortunately, in terms of quality, the MPEG-4 video codec is, at best, a distant third behind those of Real Networks and Microsoft. A promising new QuickTime codec called On2 threatens to push MPEG-4’s offering to fourth place. So although MPEG-4 has an undeniably bright future as an architecture, the MPEG-4 codec’s future is cloudier. What this adds up to in the long run is anyone’s guess.
Object Orientation
In an MPEG-4 stream, all elements-such as the audio and video assets, stock tickers, titles, and closed caption feeds mentioned earlier-are kept as discrete objects. In addition, components of the video stream can be separately maintained.
For example, a TV station will generally shoot the video of a weathercaster in front of a green screen, then electronically replace the screen with a bitmapped weather map, animated arrows, and other objects. An MPEG-4 stream could maintain all of these items separately. Similarly, during business and sports reports the respective anchors and the scrolling text results could be maintained as separate objects.
This object orientation provides two key benefits. First, each element can be interactive, allowing the viewer to pause or fast forward stock results, for example, then hyperlink to another page, bring the background video to full screen, turn off the anchor video, and interact in an almost limitless number of other ways.
Second, streams can be customized for different connections and devices. A high-powered computer connecting via DSL might receive all elements of the content, but a person checking the weather on a wireless PDA might get just the background weather map with animations and a low-bit-rate audio track. A cell phone customer checking stocks might receive the stock ticker text track after automatic conversion to synthetic speech by an MPEG-4 component. This contrasts dramatically with the limited scalability offered by today’s streaming technologies, which simply drop frames or move to a lower-quality audio codec for low-bandwidth connections.
Using MPEG-4, broadcasters can cut creation and archiving costs by creating one stream that’s intelligently parsed for all target devices and transport mediums, rather than a custom stream for each target. MPEG-4, an international, multiplatform standard, also helps ensure the availability of playback drivers across a greater range of users, avoiding much of the latency or downright unavailability that results from proprietary approaches.
Under the Hood
Because no object-oriented MPEG standard preceded MPEG-4, the committee had to define lots of under-the-hood technology. BIFS (Binary Format for Scenes) is the language for describing how different objects fit into the scene. BIFS borrows heavily from VRML (Virtual Reality Modeling Language) but is a binary rather than a text language and therefore more compact. In addition, BIFS allows playback while a scene is streaming, a necessity for Internet broadcasts. VRML playback begins only after downloading completes.
With MPEG-4, each object is encapsulated in its own stream (or in several). The individual streams are called elementary streams. Each contains information about the required decoder and also includes quality-of-service details such as minimum bit rate. Another elementary stream stores BIFS information with time stamping to ensure synchronization.
MPEG-4 includes two multiplex layers enabling transmission with existing transport mechanisms such as the RTP (Realtime Transport Protocol) used on the Internet, or ATM and MPEG-2 used for cable and broadcast distribution. First is the FlexMux, defined in the Delivery Multimedia Integration Framework (DMIF). At a high level, the FlexMux defines communication between the MPEG-4 server and player, ascertaining quality-of-service levels and grouping the elementary streams to streamline delivery.
By contrast, the TransMux channel interfaces with the transport stream to manage the actual data transfer. This allows MPEG-4 to travel over broadcast, wireless, and Internet channels, thus serving an exceptionally broad market.
Note that these multiplex layers provide MPEG-4 with a significant competitive advantage over QuickTime (the starting point for the MPEG-4 file format) and Synchronized Multimedia Integration Language (SMIL) from the World Wide Web Consortium (W3C). Neither of the older formats offers these negotiation layers, so neither can transfer information over mediums other than the Internet. Given the divergent interests of the broadcast, wireless, and Internet markets, the possibility of either Real or Apple forging an inter-medium standard outside of a formal standards body seems remote.
The Good and the Bad
Beyond these basics, MPEG-4 offers other exotic goodies such as facial animations, which, when combined with the text-to-speech interface, allow synthetic talking-head broadcasts over extremely low-bit-rate connections. And early tests of an MPEG-4 still-image codec from E-Vue exhibited extremely impressive results, with JPEG-equivalent quality at roughly half the file size for quarter- and full-screen images. With larger images, MPEG-4 often produced results comparable with those from JPEG files that were three to four times as big.
Unfortunately, this strong performance does not extend to MPEG-4’s video codec, an Achilles’ heel that will certainly delay or perhaps even derail MPEG-4’s acceptance. Designed back in 1995, MPEG-4 video is largely based on H.263, which traces its technological roots back to the aging MPEG-1 standard; hardly a recipe for world-class compression.
In contrast, Microsoft claims its own non-MPEG-4-compliant versions of the video codec surmount several limitations in the standard, boosting quality by over 30 percent. Moreover, last July, Real filed a challenge with the MPEG-4 committee, presenting evidence that RealVideo 8, the company’s latest codec, is clearly superior to MPEG-4 and requesting that the Real product be wholly included in the MPEG-4 video standard and in future extensions. (The committee’s response was unknown at press time.)
Our lab-based quality comparisons confirm the superiority of the Real and Microsoft technologies over MPEG-4. This would make MPEG-4 the first standards-based codec that actually delivers poorer quality than competing products and creates a serious problem for potential publishers, especially those considering a switch from Real or Microsoft products.
Other Rough Edges
In addition to less than stellar video quality, MPEG-4’s intellectual property protection is incomplete. Though MPEG-4 contains an optional Intellectual Property Protection and Management layer with hooks to Digital Rights Management (DRM), there is no integrated system, such as that offered by Microsoft Windows Media Technologies.
Committee members explain that the diverse interests of the three global industries served by the standard-broadcast, wireless, and the Internet-precluded a hard-wired approach. Although undoubtedly correct, this still forces publishers to find a DRM package to protect their assets; another bar to the fast acceptance of MPEG-4.
Finally, MPEG-4 comes with one feature that no one really wants: the potential obligation to pay royalties to the MPEG-4 patent pool. This could have a dramatic impact on MPEG-4 distribution. Because developers incur no royalties with MPEG-1, Microsoft and Apple include decoders with their operating systems. This has made MPEG-1 playback capability nearly universal and greatly expanded the use of the standard by Web and CD-ROM publishers.
By contrast, both companies eschewed MPEG-2 because of the reported $6-per-copy license. As a result, MPEG-2 video is much more difficult to distribute, because publishers cant count on their audiences having the necessary playback capabilities.
Apple, Microsoft, and Real have well-established pricing policies for their technologies, with free decoders for all. Until MPEG-4 royalties and rates are similarly established, publishers evaluating MPEG-4 as an architecture or codec must proceed with caution.
It’s a Wrap
So here’s the score. MPEG-4’s object orientation enables a very efficient production paradigm and high-quality interactive content distribution to a diverse range of users. Publishers, however, must swallow suboptimal video quality or increase quality by boosting video bandwidth, thus incurring additional costs. Potential royalty obligations and the need to bolt on a third-party DRM solution go on the debit side of the ledger, too.
Obviously, MPEG-4 will appeal most to broadcasters consolidating multiple asset types and distributing to a diverse user base. The standard seems less than optimal for publishers simply repurposing existing video assets for applications such as video-on-demand.
All this leaves MPEG-4 with some serious perception problems. Regarding poor video quality, committee members informed us that broadcasters can use other codecs such as RealVideo or Windows Media Video in an MPEG-4 stream as long as that stream doesn’t break any MPEG-4 decoder in the chain. How this would work from a practical standpoint is unclear, and in any case, the message has not been effectively communicated to potential users.
Ideally, MPEG-4 would work in the same manner as QuickTime and Microsoft Windows Media Technologies, which handle any compliant codec transparently, automatically sending the required decoders to the receiver. If MPEG-4 will operate in a similar manner, potential users need to know, because today, MPEG-4 translates to inferior video quality in the minds of many. Similarly, the MPEG-4 committee needs to formalize relationships with DRM providers and make these solutions readily and publicly available.
Most of all, the MPEG-4 patent holders need to create and publicize a royalty policy, understanding that formats such as Real, Windows Media Technologies, QuickTime, and even MP3 thrived in large part because playback was free. Makers of dedicated set-top boxes and other hardware players could bear a small royalty, but attempting to charge for general Internet playback would likely be untenable, especially given the quality deficit.