A woman walks into a bar and asks, “I’ll have two video transcoders, please.” Bartender says, “We don’t carry transcoders; we have VPUs. Will they do?” The woman scratches her head and says, “hmm, I don’t know.”
Do you? If not, you’re in the right place. This short article will briefly define transcoder, VPU, and VCU, and get you in and out in under 4 minutes. And hey, it will definitely help the next time you try to order encoding gear in a bar.
In the Beginning, There Were Transcoders
Simply stated, a transcoder is any technology, software, or hardware that can input a compressed stream (decode) and output a compressed stream (encode). FFmpeg is a transcoder, and for video-on-demand applications, it works fine in most low-volume applications. For live applications, particularly high-volume live interactive applications, you’ll probably need a hardware transcoder to achieve the necessary cost per stream (CAPEX), operating cost per stream, and density.
If Websters defined transcoder (it doesn’t), it might have a picture of the NETINT T408 as the perfect transcoder specimen. Based on a custom transcoding ASIC, the T408 is inexpensive ($400), capable (4K @ 60 FPS or 4x 1080p60 streams), flexible (H.264 and HEVC), and exceptionally power efficient (only 7 watts).
The T408 Video Transcoder decodes and encodes H.264 and HEVC video. What doesn’t the T408 do? Well, that leads us to the difference between a transcoder and a VPU.
First, the T408 doesn’t scale the video. If you’re building a full encoding ladder from a high-resolution source, the host CPU performs all the scaling for the lower rungs. In addition, the T408 doesn’t perform overlay in hardware. So, if you insert a logo or other bug over your videos, again, the CPU does the heavy lifting.
Finally, the T408 was launched in 2019, the first ASIC-based transcoder to ship in quite a long time. So, it’s not surprising that it doesn’t incorporate any artificial intelligence processing capabilities.
What is a Video Processing Unit (VPU)?
What’s a Video Processing Unit? A hardware device that performs decode and encode, plus other functions like scaling, overlay, and AI processing. You see this in the transcoding pipeline shown below, which is for the NETINT Quadra.
When it came to labeling the Quadra, you see the problem; It does much more than a video transcoder. Not only does it decode VP9 (and H.264/HEVC) and output AV1 (ditto), it has all the other hardware functionality. It’s much more than a video transcoder; it’s a video processing unit (VPU). For the record, the Quadra delivers four times the throughput of the T408 (one 8K60 output, or 16 1080p60 outputs) and costs around $1,500.
As much as NETINT would like to claim the acronym, it existed before the Quadra. That said, if Websters defined VPU (it doesn’t); oh, you get the point. Here’s the required glamour shot of the Quadra.
Just to throw a final wrinkle into the mix, when Google shipped their ASIC-based transcoder, Argos, they labeled it a Video Coding Unit, or VCU. Like the T408 and Quadra, the benefits of this ASIC-based technology are profound; as reported by CNET, “Argos handles video 20 to 33 times more efficiently than conventional servers when you factor in the cost to design and build the chip, employ it in Google’s data centers, and pay YouTube’s colossal electricity and network usage bills.”
Of course, Google doesn’t sell the Argos, so if you want the CAPEX and OPEX efficiencies that ASIC-based VPUs deliver, you’ll have to buy from NETINT. Like our woman in the bar, make it a double.