Real-World Perspectives on Choosing the Optimal GOP Size

One of the most fundamental encoding decisions is the size of the Group of Pictures (GOP) or the frequency of I-frames within an encoded file. I-frames, also known as keyframes, are the starting points for groups of pictures, consisting of I-, B-, and P-frames.

Traditionally, the GOP size is directed by adaptive bitrate streaming considerations, such as ensuring an I-frame starts every segment.T o explore these tradeoffs, I shared a report examining the impact of GOP sizes on VMAF quality. The study assessed 13 files across genres like animation, general entertainment, sports, and office footage, with keyframe intervals ranging from 0.5 seconds to 20 seconds, encoded using x264 and x265. The report is freely available here.

Professional Insights and Tradeoffs

Following the publication, industry professionals engaged in an enlightening discussion, offering practical insights based on years of experience in OTT and IPTV scenarios. Their perspectives illustrate how GOP sizing decisions influence not only video quality but also user experience and device interoperability.

OTT Considerations and GOP Tradeoffs

Report author Jan Ozer kicked off the conversation by asking Ateme’s Clement Duval which GOP settings he recommended.

Ateme’s Clement Duval.

Clement responded, “For OTT, I mostly see fragments ranging from 2-6 seconds (the higher: the more packaging latency and the less agile in ladder-switching), with GOP (inside fragment) usually < 2 sec. If < 1 sec: impact on VQ; if > 2 sec: not much gain in VQ. It is, however, interesting to go > 2 sec, since no switching/zapping happens inside a fragment anyway.

Otherwise, Open and Variable GOP give the best VQ and don’t impact latency or bitrate switching unless there’s an interoperability problem with low-end players (which is very common!). I’d be happy to hear your feedback on this!”

Ozer responded,  “I’d love to test variable GOP; I hear (Netflix/Hybrik) that it delivers the best quality. The only time I compared the quality of Open vs. Closed GOP, I found the quality delta very small. Do you have any test data showing a bigger margin?”

Jan Ozer added: “As I mentioned below, HLS mandates 2-second GOPs. I have no idea what most publishers use, but a good percentage always seem to adapt whatever Apple recommends. In an IPTV scenario, I could see it extending out to five seconds or so, but from there, the quality benefits of 20 seconds don’t seem worth chasing.

For those distributing to mobile, where rung changes are more likely, two seconds sounds like a good compromise between stream quality and ladder-switching agility. I’ve seen broadcast engineers use 0.5-second GOPs because that’s what they did with MPEG-2. A huge penalty there. On the other side of the spectrum, there are seriously diminishing returns going beyond 5 seconds. What do you recommend?”

Spicy Mango’s Chris Wood.

Spicy Mango CTO Chris Wood: “Based on practical experience, the 2-second GOP helps improve the frequency of which adaptions occur. However, the inherent latency of mobile networks means that buffering can occur far more easily as it often takes a long time for TCP handshakes to occur and for segments to download. A longer segment duration seems to be less affected by latency or ‘breaks’ in mobile connectivity on 3G/4G networks. 5G brings improved (reduced) latency values, so expect this to change the approach slightly.”

Chris added: “Today, typically still on 6-10 for PAL at 25/50 fps or 9-12 for NTSC (30/60 FPS). For 5G, the smaller segments are possible, but it’s impossible to guarantee it across the board yet. Much of Europe/APAC still has very patchy coverage.”

Sylvain Corvaisier.

OTT and HLS Expert Sylvain Corvaisier: “The percentage of people watching OTT with a remote control must be very low. You have autoplay for every other use. If you have quality content (hence people not switching channels every 10 seconds), you want video quality to be on par, hence a larger GOP. There are other means to enhance starting time with OTT ABR apps.”

Sylvain also stated: “For VOD, I personally recommend aligning the GOP with the segment duration. This approach minimizes unnecessary keyframes, avoiding wasted bits (so we get more for other frames, especially efficient B) and improving overall efficiency. This aligns with the findings in your table and reflects the advantages of preloading in VOD, where there are no real-time constraints (excluding distribution bandwidth limitations). By leveraging preloading, we can minimize bitrate switching, which is a common cause of poor QoE.”

Sylvain added further: “The ‘2-second GOP duration’ recommendation has its roots in live MPEG-2 to STBs, I think, at least in Europe for DTT and IPTV (maybe the 0.5 you are talking about below for cable in the US as well) and has persisted as a ‘general guideline for GOP’ since. Many companies with a ‘broadcast history’ (pre-OTT) still adhere to this standard. I’d say ‘IDR SHOULD at least be present at the beginning of every segment’ is a good recommendation for both VQ and interoperability.”

Guillaume du Pontavice

Streaming Specialist Guillaume du Pontavice: “The tradeoffs between segment length and content type (e.g., action movies vs. simpler content) are critical. Keyframes are the main cost, and for complex scenes with frequent cuts, longer segments don’t bring significant benefits. For VoD, optimal placement is at scene cuts, avoiding arbitrary durations. This can lead to long GOPs for static scenes, which poses challenges: ABR responsiveness, startup/seeking precision, and latency worsen with longer GOPs.

As Jan’s data shows, returns diminish for segments beyond ~10 seconds. For live content, adaptive segments are harder to manage due to real-time encoding constraints, so fixed durations often make more sense.”

Sylvain concluded: “That approach reminds me of the first Envivio HLS encoders, which initially focused on scene-cut alignment. However, as the market evolved, priorities shifted towards selling two distinct devices—one for encoding and another for ABR optimization—leading to less emphasis on tight integration. It’s interesting to see this concept resurface now, given the growing demand for both quality and ABR efficiency.”

Table 1. Summary of recommendations. The Author (Jan Ozer) recommends prioritizing the practitioners’ views over his when they conflict. Click the table to view it at full resolution.

Key Takeaways from the GOP Size Discussion:

  1. Shorter GOPs for OTT:
    • Clement Duval mentioned that going below one-second harms video quality (VQ), and there are minimal gains beyond two seconds. Open and variable GOPs provide the best video quality but can have compatibility issues with low-end devices.
  2. Variable and Open GOP Potential:
    • Jan Ozer discussed testing variable GOPs and noted that open vs. closed GOP comparisons show minimal quality differences. He emphasized that HLS mandates two-second GOPs and that mobile environments benefit from such settings for better adaptability. He suggested a sweet spot of five seconds for IPTV but diminishing returns beyond that.
  3. Impact of Network Latency:
    • Chris Wood emphasized the role of mobile network latency, stating that longer GOPs and segments reduce buffering on 3G/4G networks. With 5G’s reduced latency, shorter GOPs might become more feasible. He also recommended GOP durations of 6-10 for PAL (25/50 fps) and 9-12 for NTSC (30/60 fps).
  4. VOD-Specific Optimizations:
    • Sylvain Corvaisier recommended aligning GOP size with segment duration to improve bitrate efficiency in VOD workflows. He highlighted the preloading advantage in VOD, where real-time constraints are absent, allowing flexibility. He also noted that shorter initial segments can enhance startup speed.
  5. Scene-Cut Alignment for VoD:
    • Guillaume du Pontavice advocated for aligning GOPs with scene cuts in VoD to avoid unnecessary keyframes, especially in static scenes. However, longer GOPs may hinder ABR responsiveness and worsen startup and seeking latency. He recommended capped segment durations for VoD and fixed durations for live workflows due to real-time encoding constraints.

 

About Jan Ozer

Avatar photo
I help companies train new technical hires in streaming media-related positions; I also help companies optimize their codec selections and encoding stacks and evaluate new encoders and codecs. I am a contributing editor to Streaming Media Magazine, writing about codecs and encoding tools. I have written multiple authoritative books on video encoding, including Video Encoding by the Numbers: Eliminate the Guesswork from your Streaming Video (https://amzn.to/3kV6R1j) and Learn to Produce Video with FFmpeg: In Thirty Minutes or Less (https://amzn.to/3ZJih7e). I have multiple courses relating to streaming media production, all available at https://bit.ly/slc_courses. I currently work as www.netint.com as a Senior Director in Marketing.

Check Also

CSAI vs SSAI in Video Ad Insertion: A Comprehensive Guide with Recommendations

Introduction Ad insertion technologies play a crucial role in monetization strategies. Two primary methods dominate …

DCVC-B: A New Deep Learning Codec for Efficient B-Frame Compression

In a recent white paper titled Bi-Directional Deep Contextual Video Compression (DCVC-B), researchers Xihua Sheng, …

M3-CVC: A Glimpse into the Future of AI-Driven Video Compression

A new AI-based codec proved 18% more efficient than VVC but substantial decoding requirements will …

Leave a Reply

Your email address will not be published. Required fields are marked *