Understanding Google’s BBR & The Last Mile
We’re already written in an earlier post about the need to solve the problem of last mile congestion in order to improve Quality of Experience (QoE) for customers of OTT video streaming content. And we’ve talked about PCC technology, an online learning congestion control algorithm that we use at Compira to resolve these issues.
Now, we’re going to take a closer look at Bottleneck Bandwidth and Round-trip propagation time (BBR), a different congestion control algorithm, developed by Google. In this post we’ll explain BBR’s design philosophy, highlight BBR’s advantages and disadvantages, talk about the essential differences between PCC and BBR, and shed more light on our choice of PCC at Compira Labs for effective last mile delivery.
BBR’s network model
Imagine that you’re sending data from point A to point B. Somewhere along the way, your data traffic might hit a communication link that constitutes a bottleneck, that is, restricts the rate at which data is delivered. BBR responds to this by relying on a simple model of the network pipe between points A and B that reflects this bottleneck. This boils down to modeling the entire path as a single communication link that is as narrow as the bottleneck. BBR performs periodic measurements to assess the bandwidth and delay on this logical link, and adjusts the sending rate to match this theoretical bandwidth so as to keep latency to a minimum.
Since its introduction by Google, BBR, which is used to deliver YouTube’s traffic, has generated quite a buzz, as it attempts to solve a challenging problem that affects almost every major Internet service. BBR’s approach is simple and natural and has been shown to outperform traditional TCP in many environments of interest. In addition, BBR is open source, making it accessible and easy to experiment with.
Is BBR suited for the last mile?
BBR rests on the basic assumption that it is possible to imagine what’s happening within the network, and then use that imaginary scenario to resolve the congestion. While this might be true for some networks, other networks (and specifically, last-mile networks) can be highly unpredictable and complex, rendering them extremely hard to accurately model.
As we’ve noted before, the last mile is an extremely chaotic environment. Every millisecond, there can be multiple users joining and leaving the network, competing over the same resources, resulting in competing traffic flows unpredictably beginning and ending all the time. In addition, the various traffic streams might employ different congestion control algorithms to adapt their sending rates; for example, one traffic stream might be an email sent using TCP CUBIC, and another stream, competing over the same bottleneck link, might be a YouTube video, which is run on BBR. Reasoning about the dynamics between such algorithms can be difficult. Lastly, last-mile networks can consist of multiple different network segments that exhibit challenging conditions (such as mobile networks). Consequently, it’s nearly impossible to model the network in such a chaotic last mile environment.
Similarly to TCP, BBR makes certain fixed assumptions about the network that might fail in the unpredictable last mile and thus lead to suboptimal decisions. We next illustrate this through a simple experiment.
BBR vs. PCC
As explained in our previous post, PCC takes the exact opposite approach from BBR. PCC is performance-oriented, taking a black box approach, in the sense that it makes no assumptions about our ability to understand what is happening within the network. PCC’s underlying premise is that network is too complex and dynamic to model accurately, and so PCC relies on empirical observations regarding the implications for performance of sending at different rates.
PCC treats the sending of data as a sequence of micro-experiments. In each experiment, PCC tries out a certain send rate for a very short period of time (in the order of milliseconds), measures the implications for performance in terms of packet loss rate, latency, jitter, etc., and aggregates these into a utility rating (“performance score”). PCC employs a theory-informed online learning algorithm to determine, based on the history of these micro-experiments, what the next rate to send at should be. Online learning theory provides a powerful framework for making decisions under uncertain conditions, enabling PCC to quickly adapt to network changes and provide robustly high performance.
The impact of the differences between BBR’s attempt to explicitly model the (potentially highly dynamic and complex) network and PCC’s black box approach can be seen in the graph above. The graph charts the effects of using PCC and BBR in circumstances that mimic a chaotic last mile scenario, with latency, available bandwidth, and packet loss varying every second.
The green dotted line shows the optimal possible bandwidth that is theoretically available.
The blue line shows PCC delivery.
The orange line shows BBR delivery.
As the network gets more and more chaotic, the gap between PCC performance and BBR performance widens. PCC clearly outperforms BBR under these conditions.
Does BBR play well with others?
As recently pointed out by both practitioners and academics, when BBR goes up against other traffic control protocols, it doesn’t play well. It turns out that BBR can be very aggressive towards others’ traffic flows, taking over the limited available bandwidth. We will elaborate on this phenomenon in a future blog post.
BBR is an exciting recent attempt to tackle one of the Internet’s most crucial challenges, which is the key cause for bad user QoE. BBR’s approach to this challenge is simple and intuitive, and BBR is already used to carry a substantial fraction of video traffic on the Internet. However, BBR’s design philosophy is not well-suited for environments that are inherently chaotic, like the last-mile for video delivery, and BBR can be overly aggressive towards competing traffic.