Yesterday, 20190624, a routes leak hit majors Internet player like Cloudflare, Verizon and Amazon AWS that were obfuscated by a network outage.
An impressive recon and report were made by CloudFlare that early detect the problem and helped AS33154 to solve the problem.
You can find the complete report here.
But what happened? Briefly, it seems that a BGP protocol optimization tool installed on AS33154 has split the major networks into more specifics /21 networks to its customers. The customer AS396531 announced back theses prefix throughout the Internet over transit IP provider AS701 blacking many websites that no longer had a return route.
According to Cloudflare:
The leak should have stopped at Verizon. However, against numerouswww.cloudflare.com
best practicesoutlined below, Verizon’s lack of filtering turned this into a major incident that affected many Internet services such as Amazon, Linode andCloudflare.
If you are an ISP, what tricks you would take to prevent this mess?
Simple way could be to set a route-limit on your peer sessions:
# Huawei bgp 65099 ipv4-family unicast peer n.n.n.n enable peer n.n.n.n route-limit 5000 95 idle-timeout 30
In the example above the peer will accept 5000 prefixes with a warning at 95%, when the prefix 5001 is received the router will shut the BGP session for 30 minutes.
Implement The RPKI framework!
but we’ll talk about it again…