On May 13, Oracle Internet Intelligence posted the following Tweet:
Very large internet outage observed in China in the past couple of hours. External connectivity for China Telecom i… twitter.com/i/web/status/1…—
InternetIntelligence (@InternetIntel) May 14, 2019
A couple of hours later, CAIDA’s Internet Outage Alerts posted a similar observation:
CAIDA’s Internet Outage Alerts (@caida_ioda) May 14, 2019
In examining the country-level graphs included in both Tweets, a couple of similarities are evident:
- The BGP routing metrics remained unchanged – that is, neither organization observed changes to the number of routed prefixes geolocated to China
- Both ICMP-based metrics experienced a drop – IODA’s ping-based “active probing”, and Oracle’s traceroute completion ratio
These two factors occurring together actually aren’t that unusual, and can be indicative of inbound ICMP traffic being blocked/dropped for a particular reason, rather than an actual disruption of Internet connectivity impacting local users because a network has become unreachable.
It is interesting to note that the IODA graph also shows a minor, but visible, drop in the “darknet” metric. The metric represents what is known as “Internet Background Radiation”, which CAIDA defines as unsolicited traffic reaching the UCSD Network Telescope monitoring an unutilized /8 address block. The drop in this metric would seem to be more indicative of a disruption that impacts last-mile connectivity. However, there’s no corresponding drop in the Internet Intelligence DNS traffic metric, which would be the best proxy for last-mile connectivity on that set of graphs. (It’s an imperfect proxy, though, since it measures traffic from resolvers, which can cache responses; DNS manipulation by China’s “Great Firewall” may come into play here as well, affecting the volume of traffic seen by Oracle’s authoritative DNS servers.)
The Tweet from @InternetIntel notes that China Telecom was impaired, while China Unicom and China Mobile were unaffected. Drilling down into network-level views, the Internet Intelligence graph clearly illustrates a drop in the number of completed traceroutes reaching AS4134 (China Telecom) through upstream providers. And although the IODA graph doesn’t provide insight into upstream providers, it does clearly show concurrent drops in successful active probing activity and darknet traffic, with no change in routed prefixes from that ASN.
As would be expected, the issue at China Telecom impacted downstream networks – a selection can be seen in the Oracle Internet Intelligence graphs below.
While these graphs illustrate the impact at a high level, allowing us to observe that a disruption occurred, they don’t provide any real insight into the actual impact of the event. However, in a May 14 blog post, Internet monitoring firm ThousandEyes provided some additional context, noting:
“Over the course of the prolonged outage, any traffic routed through affected infrastructure was dropped, which meant that some Internet users in and outside of China would have experienced service disruptions connecting to various websites and applications. Users in China attempting to reach sites hosted external to China would have been impacted, along with users outside of China trying to connect to sites hosted within China.
Though not exclusively impacting western sites and services, many major U.S. brands, such as Apple, Amazon, Microsoft, Slack, Workday, SAP, and others were impacted over the course of the outage window.”
A post on Chinese social network Weibo provides what appears to be a local perspective on the disruption, with an English translation of the post (via Google Translate) stating:
“2:20 am on May 14th, 2019: Large-scale network fluctuations occurred in telecommunications, and 163 domestic <-> 163 international networks were disconnected.
Simply put, most of China Telecom’s users have been isolated in China’s domestic network, and data cannot go abroad. The specific recovery time is to be determined.
China Unicom and China Mobile and telecom CN2 GIA users are not affected.”
While the (translated) wording of the post suggests that the text may have been copied from a more official explanation, the original source of that explanatory text is unknown. The same (original Chinese) text was also posted to Twitter by several users (@Evaustchen, @Yhio_707019saku). “163” is presumably a reference to ChinaNet, which is run by China Telecom: “Often referred to as the 163 network after the number users dial to gain access to it, ChinaNet is also the effective international gatekeeper by virtue of the fact that all networks must go ‘through’ China Telecom’s international telecommunications access.”
So, what happened?
Ultimately, it isn’t clear what caused the multi-hour disruption within China Telecom’s network. As of the publication date of this post (May 16, 2019), China Telecom hasn’t released any information publicly about the cause of the disruption, and the information available from Oracle, CAIDA, and ThousandEyes illustrates the impact of the disruption, but doesn’t point to a cause.