Cloudflare outage briefly takes down several websites
The outage of a much-maligned infrastructure provider has had widespread consequences for many websites. Cloudflare, which protects websites from hacks and malicious code injection by storing content across their network, went offline for about an hour today (21/22). Almost three-quarters of traffic to sites hosted by Cloudflare, was blocked during the outage.
Cloudflare’s services include online protection and performance optimisation. ‘Cloudflare acts as a buffer between the web servers that host content and the end user requests that come in,’ said Gartner analyst Manish Gupta in a blog post explaining why the outage had such an impact.
The outage appears to have started with a memory leak in their edge servers which quickly spread across their network of over 150 data centers. The memory leak triggered a failover mechanism that kicked in for almost 8 minutes but couldn’t fix the problem. Only after external engineers were called in was the service restored.
The outage has caused websites to fall in page rank and traffic, as well as cost some companies millions in missed opportunities. One company told me that they would have made a large sale but the downtime prevented them from completing the request. Several smaller sites were forced to use paid advertising to keep their site afloat.
The Cloudflare outage has caused websites to fall in page rank and traffic, as well as cost some companies millions in missed opportunities. One company told me that they would have made a large sale but the downtime prevented them from completing the request. Several smaller sites were forced to use paid advertising to keep their site afloat.
After the outage, Google’s indexing managed to pick up Cloudflare services which introduced some of those sites into their index, causing a sharp drop in their traffic and ranking position. Users that site using Cloudflare services have reported issues with the system, which may indicate a number of errors in the functioning of their site.
The outage caused social media sites to fail, causing Twitter users to post hundreds of complaints about the failure. Some reports suggest that websites were down for up to 11 minutes in some cases. The outage has been widely reported in major media outlets but the overall sentiment has been surprisingly positive from companies and consumers alike. An anonymous Cloudflare engineer told me that the outage was a “speed bump in an otherwise fantastic year for us”.
The sites affected was Shopify, Zerodha, Canva, Twitter, Quora. Some cryptos affected that was Coinbase, Wazirx, Bitfinex. The outage affected a large majority of the websites using Cloudflare, but not all. The 20 or so large companies using their Enterprise product were not offline, but they still felt the effects of going from 75% to 0% of customers being served. “While our network experienced a brief outage that impacted 10% of customers in all regions simultaneously, Cloudflare quickly recovered without any indication or impact on the applications themselves. The network itself did not experience any negative impact.”
The outage has caused many large sites to reconsider their use of the service. Some have already removed the service and others are reviewing their contracts with Cloudflare. One company told me that they had considered dropping Cloudflare but that this prompted them to take action.
As a consequence of the outage, Cloudflare is facing customer complaints and some lawsuits. These companies claim that they are being unfairly damaged by the outage. One company tells me that they caused over $30,000 in revenue loss just from their website being offline. Another company claims that their revenue has been impacted by the outage: “We are using CloudFlare to protect our web presence and to front-end web application services for our members. The results of this incident have been devastating.
Cloudflare’s network outage caused a loss of revenue for us, and caused us to lose trust in CloudFlare as a company. We have signed up with another provider and will be migrating our services within 10 days. We also have wasted considerably more time dealing with our users about the status of the site (as we had to tell them that the site was offline due to an issue with Cloudflare, not because of our own issues).
Of course, the outage also impacted our members. Our members have been unable to log in to their accounts due to the Cloudflare outage, which resulted in them not being able to access services that they paid for. Members were also forced to contact us as a result of outages on Cloudflare’s network.”
After the outage, Cloudflare experienced a large amount of backlash. Many saw it as a direct attack on freedom of speech. One senior figure in the government told me, “It’s as if they declare war on us, who we’re supposed to protect.” Another representative felt that allowing websites to have their content protected was a dangerous precedent.
Why it Happened
One Cloudflare employee told, ‘We have a culture of fixing bugs as they are discovered. In this case, a team found a memory leak in our edge servers and went to work to fix it. However, the leak appeared to be more widespread than initially thought and the team was unable to get the services stable. To prevent data loss for our customers, we shut down the network for over 5 minutes in an attempt to isolate the issue and get the service running again.
Restarting the service caused it to come back online but caused the large majority of traffic to be blocked. The team worked around the clock to get the issues resolved and had everything running smoothly again in less than 24 hours. We understand this was a difficult day for our customers and we sincerely apologize for the disruption that was caused.”
The primary worry of the customers is that the company will not recover their data. Cloudflare has said that they are working hard to get all of their customers’ data back, but it will take a long time. In the meantime, customers are told that they should have a backup.
Cloudflare CEO’s Response
After the outage, Cloudflare CEO Matthew Prince said in a blog post, “For many sites, CloudFlare’s service works like a double-edged sword: It protects them from the bad guys but it also makes their business harder. In some cases, it’s hard for us to know what impact our network problems are having on businesses. For example, when a site goes down and we can’t get it back up in time for customers, they might go looking at other alternatives.
“This is the trade-off we made when we built our network. It’s a necessary risk to protect against the bad guys but it comes at the expense of impacting legitimate traffic. This trade-off is a reflection of our growth, and we take responsibility for it, but it’s one that I think we can continue to address by better understanding the interactions between our network and customers’ services.
“We understand that this outage is frustrating for all affected, especially in light of our core mission to help build a better Internet. We will be undertaking a thorough analysis of how this incident occurred and will do everything we can to help make sure it doesn’t happen again.”
Cloudflare is currently working hard to ensure that the site get their data back and fix the problems that caused the outage in order to prevent any further outages. They are also working on understanding how this happened and how they can address it. Cloudflare is a distributed network, so if one edge server fails the service it is not impacted.
This issue had spread to multiple edge servers which resulted in all of the DNS resolvers getting a 0% packet loss and not just Cloudflare’s resolvers. This caused the sites to get redirected to an empty page saying “Cloudflare is Working” rather than the actual site content.
Some Cloudflare customers have requested that the company not collect their data. This was due to the fact that the outage gave them a reason to re-evaluate how the company uses customer data. However, there were few companies willing to listen and even fewer who actually knew about it. Many of those same companies wanted their data even more secure than it is now, making it a difficult decision for Cloudflare.
Cloudflare has always taken a strong stance on privacy. The company believes that they should not be responsible for the content that their customers are sending or receiving.
This incident could have been avoided by Cloudflare. If the company had a backup plan for the outage, this would have stopped the outage from impacting so many sites in order to get itself back up. If the company was having issues with their cloud provider, it should have been addressed immediately. The company should have been more transparent about what was happening in order to prevent service interruptions for any of its customers.
Cloudflare had a brief history of outages like this. In 2014, they were down for over 12 hours and even longer in 2016. In 2015, Cloudflare had a major outage which lasted up to 20 hours. The 2016 outage was similar to the recent one as it lasted over 30 minutes and made it difficult for customers to access their web services.
In 2018, Cloudflare was one of many companies that were accused of giving user data to Russian intelligence. The company did not respond to the threat, but it seems unlikely that they would have given up any user information. However, this could be a possible future problem as hackers have tried finding vulnerabilities and there have been multiple cases where someone got into Cloudflare’s systems.
Backup and BCP Plan
Cloudflare did not have a good enough backup plan to protect customer data. If customer data was secure, this outage could have been avoided. Cloudflare is also having issues with their cloud provider and needs to address them to prevent any future outages. This outage was caused by internal issues which required a lot of work on both the company’s part and their customers in order to be fully resolved.
Cloudflare has a network of servers which run DNS resolvers. These resolvers are what your computer uses to connect to the internet. A DNS server takes the text name or number of a website and maps it to an IP address. If a customer were to use Cloudflare’s DNS, they would have less threats of being hacked and fewer DDOS attacks. In order for this to work, the DNS resolver must be pointed at Cloudflare instead of its own provider in order for it to be secure.
In order to make it easier for customers to use the DNS resolvers, Cloudflare recommends that they point their DNS resolvers at Cloudflare itself. By doing this, the customer will not have downtime and attacks from DDOS attacks and hackers. However, if they are not pointed at Cloudflare, the security measures placed on their own provider will prevent many DDOS attacks.
Cloudflare has many ways that customers can configure their DNS resolvers, however for most non-technical people, this can be a confusing process. Cloudflare recommends that their customers use the Universal Resolver (Unresolver) which will automatically point towards Cloudflare and automatically protect against DDOS attacks. Cloudflare also provides a Dynamic DNS Service as well.
Cloudflare’s Universal Resolver is a DNS configuration that allows their customers to have secure DNS resolvers. It will automatically point at the Cloudflare yet still provide the same level of security that their customers would have with the dynamic DNS service. The Unresolver program allows all of Cloudflare’s one-click features as well as other features like automatic server maintenance through a new backup system, support for DNSSEC, and Automatic Zone Transfer (Zone Transfers).
An issue with Cloudflare’s Universal Resolver is that customers do not need to know what DNS Server they are using. Some users are worried that their DNS server is not secure and could cause issues with the services they normally use. The other issue with the Unresolver is that it does not allow for any changes to be made to it. If a customer needs to change something about their resolver, they would have to contact Cloudflare and then wait for a response in order to make the changes.
Cloudflare will not provide a Dynamic DNS service. If a customer needs to use dynamic DNS, they would have to do so themselves and hope that their DNS server is secure. Cloudflare offers a Dynamic DNS service as of 2013. Cloudflare also offers Automatic DDoS Mitigation (DDoS mitigation) which works with Creative Cloud’s (Adobe’s) services. However, the mitigation is not able to protect themselves against DDOS attacks like Cloudflare can.
Cloudflare’s Universal Resolver appears to be a good solution for customers who need to use a DNS resolver. However, there are some issues with it. It seems that this solution is better for technical users than non-technical users. Cloudflare recommends that their customers use the Universal Resolver in order to increase security and prevent DDOS attacks.
Cloudflare told News.com.au that the outage affected its customers’ ability to connect to data centres in Hong Kong, London and Paris. This meant that customers in Asia, Europe, and Australia were impacted by the outage as well. A total of 611 customers were affected by this outage which was caused by a “misdirected” customer query to Cloudflare’s name servers causing the servers to reboot causing issues with other servers and not allowing them to come back online.
Cloudflare has been a vital part of many companies’ security and performance since their inception in 2010. Since their first outage, Cloudflare has had multiple outages and the most recent one occurred on Feb 28, 2019. The latest outage lasted an hour and a half and was caused by internal issues on Cloudflare’s end.
Cloudflare told the AP that they spent an hour and a half trying to restore service after the internal issues causing the outage were fixed. Many people noticed this outage as services like Uber, Fitbit, Discord, and OKCupid were all down for about an hour and a half. Some of the services may not have been directly affected by the outage but many people took to twitter expressing their frustration with not being able to use these sites.
Cloudflare is an internet security company who claims to offer their customers the ability to “build a web application firewall in less than 10 minutes”. Their services are attempting to prevent DDOS attacks and other security issues. The main goal of Cloudflare is that they want to help websites become faster, safer, and more reliable without having to deal with the technical problems that go along with hosting their website.
Cloudflare is offering a “free trial” to all of their customers who want to try out Cloudflare’s services and see if they can be helpful to their site’s security. People who want to try out Cloudflare are asked a series of questions that help capture the details about their website and how they would like Cloudflare to work with them. These details are sent to a specialist at Cloudflare who will then determine if this service could be useful for the customer.