By Frank J. Ohlhorst
Microsoft experienced another Azure outage this week and had to turn to Twitter to notify users that they were aware of the problem.
In a tweet, time stamped at 7:12AM – 4 Sep 2018, Azure Support stated “Engineers are aware of an issue affecting resources in South Central U.S. For continued updates please visit the Azure status page at satus.azure.com.”
Engineers are aware of an issue affecting resources in South Central US. For continued updates please visit the Azure status page at https://t.co/Dw19fIoS5H
— Azure Support (@AzureSupport) September 4, 2018
That message surprised even hardened cloud veterans, such as SIOS Technology’s Dave Bermingham, who serves as the resiliency software provider’s technical evangelist and cloud MVP. In a blog, Bermingham recounted his experience with the Azure outage, where he said that he suspected the problem started an hour or two before Microsoft’s tweet. That suspicion was validated by tweets from customers asking @AzureSupport about problems with South Central U.S.
Adding insult to injury, Microsoft’s own recommended updates link, status.azure.com, was also unresponsive, according to Bermingham, making it nearly impossible to ascertain the breadth and depth of the outage, which he suspected was much larger than originally indicated by Microsoft. Upon further investigation, Bermingham discovered that services that relied on Azure Active Directory may have been impacted as well, and customers attempting to provision new subscriptions were encountering problems.
Fast-forward 24 hours, and it seems that some Azure users were still encountering difficulties as evidenced by Microsoft’s 11:00 UTC update.
It wasn’t until this afternoon that Redmond declared all systems normal.
As noted, the outage was caused by a severe weather event near one of Microsoft’s data centers, which in turn resulted in a structured power-down process. Bermingham notes that no one can blame Microsoft for a natural disaster such as a lightning strike; however, he says, it should serve as a reminder for partners that responsibility for customer up-time is not all on the cloud provider.
“If your only disaster-recovery plan is to call, tweet and email Microsoft until the issue is resolved, you just received a rude awakening,” he said. “It is up to you to ensure you have covered all the bases.”
|5 Disaster Disconnects: Survey Shows That Partners Must Educate Customers on BC/DR: We surveyed channel partners and IT pros about the state of business continuity and disaster recovery strategies, and the results show a definite need for channel partners to deliver education on the realities of BC/DR preparedness. Download the free report now.]|
Bermingham followed that statement with some advice on how MSPs might mitigate …