Comments, pictures, and links.
Hosted Enterprise Chef Search API Downtime | Chef Blog
"We had two separate outages on Tuesday, and I’ll cover each one in turn as they have different root causes."
Google Cloud Platform API issues on April 8th, 2014 - Google Groups
GitHub: Denial of Service Attacks
"After some investigation, we discovered that we were seeing several thousand HTTP requests per second distributed across thousands of IP addresses for a crafted URL. These requests were being sent to the non-SSL HTTP port and were then being redirected to HTTPS, which was consuming capacity in our load balancers and in our application tier. Unfortunately, we did not have a pre-configured way to block these requests and it took us a while to deploy a change to block them."
Diary of an outage | FastMail Weblog
"right at the top there was a complex series of folder renames within a single replication event. This is not a particularly unusual operation. This time it tripped a known, rare bug in the way renames are replicated that caused the replication process (sync_client) to abort. The Cyrus master daemon starts it up again, but then it hits the same point and dies again, over and over. Replication stops."
Incident Report: Amsterdam Data Center DNS Failure - DNSimple Blog
"Multiple factors appear to have been involved: A combination of a traffic spike of unknown origin and of questionable purpose, combined with a bottleneck in the DNSimple name server software, and what appears to be upstream resolution blocking. Additionally, our desire to contain the incident to a single data center may have prolonged the outage."
Google App Engine issues on February 20, 2014 - Google Groups
"The root cause of the outage was four overlapping network element failures, due to reasons varying from fiber cut to optical equipment device failure to network router failure. These failures were not related, and would statistically be expected to overlap to the degree observed only once every several years. "
Pivotal Web Services Status - API Outage
"We experienced a confirmed AWS s3 issue creating new buckets. The effect of this event was magnified by our attempts recreate an existing bucket. [...] We deployed a production fix to remove the dependency on recreating buckets."