Comments, pictures, and links.
Postmortem for July 27 outage of the Manta service - Blog - Joyent
"Clients experienced very high latency, and ultimately received 500-level errors in response to about 22% of all types of requests, including PUT, GET, and DELETE for both objects and directories. At peak, the error rate approached 27% of all requests, and for most of the window the error rate varied between 19 and 23%."
Travis CI Status - Elevated wait times and timeouts for OSX builds
"We noticed an error pointing towards build VM boot timeouts at 23:21 UTC on the 20th. After discussion with our infrastructure provider, it was shown to us that our SAN (a NetApp appliance) was being overloaded due to a spike in disk operations per second."
CircleCI Status - DB performance issue
"The degradation in DB performance was a special kind of non-linear, going from "everything is fine" to "fully unresponsive" within 2 minutes. Symptoms included a long list of queued builds, and each query taking a massive amount of time to run, along side many queries timing out."
NYSE Blames Trading Outage on Software Upgrade | Traders Magazine Online News
"On Tuesday evening, the NYSE began the rollout of a software release in preparation for the July 11 industry test of the upcoming SIP timestamp requirement. As is standard NYSE practice, the initial release was deployed on one trading unit. As customers began connecting after 7am on Wednesday morning, there were communication issues between customer gateways and the trading unit with the new release. It was determined that the NYSE and NYSE MKT customer gateways were not loaded with the proper configuration compatible with the new release."