Comments, pictures, and links.
Blog lost and recovered in 30 minutes - Antirez weblog
"Yesterday I lost all my blog data in a rather funny way. When I installed this new blog engine, that is basically a Lamer News slightly modified to serve as a blog, I spinned a Redis instance manually with persistence *disabled* just to see if it was working and test it a bit. Since Redis very rarely crashes, guess what, after more than one year it was still running inside the screen session, and I totally forgot that it was running like that"
Twilio: Billing Incident Post-Mortem
"At 1:35 AM PDT on July 18, a loss of network connectivity caused all billing redis-slaves to simultaneously disconnect from the master. This caused all redis-slaves to reconnect and request full synchronization with the master at the same time. Receiving full sync requests from each redis-slave caused the master to suffer extreme load, resulting in performance degradation of the master and timeouts from redis-slaves to redis-master. By 2:39 AM PDT the host’s load became so extreme, services relying on redis-master began to fail. "
Non-blocking transactional atomicity | Peter Bailis
"tl;dr: You can perform non-blocking multi-object atomic reads and writes across arbitrary data partitions via some simple multi-versioning and by storing metadata regarding related items."
Counterfactual Thinking, Rules, and The Knight Capital Accident | Kitchen Soap
"You may believe this document can serve as a ‘post-mortem’ narrative. It cannot, and should not." r.e. the Knight Capital SEC document
ORDER INSTITUTING ADMINISTRATIVE AND CEASE-AND-DESIST PROCEEDINGS, PURSUANT TO SECTIONS 15(b) AND 21C OF THE SECURITIES EXCHANGE ACT OF 1934, MAKING FINDINGS, AND IMPOSING REMEDIAL SANCTIONS AND A CEASE-AND-DESIST ORDER
On August 1, 2012, Knight Capital Americas LLC (“Knight”) experienced a significant error in the operation of its automated routing system for equity orders, known as SMARS.