Over 100 Outage and Security Postmortems

Thu 24 Jan 2013  |  Comments

There are always things to learn about distributed systems, especially how they can turn against you. Companies that publish postmortems are doing us a great favor, there is a jackpot of system design and operations knowledge to be gleaned from studying as many of these as you can get your hands on.

I've collected over a hundred outage and security related postmortems in this Pinboard feed.


Google Compute Engine and Predictable Performance

Sun 01 Jul 2012  |  Comments

I raised my eyebrows at one statement Google is making about Google Compute Engine:


Keep a Small Surface - Webapp Isolation

Thu 11 Nov 2010  |  Comments

Web application developers, in particular ones on a small team, are usually focused on the next feature or getting MVPs out, not security.

When security does come up, the focus is usually mitigating direct webapp attacks. We rely on Django or RoR's mechanisms for XSS/CSRF protection and password hashing. We turn to App Engine, Heroku, or traditional hosts for DDoS protection. And so on.

All of that is important and worth doing your due diligence on, but what's the plan if/when your webapp gets entirely owned?

Here's a way to mitigate the damages, something that is doable even when you are on a small team or working alone. There are nice side effects, too.



Wed 10 Nov 2010  |  Comments

I am no longer adding anything to gridvm.org, that was probably obvious at some point last year. A young child and increasing work responsibilities will do that to you. But a bigger issue with that site than lack of free time was that the topic did not feel right anymore.

One reason was that the term "cloud computing" solidified and it was hard to tell what "virtualization and grid computing" even meant. But a second, more important reason is that I wanted to write about more general things, too.

I am starting to get meaningful free time again, so here we are.