Keep a Small Surface – Webapp Isolation

Web application developers, in particular ones on a small team, are usually focused on the next feature or getting MVPs out, not security.

When security does come up, the focus is usually mitigating direct webapp attacks. We rely on Django or RoR‘s mechanisms for XSS/CSRF protection and password hashing. We turn to App Engine, Heroku, or traditional hosts for DDoS protection. And so on.

All of that is important and worth doing your due diligence on, but what’s the plan if/when your webapp gets entirely owned?

Here’s a way to mitigate the damages, something that is doable even when you are on a small team or working alone. There are nice side effects, too.

The webapp is at the top of the diagram, deployed in some manner such as AppEngine, a hosting company, an array of many appservers, etc. Whatever it is. The middle section represents message queues. The bottom is the “backend”, wherever the main application runs on the internet.

The boxes in grey at the top (Forms, Auth/Users, Cache, and Templates) are four things that are usually tied to the deployment/web-framework context.

That web-framework specific code calls on a portable API which has some functions implemented locally for anything time critical such as page generation (Colocated API impl).

And some functions that result in messages being sent asynchronously via the FORAPP queue for receipt by any number of application servers (Message based API impl).

If applicable, the main application responds to incoming messages via the FORWEB queue (again, this is happening outside of page generation, if it takes a few seconds this is not a problem). But the main application node(s) can also create messages for the webapp at any time (because of, say, administrator actions or scheduled events).

Yeah… big whoop. A lot of webapps make use of task queues, that is not all that interesting.

The key to this is the dotted lines: these represent the borders between three isolated security realms.

The webapp has write permissions to the FORAPP queue and read-only access to the FORWEB queue. Those are the only two ways it interacts with the rest of your system, it is configured with no other credentials besides what allows it to do this messaging. The main application has all of the credentials to interact with other services, email/SMS, backups, task farm, file archives, important long-term databases, etc.

This is like a bastion host model. If a bastion host is broken into, there is only a limited amount of damage the attacker can do before working on the next break-in attempt on the next layer. And at this point, intrusion detection or some odd application behavior has alerted you and/or your host (not guaranteed, but likely).

A fast way to achieve this setup is to use Amazon’s SQS with its rich permission schemes and the new multiple credential support. Each host gets entirely different credentials and Amazon’s operations and security teams sit between them.

If you keep this barrier in mind during construction, someone that breaks into your web application cannot:

Send email or SMS with contents different than templates allow
Send email or SMS to arbitrary addresses
Take all of your users’ email addresses
Take most of your source code and database (depends on the app)
Browse and/or delete your full file archives
Destroy your backups (if you don’t have write-only backups in place)
Kick off arbitrary jobs to your real task farm
Get access to business information and reports in your administrator section

This way of organizing things has some side effects, some of which may be more valuable than this safety net.

Advantages:

Because the two roles are very decoupled, the webapp can run in a very different pricing situation. For example, I have a site in progress where AppEngine fits the webapp plan quite well but the bulk of the backend-type functionality would never work there (SoftLayer is better, with its free incoming bandwidth).
Because the two roles are decoupled, the webapp is already working with the backend asynchronously. Depending on your application, the backend can disappear for maintenance for a few minutes (or fail and be replaced on the fly via cloud APIs) and it would not affect webapp users.
For many sites, the webapp’s database and code can be kept a thin, secondary player. This means there could be a chance to do things like have a “populate me” message that a brand new instance of the webapp can send to reach some initial state before serving requests. This could even form the basis of a disaster-recovery or upgrade strategy.
There is a well defined porting effort to switch hosting providers and get back on your feet if something awful happened (for example, Google decides to freeze your AppEngine account and you can’t get in touch with a human to clear up the issue. Would not be the first of this type of story…).
There is a natural barrier for outsourcing work, the mostly-trusted web developers you work with don’t need “keys to the kingdom” either. This issue can typically be addressed with mock API implementations, but not if you want to have someone working with you on the production machines. In this setup, they can.
Because both the webapp and application sides are message/task oriented, adding a “this is a task for myself” queue is virtually free. In my own work in this model, there is a ‘_FORAPP’ queue that the webapp can’t read or write to. The backend queues tasks for itself and all actions are automatically authorized and no message content is suspicious.

How about disadvantages.

Extra work involved, of course. If you don’t have a reusable setup for this already, starting a new application this way will involve overhead. If you’re not used to organizing code as mostly asynchronous tasks, it would mean even more.
The messages on the FORAPP queue (consumed by the “real” application) should be treated as semi-hostile input. If you have a greenfield project, this is not a big problem.
The cost of at least one extra application node/VM (takes a lot of load to become a serious issue but it is a non-zero cost). But as discussed in the advantages section, these extra nodes could put you in a better financial situation overall.
The cost of SQS/SNS (takes a lot of load to become a serious issue but it is a non-zero cost). If you don’t use SQS or something similarly battle-tested and hosted by a 24/7 operations team, I think you’re skimping in the wrong place.
The latency of SQS/SNS. If you need ~millisecond response times, you are likely tying the main application directly to the user experience in the webapp. If that is necessary, then maybe you may have a scheme that won’t work in this setup. Hosting your own RabbitMQ or what not in order to get ~ms messaging is a possibility but one of the main things you would get from SQS is the security barrier and high-availability… both of which are expensive propositions for small teams or single developers.

It’s neither a panacea or zero cost. But I think it has many attractive properties. Do you see any other advantages or disadvantages?