I recently resigned to launch an independent software business 😳. Instead of seeking investors, I'm attempting to bootstrap products for the first few years. Going out on my own has been gnawing at me for over 10 years but it was still a big decision to leave such a good job (more on this here).
Previously, I was at AWS for four years working on PaaS platforms, HPC, and high speed networking. For a large portion of the time I was there, I led the organization responsible for Elastic Beanstalk's services, APIs, and core features. This is a big service deployed to 20+ independent regions with a massive customer base. I learned so much about large-scale distributed systems, operational excellence, cloud orchestration, management, and business. As I get some distance, I want to write about many of those topics here.
Before AWS, I was at Enstratius, a multi-cloud orchestration startup acquired by Dell Software in 2013. In the end, I was Director of Engineering for the whole product, leading a team of about 50 people. That phase was a good challenge and education but my favorite part of this job was when we were still a startup (and the first years after acquisition when that energy was strong despite being part of a mega-company). In my long-term business plans, I would love to recreate some aspects of what we had going there.
Before Enstratius, I was lead developer of Nimbus for seven years, a pioneering effort in cloud computing (it predates EC2). This was a foundational, exciting time. It was endlessly interesting. I made life-long friends. My manager was one of my few, significant mentors, influencing the rest of my career. The U.S. national lab environment is pretty unique: there's a blend of academia and practical systems implementation you are unlikely to find anywhere else. While we published original research and attended conferences, we also created new, impactful things that were used in production. We didn't have a lot of funding compared to the commercial world but we still pulled it all off somehow.
I'm posting here about creating a software business from scratch and what I'm learning. My products are built predominantly in Rust on top of cloud providers, so I'll write about those topics. Technology wise, I'm most interested in distributed systems, deployment safety, high availability, security, testing, debugging tools, operations tools, and databases. I'll also get into the problems that my products are tackling: web and cloud computing issues that small software businesses and solo developers are facing.
Follow me on Twitter or through my mailing list which gets exclusive posts:
Managed multiple teams in the AWS HPC organization for the EFA and ParallelCluster product lines. EFA (Elastic Fabric Adapter) is a high-speed, low-latency, kernel-bypass interconnect that allows AWS customers to scale applications with demanding communication needs to extreme heights in a cost-effective way. AWS ParallelCluster is an open source cluster management system that helps you deploy and manage HPC clusters on AWS; you can deploy complicated stacks easily and repeatedly, from prototyping to production with auto-scaling.
Owned AWS Elastic Beanstalk's core features, APIs, subsystems, and other internal services. I led multiple teams of developers and managers to deliver the fastest and simplest way to deploy web applications on AWS. I was involved in all aspects of the service, including: technical direction, planning, product management, customer support, public speaking, operations, hiring, people management, and working across AWS teams. We delivered Elastic Beanstalk's core product, highly available APIs, orchestration engine, service integrations, and application-centric health monitoring across more than 20 AWS regions. There were also periods of time where I ran the console and platform teams.
Led the backend design and development of the Dell Cloud Manager (formerly Enstratius) product line. I owned the overall product architecture, weekly releases, and most ancillary systems. Managed a team of 20 remote distributed systems developers and made sure they were engaged, growing, and effective. Oversaw multiple teams (each with their own lead/manager) as well as individual contributors directly. Worked closely with product management and business leaders.
Software design and development, backend systems. Started as backend design and development lead in November. Enstratius is a unified solution for public/private cloud automation, governance, security, and cross-cloud management. This was pre-acquisition. Enstratius was a relatively small startup during this period.
Cloud based backend systems in Scala.
Continued expansion of Nimbus IaaS, Nimbus Platform, and related cloud computing research. In 2008, we introduced novel configuration management infrastructure to deploy "one click clusters" on top of IaaS clouds. In 2009, the Nimbus development team expanded and created PaaS-related technologies for High Availability and Elastic Computing on top of IaaS cloud offerings. Led the day-to-day development efforts of a four person team. Contributed to large distributed systems architectures and designed/delivered many smaller ones. Senior Developer for the Ocean Observatory Initiative's Common Execution Infrastructure subsystem. Administered clusters and worked heavily with more formal operations and integration teams. Mentored for Google Summer of Code and worked with open-source developers and users around the world. There were some part-time and zero-time months in the first half of 2009.
Developed and co-designed initial Nimbus IaaS implementation, an open source system for leasing virtual machines like Amazon's EC2 service (it predates EC2). Besides development, this involved working with very forward-looking HPC/science users and resulted in many CS research publications on cloud and grid computing. Developed infrastructure to incorporate distributed security technologies (Shibboleth, SAML) into the Globus Toolkit.
Research and prototype work on early cloud computing technologies based on the Globus distributed computing framework and the Xen hypervisor.
- MS in Computer Science. University of Chicago, Chicago, IL.
- BS in Philosophy. St. John's College, Annapolis, MD.
- Managing Appliance Launches in Infrastructure Clouds, Bresnahan, J., Freeman, T., LaBissoniere, D., Keahey, K. Teragrid 2011. Salt Lake City, UT. July 2011.
- Cumulus: An Open Source Storage Cloud for Science, Bresnahan, J., LaBissoniere, D., Freeman, T., Keahey, K. ScienceCloud 2011, San Jose, CA. June 2011.
- Improving Utilization of Infrastructure Clouds, Marshall, P., Keahey K., Freeman, T. IEEE/ACM CCGrid 2011, Newport Beach, CA. May 2011.
- Elastic Site: Using Clouds to Elastically Extend Site Resources, Marshall, P., Keahey K., Freeman, T. IEEE/ACM CCGrid 2010, Melbourne, Australia. May 2010.
- Dynamic virtual AliEn Grid sites on Nimbus with CernVM, Harutyunyan, A., Buncic, P., Freeman, T., Keahey, K., Journal of Physics Conf. Series 219. 2010.
- Contextualization: Providing One-Click Virtual Clusters, Keahey, K., T. Freeman. eScience 2008, Indianapolis, IN. December 2008.
- On the Use of Cloud Computing for Scientific Workflows, Hoffa, C., G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, J. Good. SWBES 2008, Indianapolis, IN. December 2008.
- Science Clouds: Early Experiences in Cloud Computing for Scientific Applications, Keahey K., T. Freeman. Cloud Computing and Its Applications 2008 (CCA-08), Chicago, IL. October 2008.
- Flying Low: Simple Leases with Workspace Pilot, Freeman, T., K. Keahey. Euro-Par 08, Las Palmas de Gran Canaria, Spain. August 2008
- Enabling distributed petascale science, Baranovski, A. et al. Journal of Physics: Conference Series, 78. 2007
- Virtual Workspaces for Scientific Applications, Keahey, K., T. Freeman, J. Lauret, D. Olson. SciDAC 2007 Conference, Boston, MA. June 2007
- Enabling Cost-Effective Resource Leases with Virtual Machines, Sotomayor, B., K. Keahey, I. Foster, T. Freeman. HPDC 2007 Hot Topics session
- A Scalable Approach To Deploying And Managing Appliances, Bradshaw, R., N. Desai, T. Freeman, K. Keahey. TeraGrid 2007, Madison, WI. June 2007
- Division of Labor: Tools for Growth and Scalability of Grids, Freeman, T., K. Keahey, I. Foster, A. Rana, B. Sotomayor, F. Wuerthwein. ICSOC 06, Chicago, IL. December 2006
- A Multipolicy Authorization Framework for Grid Security, Lang, B., Ian Foster, Frank Siebenlist, Rachana Ananthakrishnan, Tim Freeman. Accepted by the IEEE NCA06 Workshop on Adaptive Grid Computing (to appear in Proc. Fifth IEEE Symposium on Network Computing and Application), Cambridge, USA, July 24-26, 2006.
- Virtual Clusters for Grid Communities, Foster, I., T. Freeman, K. Keahey, D. Scheftner, B. Sotomayor, X. Zhang. CCGRID 2006, Singapore, May 2006.
- Identity Federation and Attribute-based Authorization through the Globus Toolkit, Shibboleth, Gridshib, and MyProxy, Tom Barton, Jim Basney, Tim Freeman, Tom Scavo, Frank Siebenlist, Von Welch, Rachana Ananthakrishnan, Bill Baker, Monte Goode, and Kate Keahey. 5th Annual PKI R&D Workshop, April 2006.
- An Edge Services Framework (ESF) for EGEE, LCG, and OSG, Rana, A., F. Wuerthwein, K. Keahey, T. Freeman, et. al. Computing in High Energy and Nuclear Physics 2006 (CHEP), February 2006, T.I.F.R. Mumbai, India
- Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid, Keahey, K., I. Foster, T. Freeman, and X. Zhang. Scientific Programming Journal, vol 13, No. 4, 2005, Special Issue: Dynamic Grids and Worldwide Computing, pp. 265-276
- Virtual Workspaces in the Grid, Keahey, K., I. Foster, T. Freeman, X. Zhang, D. Galron. Europar 2005, Lisbon, Portugal, September, 2005.
- Authorization and Account Management in the Open Science Grid, Lorch, M. Kafura, D. Fisk, I. Keahey, K. Carcassi, G. Freeman, T. Peremutov, T. Rana, A.S. 6th IEEE/ACM International Workshop on Grid Computing, 2005.