On growth, accountability, and capacity planning
There’s lots going on here at Unbounce as we grow. This month we welcome seven new Unbouncers, bringing our total head count to 135, and I’m sure Sascha and Charm are sweating just a little bit as they look at our head count projections for this year, and try to handicap when to bring on additional space.
The other thing that happened this week is that we somewhat narrowly avoided running out of disk space on our database servers. Thanks to our systems monitoring we caught it in time and performed a very fast upgrade (made much easier by all the systems automation we have in place). Still, we strive for a higher level of functioning; we caught this a little later than we should have to avoid unnecessary angst (I think I’m going to adopt that as a guiding principle; avoid unnecessary angst.)
Speaking of avoiding unnecessary angst, as our head count approaches Dunbar’s number we’ve started working on formalizing our various accountabilities and responsibilities. We recently completed a RACI exercise with our senior leadership team, and I landed the highly coveted accountability for “product availability and performance”.
Now, being accountable for something doesn’t mean you actually do all the work, it just means you ensure the work gets done. Our recent database storage expansion was a bit of a reminder for me that promoting effective capacity planning is one of the critical keys to achieving high availability.
Every organization does capacity planning a little differently, and as with all things, it’s important not to over-engineer any given process or practice, and really focus on clearly identifying the problem we’re trying to solve.
We never want a foreseeable lack of systems capacity to harm our business.
In the physical world you see capacity planning failures as long line-ups at stores, and folks walking away (or, say, running out of office space). In the digital world, capacity planning failures can result in either temporary loss of service (downtime because the service cannot function at all), or serious service degradation (slow response times).
Some companies have a central team that acts as enforcers of systems concerns. At Unbounce we’re focused on building autonomous product squads; cross-functional teams that can own, understand, and innovate within a particular value stream. That ownership extends to the running and operation of any system that gets built, and that means that product squads will end up being responsible for capacity planning. At a glance, that means we want product development squads to:
- Understand the workloads of the systems they build, and the growth patterns of those workloads
- Understand the constraints of these systems
- Identify cases where constraints will be exceeded, and take proactive steps necessary to provide service continuity
How could we approach this?
- Make sure each system (we have over a dozen key systems, and that’s growing all the time) has an explicit owner and explicitly stated availability requirements (how many nines do we require?)
- Reduce or eliminate cases where there are shared systems, so that a single team can reasonably own the responsibility for their delivery efforts
- Ask each product squad to prepare and maintain capacity plans for the systems they own
- Ensure product management understands the need for this work, and supports product squads in making space for it
- Incorporate regular capacity plan reviews or check-ins to see if we’ve identified any new constraints, if we’re in danger of exceeding any of our known constraints during the upcoming planning horizon, and identify remedial steps and an action plan
Our Infrastructure Operations Team are educators, not enforcers
I expect our teams will have lots of questions as we role this out. How much detail do they need? How far out should they plan? How do they identify and monitor system constraints? This is where we ask our Infrastructure Operations Team to step in and assist. I love how Mike Thorpe, our Infrastructure Operations Manager puts it when he says we practice “no toes ops”. The Infrastructure Operations Team isn’t there to step on toes, it’s there to support product delivery squads in their efforts with tools and techniques.
Does your own organization do capacity planning? I’d love to hear about your experiences (good or bad) in the comments!