Get Inside Unbounce


True Zero Downtime HAProxy Reloads: An Alternative Approach

Recently, this blog post circulated around the tech Twittersphere:
It’s an excellent write-up from Yelp about how to get truly zero downtime from HAProxy, which I recommend reading.  It’ll work for a lot of scenarios, but unfortunately, it doesn’t work for the one we experience at Unbounce.

The Problem

Early this year, we rolled out HAProxy for SSL offload on our page servers, and encountered the same problem Yelp did.  Unbounce hosts landing pages for thousands of customer domains, so we need thousands of SSL certificates. HAProxy 1.5 supports multiple domains on a single IP using SNI (Server Name Indication).  However, HAProxy takes a significant amount of time and CPU to load certificates.

The first solution we investigated was the same one Yelp did: Dropping SYN packets during the reload.  During our testing we found that loading certificates can take long enough that this approach isn’t feasible.  Loading certificates only utilizes a single CPU core, at 100%.  Loading 1000 certs takes about 2 seconds, regardless of how many cores the machine has.  It scales linearly, so loading 10000 certificates takes about 20 seconds.  Obviously with the potential for 20+ second reloads, dropping SYN packets for that long is unfeasible.

Having a 20 second rather than 50 millisecond window to deal with pushed us in rather a different direction than Yelp went (though they did consider it).  We call what we developed the “IP tableflip”.

The Solution: IP Tableflip

Halloween at Unbounce

Halloween at Unbounce

It actually started as a joke.  Let’s run two HAProxy instances, and somehow switch between them.  Ha ha, funny, right?  But… maybe it would work? (Spoiler: it does.)

We already had an ELB (Elastic Load Balancer) in front of a number of page server instances, and now we’re running HAProxy (two instances of it) on those page server instances as well, purely for SSL offload.  We use Linux’s iptables to switch which HAProxy instance (blue or green) the ELB talks to, by changing which instance the “outside” ports are redirected to.

Our process for updating certificates looks something like this:

  1. Every 5 minutes, sync our local cert directory from the remote storage.
  2. If there weren’t any changes, job’s done.
  3. Reload the inactive HAProxy instance with “service haproxy-[colour] reload”
  4. Insert an iptables rule to direct traffic to the new HAProxy colour.
  5. Delete the iptables rule directing traffic to the old HAProxy colour.

Our automated load tests allowed us to easily graph the error-rate with a single HAProxy, compared to the same load test scenario with IP tableflip, reloading every 5 minutes:

Graph of total number of errors in the preceding minute

Total number of errors in the preceding minute

I’ve posted a snippet of our script for restarting HAProxy with zero downtime on gist:

Iptables uses a module called “conntrack” to track connections, so even after the rule directing traffic to the old colour is deleted, any packets that come in on that socket will continue to be routed to the old colour HAProxy.  Our connection keep-alive time is fairly short, so there’s no risk that these connections to the old HAProxy would live long enough to see the next reload, in 5 minutes.

All this does come at a cost, unsurprisingly.  Since implementing this, we’ve seen an approximately 10ms increase in latency in our load testing:

Graph of request latency

50k certs, 110k requests/min. Percentiles for responses in the preceding minute.

(The wave pattern of the graph is due to the HAProxy reload using 100% of one of the two cores.  With 50k certificates, reloads take around 100 seconds.)

We suspect it’s possible to reduce or eliminate the additional 10ms latency with tuning, but with a 95%ile latency of close to 55ms, it hasn’t been a priority.

This has been running in production since January, and we’re pleased with how well it’s working.  Many thanks to Yelp for sharing their findings in their blog post, which encouraged me to (finally) write this.

Derek Lewis
Senior Software Developer

  • Joseph Lynch

    Nice, I’m glad that we’re seeing some interesting alternative techniques surfacing! I think this is a really good lesson in how engineering solutions all have tradeoffs.

    In our case we have lots of fast reloads of HAProxy instances that are listening on hundreds of dynamic ports (one for each service, changing occasionally) and with clients and backends that sometimes have fairly long running connections. In your case you had a few much slower reloads (minutes wow) of HAProxy listening on a small number of ports and without long running connections. Your solution definitely makes a lot more sense for your problem, as our solution would clearly not work.

    I could probably write a whole blog post on the engineering tradeoffs of the possible solutions, it’s really interesting to me. For example, a benefit of a TC based solution is that there is no performance hit when HAProxy is running normally, and clients can connect to the IP that the process is actually listening on. I’ve had to debug a few “wait clients are connecting to but nothing is listening on … ohh iptables is routing traffic from to” type issues and while not inherently difficult it can be frustrating when everything is on fire at 4am. One thing I don’t like about the TC solution is that we risk blackholing traffic; clients might just sit there hopelessly retransmitting SYNs. I also don’t like that the solution as presented is dependent on being on the loopback interface, which makes it significantly less general.

    Tradeoffs, lots of tradeoffs … Either way, thanks for the writeup and open sourcing your method!

    • Derek Lewis

      “engineering solutions all have tradeoffs”
      So very true, and not often appreciated enough. :)
      The way we did it definitely has a lot of moving parts, and I don’t look forward to the first 4am fire that involves it, but it’s about the simplest solution we could think of that met our needs.

  • David Turner


    An interesting idea, and it should work when you’re talking to any server, not just a HAProxy.

    A couple of questions spring to mind. Firstly, we use an Nginx reverse proxy for TLS offload, and this can reload its config without apparently dropping anything while it’s doing so. However we’re not running at the scale you are yet, so I’m curious why you went for HAProxy instead in case our design will come back to bite us later!

    Secondly, given that you’re running behind an ELB, could you not just switch the instance ports over on the ELB rather than using iptables to set up another layer of indirection? I’m guessing perhaps it’s because it takes longer to fully switch over on an ELB?

    Thanks again for your write-up.

    • Derek Lewis

      Hi David.

      You’re right, this should work for any server that has the same problem of dropping packets at reload, or that you want to switch between at deployment time, for example.

      If I cast my mind way back to when we decided on HAProxy… I think ease of configuration was the biggest difference (that we knew of). With HAProxy, we can simply drop the certificate for a new domain in a directory, and at reload time, it’ll pick it up. With Nginx, it looked like a config block was needed for each domain, which would have complicated our solution, given we’re dealing with tens-of-thousands of certificates, with domains being added and removed constantly. Maybe it is possible to do something similar with Nginx, but we didn’t find a way at the time.

      As for toggling the ports on an ELB, that’s definitely an interesting idea. I believe this would however mean switching them for all the instances behind the ELB. That would likely mean some kind of centralized co-ordination of syncing the new certs to the instances, reloading HAProxy on all the instances, and then toggling the ports. One advantage of our solution is that there’s no centralized co-ordination.

      The good news is that I can say, after over a year, the solution is holding up well. Any “fires” we’ve had with it have been due to problems with the sync, nothing to do with the “ip table flip” mechanism.

  • Ali

    Great article! Thanks for sharing!

    Could you please tell us more about storing and syncing those certs between two (or more) servers?

    I tried to make use of NFS type of storage but I’ve ran into performance issues. NFS does not do good job with a lot of tiny files.

    • Derek Lewis

      Hi Ali,

      I’m happy to elaborate.

      When we did this project initially, we realized that for the certificate provisioning system to be able to say “Yes, I know that the certificate is deployed to all servers now, I can tell the customer it’s done” we would be dealing with a distributed consensus problem, with servers in multiple AWS regions. We decided on a very simple solution, with some known limitations, that has turned out quite well.

      Our certificates are stored (encrypted) in S3. Every server has an encrypted EBS volume for storing the certs locally. Every server has a cronjob that does an `aws s3 sync` with S3 every 5 minutes. This sync deletes any certs that are no longer in S3, and downloaded any certs that are in S3, but not locally. Additionally, any certs that have changed (like when they renew) are downloaded. The S3 sync command is quite parallelizable, for handling many small files.

      For the provisioning server to “know” that the certs have been synced to all servers, it simply waits 10 minutes. This gives time for any currently running sync (that hadn’t seen this file) to finish, and for one more to run, and completely finish. In the event that a server isn’t running it’s sync, or fails, it could mean the provisioning server thinks the certs are all deployed, when they’re not, but we have monitoring around that, and the overall process has been quite reliable. Reliable enough that we haven’t seen a need for more complicated solutions to the distributed consensus problem.

      I hope that answers your question.

  • Jamon Terrell

    I’m not sure of the entire ramifications, but I thought it’d be interesting to try to make something like this work using ipset instead of having to alter the actual iptables rules themselves. If you’re not familiar, ipset allows you to have dynamic lists of hosts/ports for use in iptables rules.

    I started with the same solution you describe here, with two running HAProxy instances, in my case listening on port 8000 and 8001. Ideally, I’d have a rule that listened on port 80 and forward it to port 8000 or 8001 depending on which HAProxy I want to be active. Unfortunately they can’t be used as the port forward target, but with some trickery you can achieve the same result. I added two ipsets “haproxy1” and “haproxy2” to contain lists of ports. I then added two rules, one to redirect all traffic to dst ports in “haproxy1” to port 8000, and one to redirect all traffic to dst ports in “haproxy2” to port 8001.

    ipset has a cool feature that lets you literally flip the tables, atomically, by running “ipset swap haproxy1 haproxy2” i switch which one is active, as its rule will no longer match inbound connections. Similar to your solution, conntrack still allows forwarding of already open connections.

    • Jamon Terrell

      if you’re interested:

      $ sudo iptables -t nat -L PREROUTING
      Chain PREROUTING (policy ACCEPT)
      target prot opt source destination
      DOCKER all — anywhere anywhere ADDRTYPE match dst-type LOCAL
      REDIRECT tcp — anywhere anywhere match-set haproxy2 dst redir ports 8001
      REDIRECT tcp — anywhere anywhere match-set haproxy1 dst redir ports 8000

      $ sudo ipset list
      Name: haproxy2
      Type: bitmap:port
      Revision: 2
      Header: range 80-80
      Size in memory: 92
      References: 1

      Name: haproxy1
      Type: bitmap:port
      Revision: 2
      Header: range 80-80
      Size in memory: 92
      References: 1