Testing Drush concurrency

I had a question the other day on twitter as to if it’s possible to run drush sites aliases in parallel. Apparently I was asking the question the wrong way as @greg_1_anderson (Drush co-maintainer) told me that Drush supported –concurrent and apparently has for several years. In fact, its supported it so long and is so under used that they might remove it from Drush as a result :).

So here’s my post doing some testing of running drush with concurrency to illustrate how and why you might want to use it. Criteria for using this:

  • You have multiple site aliases you want to run the same command against
  • This could be a multi-site or running against multiple remotes

Multisite is probably the best use-case for improving performance because you might just say “Use ansible” or some other provisioner to hit things in parallel. I live blogged via this issue in the ELMSLN issue queue while playing with this but I’m consolidating the findings here.

The server specs that these tests were run on: 2 CPU, 6 gig RHEL 6 based server. To test these things, I ran drush @courses-all rr –y which in our setup will run registry rebuild against 58 sites in a multi-site configuration. RR is provided by the registry rebuild drush plugin. While drush is a concurrency=1 default, I set it on purpose just to make sure.

time drush rr @coruses-all –concurrency=1 –strict=0 –v –y

Here’s what is produced if we did that command:

real 13m42.743s

Registry Rebuild is an aggressive command, so we kind of expect to see something like this. We’re at almost 14 minutes here. This is what kicked off my research, how to optimize this command

I started with a concurrency value of 10, that proved to be WAY too high. It maxed out my CPU but I got MUCH better results:

real 5m11.379s

It brought things down to 5 minutes, but 5 minutes of CPU max vs 14 minutes of much less so is a good trade. From there, I started trailing off from 10 to see if we could find that threshold. The visual below is runs at 10, 1, 6 and 7 concurrency. As you can see, 10 6 and 7 look very similar; we slam the CPU and we max out. The times for 6 and 7 were around 4:51. A huge gain but we’re still rocking 100% CPU which isn’t a good thing.

visualizing CPU usage

So we kept moving down. This graph shows 5 vs 3 vs 2 for the CPU bumps.

5 vs 3 vs 2

Three and two are very promising because while we hit 100% CPU, it’s for a very short period of time and in two concurrency we don’t hit 100, we just get close (90ish). The most impressive thing when hitting only 2 or 3 concurrent calls is that we finish the job substantially faster then 1 at a time (and not much different from 10).

Concurrency 3 (C3)

real 4m56.193s

Concurrency 2 (C2)

real 5m36.804s

Conclusions

While C3 is 40 seconds faster then C2 it might be worth just using two threads at a time because it’s not really fast enough to justify. C2 is almost 60% faster then the baseline. As a result, I adjusted our upgrade script to take advantage of the concurrency value so that we now do two threads at a time. I still need to run this in non-vagrant environments but the calls against more commands then just registry rebuild should provide an impressive gain.