Ben Scofield

me. still on a blog.

Postmortems: Trust and Confidence

I am a strong believer in the value of a good postmortem after a customer-affecting incident — and after internal incidents, and after unusual projects and efforts, and pretty much all the time. I really want to talk about one of the purposes of a postmortem, however: building trust and confidence.

Trust and confidence

I’ve been thinking about trust and confidence in the context of software development since early November:

Much of my thinking has been around the distinction between the two, and what I’ve come up with is this: trust is given, confidence is earned.

Say you hire a new developer for your team. Assuming you’ve never worked with her before, you don’t have real evidence that she can do the job, but you trust that she can. Over time, as she solves problems and demonstrates her expertise, you acquire confidence that she can do the job.

Similarly, when you start working with a new code base, you trust that the tests validate the code’s behavior. As you make changes to the code and find them caught by the tests, you gain confidence in the suite as a whole — or, conversely, when you find that the tests don’t cover some complicated functionality, you lose some of the trust or confidence that you had for the suite.

Postmortems

I could go on and on with examples of trust and confidence in people, in companies, in tests, in code, and more, but I want to shift back to postmortems now. The way I see it, when you have an incident, you’ve damaged the trust and confidence that someone has in your system by some number of people. When you lose all of that scarce resource, you fail — so a prime goal coming out of an incident is to rebuild as much of the confidence as you can.

(I’m being very careful with my choice of “trust” and “confidence” here. Incidents damage both, but the damage is worse to confidence; the incident itself is evidence against your system working properly. The goal afterwards, then, is to build back as much confidence as you can so that the normal cycle of gaining both can restart.)

So, let’s think about some postmorteming. One thing that should be clear is that the way you rebuild trust depends on both what happened and on who was affected. For a security breach at Mint, for instance, you’d need to rebuild trust in your security measures, so you’d talk about external reviews, fraud detection processes, and the like — but low-level details won’t help less technical customers that much.

Contrast that with a performance issue on an internal service at Heroku; with deeply technical coworkers, you need to produce a deeply technical review of the incident to have any hope of rebuilding confidence.

So what?

I’ve seen a lot of postmortems that miss the mark, and all too often that’s because they’re focused too much on the wrong thing. Some get bogged down in minute-by-minute recreations of the incident; others skimp on remediation. Keeping the goal of rebuilding confidence in mind, however, shows why those two paths are mistaken.

Historical transcripts of incidents are great, but if they get too fine-grained it’s very easy to lose the larger sense of progress towards a resolution. The timeline of a postmortem should include blind alleys and golden paths, because knowing how the solution was achieved is vital to rebuilding confidence in the company or team — without it, you’re saying “just trust us” at the worst possible time.

Similarly, skipping remediation drops the ball on (collective, not individual) accountability. Publicly committing to (and following through on) steps to reduce the chance of recurrence and minimize the impact of future events is the least you can do to help those affected by a problem.

So: if you’re writing a postmortem, please remember this: your system has lost a portion of the confidence that people have in it. Take this opportunity to repair that as best you can.

Two Problems With Antifragility

I’m no fan of Nassim Taleb (of black swan, antifragility, and needing-an-editor fame), but I forced myself to finish his last book Antifragile in spite of all the problems I had with his presentation and writing. If you haven’t heard of it, the gist is: fragile systems suffer harm when stressed, but there are systems that instead benefit when stressed. Taleb calls these systems “antifragile.”

Near as I can tell, Taleb thinks his identification of this concept is hugely revolutionary, and he goes out of his way to name all of his favorite things antifragile and all of his least favorite things fragile. Leaving aside his questionable expertise in many areas, I want to talk about two related problems with the general phenomenon.

First: antifragile systems may improve when stressed, but this is often at the cost of increasing fragility to other types of stress. Take a muscle: subject it to a load (stress it) and it gets stronger, easy peasy. What you don’t see, however, is that it strengthens in a way that makes it more vulnerable to other stresses — it becomes less flexible, for instance. Sure, you can override this by stretching, but that’s a different, opposed antifragile process.

Or take an autoscaling application hosted on EC2. When traffic increases (stress), the system spins up more instances (becoming better able to handle the traffic). The system is stronger with respect to traffic — but it’s weaker with respect to, say, cash flow stresses. It costs more, so financial stresses will produce more harm because of the strengthened ability to handle traffic.

Second: the very process of antifragility generates fragility. By definition, antifragile systems gain more benefit than harm when stressed, and fragile systems do the opposite. But: in almost all cases, there’s a limit to the strength a system can display. As antifragile systems improve and get closer to this limit, they gain less and less of a benefit from the same stresses. More severe stresses are required to produce progressively smaller improvements. A direct consequence of this is that the systems are, in fact, becoming more fragile.

Thought experiment time! Imagine a system that, via its antifragile properties, has reached the peak of its strength. Further stresses have no beneficial effect (since we’re assuming that there’s a maximum cap on improvement). There’s nowhere to go but down — stresses can’t help, so they can only hurt, so the system has strengthened itself into (Taleb’s definition of) fragility.

All of this is just to point out that the concept of antifragility is a bit more complicated than it might appear at first blush. I do think it’s worth considering as we build systems, but it should not be treated as a pure good without further thought.

My 2013 in Reading

I’ve been using Goodreads reliably this year to track my reading (well, the book and graphic novel portions of my reading). Over the last month, I’ve been exploring that data – mostly for my own personal enjoyment, but also to see if there were any useful, gleanable insights into how I read. We’re not quite out of 2013 yet, but I thought it might be interesting to record what I’ve found so far in one place. Warning: this is navel-gazing of an extreme form.

The reading challenge

Goodreads allows readers to create a reading challenge every year – a public commitment to read a certain number of books. In 2012, I read just over 100 books, so I thought I’d push myself in 2013 and committed to an average of 3 per week, for a total of 156.

I finished the challenge on December 14th; I was on pace for much of the first half of the year, but gradually fell behind until a prolific stretch in November and December.

All in all, I think I overdid it – I’m a fan of committing to reading, but my choice to go big ended up siphoning off some of the joy I get from reading. I pressured myself to power through some books when I’d have preferred to take a bit more time.

When I read

I mentioned falling behind pace at one point. As it turns out, that slowdown started when I left LivingSocial in early April. Weirdly, I stopped reading (or at least finishing) when I wasn’t working – it happened again after Wantful shut down. This chart shows the number of books finished by day, with each month getting its own row (January’s on top).

Completed books by day

Reads-in-progress

I’ve always known that I have a bit of a … problem focusing on just one book at a time. Now, I have data!

Reading swimlanes

(The colors correspond to the chart in “What I read”, below)

March is interesting, as I worked down my in-progress stack to 2 for the first time in as long as I can remember.

This chart is a bit misleading; it only shows books that I finished, and (at it happens) I started one in late March that I’m still chugging through. With that in mind, I maxed out at 10 books at once in early August (just after I started at Wantful).

For reference, I’ve currently got three books in-progress. Two of ‘em are long-timers, so who knows when I’ll finish them.

“I just like liking things”

According to the data, I’ve enjoyed the vast majority of what I read this year. Goodreads ratings are out of 5 – this chart shows the percentage of books (overall and of each type) that got a given rating.

Overall

2 3 4 5

Comics

3 4 5

Tech

2 3 4 5

Non-fiction

2 3 4 5

Fiction

2 3 4 5


What I read

This chart shows books completed by type – fiction, non-fiction, comics, and technical/business (top to bottom)

Completed books by genre

I’ve abandoned tech books in the last third of the year, and for whatever reason avoided comics until I left LivingSocial, but otherwise had both fiction and non-fiction in-progress through the whole year.

How I read

93% of my reading was done digitally – on a Kindle or iPad. Given that I have over a hundred physical books on my to-be-read shelves at home (relics from a pre-ebook era), I need to shift that in 2014.

Closing thoughts

I think the reading data on its own is fun (for me, and maybe for you) to look at, but any real value is dependent on correlating it with other information – things happening outside of the books. As just one example: my slowdown between jobs is interesting, especially since it happened both times. I’ve got some analyses in the work that’ll hopefully let me start looking at those sorts of interactions, but it’ll take awhile.

On reading, specifically: I’d love to have more granular data about things like reading speed, re-reads, and whatnot, but I don’t think that’s likely in the near future (ebooks make it possible to gather that data reliably, but mine is locked away inside Amazon and I don’t foresee them releasing it anytime soon).

Beyond all that: I really love to read. I’m not going to commit to a huge goal in 2014, but be sure that I’ll keep track of what I do read.

Problems With Self-organization

I just listened to the November 21st episode of the Freakonomics podcast , on what economists call ”spontaneous order .” It’s an interesting phenomenon – essentially, it’s self-organization. Daniel Klein (an economics professor at George Mason) describes it through the metaphor of a skating rink : imagine a hundred people all skating at once. There’s no dictatorial authority telling one person to speed up or a couple to move to the outside – there are minimal imposed rules (just the skating direction, say), and beyond that pure self-interest keeps the whole thing from devolving into chaos.

Klein explains this by appeal to mutuality; I have an interest in not colliding with another skater, but that skater has the same interest in not colliding with me. That shared interest – when expanded to the full set of skaters – explains the order that arises spontaneously. Slower, less experienced skaters move to the outside, etc.

It’s a great phenomenon to study, and if it were universally generalizable it’d be a great argument for libertarianism, or as support to the manager-free cultures some companies have adopted, or probably a great many other situations. I’m not convinced about how far we can take it, though, for two reasons.

Metaphor troubles

First, the skaters have very simple goals – mostly just to stay upright and to have fun. Everyone understands that, and it’s incredibly easy to see how your actions directly contribute to the achievement of those goals. Given the relatively small number of skaters, it’s even easy to see how others’ actions contribute.

If we complicate the goals, dramatically increase the number of people involved, or add steps that turn actions’ direct contributions into massively-indirected influences, this all becomes much less convincing. Nation-states meet all three such criteria – can you even identify your country’s goals? Certainly not in a blog post, and probably not at all without grossly oversimplifying and leaving out important bits. The US has nearly 314 million residents, which is a far cry from a hundred skaters. And beyond all that: how can any (not-the-President) citizen figure out the impact of his or her actions on the goals of the nation? There are just too many steps between here and there, all with innumerable external factors interfering. We’re talking Laplacian-level calculations to self-organize at the national level.

So what about companies? Small-to-mid-size businesses may avoid the population size issue, especially since we’re not talking about a basically-random sample of people. Similarly, companies of this size may be focused enough to make their goals comprehensible, and with (sometimes significant) effort may be able to show how a single employee’s actions affect those goals. In that situation, I think self-organization may be a workable approach – but it’s going to fail spectacularly if those conditions cease.

Save the ref

Second… well, to address the second problem I need to go back to the podcast briefly. Besides the interview with Klein, Steven Dubner (the host) also talked to athletes. It turns out that almost all games of Ultimate (Frisbee) are self-policed, with no referees. As Dubner talked to devotees of other sports (basketball and soccer, for instance), he asked them how orderly their games might be without referees.

In all of these talks, however, I think he missed a few more interesting opportunities. Take the plate umpire in baseball – (almost always) he is necessary because he’s watching things that no player is able to watch. The catcher and batter can’t afford to look at the plate to see if the 96mph fastball falls off the corner, so the only option is to install another set of eyes. Spontaneous order can only succeed when the individuals are able to gather all necessary information.

The analogue in a company would be a project manager – someone who looks at the whole picture. This doesn’t mean that PMs have to end up as dictators, however. As Rands discusses, good PMs see the things that the boots on the ground don’t … and then communicate those insights back to the rest of the team.

You can still be manager-free like this. I’d argue that this no longer quite counts as self-organization, however, as the need for a particular role is mandated by the nature of the effort.

All in all, I think spontaneous order / self-organization has a lot of promise (as the success of famously self-organized companies like Gore, Valve, and GitHub imply), but I think that it’s far from a universally-applicable strategy. It pays to understand its limits, just like any approach.

On Valuing People

Ernie and I are going through many of the same experiences right now, so his post on how interviews are broken resonated with me. In particular, I wanted to expand on his “I am special” point.

I don’t know that I’m particularly special; that carries a slight note of “better-than” that I’m uncomfortable with (though I don’t see that in the context of Ernie’s post). I am, however, convinced that everyone is different. We all bring different strengths, weaknesses, preferences, biases, intentions, and experiences to whatever we do. Any process that ignores that fundamental truth – whether it’s a hiring process, a date, a debate in a comment thread, or whatever – is broken.

Caution: philosophy ahead

Immanuel Kant’s moral philosophy system was built on the categorical imperative; basically, it’s a framework for evaluating actions. The imperative has a few different formulations, but the one I’m interested in here is the second: “Act in such a way as to treat humanity, whether in your own person or in that of anyone else, always as an end and never merely as a means.” I’m not a Kantian, but I’ve always found this formulation compelling. It is wrong to treat other people simply as a means to some end or goal.

It’s not always easy to remember to treat people as people, instead of as extras (or worse, props) in a story in which you’re the protagonist. It’s even harder when you’re dealing with a lot of people all at once, like when you’re hiring someone. Here’s the thing, though: if you can keep in mind the individuality of the people with whom you’re dealing, you’re going to be much more successful in the long run.

An example

Some time ago, I interviewed with a large company. I got dropped into a fairly standard interview process, which means they wanted me on-site for an all-day interview. They set me up with a travel agent who booked the trip, and I was off – to a pretty terrible experience.

While in the air for the first leg of my flight, the second leg (which was direct to my final destination) was cancelled. Upon turning my phone on after landing and finding out, I freaked out, found the rebooking center, and … got myself booked onto another two flights that would have me in the air until midnight. A few more delays later, and I ended up checking into my hotel at 1:30am, a little less than 12 hours before the interview.

After a few hours of sleep and catching up with an old friend who happened to live in the city, I walked over to BigCo and proceeded to talk to a series of people over the course of 4 hours. I designed systems on the whiteboard, instrumented existing processes to ensure performance, talked about how I’d tackle various sorts of problems – and was asked very little about what I wanted to do. It seemed very clear to me that I was one of a large number of (from BigCo’s perspective) interchangeable candidates that might or might not fit into the role they were trying to fill. In other words, they were treating me as a means to solving their problem, not as a person with ends of my own.

The trip home was actually worse than the trip out; the second leg of my flight was delayed repeatedly and eventually cancelled, but by that time I had given up hope of salvaging the trip, rented a car, and just drove home (luckily, the first leg left me in Charlotte, so it was only a two hour drive). And… I never submitted any of my expenses for reimbursement. I just didn’t want to deal with BigCo at all anymore.

So, what could BigCo have done differently? The interview should have been a conversation; an exploration of their goals and resources, my goals and skills, and how those might fit together to be mutually beneficial. I won’t lie: that is hard, especially at a really big company where no one person knows all the possibilities. It gets even harder with less experienced candidates, since their goals are often more vague. Given that we’re talking about a lot of money (in compensation, and even more in impact), however, it doesn’t seem to make sense to skimp on this process.

And the travel. Honestly, that really got to me. If you’re going to book a candidate’s travel, then you’re taking responsibility for it. I don’t expect you to build a weather machine, but it’s not that expensive to have your travel agent proactively monitoring your candidates’ travel and fix problems before the candidate knows they happened. Travel is your first impression, and in many cases takes longer than the interview itself. (I was in the BigCo office for a total of 5 hours; I had layovers longer than that on both sides of the visit.) In addition to knowing the candidate’s resume, your interviewers should know how the trip went and express sympathy for whatever obstacles arose.

Very little of this is easy, at the company level or the personal level. It certainly entails some additional effort and expense. Compared to the cost of losing great candidates, however, it seems like a worthwhile expense even without considering that it’s the ethical way to behave.

Wrapping up

One final word: I’ve often heard interviewers (and myself) use a candidate’s lack of excitement about the company or job as a mark against them – “she just wants a paycheck.” Hopefully, this (overlong) post has convinced you that turnabout is fair play, and that “they just want a body” is equally bad.

On Contact Management

I have a problem: every time I open up an application that has a contact list, I find something confusing or frustrating. With Skype, it’s people I talked to once, years ago, and no longer remember who they are. With Messages, it’s duplicate accounts that I know map to the same person. With Google Contacts, it’s outdated email addresses that I never got around to changing. With Contacts.app, it’s crazy merge artifacts. With … I could go on.

This could be better, and I’m going to tell you how.

Imagine there was an application; call it Spheres. In this application, you keep track of your own contact information – email addresses, phone numbers, links to Twitter, Facebook, LinkedIn, GitHub, and whatever else you like. (Yes, I know sites like this exist now. Bear with me.)

Once your information is in, you can package bits of it together into identities. You might have a public identity that includes your name, blog URL, and Twitter handle. A coder identity might build on that by adding your PGP key and your GitHub username. A work identity might start with name, work email address, work phone number, and your LinkedIn profile page. Some of these identities could be predefined, but you’d always be able to modify or remove existing identities and add new ones.

Now, you meet someone. Through some mechanism, you communicate your Spheres username to them – maybe you bump phones or whatever. Here’s where it gets interesting: your new acquaintance is now added to a sphere. Maybe it’s a default sphere (like “public” or “acquaintance”), or maybe you choose it before connecting with them. Regardless, that sphere is tied to one or more of the identities that you’ve already defined, and the acquaintance can immediately go in and see any of your information exposed in those identities, without having to add you to their address book, enter a phone number, or keep track of a business card.

You can then go in that night and tweak the spheres (and thus the identities, and thus the information) to which they belong – you could even override access on an individual level, if you wanted to grant your phone number or something.

When your acquaintance wants to call you, her phone makes an API call to Spheres asking if she can access your phone number; if she can, Spheres returns the number and the call goes through – similarly, if she wants to email, IM, send you snail mail, etc.

How does all of this make things better? From the sharer’s point of view:

  1. I don’t have to notify people when my address changes – I change it in one place and anyone who looks for it gets the correct info.
  2. I’ve got fine-grained control over who can see what about me, but can easily make large-scale changes by modifying identities and spheres.
  3. Pseudonymity can be managed right next to IRL identity.

And from the point of view of your contact:

  1. I don’t have to worry about outdated information; people update their own info and I have access to it immediately.
  2. I don’t have to store my contacts’ information locally – I can always just grab it when I need to (though there could be local caching if connectivity is spotty)
  3. Centralizing contact information and making sure all contact requests hit that central store means that I can see my complete history of connecting with a person – all my calls to them right next to texts, IMs, and emails.

So, what do you think? Will you share my vision with me?

On Interview Coding

Oh hey, I’m unemployed again (this time was, sadly, involuntary – my company shut down). That means I’m doing a lot of talking, to hiring companies and to people about the hiring process. One topic that I find myself feeling surprisingly strongly about: interview coding questions.

One company (I won’t name them) asked me to write code to reverse an array. In Ruby. Which has Array#reverse. I understand the abstract point here, but… really? Is this the best we can do – ask people in mild-to-major (depending on the candidate’s personality) stress to implement objectively-useless code? If you’re falling back on seeing “how people problem-solve under stress,” then why limit yourself to code? And if you want to see how they’d do their actual job under stress, why not ask them to do something like they would if the stress arose in the new job?

For example, if I’m applying to build web software, ask me to debug a slow page, find and resolve contributing causes to an outage, or work through a vague product vision to determine what to build. The first two of these could be made as directly-technical as a standard coding question. The last one would be trickier, but is infinitely flexible. Heck, you could even reuse the array reversal question: put on your biz-person hat and say “I need the contents of this file in a different order ASAP for a presentation to the board!” Let me figure out by drilling in that you need it reversed – and if I write anything other than array.reverse! go ahead and move on to the next candidate.

On Ansible

In my post on flexible infrastructures, I mentioned in passing that I was managing my ops work with Ansible rather than the more traditional Chef or Puppet. Several factors guided me towards this choice:

  • The overarching goal for the new infrastructure was to have disposable servers in every role, instead of maintaining long-running servers over time. As a result, I focused on the initial provisioning much more than the ongoing configuration management experience.
  • I wanted the servers to be as similar as possible, but not more than they needed to be. If two distinct roles needed Ruby, I wanted them to use the same version.
  • I was (am) new to this level of operations involvement, so the quicker and easier the learning curve, the better.
  • Less importantly, I was experimenting with the idea of using Packer to create AMIs that we could launch on-demand to fire up new servers. For this, I found local provisioning more intuitive than something centralized.

Starting from Chef

Our existing infrastructure was managed with a standard Chef setup; I spent a full week trying to wrap my head around what we had and adapt it to the new vision. I felt that I had to replace our existing cookbooks because they’d gotten far out of date, but when I pulled in community cookbooks for the software I wanted to install I kept running into conflicts. One would require the redis cookbook, but another would need redisio; one would install Ruby 1.9.3, while another would use 2.0.0. Sure. I could’ve (and did, at first) fork them to get them inline with each other, but then I’d just be setting us up to fall out of date again in the future.

Chef also fell short on the learning curve principle; I felt lost from the start looking through the existing repo we had, and the documentation never quite seemed to answer my questions. I was never clear when something applied to normal Chef vs. chef-solo, for instance – and all the examples I saw started with the full, relatively complicated hierarchical file structure that’s great for when you know what’s going on but rough when you see it unexplained.

Finally, Chef just seemed overly powerful for what I wanted – it’s very obviously Configuration Management, when I just needed a little provisioning tool. This also kept me from digging into Puppet too deeply, especially once I ran across Ansible.

Finding Ansible

I was looking for simpler alternatives to Chef and saw a link to Ansible’s documentation. Within a few minutes, I knew how to do the local provisioning I wanted (by making 127.0.0.1 the only entry in the hosts file) and saw how to start, simply, with a single playbook file. YAML, as much as I hate it as a serialization format (seriously, I hate it for that with a fiery passion) seemed perfectly suited to directives like this:

1
2
3
- name: Generate the Nginx configuration file
  copy: src=nginx.conf
        dest=/etc/nginx/nginx.conf

As I grew more comfortable with how Ansible worked, I started looking at more complicated directory structures and setups using roles, but the key was the ease with which I moved into it – every step was easy and made sense at the time, as opposed to just being dropped in the deep end.

The simplicity of the playbooks (and their direct correlation to the shell commands I’d run to set up the server manually) made it incredibly easy to write my own roles and reuse them for different server types, which made it trivial to keep the dependencies identical wherever possible.

Wrapping up

I hope no one reads this and comes away thinking that I’m saying Ansible is objectively superior to Chef or Puppet. They’re all powerful tools – it’s just that I found Ansible to be the best fit for me, given my objectives and experience. Honestly, the more automation we can get in operations, the better, regardless of the tools used!

That said, if you’re looking to get started with all of this, I think Ansible is well worth a look.

On a Flexible Infrastructure

I’ve been having a lot of fun over the past couple of weeks shading into the ops side of things – I’ve been experimenting with an alternative to the existing infrastructure at my day job, and have been learning a ton. There’s a long way to go, but I think I’ve got the start of an interesting, flexible setup.

Configuration

At the heart of this cluster is a configuration server; I wanted to have new instances easily register themselves and discover existing instances and services. After looking at a few alternatives, I ended up going with etcd, in large part because of its curl-friendly interface (which also powers my experimental Ember.js web UI for etcd, wetcd).

Monitoring

For monitoring, I wanted something that’d easily adjust to instances spinning up and down. Sensu looked like a good fit, so I’ve been giving that a go. It introduces dependencies that might scare some people off (Ruby on every client, for instance), but it’s incredibly flexible.

Analytics

The old standbys of StatsD and Graphite are great; the only outstanding question for me here is what dashboard to use, since Composer is … not the best.

Logs

The piece that I think gets overlooked most frequently in infrastructure setups is log aggregation – when you’re adding and removing instances willy-nilly, you really need a solid, central place to view and analyze logs (especially if you want to keep people away from direct access to the servers). I’m loving Logstash for this, especially since it just added some great features as it hit 1.2. For getting the logs to Logstash, I’m relying on good ol’ rsyslog. Finally, I’m using Kibana 3 to view and analyze the logs.

In Action

So here’s what happens: when a new instance comes up, it first pings the configuration server to find out where monitoring, logging, and the like live so they can be set correctly. Once provisioning is complete, the instance then notifies the configuration server that it is available for whatever role it’s playing.

All boxes get a standard set of monitoring checks by default, in addition to checks specific to whatever they’re running (nginx, Redis, etc.); some of these checks get forwarded on to the analytics server. Finally, all logs get shipped to the log server via rsyslog. It all works together shockingly (from my naive, developer-oriented background) well.

There’s more to talk about (for instance, why I prefer managing all of this with Ansible instead of Chef or Puppet), but I’d love to hear what you all think. Hit me up on Twitter with comments and questions!

On GitHub, DDOSs, and Deploys

When GitHub goes down, you can almost hear the wailing in the streets. GitHub has cemented itself as a central part of many development workflows, and while much of that is unaffected by the occasional DDOS, one element in particular has the potential to cause a lot of trouble: deploys.

Standard practice for many deployment tools (e.g., Capistrano) critically relies on the GitHub repository being available – checking out the latest code either on the remote server itself or locally before pushing it to the remote box. If GitHub’s unavailable, all of that comes to a screeching halt.

Enter: deus_ex (github). It’s a simple RubyGem meant to work around this exact problem.

Say GitHub is being DDOSed and you need to deploy – just install the gem, ensure your AWS credentials are correct in ~/.fog, and:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ deus_ex
[DEUS EX] connection established
[DEUS EX] creating server (this may take a couple of minutes)
[DEUS EX] server created
[DEUS EX] initializing git repo
[DEUS EX] git repo initialized
[DEUS EX] adding local git remote
[DEUS EX] pushing to remote
The authenticity of host 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com (xx.xx.xx.xx)' can't be established.
RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com,xx.xx.xx.xx' (RSA) to the list of known hosts.
Counting objects: 126, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (78/78), done.
Writing objects: 100% (126/126), 13.70 KiB, done.
Total 126 (delta 51), reused 104 (delta 42)
To ubuntu@ec2-xx-xx-xx-xx.compute-1.amazonaws.com:deus_ex_project.git
 * [new branch]      master -> master
[DEUS EX] removing local git remote
[DEUS EX]
[DEUS EX] you can now deploy from ubuntu@ec2-xx-xx-xx-xx.compute-1.amazonaws.com:deus_ex_project.git

Then, jump over to your deploy tool, set it to look at your new repository instead of GitHub, and deploy away!

Once you’re done, you’ll need to clean up the instance:

1
2
3
$ deus_ex cleanup
[DEUS EX] connection established
[DEUS EX] server destroyed

And be sure to set your deploy tool to look at GitHub again for future deploys!