Ben Scofield

me. still on a blog.

Wanting to Be, Not to Do

It’s about a month and a half until my 10th wedding anniversary, and I’ve been reflecting on how I was feeling before I got married. I distinctly remember telling my now-wife several times during the wedding planning that I wanted to be married, but I didn’t at all want to get married. During a long cab ride to an airport this morning, I was thinking about that, and I realized that it’s one instance of a more general truth that a lot of people are realizing or discovering in other contexts.

Take the (encouraging) trend towards user- rather than feature-focused communication in product design. We’re discovering (again?) that pitching products based on the person that the user will be with their help is far superior to pitching them based on all the things that the user can do with them.

Or look at Kathy Sierra’s discussions about writing and speaking, where she continually focuses on helping the audience be awesome, instead of telling them what to do.

Or consider basically any research on habits, willpower, and practice, where people will repeatedly tell you how committed they are to quitting smoking (being a non-smoker), losing weight (being healthy), or playing an instrument (being a musician), but all-too-frequently fail to do the work.

There is, of course, an exception to this: experiences. There are some things that people do, not to become something different, but just to do them. Skydiving, for instance. Heck, weddings for many people are just that – where for me the wedding was a means to the end of being married, my wife loved the entire experience of the day.

Experiences like that – ends in themselves – are pretty rare, though. In business, few startups are selling to that angle, and in life we’ve all got commitments and needs that, most of the time, prevent us from pursuing things like that.

So, I’m committing to myself to reframe how I think about my work in terms of who the people who consume want to and will become, as opposed to how they’ll use it. I think it’ll be a much more successful worldview in the long run.

Predictions for Comixology+Amazon

Note: I don’t have any special insight into the business arrangements here; I haven’t been paying as much attention to the comics industry over the past few years as I once did. These are just minimally-educated guesses based.

When the news that Amazon had acquired Comixology broke, several of my friends — knowing my long-time interest in comics in general and digital comics in particular — asked my opinion. I didn’t have a ton to say at the time, other than the general statement that I didn’t think much would change on the Comixology side (citing Goodreads as an example).

Today, though, Comixology announced that they were removing the ability to buy comics through their mobile applications, just like the Kindle app doesn’t allow you to buy books. I didn’t expect this, though had I spent any real time thinking about it I hope that I’d have predicted it — the only surprising part of this is that it is happening before the acquisition closes. In order to attempt to head off future unexpected events, I’m going to log some predictions here.

Comixology’s catalog will expand; Amazon’s won’t

Amazon has some comics that don’t appear on Comixology (in my experience, a lot of that is Vertigo and Dark Horse); I’m betting that’ll be rolled onto Comixology’s platform. I’m less certain that the reverse will be true, but if that happens I’d expect it to take a long time. Note that if Dark Horse makes the jump, that changes their native digital strategy considerably.

One of the good things that Comixology has done has been expanding its backlist, and I hope that’ll continue. I’d be surprised if it were to speed up, though, and I wouldn’t be surprised to see it slow down.

Kindle comics and Comixology will stay separate

I’m betting that we’ll still see two applications for reading comics for the foreseeable future, and purchases on one won’t show up in the other (even when the content is available in both catalogs).

And beyond even not-merging: I’m betting both will remain basically the same. Amazon doesn’t seem to pressure its acquisitions to improve their customer experience (cf. Goodreads), instead choosing to focus on questionably-valuable backend integrations. Maybe we’ll see ratings propagate from one application to the other, but I wouldn’t expect even that for a year or more. I don’t expect to see Guided View(tm) in Kindle comics — I’d be less surprised to see comics removed from the Kindle app entirely (though I’ll be shocked if that happens).

Prices will stay the same

My impression is that prices for digital are where they are because of publisher pressure, and I don’t think having a single outlet will change that.

Recommendations will continue to suck

Near as I can tell, there’s no good algorithm for recommending comics, and even with the extra data that Amazon will have access to I don’t think we’ll see that change. You’ll still need to talk to someone you trust to hear about good stuff (be that a voice on the internet, your local comic shop staff, or someone else).

Publisher-native apps will start to fade

I’m guessing that as content contracts come up for renegotiation, we’ll see fewer native apps from Marvel, DC, and the rest (some of this happened in the last few years; the acquisition will just finish the job). I don’t think that Marvel, particularly, will ever get rid of their app completely, but I’d be surprised to see them push it as a primary channel a few years from now.

New competitors will emerge (but only for indie comics and, possibly, Dark Horse)

Thrillbent’s already out there and doing interesting things, but I think that the Comixology-Amazon behemoth will scare enough people that we’ll see new players emerge. Unfortunately, Comixology and Amazon will be able to put a stranglehold on content from the big two, so you won’t see a little comics startup distributing Batman or the Avengers any time soon.

I imagine most experimentation will be on pricing (again, looking at Thrillbent), but we seem pretty well locked into substandard user experiences for comic discovery, and we’ve seen a sad convergence to a decent (but not exceptional) reading experience.

TL;DR

All in all, I still don’t expect a ton of change on the consumer side — I think the most visible changes will be Comixology’s catalog expanding and some minor, low-value (for customers) integrations between Comixology and Amazon.

My fondest wish is that someone would come along — probably by providing something exceptional for indie comics, though they’d most likely have to figure some way to get mainstream content into their system — and just shake up the whole ecosystem. I remain pessimistic, though, as I have been since Comixology basically one the race years ago by locking up the content that most people are going to go looking for in a comics app.

Postmortems: Trust and Confidence

I am a strong believer in the value of a good postmortem after a customer-affecting incident — and after internal incidents, and after unusual projects and efforts, and pretty much all the time. I really want to talk about one of the purposes of a postmortem, however: building trust and confidence.

Trust and confidence

I’ve been thinking about trust and confidence in the context of software development since early November:

Much of my thinking has been around the distinction between the two, and what I’ve come up with is this: trust is given, confidence is earned.

Say you hire a new developer for your team. Assuming you’ve never worked with her before, you don’t have real evidence that she can do the job, but you trust that she can. Over time, as she solves problems and demonstrates her expertise, you acquire confidence that she can do the job.

Similarly, when you start working with a new code base, you trust that the tests validate the code’s behavior. As you make changes to the code and find them caught by the tests, you gain confidence in the suite as a whole — or, conversely, when you find that the tests don’t cover some complicated functionality, you lose some of the trust or confidence that you had for the suite.

Postmortems

I could go on and on with examples of trust and confidence in people, in companies, in tests, in code, and more, but I want to shift back to postmortems now. The way I see it, when you have an incident, you’ve damaged the trust and confidence that someone has in your system by some number of people. When you lose all of that scarce resource, you fail — so a prime goal coming out of an incident is to rebuild as much of the confidence as you can.

(I’m being very careful with my choice of “trust” and “confidence” here. Incidents damage both, but the damage is worse to confidence; the incident itself is evidence against your system working properly. The goal afterwards, then, is to build back as much confidence as you can so that the normal cycle of gaining both can restart.)

So, let’s think about some postmorteming. One thing that should be clear is that the way you rebuild trust depends on both what happened and on who was affected. For a security breach at Mint, for instance, you’d need to rebuild trust in your security measures, so you’d talk about external reviews, fraud detection processes, and the like — but low-level details won’t help less technical customers that much.

Contrast that with a performance issue on an internal service at Heroku; with deeply technical coworkers, you need to produce a deeply technical review of the incident to have any hope of rebuilding confidence.

So what?

I’ve seen a lot of postmortems that miss the mark, and all too often that’s because they’re focused too much on the wrong thing. Some get bogged down in minute-by-minute recreations of the incident; others skimp on remediation. Keeping the goal of rebuilding confidence in mind, however, shows why those two paths are mistaken.

Historical transcripts of incidents are great, but if they get too fine-grained it’s very easy to lose the larger sense of progress towards a resolution. The timeline of a postmortem should include blind alleys and golden paths, because knowing how the solution was achieved is vital to rebuilding confidence in the company or team — without it, you’re saying “just trust us” at the worst possible time.

Similarly, skipping remediation drops the ball on (collective, not individual) accountability. Publicly committing to (and following through on) steps to reduce the chance of recurrence and minimize the impact of future events is the least you can do to help those affected by a problem.

So: if you’re writing a postmortem, please remember this: your system has lost a portion of the confidence that people have in it. Take this opportunity to repair that as best you can.

Two Problems With Antifragility

I’m no fan of Nassim Taleb (of black swan, antifragility, and needing-an-editor fame), but I forced myself to finish his last book Antifragile in spite of all the problems I had with his presentation and writing. If you haven’t heard of it, the gist is: fragile systems suffer harm when stressed, but there are systems that instead benefit when stressed. Taleb calls these systems “antifragile.”

Near as I can tell, Taleb thinks his identification of this concept is hugely revolutionary, and he goes out of his way to name all of his favorite things antifragile and all of his least favorite things fragile. Leaving aside his questionable expertise in many areas, I want to talk about two related problems with the general phenomenon.

First: antifragile systems may improve when stressed, but this is often at the cost of increasing fragility to other types of stress. Take a muscle: subject it to a load (stress it) and it gets stronger, easy peasy. What you don’t see, however, is that it strengthens in a way that makes it more vulnerable to other stresses — it becomes less flexible, for instance. Sure, you can override this by stretching, but that’s a different, opposed antifragile process.

Or take an autoscaling application hosted on EC2. When traffic increases (stress), the system spins up more instances (becoming better able to handle the traffic). The system is stronger with respect to traffic — but it’s weaker with respect to, say, cash flow stresses. It costs more, so financial stresses will produce more harm because of the strengthened ability to handle traffic.

Second: the very process of antifragility generates fragility. By definition, antifragile systems gain more benefit than harm when stressed, and fragile systems do the opposite. But: in almost all cases, there’s a limit to the strength a system can display. As antifragile systems improve and get closer to this limit, they gain less and less of a benefit from the same stresses. More severe stresses are required to produce progressively smaller improvements. A direct consequence of this is that the systems are, in fact, becoming more fragile.

Thought experiment time! Imagine a system that, via its antifragile properties, has reached the peak of its strength. Further stresses have no beneficial effect (since we’re assuming that there’s a maximum cap on improvement). There’s nowhere to go but down — stresses can’t help, so they can only hurt, so the system has strengthened itself into (Taleb’s definition of) fragility.

All of this is just to point out that the concept of antifragility is a bit more complicated than it might appear at first blush. I do think it’s worth considering as we build systems, but it should not be treated as a pure good without further thought.

My 2013 in Reading

I’ve been using Goodreads reliably this year to track my reading (well, the book and graphic novel portions of my reading). Over the last month, I’ve been exploring that data – mostly for my own personal enjoyment, but also to see if there were any useful, gleanable insights into how I read. We’re not quite out of 2013 yet, but I thought it might be interesting to record what I’ve found so far in one place. Warning: this is navel-gazing of an extreme form.

The reading challenge

Goodreads allows readers to create a reading challenge every year – a public commitment to read a certain number of books. In 2012, I read just over 100 books, so I thought I’d push myself in 2013 and committed to an average of 3 per week, for a total of 156.

I finished the challenge on December 14th; I was on pace for much of the first half of the year, but gradually fell behind until a prolific stretch in November and December.

All in all, I think I overdid it – I’m a fan of committing to reading, but my choice to go big ended up siphoning off some of the joy I get from reading. I pressured myself to power through some books when I’d have preferred to take a bit more time.

When I read

I mentioned falling behind pace at one point. As it turns out, that slowdown started when I left LivingSocial in early April. Weirdly, I stopped reading (or at least finishing) when I wasn’t working – it happened again after Wantful shut down. This chart shows the number of books finished by day, with each month getting its own row (January’s on top).

Completed books by day

Reads-in-progress

I’ve always known that I have a bit of a … problem focusing on just one book at a time. Now, I have data!

Reading swimlanes

(The colors correspond to the chart in “What I read”, below)

March is interesting, as I worked down my in-progress stack to 2 for the first time in as long as I can remember.

This chart is a bit misleading; it only shows books that I finished, and (at it happens) I started one in late March that I’m still chugging through. With that in mind, I maxed out at 10 books at once in early August (just after I started at Wantful).

For reference, I’ve currently got three books in-progress. Two of ‘em are long-timers, so who knows when I’ll finish them.

“I just like liking things”

According to the data, I’ve enjoyed the vast majority of what I read this year. Goodreads ratings are out of 5 – this chart shows the percentage of books (overall and of each type) that got a given rating.

Overall

2 3 4 5

Comics

3 4 5

Tech

2 3 4 5

Non-fiction

2 3 4 5

Fiction

2 3 4 5


What I read

This chart shows books completed by type – fiction, non-fiction, comics, and technical/business (top to bottom)

Completed books by genre

I’ve abandoned tech books in the last third of the year, and for whatever reason avoided comics until I left LivingSocial, but otherwise had both fiction and non-fiction in-progress through the whole year.

How I read

93% of my reading was done digitally – on a Kindle or iPad. Given that I have over a hundred physical books on my to-be-read shelves at home (relics from a pre-ebook era), I need to shift that in 2014.

Closing thoughts

I think the reading data on its own is fun (for me, and maybe for you) to look at, but any real value is dependent on correlating it with other information – things happening outside of the books. As just one example: my slowdown between jobs is interesting, especially since it happened both times. I’ve got some analyses in the work that’ll hopefully let me start looking at those sorts of interactions, but it’ll take awhile.

On reading, specifically: I’d love to have more granular data about things like reading speed, re-reads, and whatnot, but I don’t think that’s likely in the near future (ebooks make it possible to gather that data reliably, but mine is locked away inside Amazon and I don’t foresee them releasing it anytime soon).

Beyond all that: I really love to read. I’m not going to commit to a huge goal in 2014, but be sure that I’ll keep track of what I do read.

Problems With Self-organization

I just listened to the November 21st episode of the Freakonomics podcast , on what economists call ”spontaneous order .” It’s an interesting phenomenon – essentially, it’s self-organization. Daniel Klein (an economics professor at George Mason) describes it through the metaphor of a skating rink : imagine a hundred people all skating at once. There’s no dictatorial authority telling one person to speed up or a couple to move to the outside – there are minimal imposed rules (just the skating direction, say), and beyond that pure self-interest keeps the whole thing from devolving into chaos.

Klein explains this by appeal to mutuality; I have an interest in not colliding with another skater, but that skater has the same interest in not colliding with me. That shared interest – when expanded to the full set of skaters – explains the order that arises spontaneously. Slower, less experienced skaters move to the outside, etc.

It’s a great phenomenon to study, and if it were universally generalizable it’d be a great argument for libertarianism, or as support to the manager-free cultures some companies have adopted, or probably a great many other situations. I’m not convinced about how far we can take it, though, for two reasons.

Metaphor troubles

First, the skaters have very simple goals – mostly just to stay upright and to have fun. Everyone understands that, and it’s incredibly easy to see how your actions directly contribute to the achievement of those goals. Given the relatively small number of skaters, it’s even easy to see how others’ actions contribute.

If we complicate the goals, dramatically increase the number of people involved, or add steps that turn actions’ direct contributions into massively-indirected influences, this all becomes much less convincing. Nation-states meet all three such criteria – can you even identify your country’s goals? Certainly not in a blog post, and probably not at all without grossly oversimplifying and leaving out important bits. The US has nearly 314 million residents, which is a far cry from a hundred skaters. And beyond all that: how can any (not-the-President) citizen figure out the impact of his or her actions on the goals of the nation? There are just too many steps between here and there, all with innumerable external factors interfering. We’re talking Laplacian-level calculations to self-organize at the national level.

So what about companies? Small-to-mid-size businesses may avoid the population size issue, especially since we’re not talking about a basically-random sample of people. Similarly, companies of this size may be focused enough to make their goals comprehensible, and with (sometimes significant) effort may be able to show how a single employee’s actions affect those goals. In that situation, I think self-organization may be a workable approach – but it’s going to fail spectacularly if those conditions cease.

Save the ref

Second… well, to address the second problem I need to go back to the podcast briefly. Besides the interview with Klein, Steven Dubner (the host) also talked to athletes. It turns out that almost all games of Ultimate (Frisbee) are self-policed, with no referees. As Dubner talked to devotees of other sports (basketball and soccer, for instance), he asked them how orderly their games might be without referees.

In all of these talks, however, I think he missed a few more interesting opportunities. Take the plate umpire in baseball – (almost always) he is necessary because he’s watching things that no player is able to watch. The catcher and batter can’t afford to look at the plate to see if the 96mph fastball falls off the corner, so the only option is to install another set of eyes. Spontaneous order can only succeed when the individuals are able to gather all necessary information.

The analogue in a company would be a project manager – someone who looks at the whole picture. This doesn’t mean that PMs have to end up as dictators, however. As Rands discusses, good PMs see the things that the boots on the ground don’t … and then communicate those insights back to the rest of the team.

You can still be manager-free like this. I’d argue that this no longer quite counts as self-organization, however, as the need for a particular role is mandated by the nature of the effort.

All in all, I think spontaneous order / self-organization has a lot of promise (as the success of famously self-organized companies like Gore, Valve, and GitHub imply), but I think that it’s far from a universally-applicable strategy. It pays to understand its limits, just like any approach.

On Valuing People

Ernie and I are going through many of the same experiences right now, so his post on how interviews are broken resonated with me. In particular, I wanted to expand on his “I am special” point.

I don’t know that I’m particularly special; that carries a slight note of “better-than” that I’m uncomfortable with (though I don’t see that in the context of Ernie’s post). I am, however, convinced that everyone is different. We all bring different strengths, weaknesses, preferences, biases, intentions, and experiences to whatever we do. Any process that ignores that fundamental truth – whether it’s a hiring process, a date, a debate in a comment thread, or whatever – is broken.

Caution: philosophy ahead

Immanuel Kant’s moral philosophy system was built on the categorical imperative; basically, it’s a framework for evaluating actions. The imperative has a few different formulations, but the one I’m interested in here is the second: “Act in such a way as to treat humanity, whether in your own person or in that of anyone else, always as an end and never merely as a means.” I’m not a Kantian, but I’ve always found this formulation compelling. It is wrong to treat other people simply as a means to some end or goal.

It’s not always easy to remember to treat people as people, instead of as extras (or worse, props) in a story in which you’re the protagonist. It’s even harder when you’re dealing with a lot of people all at once, like when you’re hiring someone. Here’s the thing, though: if you can keep in mind the individuality of the people with whom you’re dealing, you’re going to be much more successful in the long run.

An example

Some time ago, I interviewed with a large company. I got dropped into a fairly standard interview process, which means they wanted me on-site for an all-day interview. They set me up with a travel agent who booked the trip, and I was off – to a pretty terrible experience.

While in the air for the first leg of my flight, the second leg (which was direct to my final destination) was cancelled. Upon turning my phone on after landing and finding out, I freaked out, found the rebooking center, and … got myself booked onto another two flights that would have me in the air until midnight. A few more delays later, and I ended up checking into my hotel at 1:30am, a little less than 12 hours before the interview.

After a few hours of sleep and catching up with an old friend who happened to live in the city, I walked over to BigCo and proceeded to talk to a series of people over the course of 4 hours. I designed systems on the whiteboard, instrumented existing processes to ensure performance, talked about how I’d tackle various sorts of problems – and was asked very little about what I wanted to do. It seemed very clear to me that I was one of a large number of (from BigCo’s perspective) interchangeable candidates that might or might not fit into the role they were trying to fill. In other words, they were treating me as a means to solving their problem, not as a person with ends of my own.

The trip home was actually worse than the trip out; the second leg of my flight was delayed repeatedly and eventually cancelled, but by that time I had given up hope of salvaging the trip, rented a car, and just drove home (luckily, the first leg left me in Charlotte, so it was only a two hour drive). And… I never submitted any of my expenses for reimbursement. I just didn’t want to deal with BigCo at all anymore.

So, what could BigCo have done differently? The interview should have been a conversation; an exploration of their goals and resources, my goals and skills, and how those might fit together to be mutually beneficial. I won’t lie: that is hard, especially at a really big company where no one person knows all the possibilities. It gets even harder with less experienced candidates, since their goals are often more vague. Given that we’re talking about a lot of money (in compensation, and even more in impact), however, it doesn’t seem to make sense to skimp on this process.

And the travel. Honestly, that really got to me. If you’re going to book a candidate’s travel, then you’re taking responsibility for it. I don’t expect you to build a weather machine, but it’s not that expensive to have your travel agent proactively monitoring your candidates’ travel and fix problems before the candidate knows they happened. Travel is your first impression, and in many cases takes longer than the interview itself. (I was in the BigCo office for a total of 5 hours; I had layovers longer than that on both sides of the visit.) In addition to knowing the candidate’s resume, your interviewers should know how the trip went and express sympathy for whatever obstacles arose.

Very little of this is easy, at the company level or the personal level. It certainly entails some additional effort and expense. Compared to the cost of losing great candidates, however, it seems like a worthwhile expense even without considering that it’s the ethical way to behave.

Wrapping up

One final word: I’ve often heard interviewers (and myself) use a candidate’s lack of excitement about the company or job as a mark against them – “she just wants a paycheck.” Hopefully, this (overlong) post has convinced you that turnabout is fair play, and that “they just want a body” is equally bad.

On Contact Management

I have a problem: every time I open up an application that has a contact list, I find something confusing or frustrating. With Skype, it’s people I talked to once, years ago, and no longer remember who they are. With Messages, it’s duplicate accounts that I know map to the same person. With Google Contacts, it’s outdated email addresses that I never got around to changing. With Contacts.app, it’s crazy merge artifacts. With … I could go on.

This could be better, and I’m going to tell you how.

Imagine there was an application; call it Spheres. In this application, you keep track of your own contact information – email addresses, phone numbers, links to Twitter, Facebook, LinkedIn, GitHub, and whatever else you like. (Yes, I know sites like this exist now. Bear with me.)

Once your information is in, you can package bits of it together into identities. You might have a public identity that includes your name, blog URL, and Twitter handle. A coder identity might build on that by adding your PGP key and your GitHub username. A work identity might start with name, work email address, work phone number, and your LinkedIn profile page. Some of these identities could be predefined, but you’d always be able to modify or remove existing identities and add new ones.

Now, you meet someone. Through some mechanism, you communicate your Spheres username to them – maybe you bump phones or whatever. Here’s where it gets interesting: your new acquaintance is now added to a sphere. Maybe it’s a default sphere (like “public” or “acquaintance”), or maybe you choose it before connecting with them. Regardless, that sphere is tied to one or more of the identities that you’ve already defined, and the acquaintance can immediately go in and see any of your information exposed in those identities, without having to add you to their address book, enter a phone number, or keep track of a business card.

You can then go in that night and tweak the spheres (and thus the identities, and thus the information) to which they belong – you could even override access on an individual level, if you wanted to grant your phone number or something.

When your acquaintance wants to call you, her phone makes an API call to Spheres asking if she can access your phone number; if she can, Spheres returns the number and the call goes through – similarly, if she wants to email, IM, send you snail mail, etc.

How does all of this make things better? From the sharer’s point of view:

  1. I don’t have to notify people when my address changes – I change it in one place and anyone who looks for it gets the correct info.
  2. I’ve got fine-grained control over who can see what about me, but can easily make large-scale changes by modifying identities and spheres.
  3. Pseudonymity can be managed right next to IRL identity.

And from the point of view of your contact:

  1. I don’t have to worry about outdated information; people update their own info and I have access to it immediately.
  2. I don’t have to store my contacts’ information locally – I can always just grab it when I need to (though there could be local caching if connectivity is spotty)
  3. Centralizing contact information and making sure all contact requests hit that central store means that I can see my complete history of connecting with a person – all my calls to them right next to texts, IMs, and emails.

So, what do you think? Will you share my vision with me?

On Interview Coding

Oh hey, I’m unemployed again (this time was, sadly, involuntary – my company shut down). That means I’m doing a lot of talking, to hiring companies and to people about the hiring process. One topic that I find myself feeling surprisingly strongly about: interview coding questions.

One company (I won’t name them) asked me to write code to reverse an array. In Ruby. Which has Array#reverse. I understand the abstract point here, but… really? Is this the best we can do – ask people in mild-to-major (depending on the candidate’s personality) stress to implement objectively-useless code? If you’re falling back on seeing “how people problem-solve under stress,” then why limit yourself to code? And if you want to see how they’d do their actual job under stress, why not ask them to do something like they would if the stress arose in the new job?

For example, if I’m applying to build web software, ask me to debug a slow page, find and resolve contributing causes to an outage, or work through a vague product vision to determine what to build. The first two of these could be made as directly-technical as a standard coding question. The last one would be trickier, but is infinitely flexible. Heck, you could even reuse the array reversal question: put on your biz-person hat and say “I need the contents of this file in a different order ASAP for a presentation to the board!” Let me figure out by drilling in that you need it reversed – and if I write anything other than array.reverse! go ahead and move on to the next candidate.

On Ansible

In my post on flexible infrastructures, I mentioned in passing that I was managing my ops work with Ansible rather than the more traditional Chef or Puppet. Several factors guided me towards this choice:

  • The overarching goal for the new infrastructure was to have disposable servers in every role, instead of maintaining long-running servers over time. As a result, I focused on the initial provisioning much more than the ongoing configuration management experience.
  • I wanted the servers to be as similar as possible, but not more than they needed to be. If two distinct roles needed Ruby, I wanted them to use the same version.
  • I was (am) new to this level of operations involvement, so the quicker and easier the learning curve, the better.
  • Less importantly, I was experimenting with the idea of using Packer to create AMIs that we could launch on-demand to fire up new servers. For this, I found local provisioning more intuitive than something centralized.

Starting from Chef

Our existing infrastructure was managed with a standard Chef setup; I spent a full week trying to wrap my head around what we had and adapt it to the new vision. I felt that I had to replace our existing cookbooks because they’d gotten far out of date, but when I pulled in community cookbooks for the software I wanted to install I kept running into conflicts. One would require the redis cookbook, but another would need redisio; one would install Ruby 1.9.3, while another would use 2.0.0. Sure. I could’ve (and did, at first) fork them to get them inline with each other, but then I’d just be setting us up to fall out of date again in the future.

Chef also fell short on the learning curve principle; I felt lost from the start looking through the existing repo we had, and the documentation never quite seemed to answer my questions. I was never clear when something applied to normal Chef vs. chef-solo, for instance – and all the examples I saw started with the full, relatively complicated hierarchical file structure that’s great for when you know what’s going on but rough when you see it unexplained.

Finally, Chef just seemed overly powerful for what I wanted – it’s very obviously Configuration Management, when I just needed a little provisioning tool. This also kept me from digging into Puppet too deeply, especially once I ran across Ansible.

Finding Ansible

I was looking for simpler alternatives to Chef and saw a link to Ansible’s documentation. Within a few minutes, I knew how to do the local provisioning I wanted (by making 127.0.0.1 the only entry in the hosts file) and saw how to start, simply, with a single playbook file. YAML, as much as I hate it as a serialization format (seriously, I hate it for that with a fiery passion) seemed perfectly suited to directives like this:

1
2
3
- name: Generate the Nginx configuration file
  copy: src=nginx.conf
        dest=/etc/nginx/nginx.conf

As I grew more comfortable with how Ansible worked, I started looking at more complicated directory structures and setups using roles, but the key was the ease with which I moved into it – every step was easy and made sense at the time, as opposed to just being dropped in the deep end.

The simplicity of the playbooks (and their direct correlation to the shell commands I’d run to set up the server manually) made it incredibly easy to write my own roles and reuse them for different server types, which made it trivial to keep the dependencies identical wherever possible.

Wrapping up

I hope no one reads this and comes away thinking that I’m saying Ansible is objectively superior to Chef or Puppet. They’re all powerful tools – it’s just that I found Ansible to be the best fit for me, given my objectives and experience. Honestly, the more automation we can get in operations, the better, regardless of the tools used!

That said, if you’re looking to get started with all of this, I think Ansible is well worth a look.