Ben Scofield

Ben Scofield

… rarely updated

08 Sep 2009

Polyglot persistence

Polyglot programming was everywhere in the (developer-centric) news what, last year? That is the idea that good developers should be competent, if not fluent, in a number of different programming languages, because different languages are good for different things. Perl, for instance, was built for string manipulation and report generation, while Java was... not. The programmer who knows both, then, is better able to handle a wide variety of problems quickly and efficiently. Of course, anything you can do in one Turing-complete programming language, you can technically do in another - but that doesn't mean it'll be easy. Over the last few months, I think we've seen the emergence of a similar movement in another aspect of the development world: databases. There's been a metric ton of publicity for CouchDBRedis, and other alternatives to the traditional relational database - many of which fall under the moniker NoSQL (or "post-relational," perhaps). I think that many of the NoSQL crowd either fail to either recognize, or to properly describe that their preferred databases don't replace applications like MySQL and Postgres, just as Ruby doesn't replace Java. Instead, the explosion of these new options for persistence just work better for some domains (and worse for others). That, in fact, is the subject of a talk I've been giving at various events this year*. In the talk, I explore a couple of problem domains - biological taxonomy and the comic book market - and show how neither maps cleanly onto a traditional relational database schema. I then describe some of the major categories of alternatives (key-value stores, document-oriented databases, and graph databases), and show how they work better for these particular domains. My favorite part of the talk, however - and my favorite aspect of the NoSQL movement as a whole - comes at the end, when I describe a blended system. Many applications may require a non-traditional data store (say, something like MongoDB) for their core domain, but have other features that fit perfectly into a relational database - say, a CMS that relies heavily on custom fields and has a traditional user management system. Just as polyglot programmers may use multiple languages in a single application, I think the future of the web is polyglot persistence: we should use the database that best represents our domain, even if that requires several distinct systems within a single application. Of course, this raises new problems - for instance, scaling multiple database systems to expand a single application - but these seem far from insurmountable, and the gains of the approach appear (to me, at least) substantial. It's clear from the explosion of alternative databases that this is an area that will continue to grow, and the only viable way to deal with that growth is to take its best parts and integrate them with the best that the current ecosystem has to offer.

* Repeating a talk is fairly rare for me, but in this case it seems particularly appropriate - the state of the database landscape is in such flux that I get to revise parts of my talk fairly often, so the version I give at, say, WindyCityRails this weekend will be noticeably different from the one I gave at Developer Day Boston in August.