Slashdot linked to an article on History’s worst software bugs. It’s a short and darkly entertaining read that makes us feel better about all the bugs we’ve all shipped out to the world. We’d never make anything that bad, right?
Wrong. The difference between these stories and 90% of software developers is the context of the work. Few of us work on medical equipment, anti-lock brakes or nuclear weapon arming devices. We don’t work on things with the potential to kill or cost $100 of millions. For most of us, if we employed the same development practices we do on a daily basis on a mission critical project, we’d make this list in no time. The difference between us and them isn’t skill: it’s domain.
Another problem with articles like this one is that they offer little to learn from. It’s to easy to laugh at how stupid the mistakes seem and gleefully return to writing code. These lists tend to give us unfounded confidence that it’s our approaches to work that makes us immune from these kinds of mistakes, but that’s not true.
Worse, unlike other kinds of disasters, such as airplane crashes, medical errors, or building collapses, few of these worst bugs in history have publically available analysis of why they happened and how we can avoid similiar mistakes.
Learning from the past is not a strong part of the practioner software development culture, and it’s a shame, since we repeat the same mistakes again and again. Understanding landmark failures is an integral part of most engineering disciplines (See Petroski’s To engineer is human: the role of failure), is not yet part of the software development culture, but it needs to be.
It’d be great for every CS major to study one of these software disasters before they graduate and understand something about how failures really happen before they start building things as professionals. Back at CMU the only course I took on engineering failures was in the humanities school (and it may have been the best course I’ve ever taken).
Here’s some resources for learning from the mistakes of others:
Anyone have any references I should add? How do you learn from the failures of others? Your own?