It’s time to kill the relational database model. Well not the relational model per se, but rather the practice of building systems by developing relational schemas first.
This is quite the revelation for me. When first I discovered relational database design over three decades ago I thought it was religion. The simplicity of tables and relationships represented freedom from the rigid hierarchical structures of the day. Since then, relational design and its’ normal form remained a constant foundation for the systems I constructed.
But during a recent project I tired of the burden that relational design placed on our development effort. In particular, a tight adherence to rules of normalization and relational philosophy stifled our opportunity for nimble software engineering. What on paper looked like a method of enforcing integrity became complex morass of surrogate keys and multiple joins.
Fundamentally my complaint is over the complexity that normalization brings; an odd turn considering that relational databases were popularized on simplicity of tables. Frankly, though, there is nothing intuitive about joining tables; and the more joins the more complexity.
While relational databases are not (and should not) go away, there are flaws with some relational theory that makes writing software difficult. For example…
- Normalization produces a lot of tables. A lot of tables translate into a lot of joins. A lot of joins is a red-flag for complexity. When queries frequently have three or more joins, the schema is probably overly complex.
- Relational theorists discourage use of null foreign keys. The only way to accomplish this goal is to introduce a link table for the one to many relationships in the model. This introduces two problems, first, the extra table adds an additional (and unnecessary join). And second, it is harder to determine if a row in one table is related to a row in the other.
- Using data to determine state (or status) introduces complexity. While not necessarily a tenant of relational models, some designers prefer to compute the state of an item on-the-fly based on data in the tables. This is easy enough when all the data needed to determine the state are stored on the same table row. Unfortunately it is frequently necessary to scan additional tables on multiple rows to calculate the status.
A better method of system design starts with a design of objects. The database schema, then, is modeled to keep these objects intact. In this way, the database becomes nothing more than the repository for data at rest. The validity of the data is managed by methods within the objects and not by referential integrity or database triggers. Think of this as object first, data last design; there OFDL, I just coined a new acronym.