The Problem with Data Modeling (as commonly practiced)


Every mature MVC application has that one model that’s grown out of control.  The table has 20 columns; there are user preferences stored in it alongside system information; it sends email notifications and writes other models to the database.  The model encompasses so much application logic that any new feature is likely to have to go through it.  It’s the class that makes developers groan whenever they open it up.

How did we get here?

Web applications usually start out with a single purpose, to display a single type of data – maybe it’s Article for an online journal or ClothingItem for a fashion retailer.  It’s common for MVC practitioners to take concepts from their product whiteboarding sessions and directly translate them into database models.  So we start out with a database model that represents a central concept in the application, and as more business requirements emerge, the cheapest way to accommodate them is to add columns to the existing model.  Carry this out over time and you end up with a God Object.

The problem from the start was taking the casual, loose concepts sketched out at the product level and putting them in the codebase.  Software engineers should be well aware that the concepts people use in everyday life and thinking are terribly imprecise and loaded with implicit assumptions.  Highlighting implicit assumptions is often a software engineer’s key contribution, so it’s a wonder we take these concepts from the product level and embed them in our code.  It’s just asking for hidden edge cases to need clarifying logic shoved into the existing class.

The concept of Folk Psychology is illuminating here.  Folk Psychology refers to the innate, loosely specified theories that people have about how other humans operate that they use to infer motivations and predict behavior.  These “folk” theories work well enough in the context of everyday human life, but are not scientifically rigorous and contain blindspots.  Similarly, people make use of “folk object models” in software businesses.  These are the informal concepts people construct to discuss software with other humans – the words product managers use with software engineers, the boxes drawn on the whiteboard.  They work well enough when discussing concepts with other humans, who can be generous in their interpretations, but can fall apart when formalized as code.  These concepts are a useful starting point to frame the product features, but from an OO perspective are too broad to be used as classes.  They tend to accumulate logic since they implicitly encompass so much of the problem domain.

Much as the first obstacle people have to overcome when learning to code is to take their thoughts and explicitly formulate them as steps in an algorithm, experienced software engineers need to take folk object models and break them down into explicit components that can be used as classes.  In the product domain, we may start with a broad “User” concept concept, but as we dig deeper we’ll discover different pieces of logic that would be better served as separate classes- a billing preference, a current status, or notification settings.  Each of those will require their own logic to meet product requirements, and if we don’t separate them out to make space for the logic, we’ll incur bloat.

People often think that data modeling is about encoding the business concepts in software, but really it’s about using model classes as tools to construct a system.  Often codebases are better served when large models are broken into components that each address a specific piece of domain logic.

This article was written by Alex Kudlick.