Why .NET is our go-to development stack. We all have favourites but ours is also the best.
When we talk about a database, we mean the store of information that a software application connects with. Software will use a database to save information and to retrieve it later. For the applications we develop, we typically use Microsoft's SQL Server which is a relational database management system (RDBMS). An RDBMS provides the tools to structure units of information or entities and indicate relationships between them. For example, one entity might be a "supplier" and another entity could be "product". Using an RDBMS we can structure the information so that one supplier may have many products and one product may have many suppliers.
I personally regard the design of the database as the most important design step in software development. Here are some database design concepts that we follow, and an outline of why they are so important.
While we are presenting wireframes and mockups to the future users of a system, we're secretly designing the database model and validating it against the users' expectations. We absolutely need to get this bit right as the data model we design reflects of our understanding of the entities within a business and the relationships between them.
Once a system is up and running, making changes to the structure of a database can be very difficult. It takes careful planning and a lot can go wrong. The hardest part involves transforming existing data into the new model. Under time and cost constraints, this is where corners are cut and we end up with a clunky database with performance issues, redundancy and poor discoverability.
The data model becomes the cornerstone of the application. By following good database design principles, we ensure that the system remains scalable for future enhancements.
It's a religious argument, but we stick to a single, auto-generated key for all entities. This is not something that the user is aware of, they might still use unique Job Codes to identify job entities, but behind the scenes there will be a Job ID. We use the these generated IDs to model the relationships between entities. This convention means we can leverage this assumption to write code faster, and it makes the data model much more scalable when requirements change.
Imagine you have a system which has two places to store your contact information, once in "leads" then another in "sales". This is a problem because if you update one the other stays the same, and now you have information out of sync. Instead, it's better to design the database with "contacts" in one place and *relate* them to leads and sales entities. The formal process to avoid redundancy is called normalisation and we follow this process for all the databases we design.
With a well designed database, we impose constraints to ensure that the data is never in an invalid state. Imagine having a maths class without a teacher. With an RDBMS we can enforce this relationship and make sure this doesn't happen.
There are however some constraints that we avoid at the database level. There is often a temptation to implement business rules in the database, especially when there are multiple applications connecting to it. One example of this would be to make sure that emails are valid. Instead of trying to build this into the database, we would look at writing an API layer that multiple applications could use, and this would enforce the application-level constraints like this.
Say you have a recipe book and you want to know all the recipes which have carrots. It would take a while to go through the ingredients lists for every recipe. Luckily, most cookbooks have an ingredient index at the back. Databases work like this too. We can create indexes for bits of information which we then use to make lookups faster. Too many indexes can make saving information slow, so it's important that we strike a good balance. We create indexes optimised for the specific operations we are running and monitor the database for inefficiencies.
I know I have mentioned avoiding data redundancy, but sometimes performance can be an issue. This can be made obvious when viewing dashboards or generating reports, which might involve joining and summarising information across many entities. If it means we can speed things up for the user, we will look to introduce some data redundancy, often by caching data on a schedule. It's important that we don't optimise early in this way. We would make this decision based on the results of performance testing and well after we've modelled the database on the business.
Any well designed software system should aim to be self-describing, a database especially so. We apply consistent standards for the whole database so that there are no surprises for any developers looking to get their hands dirty. We avoid abbreviations and prefixing the names of things. Abbreviations lend themselves to hiding a thing's purpose and also breed inconsistency. They increase the cognitive load of reading database queries, and they flow through to bad naming conventions in code, especially where code generation tools are used (ORMs).
The names in the database should also reflect the names in the UI. Building a CRM with "Opportunities"? Don't call them "Leads" in the database.
Discoverability is especially important for any developers who wants to integrate with a database in the future. The structure of the data should speak for itself, and this makes sure a system can work for our users for years to come.
Hopefully I have done well enough to avoid jargon and make clear certain database concepts and why they are important. This is a topic we are very passionate about and if you're interested, we would love to talk about it more.
Check out some more articles below.