Peter Brawley, Arthur Fuller
A database design specifies ...
Getting there is a 10-step ...
1 The Raw Requirement
List everything the application has to do, down to the last detail. Note that this must include
not just specifications of data; it also musty specify who will be accessing what data and who is allowed to
do what with such data. If multiple apps will use the database, include everything that all the apps have to do.
This first list needn't yet be organised by sequence or topic. It just has to be complete.
2 The Elements
Make a working copy of the Requirement. In it underline or otherwise mark every noun relevant to the requirement.
3 The Entities
Group the Elements from Step 2 into lists of attributes that belong together, such that ...
If you are using an ERD tool, how much table design you can get done in that tool depends on the its capabilities, but there'll always be much doubling back and forth amongst requirements and mapping documentation, database code, and the ERD tool.
4a: Remove as much redundancy as possible. If you are writing an order entry app,
the core data will include ...
customers
orders
orderitems
If your app is to manage classes and instructors across multiple colleges in multiple counties, the core
data will fall into a tree like this:
county
college
school
department
course
class section
class datetime
instructor
If you look at such a tree top-down, each entity that isn't a leaf has a parent-child or has-a
relationship to the next entity below it: a county has colleges, which have
schools, etc. That's a 1:many relationship. If you look at the tree bottom-up, each
entity except the root has a child-parent or belongs-to relationship to the entity above it:
a course belongs to a department, which belongs to a school, etc. This is a
many:1 relationship.
4b: Turn each entity into a table definition with a table name, a list of required
entity attributes as columns, and a primary key definition that guarantees row
uniqueness and if possible is world-independent (e.g., not a value generated by a
government agency). No, a person's or firm's name won't cut it as a primary key—how many
Jane Smiths are there in the world? If the key is (lastname,firstname,SSN)
,
Jane's primary key may change when she marries, messing up table relationships that
depend on that key value. Note too that in some countries including Canada,
certain external unique identifiers such as Social Insurance Number legally
can be used only for purposes relating to the government.
Sometimes there's a combination of unique never-changing real-world column values
that will serve as an adequate primary key. More often the safest bet is a
"surrogate" key auto_increment
integer ID column. Using such a
unique auto-generated arbitrary number to identify all table rows
guarantees row uniqueness and removes the need ever to edit such primary key values.
In MySQL, declare such keys integer_type unsigned primary key auto_increment
where integer_type is one of ...
4c: Make each table self-consistent. For example it seems obvious that a customer has an address. Unless the app will have homeless customers, that's usually the case. It seems natural then to give the customer table some address columns and leave it at that.
Combining addresses with other identifying info can be problematic. Changing an address doesn't change the customer; they're two different sorts of things. Many people and organisations have mutliple addresses (branch offices; school; summer, a particular job or project &c), not to mention the very affluent who have homes in several countries.
In database-speak, the relationship from customers to addresses is 1:many, so in many cases the customer table with address columns needs to become two tables: a customer table and an address table with a customer foreign key. Likewise an orders table with order item columns becomes an orders table and an order items table with an orders foreign key. Simpler, less redundant, more reportable, more maintainable.
4d: Give each child table a column that references its parent table,
to automate the database logic that keeps these table relationships intact. Such a column
is a foreign key. Do not carry IDs up to grandparents; that would introduce redundancy,
complicating data management and queries.
4e: Configure lookup tables not covered by steps 4c and 4d. A child can have multiple
parents. Any table in the chain may also belong to a table that's not in a main chain. For
example counties, schools and departments may each have a computerservicesID
column
that points to a computer services table, and a staffID
column pointing to a staff
table:
CREATE TABLE school (
ID int auto_increment primary key,
name varchar(50) not null Default 'Not known',
universityID int not null references university(ID),
computerservicesID int not null references computerservices(ID),
deanstaffID int not null references staff(ID),
...
)
That's how we model schools that share a computer service, staff who work in multiple schools &c.
The diagram begins to get more interesting ...
state <------------------+
| |
university <----- computerservices
| | |
school <---------------+ |
| |
department <--------------+
|
etc
4f: Add bridge tables for many-many relationships: Students take multiple tests
and each test is taken by multiple students; that's a many:many relationship. To
implement it, you need a bridge table, a students_testresults
table that has
its own primary key column, plus a foreign key column pointing to a students
table row,
and a foreign key column pointing to a particular testresults
table row.
Steps 4a through 4f bring your database to Third Normal Form,
roughly. We can get rid of yet more redundancy, e.g., by creating a schoolnames
table, replacing the name
column in schools
with a schoolnameID
column that points to the schoolnames
table row for that name. Now no name is stored twice.
Whether to take normalisation that far depends on how important the redundancy is for queries, and for
available storage space. Third Normal Form usually suffices.
5 The Actions
In the raw requirement (Step 1), underline or otherwise mark all the verbs to get a first
outline of what actions will have to be performed on the Tables. Then organise those actions into
query specifications (e.g., a shipping clerk begins by looking up an order by date, customer or
order number...).
6 Build the database from the normalised model
If you are using the ERD tool in MySQL WorkBench, a button click gets it done.
It's the first test of your model: does it parse, can your RDBMS actually buold the
database you defined? Inevitably, you'll find errors, inconsistencies, omissions. Rinse and repeat.
7 Seed the database with data
Populate your lookup tables, then insert test data into the database.
8 The Queries
This is the second real test of your database. Translate every Action from Step 5 into a query.
Notice that this includes all anticipated inputs and reports!
9 Test loop and revise as need be
You'll find that Step 8 uncovers some aspects of the database model that need touching up.
Loop back as far as need be to plug holes and correct errors.
10 Done
Write no application code till you arrive here.
For more, see ...