The basics of designing a database

Peter Brawley and Arthur Fuller

It's a 10-step:

1. The Requirement

Write down everything the application has to do, down to the last detail. If multiple apps will use the database, include everything that all the apps have to do. This first list needn't be organised by sequence or topic at this stage. It just has to be complete.

2. Elements

Make a working copy of the Requirement, and in it underline every noun relevant to the requirement.

3. Entities

Group the Elements from Step 2 into lists of attributes that belong together, so each list defines exactly one kind of thing that the app will have to deal with—a customer, a book, a widget &c—and no element is itself a list. A customer has a first name, last name, SSN, home phone and so on. An order has a customer reference, a date, a shipping method, perhaps a memo text field. An order item has an order reference, an inventory reference, a quantity and a base price. A book has a title, ISBN number, publisher, publication date &c. A widget has a name, a description, perhaps a patent number. 

4. Tables

Turn the Entities from Step 3 into table specifications:

A: Remove as much redundancy as possible. If you are writing an order entry app, the core data will be...

customers
  orders
    orderitems


If your app is to manage classes and instructors across multiple colleges in multiple counties, the core data will fall into a tree like this:

county
  college
    school
      department
        course
          class section
            class datetime
              instructor


If you look at the tree top-down, each entity that is not a leaf has a parent-child or has-a relationship to the next entity below it: a county has colleges, which have schools, etc. This is a 1:many relationship. If you look at the tree bottom-up, each entity except the root has a child-parent or belongs-to relationship to the entity above it: a course belongs to a department, which belongs to a school, etc. This is a many:1 relationship.

B: Turn each entity into a table specification with a name, and give the table an integer ID column to serve as its primary key so, for example, the system can distinguish two people named Jane Smith. In theory, we could combine several columns into a unique identifier, but that is inherently risky; for example if the key is (lastname,firstname,SSN), Jane's primary key changes when she marries. It is much simpler to use an arbitrary number to identify the rows in each table. Also, in some countries including Canada, certain unique identifiers such as Social Insurance Number legally can be used only for purposes relating to the government.

C: Make each table self-consistent. For example it seems obvious that a customer has an address. Unless you deal with the homeless, that's usually the case. It seems natural then to give the customer table some address columns and leave it at that. 

There are two problems with this. First, changing address does not change the customer; they're two different sorts of things. Second, many people have more than one address (address during the academic year, summer address, and so on---not to mention affluent customers who have homes in several countries). In database-speak, the relationship from customers to addresses is 1:many. Then a customer table with address columns becomes a customer table and an address table with a customer foreign key. Likewise an orders table with order item columns becomes an orders table and an order items table with an orders foreign key. Simpler, less redundant, more logical.

D: Give each child table a column that references its parent table, to automate the database logic that keeps these relationships intact. Such a column is a foreign key. Do not carry IDs up to grandparents; that would introduce redundancy, thus complicating data management and queries.

E: Configure lookup tables not covered by steps C and D: A child can have multiple parents. Any table in the chain may belong to a table that is not in a main chain. For example counties, schools and departments may each have a computerservicesID columns that points to a computer services table, and a staffID column pointing to a staff table:

CREATE TABLE school (
  ID int auto_increment primary key,
  name varchar(50) not null Default 'Not known', 
  universityID int not null references university(ID),
  computerservicesID int not null references computerservices(ID),
  deanstaffID int not null references staff(ID),
  -- ...
)

Thus many schools can share a computer service, staff can work in multiple schools, and so on. The diagram becomes more interesting:

  state <------------------+
     |                     |
    university <----- computerservices
        |                    |    |
      school <---------------+    |
          |                       |
        department <--------------+
           |
          etc 
F: Add bridge tables for many-many relationships: Each student takes multiple tests, and each test is taken by multiple students; that is a many:many relationship. To implement it, you need a bridge table, a students_testresults table which has its own primary key column, plus a foreign key column pointing to a students row, plus a foreign key column pointing to a particular testresults row. 

Steps 4A through 4F bring your database to Third Normal Form, roughly. We can get rid of yet more redundancy, for example by creating a table SchoolNames, and replacing the name column in schools with a schoolnameID that points to the schoolnames row for that name. Now no name is stored twice. Whether to take normalisation that far depends on how important a particular redundancy is for queries, and for available storage space.

5. Actions

In the Requirement (Step 1), underline all the verbs to get a first outline of what actions will have to be performed on the Tables. Then organise those actions into query specifications (eg a shipping clerk begins by looking up an order by date, customer or order number).

6. Build the database from the normalised model

It is no longer necessary to do this by hand—your data modelling tool can often do this with one button click. A terrific open source tool for diagrammiing a database from requirements is the Azzurri Clay modeller for Eclipse; see "Eclipse, an elegant MySQL IDE" at http://www.artfulsoftware.com/infotree/mysqltips.php. An alternative is MySQL Workbench.

7. Seed the database with data

Insert test data into the database.

8. Queries

Translate every Action from Step 5 into a query. Notice that this includes all required reports! Test each query.

9. Test, loop and revise as necessary

As you proceed, loop back as far as need be to plug holes and correct errors.

10. Done

Write no application code till you arrive here.

For more, see ...
Database design
Normalisation
Database design guide
Database resources on the web
Relational databases
Ten common database design mistakes
Writing a specification

Back to Artful ...