Homework: Normalizing the Open Source Shakespeare Database

For this assignment, you will analyze the schema of the the Open Source Shakespeare database and bring it into 3NF. We will not work with the wordform table.

To begin, examine the work, character, character_work, chapter, and paragraph tables (all tables except wordform) and state the higest normal form (up to 3NF) that the table satisfies. (For example, if you state 2NF, you are indicating that the table is not in 3NF). Explain how the table fails to satisfy the requirements of the lowest normal form that it fails. Consider issues of data redundancy, modification anomalies, and functional dependencies.

Then modify all tables in the Shakespeare ERD to bring them into 3NF. You may access a LucidChart ERD for the current schema. Duplicate this in your own account, and modify your copy.

You should color code your objects to make clear what is new and what is old:

  • You may find that you need to add tables not in the schema. If you do so, apply a green fill to any new tables. Also use green lines for any new relationships, and green text for any new columns in existing tables.
  • You may find that you need to delete columns, relationships, or entire tables. Apply a red color to the text, lines, or fill of any columns, relationships, or tables that you intend to delete.
  • You may find that some columns can be calculated based on other data in the tables. For example, we saw in a previous assignment that each character’s speech count can be calculated by counting rows in the paragraph table. Therefore, the speechcount column in the character table is superfluous and could be deleted, and a view (stored query) created that calculates the value when needed. Apply a purple color to the names of columns that you could calculate instead of store.

Indicate primary keys and foreign keys in all tables with “PK” and “FK”. Remember that a column can be both a PK and FK. Make sure that relationship lines physically connect attach to the columns they reference. Have all foreign keys reference a single primary key column. To facilitate this, all tables except link tables should have an atomic primary key. Link tables will still require a composite primary key to facilitate a many-to-many relationships between two other entities.

Deliverables

Submit one PDF with, first, an image of the new ERD, followed by your discussion of the normal form that each of the original tables satisfied before your modifications. The ERD can be downloaded from Lucidchart in JPG or PNG format and embedded in a word processor document (Google Docs, LibreOffice Writer, MS Word). Then download the document as PDF to submit.