Chapter 8, Databases

What is a database?


What are databases used for -- what would your life be like without databases? Have you...

If you can answer yes to ANY of these questions, you have interacted with databases that have information about you!

Basic concepts and terminology

Why Spreadsheets are not enough

Q: If databases are so much like spreadsheets, why not just use spreadsheets?

A: When the collection of information is complex, you run into problems with maintaining consistency and storing redundant information.

Example: Information for course registration, spreadsheet style
NameEmailSectionRoom
Mary Lambml@oc.edu1040Towers 114
Mary Lambml@oc.edu2915Roush 213
Mary Lambml@oc.edu3210Towers 110
Joe Klinejk@oc.edu2915Roush 213
Joe Klinejk@oc.edu1040Towers 114
Joe Klinejk@oc.edu5678Battelle 210
. . .. . .. . .. . .

Each row represents one student registered for one section. In reality, more data are required for each registration but this will illustrate.

Databases, and relational databases in particular, will address these problems. See modified example below.


Relational databases

Example: Reconfigure the registration information above into a relational database

Students
IDNameEmail
109999Mary Lambml@oc.edu
108888Joe Klinejk@oc.edu
. . .. . .. . .
         
Registrations
IDSection
1099991040
1099992915
1099993210
1088882915
1088881040
1088885678
. . .. . .
         
Sections
SectionRoom
1040Towers 114
2915Roush 213
3210Towers 110
5678Battelle 210
. . .. . .

There are now three tables. The ID field is key field for the Students table and the Section field is key field for the Sections table. Each row in the Registrations table represents one student registered for one section.

Given a record in the Registrations table:

If the student name changes, only one record needs to be changed! If the room assignment changes, only one record needs to be changed! Data are consistent because there is only one copy. And the duplication problem also disappears!

Ordering Records

Finding data

Search Engine Queries

SQL: a language for forming database queries


Microsoft Access as illustration

Here is a view of the Northwind database, with the list of tables shown.

You can get an idea of the nature of Northwind just by reading the names of the various tables.  This is a relational database, and the relationships are shown in this illustration.  The list of identifiers in each box is the list of fields in that table, and the lines represent relationships.


 
 

Databases and privacy

The textbook briefly describes a plethora of database-related privacy horror-stores, most caused by criminals creating or stealing an "identity" or mistakes made by credit bureaus.

What are some of the privacy concerns related with databases?

Did you know that the U.S. Constitution does not explicitly address the right to privacy?

Do you feel the greater privacy threat is government, or the private sector?

Sometimes it is possible to uniquely identify you and reveal information about you by combining different pieces of information gleaned from multiple unrelated databases.  This is called record matching.  Geneologists have long depended on this technique to learn about their ancestors. The government depends on this technique to identify potential terrorists.

Sometimes even summaries of anonymous survey results can uniquely identify you.  For example, my previous university compiles and publishes an annual report based on faculty survey information.  One of the summary categories is average salary by age group (20-29, 30-39, ..., 70-above).  There is one professor in the "70 - above" age group.  The survey is not anonymous for him.