Databases and SQL
Creating and Modifying Data
Learning Objectives
- Write statements that create tables.
- Write statements to insert, modify, and delete records.
So far we have only looked at how to get information out of a database, both because it’s the operation you’ll do most frequenlty and because most other operations only make sense once queries are understood. If we want to create and modify data, we need to know two other sets of commands.
The first pair are CREATE TABLE
and DROP TABLE
. While they are written as two words, they are actually single commands. The first one creates a new table and its arguments are the names and types of the table’s columns. For example, the following statement creates a new table in our survey database:
CREATE TABLE SecondSurvey(taken INTEGER, person TEXT, quant TEXT, reading REAL);
We can get rid of one of our tables using:
DROP TABLE SecondSurvey;
Be very careful when doing this: most databases have some support for undoing changes, but it’s better not to have to rely on it.
Different database systems support different data types for table columns, but most use the following:
data type | use |
---|---|
INTEGER | a signed integer |
REAL | a floating point number |
TEXT | a character string |
BLOB | a “binary large object”, such as an image |
Most databases also support Booleans and date/time values; SQLite uses the integers 0 and 1 for Booleans, and represents the dates as discussed earlier. An increasing number of databases also support geographic data types such as latitude and longitude. Keeping track of what particular systems do or do not offer, and what names they give different data types, is an unending portability headache.
When we create a table, we can specify several kinds of constraints on its columns. For example, a better definition for the SecondSurvey
table would be:
CREATE TABLE SecondSurvey(
taken integer not null, -- where reading taken
person text, -- may not know who took it
quant text not null, -- the quantity measured
reading real not null, -- the actual reading
primary key(taken, quant),
foreign key(taken) references Visited(ident),
foreign key(person) references Person(ident)
);
Once again, exactly what constraints are available and what they’re called depends on which database manager we are using.
Once tables have been created, we can add, change, and remove records using our other set of commands, INSERT
, UPDATE
, and DELETE
.
The simplest form of INSERT
statement lists values in order:
CREATE TABLE JustLatLong(lat text, long text);
INSERT INTO JustLatLong values(-49.85, -128.57);
INSERT INTO JustLatLong values(-47.15, -126.72);
INSERT INTO JustLatLong values(-48.87, -123.40);
We can also insert values into one table directly from another:
INSERT INTO JustLatLong SELECT lat, long FROM Site;
Modifying existing records is done using the UPDATE
statement. To do this we tell the database which table we want to update, what we want to change the values to for any or all of the fields, and under what conditions we should update the values.
For example, if we made a mistake when entering the lat and long values of the last INSERT
statement above:
UPDATE Site SET lat=-47.87, long=-122.40 WHERE name='MSK-4';
Be careful to not forget the where
clause or the update statement will modify all of the records in the database.
Deleting records can be a bit trickier, because we have to ensure that the database remains internally consistent. If all we care about is a single table, we can use the DELETE
command with a WHERE
clause that matches the records we want to discard. For example, once we realize that Frank Danforth didn’t take any measurements, we can remove him from the Person
table like this:
DELETE FROM Person WHERE ident = 'danforth';
But what if we removed Anderson Lake instead? Our Survey
table would still contain seven records of measurements he’d taken, but that’s never supposed to happen: Survey.person
is a foreign key into the Person
table, and all our queries assume there will be a row in the latter matching every value in the former.
This problem is called referential integrity: we need to ensure that all references between tables can always be resolved correctly. One way to do this is to delete all the records that use 'lake'
as a foreign key before deleting the record that uses it as a primary key. If our database manager supports it, we can automate this using cascading delete. However, this technique is outside the scope of this chapter.
Replacing NULL
Write an SQL statement to replace all uses of null
in Survey.person
with the string 'unknown'
.
Backing Up with SQL
SQLite has several administrative commands that aren’t part of the SQL standard. One of them is .dump
, which prints the SQL commands needed to re-create the database. Another is .load
, which reads a file created by .dump
and restores the database. A colleague of yours thinks that storing dump files (which are text) in version control is a good way to track and manage changes to the database. What are the pros and cons of this approach? (Hint: records aren’t stored in any particular order.)