Published on ONLamp.com (http://www.onlamp.com/)
See this if you're having trouble printing code examples

## Introducing SQL Sets

08/02/2001

SQL has many language elements that are not commonly introduced in quick surveys of the language. Because SQL is based on relational algebra, it should come as no surprise that the standard contains several set-oriented operators.

In the interest of completeness, and to provide a tool that is extremely useful in dealing with heterogeneous data, it's time we take a look at the `UNION` operator.

### SQL and sets

If you were to dredge up the memories of middle school, you'll probably remember vague notions of learning about sets. If you have ever studied logic, you may have even worked with Venn diagrams.

Sets are simply groups of discrete elements. In the strict mathematical world, each set is made up of unique numerical elements; in the SQL world, a set is a collection of rows of data that may or may not contain duplicates. You've actually been working with SQL sets for quite a while -- queries are simply a set (strictly speaking, a subset) of a database table.

There are three simple ways to compare sets:

• creating a union of two (or more) sets;
• generating the intersection between two sets; and
• finding the exceptions (or differences) between two sets.

While all of these functions are defined in the SQL standard (at least in the SQL-92 standard), the exact implementation varies widely by vendor. Many database engines leave out some or all of these functions because they are not used as frequently as other commands. Or vendors may omit them because the function of the operators can be recreated using SQL and/or the database engine's programming language. In rough order of implementation, most databases allow the `UNION` operator, some provide the `INTERSECT` operator, and a few provide the functionality of the `EXCEPT` operator using a different operator name (for example, Oracle's `MINUS` operator).

### Using SQL sets

I'm sure that some of you may be wondering why I dedicated a column to this dusty corner of the SQL world. Fear not! There's a very good reason to learn about SQL sets, particularly the `UNION` operator.

Working with sets provides the flexibility to manipulate two tables that otherwise have no relationship -- where joins are not possible or useful. Let's start with an simple example. You have a database with a table of employees, and a table of suppliers.

 Employees EmployeeID EmployeeName Address ZIP 61 Sue Smith 100 South St. 12345 62 David Jones 2525 1st St. 12345

 Suppliers SupplierID Contact Address ZIP A1-674 Troy Parker 1100 Main St. 23456 ZZ-1A1 Claire Smith-Jones 400 East Main, Apt 5 56789

If you were sending Christmas cards to all suppliers and employees, you could certainly use a single table of name, address, and ZIP code data that could be the data source for a mail merge. Of course, you could create a new table, execute two SQL statements to insert the appropriate columns, and call it a day.

 SQL in a Nutshell By Kevin Kline with Daniel Kline, Ph.D. January 2001 1-56592-744-3, Order Number: 7443 224 pages, \$29.95

But what if this is an activity that you need to repeat frequently? Or what if the dataset is large enough that it is impractical to create a new database table? Or what if you simply don't have the permissions to do anything other than `SELECT` queries in your database? A `UNION` query is the answer!

The syntax for a `UNION` query is very straightforward:

`query 1 UNION [ALL] query2 [ORDER BY sort_order]`

The optional `ALL` keyword indicates the `UNION` should include all duplicates which would be ignored by default. So to create our table of addresses, we can use the following SQL statement

```SELECT Contact AS Name, Address, ZIP FROM Suppliers UNION SELECT EmployeeName AS Name, Address, ZIP FROM Employees```

which would produce a table something like the following:

 Name Address ZIP Sue Smith 100 South St. 12345 David Jones 2525 1st St. 12345 Troy Parker 1100 Main St. 23456 Claire Smith-Jones 400 East Main, Apt 5 56789

This doesn't seem like such a big deal. But what if the data tables were somewhat more heterogeneous? One interesting thing about the `UNION` operator is that in general, any two queries can be joined as long as they have the same number of columns. Note that the data types don't have to be the same. That's pretty useful!

In the past, we've discussed using one-to-one tables to store specific data such as information about a music CD or a book that is linked to the table of common data about items in a store. If you instead chose to create a table of CDs and a table of books, you could create a SQL statement like the following that will generate a single result set consisting heterogeneous data:

```SELECT CatalogID, Price, Description, PlayingTime AS Custom1, NumOfTracks AS Custom2, Artist AS Custom3, Label AS Custom4 FROM CDs UNION SELECT CatalogID, Price, Description, PageCount AS Custom1, Author AS Custom2, PublishDate AS Custom3, ISBN AS Custom4 FROM Books```

In this scenario, not only do the columns in the new set contain values that mean different things, they can even contain different types of data (playing time is probably a time or text field, the page count is probably an integer).

### Next steps

Why don't more people use `UNION` queries? And what about `INTERSECT` and `EXCEPT`? The truth is that these operations can be computationally intensive and there are often alternatives -- such as pulling out two data sets and using a C++ program to create the union, intersection, or difference set. But if your database supports them, they can occasionally be a lifesaver.

Starting in the next column, we're going to switch from discussing SQL for queries to discussing SQL statements for constructing tables and databases. That topic will occupy us for several columns and lead directly in to issues such as triggers, constraints, and all the myriad other tools that can save you time and effort. Until then, feel free to contact me with comments and questions.

John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.