
Relational Operators
JOINS
The UNION, INTERSECT, and EXCEPT operators are valuable in multitable databases in which the tables are union-compatible. In many cases, however, you want to draw data from multiple tables that have very little in common. JOINs are powerful relational operators that combine data from multiple tables into a single result table. The source tables may have little (or even nothing) in common with each other.
SQL:2003 supports a number of types of JOINs. The best one to choose in a given situation depends on the result you’re trying to achieve.
Basic JOIN
Any multitable query is a type of JOIN. The source tables are joined in the sense that the result table includes information taken from all the source tables. The simplest JOIN is a two-table SELECT that has no WHERE clause qualifiers. Every row of the first table is joined to every row of the second table. The result table is the Cartesian product of the two source tables. The number of rows in the result table is equal to the number of rows in the first source table multiplied by the number of rows in the second source table.
For example, imagine that you’re the personnel manager for a company and that part of your job is to maintain employee records. Most employee data, such as home address and telephone number, is not particularly sensitive.
But some data, such as current salary, should be available only to authorized personnel. To maintain security of the sensitive information, keep it in a sepa-rate table that is password protected. Consider the following pair of tables:
EMPLOYEE COMPENSATION
-------- ------------
EmpID Employ
FName Salary
LName Bonus
City
Phone
Fill the tables with some sample data:
EmpID
|
FName
|
LName
|
City
|
Phone
|
-----
|
-----
|
-----
|
----
|
-----
|
1
|
Whitey
|
Ford
|
Orange
|
555-1001
|
2
|
Don
|
Larson
|
Newark
|
555-3221
|
3
|
Sal
|
Maglie
|
Nutley
|
555-6905
|
4
|
Bob
|
Turley
|
Passaic
|
555-8908
|
Employ
|
Salary
|
Bonus
|
||
------
|
------
|
-----
|
||
1
|
33000
|
10000
|
||
2
|
18000
|
2000
|
||
3
|
24000
|
5000
|
||
4
|
22000
|
7000
|
||
Create a virtual result table with the following query:
SELECT *
FROM EMPLOYEE, COMPENSATION ;
Producing:
EmpID FName LName City Phone Employ Salary Bonus
----- ----- ----- ---- ----- ------ ------ -----
1 Whitey Ford Orange 555-1001 1 33000 10000
1 Whitey Ford Orange 555-1001 2 18000 2000
1 Whitey Ford Orange 555-1001 3 24000 5000
1 Whitey Ford Orange 555-1001 4 22000 7000
2 Don Larson Newark 555-3221 1 33000 10000
2 Don Larson Newark 555-3221 2 18000 2000
2 Don Larson Newark 555-3221 3 24000 5000
2 Don Larson Newark 555-3221 4 22000 7000
3 Sal Maglie Nutley 555-6905 1 33000 10000
3 Sal Maglie Nutley 555-6905 2 18000 2000
3 Sal Maglie Nutley 555-6905 3 24000 5000
3 Sal Maglie Nutley 555-6905 4 22000 7000
4 Bob Turley Passaic 555-8908 1 33000 10000
4 Bob Turley Passaic 555-8908 2 18000 2000
4 Bob Turley Passaic 555-8908 3 24000 5000
4 Bob Turley Passaic 555-8908 4 22000 7000
The result table, which is the Cartesian product of the EMPLOYEE and COMPENSATION tables, contains considerable redundancy. Furthermore, it doesn’t make much sense. It combines every row of EMPLOYEE with every row of COMPENSATION. The only rows that convey meaningful information are those in which the EmpID number that came from EMPLOYEE matches the Employ number that came from COMPENSATION. In those rows, an employee’s name and address are associated with that same employee’s compensation.
When you’re trying to get useful information out of a multitable database, the Cartesian product produced by a basic JOIN is almost never what you want, but it’s almost always the first step toward what you want. By applying con-straints to the JOIN with a WHERE clause, you can filter out the unwanted rows. The most common JOIN that uses the WHERE clause filter is the equi-join.
Equi-join
An equi-join is a basic join with a WHERE clause containing a condition specify-ing that the value in one column in the first table must be equal to the value of a corresponding column in the second table. Applying an equi-join to the example tables from the previous section brings a more meaningful result:
SELECT *
FROM EMPLOYEE, COMPENSATION
WHERE EMPLOYEE.EmpID = COMPENSATION.Employ ;
This produces:
EmpID
|
FName
|
LName
|
City
|
Phone
|
Employ
Salary Bonus
|
||
-----
|
------
|
-----
|
----
|
-----
|
------ ------ -----
|
||
1
|
Whitey
|
Ford
|
Orange
|
555-1001
|
1
|
33000
|
10000
|
2
|
Don
|
Larson
|
Newark
|
555-3221
|
2
|
18000
|
2000
|
3
|
Sal
|
Maglie
|
Nutley
|
555-6905
|
3
|
24000
|
5000
|
4
|
Bob
|
Turley
|
Passaic
|
555-8908
|
4
|
22000
|
7000
|
In this result table, the salaries and bonuses on the right apply to the employees named on the left. The table still has some redundancy because the EmpID column duplicates the Employ column. You can fix this problem with a slight reformulation of the query:
SELECT EMPLOYEE.*,COMPENSATION.Salary,COMPENSATION.Bonus
FROM EMPLOYEE, COMPENSATION
WHERE EMPLOYEE.EmpID = COMPENSATION.Employ ;
This produces:
EmpID FName LName City Phone Salary Bonus
----- ----- ----- ---- ----- ------ -----
1 Whitey Ford Orange 555-1001 33000 10000
2 Don Larson Newark 555-3221 18000 2000
3 Sal Maglie Nutley 555-6905 24000 5000
4 Bob Turley Passaic 555-8908 22000 7000
This table tells you what you want to know, but doesn’t burden you with any extraneous data. The query is somewhat tedious to write, however. To avoid ambiguity, qualify the column names with the names of the tables they came from. Writing those table names repeatedly provides good exercise for the fingers but has no merit otherwise.
You can cut down on the amount of typing by using aliases (or correlation names). An alias is a short name that stands for a table name. If you use aliases in recasting the preceding query, it comes out like this:
SELECT E.*, C.Salary, C.Bonus
FROM EMPLOYEE E, COMPENSATION C
WHERE E.EmpID = C.Employ ;
In this example, E is the alias for EMPLOYEE, and C is the alias for COMPENSATION. The alias is local to the statement it’s in. After you declare an alias (in the FROM clause), you must use it throughout the statement. You can’t use both the alias and the long form of the table name.
Mixing the long form of table names with aliases creates confusion. Consider the following example, which is confusing:
SELECT T1.C, T2.C
FROM T1 T2, T2 T1
WHERE T1.C > T2.C ;
In this example, the alias for T1 is T2, and the alias for T2 is T1. Admittedly, this isn’t a smart selection of aliases, but it isn’t forbidden by the rules. If you mix aliases with long-form table names, you can’t tell which table is which.
The preceding example with aliases is equivalent to the following SELECT with no aliases:
SELECT T2.C, T1.C
FROM T1 , T2
WHERE T2.C > T1.C ;
SQL:2003 enables you to join more than two tables. The maximum number varies from one implementation to another. The syntax is analogous to the two-table case:
SELECT E.*, C.Salary, C.Bonus, Y.TotalSales
FROM EMPLOYEE E, COMPENSATION C, YTD_SALES Y
WHERE E.EmpID = C.Employ
AND C.Employ = Y.EmpNo ;
This statement performs an equi-join on three tables, pulling data from corresponding rows of each one to produce a result table that shows the sales-people’s names, the amount of sales they are responsible for, and their com-pensation. The sales manager can quickly see whether compensation is in line with production.
Storing a salesperson’s year-to-date sales in a separate YTD_SALES table ensures better performance and reliability than keeping that data in the EMPLOYEE table. The data in the EMPLOYEE table is relatively static. A person’s name, address, and telephone number don’t change very often. In contrast, the year-to-date sales change frequently (you hope). Because YTD_SALES has fewer columns than EMPLOYEE, you may be able to update it more quickly. If, in the course of updating sales totals, you don’t touch the EMPLOYEE table, you decrease the risk of accidentally modifying EMPLOYEE information that should stay the same.
Cross join
CROSS JOIN is the keyword for the basic join without a WHERE clause.
Therefore,
SELECT *
FROM EMPLOYEE, COMPENSATION ;
can also be written as:
SELECT *
FROM EMPLOYEE CROSS JOIN COMPENSATION ;
The result is the Cartesian product (also called the cross product) of the two source tables. CROSS JOIN rarely gives you the final result you want, but it can be useful as the first step in a chain of data manipulation operations that ultimately produce the desired result.
Natural join
The natural join is a special case of an equi-join. In the WHERE clause of an equi-join, a column from one source table is compared with a column of a second source table for equality. The two columns must be the same type and length and must have the same name. In fact, in a natural join, all columns in one table that have the same names, types, and lengths as corresponding columns in the second table are compared for equality.
Imagine that the COMPENSATION table from the preceding example has columns EmpID, Salary, and Bonus rather than Employ, Salary, and Bonus. In that case, you can perform a natural join of the COMPENSATION table with the EMPLOYEE table. The traditional JOIN syntax would look like this:
SELECT E.*, C.Salary, C.Bonus
FROM EMPLOYEE E, COMPENSATION C
WHERE E.EmpID = C.EmpID ;
This query is a natural join. An alternate syntax for the same operation is the following:
SELECT E.*, C.Salary, C.Bonus
FROM EMPLOYEE E NATURAL JOIN COMPENSATION C ;
Condition join
A condition join is like an equi-join, except the condition being tested doesn’t have to be equal (although it can be). It can be any well-formed predicate. If the condition is satisfied, the corresponding row becomes part of the result table. The syntax is a little different from what you have seen so far, in that the condition is contained in an ON clause rather than a WHERE clause.
Say that a baseball statistician wants to know which National League pitchers have pitched the same number of complete games as one or more American League pitchers. This question is a job for an equi-join, which can also be expressed with condition join syntax:
SELECT *
FROM NATIONAL JOIN AMERICAN
ON NATIONAL.CompleteGames = AMERICAN.CompleteGames ;
Column-name join
The column-name join is like a natural join, but it’s more flexible. In a natural join, all the source table columns that have the same name are compared with each other for equality. With the column-name join, you select which same-name columns to compare. You can choose them all if you want, making the column-name join effectively a natural join. Or you may choose fewer than all same-name columns. In this way, you have a great degree of control over which cross product rows qualify to be placed into your result table.
Say that you’re a chess set manufacturer and have one inventory table that keeps track of your stock of white pieces and another that keeps track of black pieces. The tables contain data as follows:
WHITE
|
BLACK
|
||||
-----
|
-----
|
||||
Piece
Quant Wood
|
Piece
|
Quant
Wood
|
|||
-----
----- ----
|
-----
|
-----
----
|
|||
King
|
502
|
Oak
|
King
|
502
|
Ebony
|
Queen
|
398
|
Oak
|
Queen
|
397
|
Ebony
|
Rook
|
1020
|
Oak
|
Rook
|
1020
|
Ebony
|
Bishop
|
985
|
Oak
|
Bishop
|
985
|
Ebony
|
Knight
|
950
|
Oak
|
Knight
|
950
|
Ebony
|
Pawn
|
431
|
Oak
|
Pawn
|
453
|
Ebony
|
For each piece type, the number of white pieces should match the number of black pieces. If they don’t match, some chessmen are being lost or stolen, and you need to tighten security measures.
A natural join compares all columns with the same name for equality. In this case, a result table with no rows is produced because no rows in the WOOD column in the WHITE table match any rows in the WOOD column in the BLACK table. This result table doesn’t help you determine whether any merchandise is missing. Instead, do a column-name join that excludes the WOOD column from consideration. It can take the following form:
SELECT *
FROM WHITE JOIN BLACK
USING (Piece, Quant) ;
The result table shows only the rows for which the number of white pieces in stock equals the number of black pieces:
Piece Quant
|
Wood
|
Piece
|
Quant
|
Wood
|
|
----- -----
|
----
|
-----
|
-----
|
----
|
|
King
|
502
|
Oak
|
King
|
502
|
Ebony
|
Rook
|
1020
|
Oak
|
Rook
|
1020
|
Ebony
|
Bishop
|
985
|
Oak
|
Bishop
|
985
|
Ebony
|
Knight
|
950
|
Oak
|
Knight
|
950
|
Ebony
|
The shrewd person can deduce that Queen and Pawn are missing from the list, indicating a shortage somewhere for those piece types.
Inner join
By now, you’re probably getting the idea that joins are pretty esoteric and that it takes an uncommon level of spiritual discernment to deal with them adequately. You may have even heard of the mysterious inner join and specu-lated that it probably represents the core or essence of relational operations.
Well, ha! The joke is on you. There’s nothing mysterious about inner joins. In fact, all the joins covered so far in this article are inner joins. I could have formulated the column-name join in the last example as an inner join by using the following syntax:
SELECT *
FROM WHITE INNER JOIN BLACK
USING (Piece, Quant) ;
The result is the same.
The inner join is so named to distinguish it from the outer join. An inner join discards all rows from the result table that don’t have corresponding rows in both source tables. An outer join preserves unmatched rows. That’s the dif-ference. There is nothing metaphysical about it.
Outer join
When you’re joining two tables, the first one (call it the one on the left) may have rows that don’t have matching counterparts in the second table (the one on the right). Conversely, the table on the right may have rows that don’t have matching counterparts in the table on the left. If you perform an inner join on those tables, all the unmatched rows are excluded from the output. Outer joins, however, don’t exclude the unmatched rows. Outer joins come in three types: the left outer join, the right outer join, and the full outer join.
Left outer join
In a query that includes a join, the left table is the one that precedes the key-word JOIN, and the right table is the one that follows it. The left outer join pre-serves unmatched rows from the left table but discards unmatched rows from the right table.
To understand outer joins, consider a corporate database that maintains records of the company’s employees, departments, and locations. Tables 1, 2, and 3 contain the database’s example data.
Table
|
LOCATION
|
LOCATION_ID
|
CITY
|
1
|
Boston
|
3
|
Tampa
|
5
|
Chicago
|
Table
|
DEPT
|
|
DEPT_ID
|
LOCATION_ID
|
NAME
|
21
|
1
|
Sales
|
24
|
1
|
Admin
|
27
|
5
|
Repair
|
29
|
5
|
Stock
|

Table
|
EMPLOYEE
|
|
EMP_ID
|
DEPT_ID
|
NAME
|
61
|
24
|
Kirk
|
63
|
27
|
McCoy
|
Now suppose that you want to see all the data for all employees, including department and location. You get this with an equi-join:
SELECT *
FROM LOCATION L, DEPT D, EMPLOYEE E
WHERE L.LocationID = D.LocationID
AND D.DeptID = E.DeptID ;
This statement produces the following result:
1 Boston 24 1 Admin 61 24 Kirk
5 Chicago 27 5 Repair 63 27 McCoy
This result table gives all the data for all the employees, including their loca-tion and department. The equi-join works because every employee has a location and a department.
Suppose now that you want the data on the locations, with the related department and employee data. This is a different problem because a loca-tion without any associated departments may exist. To get what you want, you have to use an outer join, as in the following example:
SELECT *
FROM LOCATION L LEFT OUTER JOIN DEPT D
ON (L.LocationID = D.LocationID)
LEFT OUTER JOIN EMPLOYEE E
ON (D.DeptID = E.DeptID);
This join pulls data from three tables. First, the LOCATION table is joined to the DEPT table. The resulting table is then joined to the EMPLOYEE table. Rows from the table on the left of the LEFT OUTER JOIN operator that have no corresponding row in the table on the right are included in the result. Thus, in the first join, all locations are included, even if no department associated with them exists. In the second join, all departments are included, even if no employee associated with them exists. The result is as follows:
1
|
Boston
|
24
|
1
|
Admin
|
61
|
24
|
Kirk
|
5
|
Chicago
|
27
|
5
|
Repair
|
63
|
27
|
McCoy
|
3
|
Tampa
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
5
|
Chicago
|
29
|
5
|
Stock
|
NULL
|
NULL
|
NULL
|
1
|
Boston
|
21
|
1
|
Sales
|
NULL
|
NULL
|
NULL
|
The first two rows are the same as the two result rows in the previous example. The third row (3 Tampa) has nulls in the department and employee columns because no departments are defined for Tampa and no employees are stationed there. The fourth and fifth rows (5 Chicago and 1 Boston) contain data about the Stock and the Sales departments, but the employee columns for these rows contain nulls because these two departments have no employees. This outer join tells you everything that the equi-join told you plus the following:
All the company’s locations, whether they have any departments or not All the company’s departments, whether they have any employees or not
The rows returned in the preceding example aren’t guaranteed to be in the order you want. The order may vary from one implementation to the next. To make sure that the rows returned are in the order you want, add an ORDER BY clause to your SELECT statement, like this:
SELECT *
FROM LOCATION L LEFT OUTER JOIN DEPT D
ON (L.LocationID = D.LocationID)
LEFT OUTER JOIN EMPLOYEE E
ON (D.DeptID = E.DeptID)
ORDER BY L.LocationID, D.DeptID, E.EmpID;
You can abbreviate the left outer join language as LEFT JOIN because there’s no such thing as a left inner join.
Right outer join
I bet you figured out how the right outer join behaves. Right! The right outer join preserves unmatched rows from the right table but discards unmatched rows from the left table. You can use it on the same tables and get the same result by reversing the order in which you present tables to the join:
SELECT *
FROM EMPLOYEE E RIGHT OUTER JOIN DEPT D
ON (D.DeptID = E.DeptID)
RIGHT OUTER JOIN LOCATION L
ON (L.LocationID = D.LocationID) ;
In this formulation, the first join produces a table that contains all depart-ments, whether they have an associated employee or not. The second join produces a table that contains all locations, whether they have an associated department or not.
You can abbreviate the right outer join language as RIGHT JOIN because there’s no such thing as a right inner join.
Full outer join
The full outer join combines the functions of the left outer join and the right outer join. It retains the unmatched rows from both the left and the right tables. Consider the most general case of the company database used in the preceding examples. It could have
Locations with no departments Departments with no locations Departments with no employees Employees with no departments
To show all locations, departments, and employees, regardless of whether they have corresponding rows in the other tables, use a full outer join in the following form:
SELECT *
FROM LOCATION L FULL JOIN DEPT D
ON (L.LocationID = D.LocationID)
FULL JOIN EMPLOYEE E
ON (D.DeptID = E.DeptID) ;
You can abbreviate the full outer join language as FULL JOIN because there’s no such thing as a full inner join.
Union join
Unlike the other kinds of join, the union join makes no attempt to match a row from the left source table with any rows in the right source table. It cre-ates a new virtual table that contains the union of all the columns in both source tables. In the virtual result table, the columns that came from the left source table contain all the rows that were in the left source table. For those rows, the columns that came from the right source table all have the null value. Similarly, the columns that came from the right source table contain all the rows that were in the right source table. For those rows, the columns that came from the left source table all have the null value. Thus, the table result-ing from a union join contains all the columns of both source tables, and the number of rows that it contains is the sum of the number of rows in the two source tables.
The result of a union join by itself is not immediately useful in most cases; it produces a result table with many nulls in it. Look at an example.
Suppose that you work for a company that designs and builds experimental rockets. You have several projects in the works. You also have several design engineers who have skills in multiple areas. As a manager, you want to know which employees, having which skills, have worked on which projects. Currently, this data is scattered among the EMPLOYEE table, the PROJECTS table, and the SKILLS table.
The EMPLOYEE table carries data about employees, and EMPLOYEE.EmpID is its primary key. The PROJECTS table has a row for each project that an employee has worked on. PROJECTS.EmpID is a foreign key that references the EMPLOYEE table. The SKILLS table shows the expertise of each employee. SKILLS.EmpID is a foreign key that references the EMPLOYEE table.
The EMPLOYEE table has one row for each employee; the PROJECTS table and the SKILLS table have zero or more rows.
Tables 1, 2, and 3 show example data in the three tables.
Table
|
EMPLOYEE
Table
|
EmpID
|
Name
|
1
|
Ferguson
|
2
|
Frost
|
3
|
Toyon
|
Table
|
PROJECTS
Table
|
ProjectName
|
EmpID
|
X-63 Structure
|
1
|
X-64
Structure
|
1
|
X-63 Guidance
|
2
|
X-64 Guidance
|
2
|
X-63
Telemetry
|
3
|
X-64
Telemetry
|
3
|

Table
|
SKILLS Table
|
Skill
|
EmpID
|
Mechanical
Design
|
1
|
Aerodynamic
Loading
|
1
|
Analog Design
|
2
|
Gyroscope
Design
|
2
|
Digital
Design
|
3
|
R/F Design
|
3
|
From the tables, you can see that Ferguson has worked on X-63 and X-64 structure design and has expertise in mechanical design and aerodynamic loading.
Now suppose that, as a manager, you want to see all the information about all the employees. You decide to apply an equi-join to the EMPLOYEE, PROJECTS, and SKILLS tables:
SELECT *
FROM EMPLOYEE E, PROJECTS P, SKILLS S
WHERE E.EmpID = P.EmpID
AND E.EmpID = S.EmpID ;
You can express this same operation as an inner join by using the following syntax:
SELECT *
FROM EMPLOYEE E INNER JOIN PROJECTS P
ON (E.EmpID = P.EmpID)
INNER JOIN SKILLS S
ON (E.EmpID = S.EmpID) ;
Both formulations give the same result, as shown in Table
Table
|
Result of Inner Join
|
||||
E.EmpID
|
Name
|
P.EmpID
|
ProjectName
|
S.EmpID
|
Skill
|
1
|
Ferguson
|
1
|
X-63
Structure
|
1
|
Mechanical
|
Design
|
|||||
1
|
Ferguson
|
1
|
X-63
Structure
|
1
|
Aerodynamic
|
Loading
|
|||||
1
|
Ferguson
|
1
|
X-64
Structure
|
1
|
Mechanical
|
Design
|
|||||
1
|
Ferguson
|
1
|
X-64
Structure
|
1
|
Aerodynamic
|
Loading
|
|||||
2
|
Frost
|
2
|
X-63 Guidance
|
2
|
Analog Design
|
2
|
Frost
|
2
|
X-63 Guidance
|
2
|
Gyroscope
Design
|
2
|
Frost
|
2
|
X-64 Guidance
|
2
|
Analog Design
|
2
|
Frost
|
2
|
X-64 Guidance
|
2
|
Gyroscope
Design
|
3
|
Toyon
|
3
|
X-63 Telemetry
|
3
|
Digital
Design
|
3
|
Toyon
|
3
|
X-63 Telemetry
|
3
|
R/F Design
|
3
|
Toyon
|
3
|
X-64 Telemetry
|
3
|
Digital
Design
|
3
|
Toyon
|
3
|
X-64 Telemetry
|
3
|
R/F Design
|
This data arrangement is not particularly enlightening. The employee ID numbers appear three times, and the projects and skills are duplicated for each employee. The inner joins are not well suited to answering this type of question. You can put the union join to work here, along with some strategi-cally chosen SELECT statements, to produce a more suitable result. You begin with the basic union join:
SELECT *
FROM EMPLOYEE E UNION JOIN PROJECTS P
UNION JOIN SKILLS S ;
Notice that the union join has no ON clause. It doesn’t filter the data, so an ON clause isn’t needed. This statement produces the result shown in Table
Table
|
Result of Union Join
|
||||
E.EmpID
|
Name
|
P.EmpID
|
ProjectName
|
S.EmpID
|
Skill
|
1
|
Ferguson
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
1
|
X-63
Structure
|
NULL
|
NULL
|
NULL
|
NULL
|
1
|
X-64
Structure
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
1
|
Mechanical
|
Design
|
|||||
NULL
|
NULL
|
NULL
|
NULL
|
1
|
Aerodynamic
|
Loading
|
|||||
2
|
Frost
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
2
|
X-63 Guidance
|
NULL
|
NULL
|
NULL
|
NULL
|
2
|
X-64 Guidance
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
2
|
Analog Design
|
NULL
|
NULL
|
NULL
|
NULL
|
2
|
Gyroscope
|
Design
|
|||||
3
|
Toyon
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
3
|
X-63 Telemetry
|
NULL
|
NULL
|
NULL
|
NULL
|
3
|
X-64 Telemetry
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
NULL
|
3
|
Digital
Design
|
NULL
|
NULL
|
NULL
|
NULL
|
3
|
R/F Design
|
Each table has been extended to the right or left with nulls, and those null-extended rows have been unioned. The order of the rows is arbitrary and depends on the implementation. Now you can massage the data to put it in a more useful form.
Notice that the table has three ID columns, only one of which is nonnull in any row. You can improve the display by coalescing the ID columns. As I note in previous article, the COALESCE expression takes on the value of the first nonnull value in a list of values. In the present case, it takes on the value of the only nonnull value in a column list:
SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,
E.Name, P.ProjectName, S.Skill
FROM EMPLOYEE E UNION JOIN PROJECTS P
UNION JOIN SKILLS S
ORDER BY ID ;
The FROM clause is the same as in the previous example, but now you’re coa-lescing the three EMP_ID columns into a single column named ID. You’re also ordering the result by ID. Table shows the result.
Table Result of Union Join with
COALESCE Expression

ID
|
Name
|
ProjectName
|
Skill
|
1
|
Ferguson
|
X-63
Structure
|
NULL
|
1
|
Ferguson
|
X-64
Structure
|
NULL
|
1
|
Ferguson
|
NULL
|
Mechanical
Design
|
1
|
Ferguson
|
NULL
|
Aerodynamic
Loading
|
2
|
Frost
|
X-63 Guidance
|
NULL
|
2
|
Frost
|
X-64 Guidance
|
NULL
|
2
|
Frost
|
NULL
|
Analog Design
|
2
|
Frost
|
NULL
|
Gyroscope
Design
|
3
|
Toyon
|
X-63
Telemetry
|
NULL
|
3
|
Toyon
|
X-64
Telemetry
|
NULL
|
3
|
Toyon
|
NULL
|
Digital
Design
|
3
|
Toyon
|
NULL
|
R/F Design
|
Each row in this result has data about a project or a skill, but not both. When you read the result, you first have to determine what type of information is in each row (project or skill). If the ProjectName column has a nonnull value, the row names a project the employee has worked on. If the Skill column is non-null, the row names one of the employee’s skills.
You can make the result a little clearer by adding another COALESCE to the
SELECT statement, as follows:
SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,
E.Name, COALESCE (P.Type, S.Type) AS Type,
P.ProjectName, S.Skill
FROM EMPLOYEE E
UNION JOIN (SELECT “Project” AS Type, P.*
FROM PROJECTS) P
UNION JOIN (SELECT “Skill” AS Type, S.*
FROM SKILLS) S
ORDER BY ID, Type ;
In this union join, the PROJECTS table in the previous example has been replaced with a nested SELECT that appends a column named P.Type with a constant value “Project” to the columns coming from the PROJECTS table. Similarly, the SKILLS table has been replaced with a nested SELECT that appends a column named S.Type with a constant value “Skill” to the columns coming from the SKILLS table. In each row, P.Type is either null or “Project”, and S.Type is either null or “Skill”.
The outer SELECT list specifies a COALESCE of those two Type columns into a single column named Type. You then specify Type in the ORDER BY clause, which sorts the rows that all have the same ID so that all projects are first, followed by all the skills. The result is shown in Table
Table
|
Refined
Result of Union Join with
|
|||
COALESCE
Expressions
|
||||
ID
|
Name
|
Type
|
ProjectName
|
Skill
|
1
|
Ferguson
|
Project
|
X-63 Structure
|
NULL
|
1
|
Ferguson
|
Project
|
X-64
Structure
|
NULL
|
1
|
Ferguson
|
Skill
|
NULL
|
Mechanical
Design
|
1
|
Ferguson
|
Skill
|
NULL
|
Aerodynamic
Loading
|
2
|
Frost
|
Project
|
X-63 Guidance
|
NULL
|
2
|
Frost
|
Project
|
X-64 Guidance
|
NULL
|
2
|
Frost
|
Skill
|
NULL
|
Analog Design
|
2
|
Frost
|
Skill
|
NULL
|
Gyroscope
Design
|
3
|
Toyon
|
Project
|
X-63
Telemetry
|
NULL
|
3
|
Toyon
|
Project
|
X-64
Telemetry
|
NULL
|
3
|
Toyon
|
Skill
|
NULL
|
Digital
Design
|
3
|
Toyon
|
Skill
|
NULL
|
R/F Design
|
The result table now presents a very readable account of the project experi-ence and the skill sets of all the employees in the EMPLOYEE table.
Considering the number of JOIN operations available, relating data from dif-ferent tables shouldn’t be a problem, regardless of the tables’ structure. Trust that if the raw data exists in your database, SQL:2003 has the means to get it out and display it in a meaningful form.
ON versus WHERE
The function of the ON and WHERE clauses in the various types of joins is potentially confusing. These facts may help you keep things straight:
The ON clause is part of the inner, left, right, and full joins. The cross join and union join don’t have an ON clause because neither of them does any filtering of the data.
The ON clause in an inner join is logically equivalent to a WHERE clause; the same condition could be specified either in the ON clause or a WHERE clause.
The ON clauses in outer joins (left, right, and full joins) are different from WHERE clauses. The WHERE clause simply filters the rows that are returned by the FROM clause. Rows that are rejected by the filter are not included in the result. The ON clause in an outer join first filters the rows of a cross product and then includes the rejected rows, extended with nulls.
No comments:
Post a Comment