SQL Series...26

Zeroing In on the Data You Want

SQL enables you to use some characteristics of the data to determine whether a particular table row is of interest to you. The SELECT, DELETE, and UPDATE statements convey to the database engine (the part of the DBMS that directly interacts with the data) which rows to select, delete, or update. You add modi-fying clauses to the SELECT, DELETE, and UPDATE statements to refine the search to your specifications.

Modifying Clauses

The modifying clauses available in SQL are FROM, WHERE, HAVING, GROUP BY, and ORDER BY. The FROM clause tells the database engine which table or tables to operate on. The WHERE and HAVING clauses specify a data characteristic that determines whether or not to include a particular row in the current opera-tion. The GROUP BY and ORDER BY clauses specify how to display the retrieved rows. Table provides a summary.

Table	Modifying Clauses and Functions

*Modifying Clause*	*Function*
FROM	Specifies from which tables to take data

WHERE	Filters out rows that don’t satisfy the search condition

GROUP BY	Separates rows into groups based on the values in the
	grouping columns

HAVING	Filters out groups that don’t satisfy the search condition

ORDER BY	Sorts the results of prior clauses to produce final output

If you use more than one of these clauses, they must appear in the following order:

SELECT column_list

FROM table_list

[WHERE search_condition]

[GROUP BY grouping_column]

[HAVING search_condition]

[ORDER BY ordering_condition] ;

Here’s the lowdown on the execution of these clauses:

The WHERE clause is a filter that passes rows that meet the search condi-tion and rejects rows that don’t meet the condition.

The GROUP BY clause rearranges the rows that the WHERE clause passes according to the value of the grouping column.

The HAVING clause is another filter that takes each group that the GROUP BY clause forms and passes those groups that meet the search condi-tion, rejecting the rest.

The ORDER BY clause sorts whatever remains after all the preceding clauses process the table.

As the square brackets ([ ]) indicate, the WHERE, GROUP BY, HAVING, and ORDER BY clauses are optional.

SQL evaluates these clauses in the order FROM, WHERE, GROUP BY, HAVING, and finally SELECT. The clauses operate in a “pipeline” manner, in which each clause receives the result of the prior clause and produces an output for the next clause. In functional notation, this order of evaluation appears as follows:

SELECT(HAVING(GROUP BY(WHERE(FROM...))))

ORDER BY operates after SELECT, which explains why ORDER BY can only ref-erence columns in the SELECT list. ORDER BY can’t reference other columns in the FROM table(s).

FROM Clauses

The FROM clause is easy to understand if you specify only one table, as in the following example:

SELECT * FROM SALES ;

This statement returns all the data in all the rows of every column in the SALES table. You can, however, specify more than one table in a FROM clause. Consider the following example:

SELECT *

FROM CUSTOMER, SALES ;

This statement forms a virtual table that combines the data from the CUSTOMER table with the data from the SALES table. Each row in the CUSTOMER table combines with every row in the SALES table to form the new table. The new virtual table that this combination forms contains the number of rows in the CUSTOMER table multiplied by the number of rows in the SALES table. If the CUSTOMER table has 10 rows and the SALES table has 100, then the new virtual table has 1,000 rows.

This operation is called the Cartesian product of the two source tables. The Cartesian product is a type of JOIN.

In most applications, the majority of the rows that form as a result of taking the Cartesian product of two tables are meaningless. In the case of the virtual table that forms from the CUSTOMER and SALES tables, only the rows where the CustomerID from the CUSTOMER table matches the CustomerID from the SALES table are of interest. You can filter out the rest of the rows by using a WHERE clause.

WHERE Clauses

I use the WHERE clause many times throughout this series without really explain-ing it because its meaning and use are obvious: A statement performs an opera-tion (such as a SELECT, DELETE, or UPDATE) only on table rows WHERE a stated condition is True. The syntax of the WHERE clause is as follows:

SELECT column_list

FROM table_name

WHERE condition ;

DELETE FROM table_name

WHERE condition ;

UPDATE table_name

SET column1=value1, column2=value2, ..., columnn=valuen

WHERE condition ;

The condition in the WHERE clause may be simple or arbitrarily complex. You may join multiple conditions together by using the logical connectives AND, OR, and NOT (which I discuss later in this article) to create a single condition.

The following statements show you some typical examples of WHERE clauses:

WHERE CUSTOMER.CustomerID = SALES.CustomerID

WHERE FOODS.Calories = COMIDA.Caloria

WHERE FOODS.Calories < 219

WHERE FOODS.Calories > 3 * base_value

WHERE FOODS.Calories < 219 AND FOODS.Protein > 27.4

The conditions that these WHERE clauses express are known as predicates. A predicate is an expression that asserts a fact about values.

The predicate FOODS.Calories < 219, for example, is True if the value for the current row of the column FOODS.Calories is less than 219. If the assertion is True, it satisfies the condition. An assertion may be True, False, or unknown. The unknown case arises if one or more elements in the assertion are null. The comparison predicates (=, <, >, <>, <=, and >=) are the most common, but SQL offers several others that greatly increase your capability to distinguish, or “filter out,” a desired data item from others in the same column. The follow-ing list notes the predicates that give you that filtering capability:

Comparison predicates

BETWEEN

IN [NOT IN]

LIKE [NOT LIKE] NULL

ALL, SOME, ANY EXISTS

UNIQUE OVERLAPS MATCH

SIMILAR DISTINCT

Comparison predicates

The examples in the preceding section show typical uses of comparison predi-cates in which you compare one value to another. For every row in which the comparison evaluates to a True value, that value satisfies the WHERE clause, and the operation (SELECT, UPDATE, DELETE, or whatever) executes upon that row. Rows that the comparison evaluates to FALSE are skipped. Consider the following SQL statement:

SELECT * FROM FOODS

WHERE Calories < 219 ;

This statement displays all rows from the FOODS table that have a value of less than 219 in the Calories column.

Six comparison predicates are listed in Table

Table	SQL’s Comparison Predicates
*Comparison*	*Symbol*
Equal	=

Not equal	<>

Less than	<

Less than or equal	<=

Greater than	>

Greater than or equal	>=

BETWEEN

Sometimes, you want to select a row if the value in a column falls within a specified range. One way to make this selection is by using comparison predi-cates. For example, you can formulate a WHERE clause to select all the rows in the FOODS table that have a value in the Calories column greater than 100 and less than 300, as follows:

WHERE FOODS.Calories > 100 AND FOODS.Calories < 300

This comparison doesn’t include foods with a calorie count of exactly 100 or 300 — only those values that fall in between these two numbers. To include the end points, you can write the statement as follows:

WHERE FOODS.Calories >= 100 AND FOODS.Calories <= 300

Another way of specifying a range that includes the end points is to use a

BETWEEN predicate in the following manner:

WHERE FOODS.Calories BETWEEN 100 AND 300

This clause is functionally identical to the preceding example, which uses comparison predicates. This formulation saves some typing and is a little more intuitive than the one that uses two comparison predicates joined by the logical connective AND.

The BETWEEN keyword may be confusing because it doesn’t tell you explicitly whether the clause includes the end points. In fact, the clause does include these end points. BETWEEN also fails to tell you explicitly that the first term in the comparison must be equal to or less than the second. If, for example, FOODS.Calories contains a value of 200, the following clause returns a True value:

WHERE FOODS.Calories BETWEEN 100 AND 300

However, a clause that you may think is equivalent to the preceding example returns the opposite result, False:

WHERE FOODS.Calories BETWEEN 300 AND 100

If you use BETWEEN, you must be able to guarantee that the first term in your comparison is always equal to or less than the second term.

You can use the BETWEEN predicate with character, bit, and datetime data types as well as with the numeric types. You may see something like the fol-lowing example:

SELECT FirstName, LastName

FROM CUSTOMER

WHERE CUSTOMER.LastName BETWEEN ‘A’ AND ‘Mzzz’ ;

This example returns all customers whose last names are in the first half of the alphabet.

IN and NOT IN

The IN and NOT IN predicates deal with whether specified values (such as OR, WA, and ID) are contained within a particular set of values (such as the states of the United States). You may, for example, have a table that lists sup-pliers of a commodity that your company purchases on a regular basis. You want to know the phone numbers of those suppliers located in the Pacific Northwest. You can find these numbers by using comparison predicates, such as those shown in the following example:

SELECT Company, Phone

FROM SUPPLIER

WHERE State = ‘OR’ OR State = ‘WA’ OR State = ‘ID’ ;

You can also use the IN predicate to perform the same task, as follows:

SELECT Company, Phone

FROM SUPPLIER

WHERE State IN (‘OR’, ‘WA’, ‘ID’) ;

This formulation is a more compact than the one using comparison predi-cates and logical OR.

The NOT IN version of this predicate works the same way. Say that you have locations in California, Arizona, and New Mexico, and to avoid paying sales tax, you want to consider using suppliers located anywhere except in those states. Use the following construction:

SELECT Company, Phone

FROM SUPPLIER

WHERE State NOT IN (‘CA’, ‘AZ’, ‘NM’) ;

Using the IN keyword this way saves you a little typing. Saving a little typing, however, isn’t that great of an advantage. You can do the same job by using comparison predicates as shown in this section’s first example.

You may have another good reason to use the IN predicate rather than com-parison predicates, even if using IN doesn’t save much typing. Your DBMS probably implements the two methods differently, and one of the methods may be significantly faster than the other on your system. You may want to run a performance comparison on the two ways of expressing inclusion in (or exclusion from) a group and then use the technique that produces the quicker result. A DBMS with a good optimizer will probably choose the more efficient method, regardless of which kind of predicate you use. A performance com-parison gives you some idea of how good your DBMS’s optimizer is. If a signif-icant difference between the run times of the two statements exists, the quality of your DBMS’s optimizer is called into question.

The IN keyword is valuable in another area, too. If IN is part of a subquery, the keyword enables you to pull information from two tables to obtain results that you can’t derive from a single table. I cover subqueries in detail in upcoming articles, but following is an example that shows how a subquery uses the IN keyword.

Suppose that you want to display the names of all customers who’ve bought the F-117A product in the last 30 days. Customer names are in the CUSTOMER table, and sales transaction data is in the TRANSACT table. You can use the following query:

SELECT FirstName, LastName

FROM CUSTOMER

WHERE CustomerID IN

(SELECT CustomerID

FROM TRANSACT

WHERE ProductID = ‘F-117A’

AND TransDate >= (CurrentDate - 30)) ;

The inner SELECT of the TRANSACT table nests within the outer SELECT of the CUSTOMER table. The inner SELECT finds the CustomerID numbers of all customers who bought the F-117A product in the last 30 days. The outer SELECT displays the first and last names of all customers whose CustomerID is retrieved by the inner SELECT.

LIKE and NOT LIKE

You can use the LIKE predicate to compare two character strings for a partial match. Partial matches are valuable if don’t know the exact form of the string for which you’re searching. You can also use partial matches to retrieve mul-tiple rows that contain similar strings in one of the table’s columns.

To identify partial matches, SQL uses two wildcard characters. The percent sign (%) can stand for any string of characters that have zero or more charac-ters. The underscore (_) stands for any single character. Table provides some examples that show how to use LIKE.

Table	SQL’s LIKE Predicate
*Statement*	*Values Returned*
WHERE Word LIKE ‘intern%’	intern

	internal

	international

	internet

	interns

WHERE Word LIKE ‘%Peace%’	Justice of the Peace

	Peaceful Warrior

*Statement*	*Values Returned*
WHERE Word LIKE ‘t_p_‘	tape

	taps

	tipi

	tips

	tops

	type

The NOT LIKE predicate retrieves all rows that don’t satisfy a partial match, including one or more wildcard characters, as in the following example:

WHERE Phone NOT LIKE ‘503%’

This example returns all the rows in the table for which the phone number starts with something other than 503.

You may want to search for a string that includes a percent sign or an under-score. In this case, you want SQL to interpret the percent sign as a percent sign and not as a wildcard character. You can conduct such a search by typing an escape character just prior to the character you want SQL to take literally. You can choose any character as the escape character, as long as that character doesn’t appear in the string that you’re testing, as shown in the following example:

SELECT Quote

FROM BARTLETTS

WHERE Quote LIKE ‘20#%’

ESCAPE ‘#’ ;

The % character is escaped by the preceding # sign, so the statement inter-prets this symbol as a percent sign rather than as a wildcard. You can escape an underscore or the escape character itself, in the same way. The preceding query, for example, would find the following quotation in Bartlett’s Familiar Quotations:

20% of the salespeople produce 80% of the results.

The query would also find the following:

20%

SIMILAR

SQL:1999 added the SIMILAR predicate, which offers a more powerful way of finding partial matches than the LIKE predicate provides. With the SIMILAR predicate, you can compare a character string to a regular expression. For example, say you’re searching the OperatingSystem column of a software compatibility table to look for Microsoft Windows compatibility. You could construct a WHERE clause such as the following:

WHERE OperatingSystem SIMILAR TO ‘(‘Windows ‘(3.1|95|98|ME|CE|NT|2000|XP))’

This predicate retrieves all rows that contain any of the specified Microsoft operating systems.

NULL

The NULL predicate finds all rows where the value in the selected column is null. In the FOODS table in previous article, several rows have null values in the Carbohydrate column. You can retrieve their names by using a statement such as the following:

SELECT (Food)

FROM FOODS

WHERE Carbohydrate IS NULL ;

This query returns the following values:

Beef, lean hamburger

Chicken, light meat

Opossum, roasted

Pork, ham

As you may expect, including the NOT keyword reverses the result, as in the following example:

SELECT (Food)

FROM FOODS

WHERE Carbohydrate IS NOT NULL ;

This query returns all the rows in the table except the four that the preceding query returns.

The statement Carbohydrate IS NULL is not the same as Carbohydrate = NULL. To illustrate this point, assume that, in the current row of the FOODS table, both Carbohydrate and Protein are null. From this fact, you can draw the following conclusions:

Carbohydrate IS NULL is True. Protein IS NULL is True.

Carbohydrate IS NULL AND Protein IS NULL is True. Carbohydrate = Protein is unknown.

Carbohydrate = NULL is an illegal expression.

Using the keyword NULL in a comparison is meaningless because the answer always returns as unknown.

Why is Carbohydrate = Protein defined as unknown, even though Carbohydrate and Protein have the same (null) value? Because NULL simply means “I don’t know.” You don’t know what Carbohydrate is, and you don’t know what Protein is; therefore, you don’t know whether those (unknown) values are the same. Maybe Carbohydrate is 37, and Protein is 14, or maybe Carbohydrate is 93, and Protein is 93. If you don’t know both the carbohydrate value and the protein value, you can’t say whether the two are the same.

ALL, SOME, ANY

Thousands of years ago, the Greek philosopher Aristotle formulated a system of logic that became the basis for much of Western thought. The essence of this logic is to start with a set of premises that you know to be true, apply valid operations to these premises, and, thereby, arrive at new truths. An example of this procedure is as follows:

Premise 1: All Greeks are human.

Premise 2: All humans are mortal.

Conclusion: All Greeks are mortal.

Another example:

Premise 1: Some Greeks are women.

Premise 2: All women are human.

Conclusion: Some Greeks are human.

Another way of stating the same logical idea of this second example is as follows:

If any Greeks are women and all women are human, then some Greeks are human.

The first example uses the universal quantifier ALL in both premises, enabling you to make a sound deduction about all Greeks in the conclusion. The second example uses the existential quantifier SOME in one premise, enabling you to make a deduction about some Greeks in the conclusion. The third example uses the existential quantifier ANY, which is a synonym for SOME, to reach the same conclusion you reach in the second example.

Look at how SOME, ANY, and ALL apply in SQL.

Consider an example in baseball statistics. Baseball is a physically demand-ing sport, especially for pitchers. A pitcher must throw the baseball from the pitcher’s mound to home plate between 90 and 150 times during a game. This effort can be very tiring, and many times, the pitcher becomes ineffective, and a relief pitcher must replace him before the game ends. Pitching an entire game is an outstanding achievement, regardless of whether the effort results in a victory.

Suppose that you’re keeping track of the number of complete games that all major-league pitchers pitch. In one table, you list all the American League pitchers, and in another table, you list all the National League pitchers. Both tables contain the players’ first names, last names, and number of complete games pitched.

The American League permits a designated hitter (DH) (who isn’t required to play a defensive position) to bat in place of any of the nine players who play defense. Usually the DH bats for the pitcher, because pitchers are notoriously poor hitters. Pitchers must spend so much time and effort on perfecting their pitching that they do not have as much time to practice batting as the other players do.

Say that you have a theory that, on average, American League starting pitchers throw more complete games than do National League starting pitchers. This is based on your observation that designated hitters enable hard-throwing, but weak-hitting, American League pitchers to stay in close games. Because the DH is already batting for them, the fact that they are poor hitters is not a liability. In the National League, however, a pinch hitter would replace a com-parable National League pitcher in a close game, because he would have a better chance at getting a hit. To test your theory, you formulate the follow-ing query:

SELECT FirstName, LastName

FROM AMERICAN_LEAGUER

WHERE CompleteGames > ALL

(SELECT CompleteGames

FROM NATIONAL_LEAGUER) ;

The subquery (the inner SELECT) returns a list, showing for every National League pitcher, the number of complete games he pitched. The outer query returns the first and last names of all American Leaguers who pitched more complete games than ALL of the National Leaguers. The query returns the names of those American League pitchers who pitched more complete games than the pitcher who has thrown the most complete games in the National League.

Consider the following similar statement:

SELECT FirstName, LastName

FROM AMERICAN_LEAGUER

WHERE CompleteGames > ANY

(SELECT CompleteGames

FROM NATIONAL_LEAGUER) ;

In this case, you use the existential quantifier ANY instead of the universal quantifier ALL. The subquery (the inner, nested query) is identical to the sub-query in the previous example. This subquery retrieves a complete list of the complete game statistics for all the National League pitchers. The outer query returns the first and last names of all American League pitchers who pitched more complete games than ANY National League pitcher. Because you can be virtually certain that at least one National League pitcher hasn’t pitched a complete game, the result probably includes all American League pitchers who’ve pitched at least one complete game.

If you replace the keyword ANY with the equivalent keyword SOME, the result is the same. If the statement that at least one National League pitcher hasn’t pitched a complete game is a true statement, you can then say that SOME National League pitcher hasn’t pitched a complete game.

EXISTS

You can use the EXISTS predicate in conjunction with a subquery to deter-mine whether the subquery returns any rows. If the subquery returns at least one row, that result satisfies the EXISTS condition, and the outer query exe-cutes. Consider the following example:

SELECT FirstName, LastName

FROM CUSTOMER

WHERE EXISTS

(SELECT DISTINCT CustomerID

FROM SALES

WHERE SALES.CustomerID = CUSTOMER.CustomerID);

The SALES table contains all of your company’s sales transactions. The table includes the CustomerID of the customer who makes each purchase, as well as other pertinent information. The CUSTOMER table contains each cus-tomer’s first and last names, but no information about specific transactions.

The subquery in the preceding example returns a row for every customer who has made at least one purchase. The outer query returns the first and last names of the customers who made the purchases that the SALES table records.

EXISTS is equivalent to a comparison of COUNT with zero, as the following query shows:

SELECT FirstName, LastName

FROM CUSTOMER

WHERE 0 <>

(SELECT COUNT(*)

FROM SALES

WHERE SALES.CustomerID = CUSTOMER.CustomerID);

For every row in the SALES table that contains a CustomerID that’s equal to a CustomerID in the CUSTOMER table, this statement displays the FirstName and LastName columns in the CUSTOMER table. For every sale in the SALES table, therefore, the statement displays the name of the customer who made the purchase.

UNIQUE

As you do with the EXISTS predicate, you use the UNIQUE predicate with a subquery. Although the EXISTS predicate evaluates to True only if the sub-query returns at least one row, the UNIQUE predicate evaluates to True only if no two rows that the subquery returns are identical. In other words, the UNIQUE predicate evaluates to True only if all rows that its subquery returns are unique. Consider the following example:

SELECT FirstName, LastName

FROM CUSTOMER

WHERE UNIQUE

(SELECT CustomerID FROM SALES

WHERE SALES.CustomerID = CUSTOMER.CustomerID);

This statement retrieves the names of all new customers for whom the SALES table records only one sale. Two null values are considered to be not equal to each other and thus unique. When the UNIQUE keyword is applied to a result table that only contains two null rows, the UNIQUE predicate evaluates to True.

DISTINCT

The DISTINCT predicate is similar to the UNIQUE predicate, except in the way it treats nulls. If all the values in a result table are UNIQUE, then they’re also DISTINCT from each other. However, unlike the result for the UNIQUE predi-cate, if the DISTINCT keyword is applied to a result table that contains only two null rows, the DISTINCT predicate evaluates to False. Two null values are not considered distinct from each other, while at the same time they are con-sidered to be unique. This strange situation seems contradictory, but there’s a reason for it. In some situations, you may want to treat two null values as dif-ferent from each other, whereas in other situations, you want to treat them as if they’re the same. In the first case, use the UNIQUE predicate. In the second case, use the DISTINCT predicate.

OVERLAPS

You use the OVERLAPS predicate to determine whether two time intervals over-lap each other. This predicate is useful for avoiding scheduling conflicts. If the two intervals overlap, the predicate returns a True value. If they don’t overlap, the predicate returns a False value.

You can specify an interval in two ways: either as a start time and an end time or as a start time and a duration. Following are a few examples:

(TIME ‘2:55:00’, INTERVAL ‘1’ HOUR)

OVERLAPS

(TIME ‘3:30:00’, INTERVAL ‘2’ HOUR)

The preceding example returns a True, because 3:30 is less than one hour after 2:55.

(TIME ‘9:00:00’, TIME ‘9:30:00’)

OVERLAPS

(TIME ‘9:29:00’, TIME ‘9:31:00’)

The preceding example returns a True, because you have a one-minute over-lap between the two intervals.

(TIME ‘9:00:00’, TIME ‘10:00:00’)

OVERLAPS

(TIME ‘10:15:00’, INTERVAL ‘3’ HOUR)

The preceding example returns a False, because the two intervals don’t overlap.

(TIME ‘9:00:00’, TIME ‘9:30:00’)

OVERLAPS

(TIME ‘9:30:00’, TIME ‘9:35:00’)

This example returns a False, because even though the two intervals are con-tiguous, they don’t overlap.

MATCH

In previous article, I discuss referential integrity, which involves maintaining con-sistency in a multitable database. You can lose integrity by adding a row to a child table that doesn’t have a corresponding row in the child’s parent table. You can cause similar problems by deleting a row from a parent table if rows corresponding to that row exist in a child table.

Say that your business has a CUSTOMER table that keeps track of all your cus-tomers and a SALES table that records all sales transactions. You don’t want to add a row to SALES until after you enter the customer making the purchase into the CUSTOMER table. You also don’t want to delete a customer from the CUSTOMER table if that customer made purchases that exist in the SALES table. Before you perform an insertion or deletion, you may want to check the candidate row to make sure that inserting or deleting that row doesn’t cause integrity problems. The MATCH predicate can perform such a check.

Examine the use of the MATCH predicate through an example that employs the CUSTOMER and SALES tables. CustomerID is the primary key of the CUSTOMER table and acts as a foreign key in the SALES table. Every row in the CUSTOMER table must have a unique, nonnull CustomerID. CustomerID isn’t unique in the SALES table, because repeat customers buy more than once. This situation is fine and does not threaten integrity because CustomerID is a foreign key rather than a primary key in that table.

Seemingly, CustomerID can be null in the SALES table, because someone can walk in off the street, buy something, and walk out before you get a chance to enter his or her name and address into the CUSTOMER table. This situation can create a row in the child table with no corresponding row in the parent table. To overcome this problem, you can create a generic customer in the CUSTOMER table and assign all such anonymous sales to that customer.

Say that a customer steps up to the cash register and claims that she bought an F-117A Stealth fighter on May 18, 2003. She now wants to return the plane because it shows up like an aircraft carrier on opponent radar screens. You can verify her claim by searching your SALES database for a match. First, you must retrieve her CustomerID into the variable vcustid; then you can use the following syntax:

... WHERE (:vcustid, ‘F-117A’, ‘2003-05-18’)

MATCH

(SELECT CustomerID, ProductID, SaleDate

FROM SALES)

If a sale exists for that customer ID for that product on that date, the MATCH predicate returns a True value. Give the customer her money back. (Note: If any values in the first argument of the MATCH predicate are null, a True value always returns.)

SQL’s developers added the MATCH predicate and the UNIQUE predicate for the same reason — they provide a way to explicitly perform the tests defined for the implicit referential integrity (RI) and UNIQUE constraints.

The general form of the MATCH predicate is as follows:

Row_value MATCH [UNIQUE] [SIMPLE| PARTIAL | FULL ] Subquery

The UNIQUE, SIMPLE, PARTIAL, and FULL options relate to rules that come into play if the row value expression R has one or more columns that are null. The rules for the MATCH predicate are a copy of corresponding referential integrity rules.

Referential integrity rules

Referential integrity rules require that the values of a column or columns in one table match the values of a column or columns in another table. You refer to the columns in the first table as the foreign key and the columns in the second table as the primary key or unique key. For example, you may declare the column EmpDeptNo in an EMPLOYEE table as a foreign key that references the DeptNo column of a DEPT table. This matchup ensures that if you record an employee in the EMPLOYEE table as working in department 123, a row appears in the DEPT table where DeptNo is 123.

This situation is fairly straightforward if the foreign key and primary key both consist of a single column. The two keys can, however, consist of multi-ple columns. The DeptNo value, for example, may be unique only within a Location; therefore, to uniquely identify a DEPT row, you must specify both a Location and a DeptNo. If both the Boston and Tampa offices have a depart-ment 123, you need to identify the departments as (‘Boston’, ‘123’) and

(‘Tampa’, ‘123’). In this case, the EMPLOYEE table needs two columns to identify a DEPT. Call those columns EmpLoc and EmpDeptNo. If an employee works in department 123 in Boston, the EmpLoc and EmpDeptNo values are ‘Boston’ and ‘123’. And the foreign key declaration in EMPLOYEE is as follows:

FOREIGN KEY (EmpLoc, EmpDeptNo)

REFERENCES DEPT (Location, DeptNo)

Drawing valid conclusions from your data is complicated immensely if the data contains nulls. Sometimes you want to treat null-containing data one way, and sometimes you want to treat it another way. The UNIQUE, SIMPLE, PARTIAL, and FULL keywords specify different ways of treating data that con-tains nulls. If your data does not contain any null values, you can save yourself a lot of head scratching by merely skipping from here to the next section of this series, “Logical Connectives.” If your data does contain null values, drop out of Evelyn Woods speed-reading mode now and read the following paragraphs slowly and carefully. Each paragraph presents a different situation with respect to null values and tells how the MATCH predicate handles it.

If the values of EmpLoc and EmpDeptNo are both nonnull or both null, the referential integrity rules are the same as for single-column keys with values that are null or nonnull. But if EmpLoc is null and EmpDeptNo is nonnull — or EmpLoc is nonnull and EmpDeptNo is null — you need new rules. What should the rules be if you insert or update the EMPLOYEE table with EmpLoc and EmpDeptNo values of (NULL, ‘123’) or (‘Boston’, NULL)? You have six main alternatives, SIMPLE, PARTIAL, and FULL, each either with or without the UNIQUE keyword. The UNIQUE keyword, if present, means that a matching row in the subquery result table must be unique in order for the predicate to evaluate to a True value. If both components of the row value expression R are null, the MATCH predicate returns a True value regardless of the con-tents of the subquery result table being compared.

If neither component of the row value expression R is null, SIMPLE is specified, UNIQUE is not specified, and at least one row in the subquery result table matches R, then the MATCH predicate returns a True value. Otherwise, it returns a False value.

If neither component of the row value expression R is null, SIMPLE is speci-fied, UNIQUE is specified, and at least one row in the subquery result table is both unique and matches R, then the MATCH predicate returns a True value. Otherwise, it returns a False value.

If any component of the row value expression R is null and SIMPLE is speci-fied, then the MATCH predicate returns a True value.

If any component of the row value expression R is nonnull, PARTIAL is speci-fied, UNIQUE is not specified, and the nonnull parts of at least one row in the subquery result table matches R, then the MATCH predicate returns a True value. Otherwise, it returns a False value.

If any component of the row value expression R is nonnull, PARTIAL is speci-fied, UNIQUE is specified, and the nonnull parts of R match the nonnull parts of at least one unique row in the subquery result table, then the MATCH predi-cate returns a True value. Otherwise, it returns a False value.

If neither component of the row value expression R is null, FULL is specified, UNIQUE is not specified, and at least one row in the subquery result table matches R, then the MATCH predicate returns a True value. Otherwise, it returns a False value.

If neither component of the row value expression R is null, FULL is specified, UNIQUE is specified, and at least one row in the subquery result table is both unique and matches R, then the MATCH predicate returns a True value. Otherwise, it returns a False value.

If any component of the row value expression R is null and FULL is specified, then the MATCH predicate returns a False value.

Logical Connectives

Often, as a number of previous examples show, applying one condition in a query isn’t enough to return the rows that you want from a table. In some cases, the rows must satisfy two or more conditions. In other cases, if a row satisfies any of two or more conditions, it qualifies for retrieval. On other occa-sions, you want to retrieve only rows that don’t satisfy a specified condition. To meet these needs, SQL offers the logical connectives AND, OR, and NOT.

AND

If multiple conditions must all be True before you can retrieve a row, use the

AND logical connective. Consider the following example:

SELECT InvoiceNo, SaleDate, SalesPerson, TotalSale

FROM SALES

WHERE SaleDate >= ‘2003-05-18’

AND SaleDate <= ‘2003-05-24’ ;

The WHERE clause must meet the following two conditions:

SaleDate must be greater than or equal to May 18, 2003. SaleDate must be less than or equal to May 24, 2003.

Only rows that record sales occurring during the week of May 18 meet both conditions. The query returns only these rows.

Notice that the AND connective is strictly logical. This restriction can some-times be confusing because people commonly use the word and with a looser meaning. Suppose, for example, that your boss says to you, “I’d like to see the sales for Ferguson and Ford.” He said, “Ferguson and Ford,” so you may write the following SQL query:

SELECT *

FROM SALES

WHERE Salesperson = ‘Ferguson’

AND Salesperson = ‘Ford’;

Well, don’t take that answer back to your boss. The following query is more like what he had in mind:

SELECT *

FROM SALES

WHERE Salesperson IN (‘Ferguson’, ‘Ford’) ;

The first query won’t return anything, because none of the sales in the SALES table were made by both Ferguson and Ford. The second query will return the information on all sales made by either Ferguson or Ford, which is probably what the boss wanted.

OR

If any one of two or more conditions must be True to qualify a row for retrieval, use the OR logical connective, as in the following example:

SELECT InvoiceNo, SaleDate, Salesperson, TotalSale

FROM SALES

WHERE Salesperson = ‘Ford’

OR TotalSale > 200 ;

This query retrieves all of Ford’s sales, regardless of how large, as well as all sales of more than $200, regardless of who made the sales.

NOT

The NOT connective negates a condition. If the condition normally returns a True value, adding NOT causes the same condition to return a False value. If a condition normally returns a False value, adding NOT causes the condition to return a True value. Consider the following example:

SELECT InvoiceNo, SaleDate, Salesperson, TotalSale

FROM SALES

WHERE NOT (Salesperson = ‘Ford’) ;

This query returns rows for all sales transactions that salespeople other than Ford completed.

When you use AND, OR, or NOT, sometimes the scope of the connective isn’t clear. To be safe, use parentheses to make sure that SQL applies the connec-tive to the predicate you want. In the preceding example, the NOT connective applies to the entire predicate (Salesperson = ‘Ford’).

GROUP BY Clauses

Sometimes, rather than retrieving individual records, you want to know some-thing about a group of records. The GROUP BY clause is the tool you need.

Suppose you’re the sales manager and you want to look at the performance of your sales force. You could do a simple SELECT such as the following:

SELECT InvoiceNo, SaleDate, Salesperson, TotalSale

FROM SALES;

You would receive a result similar to that shown in Figure.

This result gives you some idea of how well your salespeople are doing because so few total sales are involved. However, in real life, a company would have many more sales, and it wouldn’t be as easy to tell whether sales

objectives were being met. To do that, you can combine the GROUP BY clause with one of the aggregate functions (also called set functions) to get a quanti-tative picture of sales performance. For example, you can see which salesper-son is selling more of the profitable high-ticket items by using the average (AVG) function as follows:

SELECT Salesperson, AVG(TotalSale)

FROM SALES

GROUP BY Salesperson;

You would receive a result similar to that shown in Figure

As shown in Figure, Ferguson’s average sale is considerably higher than that of the other two salespeople. You compare total sales with a similar query:

SELECT Salesperson, SUM(TotalSale)

FROM SALES

GROUP BY Salesperson;

Ferguson also has the highest total sales, which is consistent with having the highest average sales.

HAVING Clauses

You can analyze the grouped data further by using the HAVING clause. The HAVING clause is a filter that acts similar to a WHERE clause, but on groups of rows rather than on individual rows. To illustrate the function of the HAVING clause, suppose the sales manager considers Ferguson to be in a class by himself. His performance distorts the overall data for the other salespeople. You can exclude Ferguson’s sales from the grouped data by using a HAVING clause as follows:

SELECT Salesperson, SUM(TotalSale)

FROM SALES

GROUP BY Salesperson

HAVING Salesperson <> ‘Ferguson’;

This gives the result shown in Figure. Only rows where the salesperson is not Ferguson are considered.

ORDER BY Clauses

Use the ORDER BY clause to display the output table of a query in either ascending or descending alphabetical order. Whereas the GROUP BY clause gathers rows into groups and sorts the groups into alphabetical order, ORDER BY sorts individual rows. The ORDER BY clause must be the last clause that you specify in a query. If the query also contains a GROUP BY clause, the clause first arranges the output rows into groups. The ORDER BY clause then sorts the rows within each group. If you have no GROUP BY clause, then the state-ment considers the entire table as a group, and the ORDER BY clause sorts all its rows according to the column (or columns) that the ORDER BY clause specifies.

To illustrate this point, consider the data in the SALES table. The SALES table contains columns for InvoiceNo, SaleDate, Salesperson, and TotalSale. If you use the following example, you see all the SALES data, but in an arbi-trary order:

SELECT * FROM SALES ;

On one implementation, this order may be the one in which you inserted the rows in the table, and on another implementation, the order may be that of the most recent updates. The order can also change unexpectedly if anyone physically reorganizes the database. Usually, you want to specify the order in which you want the rows. You may, for example, want to see the rows in order by the SaleDate, as follows:

SELECT * FROM SALES ORDER BY SaleDate ;

This example returns all the rows in the SALES table, in order by SaleDate.

For rows with the same SaleDate, the default order depends on the imple-mentation. You can, however, specify how to sort the rows that share the same SaleDate. You may want to see the SALES for each SaleDate in order by InvoiceNo, as follows:

SELECT * FROM SALES ORDER BY SaleDate, InvoiceNo ;

This example first orders the SALES by SaleDate; then for each SaleDate, it orders the SALES by InvoiceNo. But don’t confuse that example with the fol-lowing query:

SELECT * FROM SALES ORDER BY InvoiceNo, SaleDate ;

This query first orders the SALES by INVOICE_NO. Then for each different INVOICE_NO, the query orders the SALES by SALE_DATE. This probably won’t yield the result you want, because it is unlikely that multiple sale dates will exist for a single invoice number.

The following query is another example of how SQL can return data:

SELECT * FROM SALES ORDER BY Salesperson, SaleDate ;

This example first orders by SALESPERSON and then by SALE_DATE. After you look at the data in that order, you may want to invert it, as follows:

SELECT * FROM SALES ORDER BY SaleDate, Salesperson ;

This example orders the rows first by SaleDate and then by Salesperson.

All these ordering examples are ascending (ASC), which is the default sort order. The last SELECT shows earlier SALES first and, within a given date, shows SALES for ‘Adams’ before ‘Baker’. If you prefer descending (DESC) order, you can specify this order for one or more of the order columns, as follows:

SELECT * FROM SALES

ORDER BY SaleDate DESC, Salesperson ASC ;

This example specifies a descending order for sales date, showing the more recent sales first, and an ascending order for salespeople, putting them in normal alphabetical order.

IT Skills

Tags

Search This Blog

Wednesday, 6 June 2018

SQL Series...26

Zeroing In on the Data You Want

Modifying Clauses

FROM Clauses

WHERE Clauses

Comparison predicates

Comparison predicates

BETWEEN

IN and NOT IN

LIKE and NOT LIKE

SELECT Quote

SIMILAR

NULL

ALL, SOME, ANY

Look at how SOME, ANY, and ALL apply in SQL.

UNIQUE

DISTINCT

OVERLAPS

MATCH

Referential integrity rules

Logical Connectives

AND

OR

NOT

GROUP BY Clauses

HAVING Clauses

ORDER BY Clauses

No comments:

Post a Comment

Report Abuse

Popular

About Me

Archive

Comments

Tags

Contact Form

Labels

Categories

Trending

Recent