
Recursive Queries
One of the major criticisms of SQL, up through and including SQL-92, was its inability to implement recursive processing. Many important problems that are difficult to solve by other means yield readily to recursive solu-tions. Extensions included in SQL:1999 allow recursive queries, greatly expanding the language’s power. If your SQL implementation includes the recursion extensions, you can efficiently solve a large new class of problems. However, because recursion is not a part of core SQL:2003, many implementa-tions currently available do not include it.
What Is Recursion?
Recursion is a feature that’s been around for years in programming languages such as Logo, LISP, and C++. In these languages, you can define a function (a set of one or more commands) that performs a specific operation. The main program invokes the function by issuing a command called a function call. If the function calls itself as a part of its operation, you have the simplest form of recursion.
A simple program that uses recursion in one of its functions provides an illus-
tration of the joys and pitfalls of recursion. The following program, written in
C++, draws a spiral on the computer screen. It assumes that the drawing tool
is initially pointing toward the top of the screen, and includes three functions:
The function line(n) draws a line n units long.
The function left_turn(d) rotates the drawing tool d degrees counterclockwise.
You can define the function spiral(segment) as follows:
void spiral(int segment)
{
line(segment)
left_turn(90)
spiral(segment + 1)
} ;
If you call spiral(1) from the main program, the following actions take place:
spiral(1) draws a line one unit long toward the top of the screen.
spiral(1) turns left 90 degrees.
spiral(1) calls spiral(2).
spiral(2) draws a line 2 units long toward the left side of the screen.
spiral(2) turns left 90 degrees.
spiral(2) calls spiral(3).
And so on. . . .
Eventually the program generates the spiral curve shown in Figure
Houston, we have a problem
Well, okay, the situation here is not as serious as it was for Apollo 13 when its main oxygen tank exploded into space while en route to the moon. Your prob-lem is that the spiral-drawing program keeps calling itself and drawing longer and longer lines. It will continue to do that until the computer executing it runs out of resources and (if you’re lucky) puts an obnoxious error message on the screen. If you’re unlucky, the computer just crashes.
Failure is not an option
This scenario shows one of the dangers of using recursion. A program written to call itself invokes a new instance of itself — which in turn calls yet another instance, ad infinitum. This is generally not what you want. To address this problem, programmers include a termination condition within the recursive function — a limit on how deep the recursion can go — so the program per-forms the desired action and then terminates gracefully. You can include a termination condition in your spiral-drawing program to save computer resources and prevent dizziness in programmers:
void spiral2(int segment)
{
if (segment <= 10)
{
line(segment)
left_turn(90)
spiral2(segment + 1)
}
} ;
When you call spiral2(1), it executes and then (recursively) calls itself until the value of segment exceeds 10. At the point where segment equals 11, the if (segment <=10) construct returns a False value, and the code within the interior braces is skipped. Control returns to the previous invocation of spiral2 and, from there, returns all the way up to the first invocation, after which the program terminates. Figure shows the sequence of calls and returns that occurs.
Every time a function calls itself, it takes you one level farther away from the main program that was the starting point of the operation. For the main pro-gram to continue, the deepest iteration must return control to the iteration that called it. That iteration will have to do likewise, continuing all the way back to the main program that made the first call to the recursive function.
call
spiral2(1)
|
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(2)
|
||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(3)
|
|||||||||||||||||||||||||
|
|
|
|
|
||||||||||||||||||||||||||
|
|
|
|
|
||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(4)
|
||||||||||||||||||||||
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(5)
|
|||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(6)
|
||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(7)
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(8)
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(9)
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(10)
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call
spiral2(11)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Recursion is a powerful tool for repeatedly executing code when you don’t know at the outset how many times the code should be repeated. It’s ideal for searching through tree-shaped structures such as family trees, complex elec-tronic circuits, or multilevel distribution networks.
What Is a Recursive Query?
A recursive query is a query that is functionally dependent on itself. The simplest form of such functional dependence is the case where a query Q1 includes an invocation of itself in the query expression body. A more complex case is where query Q1 depends on query Q2, which in turn depends on query Q1. There is still a functional dependency, and recursion is still involved, no matter how many queries lie between the first and the second invocation of the same query.
Where Might I Use a Recursive Query?
Recursive queries may help save time and frustration in various kinds of problems. Suppose, for example, that you have a pass that gives you free air travel on any flight of the (fictional) Vannevar Airlines. Way cool. The next question is, “Where can I go for free?” The FLIGHT table contains all the flights that Vannevar runs. Table shows the flight number and the source and destination of each flight.
Table
|
Flights
Offered by Vannevar Airlines
|
||
|
|
|
|
Flight
No.
|
Source
|
Destination
|
|
3141
|
Portland
|
Orange County
|
|
|
|
|
|
2173
|
Portland
|
Charlotte
|
|
|
|
|
|
623
|
Portland
|
Daytona Beach
|
|
|
|
|
|
5440
|
Orange County
|
Montgomery
|
|
|
|
|
|
221
|
Charlotte
|
Memphis
|
|
|
|
|
|
32
|
Memphis
|
Champaign
|
|
|
|
|
|
981
|
Montgomery
|
Memphis
|
Map illustrates the routes of the United States.
To get started on your vacation plan, create a database table for FLIGHT by using SQL as follows:
CREATE TABLE FLIGHT (
|
|
|
FlightNo
|
INTEGER
|
NOT
NULL,
|
Source
|
CHARACTER
(30),
|
|
Destination
|
CHARACTER
(30)
|
|
);
|
|
|
|
|
|
After the table is created, you can populate it with the data shown in Table.
Suppose you’re starting from Portland and you want to visit a friend in Montgomery. Naturally you wonder, “What cities can I reach via Vannevar if I start from Portland?” and “What cities can I reach via the same airline if I start from Montgomery?” Some cities are reachable in one hop; others are not. Some might require two or more hops. You can find all the cities that Vannevar will take you to, starting from any given city on its route map — but if you do it one query at a time, you’re . . .
Querying the hard way
To find out what you want to know — provided you have the time and patience — you can make a series of queries, starting with Portland as the starting city:
SELECT Destination FROM FLIGHT WHERE Source = “Portland”;
The first query returns Orange County, Charlotte, and Daytona Beach.
Your second query uses the first of these results as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = “Orange
County”;
The second query returns Montgomery. Your third query returns to the results of the first query and uses the second result as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = “Charlotte”;
The third query returns Memphis. Your fourth query goes back to the results of the first query and uses the remaining result as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = “Daytona
Beach”;
Sorry, the fourth query returns a null result because Vannevar offers no out-going flights from Daytona Beach. But the second query returned another city (Montgomery) as a possible starting point, so your fifth query uses that result:
SELECT Destination FROM FLIGHT WHERE Source = “Montgomery”;
This query returns Memphis, but you already know it’s among the cities you can get to (in this case, via Charlotte). But you go ahead and try this latest result as a starting point for another query:
SELECT Destination FROM FLIGHT WHERE Source = “Memphis”;
The query returns Champaign — which you can add to the list of reachable cities (even if you have to get there in two hops). As long as you’re consider-ing multiple hops, you plug in Champaign as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = “Champaign’;
Oops. This query returns a null value; Vannevar offers no outgoing flights from Champaign. (Seven queries so far. Are you fidgeting yet?)
Vannevar doesn’t offer a flight out of Daytona Beach, either, so if you go there, you’re stuck — which might not be a hardship if it’s Spring Break week. (Of course, if you use up a week running individual queries to find out where to go next, you might get a worse headache than you’d get from a week of partying.) Or you might get stuck in Champaign — in which case, you could enroll in the University of Illinois and take a few database courses. (They might even help you figure out how to get out of Daytona or Champaign.)
Granted, this method will (eventually) answer the question, “What cities are reachable from Portland?” But running one query after another, making each one dependent on the results of a previous query, is complicated, time-consuming, and fidgety.
Saving time with a recursive query
A simpler way to get the info you need is to craft a single recursive query that does the entire job in one operation. Here’s the syntax for such a query:
WITH RECURSIVE
ReachableFrom (Source, Destination)
AS (SELECT Source, Destination
FROM FLIGHT
UNION
SELECT in.Source, out.Destination
FROM ReachableFrom in, FLIGHT out
WHERE in.Destination = out.Source
)
SELECT * FROM ReachableFrom
WHERE Source = “Portland”;
The first time through the recursion, FLIGHT has seven rows, and ReachableFrom has none. The UNION takes the seven rows from FLIGHT and copies them into ReachableFrom. At this point, ReachableFrom has the data shown in Table
Table
|
ReachableFrom After One Pass through Recursion
|
|
|
Source
|
Destination
|
Portland
|
Orange County
|
|
|
Portland
|
Charlotte
|
|
|
Portland
|
Daytona Beach
|
|
|
Orange County
|
Montgomery
|
|
|
Charlotte
|
Memphis
|
|
|
Memphis
|
Champaign
|
|
|
Montgomery
|
Memphis
|
The second time through the recursion, things start to get interesting. The WHERE clause (WHERE in.Destination = out.Source) means that you’re looking only at rows where the Destination field of the ReachableFrom table equals the Source field of the FLIGHT table. For those rows, you’re taking the Source field from ReachableFrom and the Destination field from FLIGHT, and adding those two fields to ReachableFrom as a new row. Table shows the result of this iteration of the recursion.
Table
|
ReachableFrom
After Two Passes
|
|
through the
Recursion
|
|
|
Source
|
Destination
|
Portland
|
Orange County
|
|
|
Portland
|
Charlotte
|
|
|
Portland
|
Daytona Beach
|
|
|
Orange County
|
Montgomery
|
|
|
Charlotte
|
Memphis
|
|
|
Memphis
|
Champaign
|
|
|
Montgomery
|
Memphis
|
|
|
Portland
|
Montgomery
|
|
|
Portland
|
Memphis
|
|
|
Orange County
|
Memphis
|
|
|
Charlotte
|
Champaign
|
|
|
The results are looking more useful. ReachableFrom now contains all the Destination cities that are reachable from any Source city in two hops or less. Next, the recursion processes three-hop trips, and so on, until all possi-ble destination cities have been reached.
After the recursion is complete, the third and final SELECT statement (which is outside the recursion) extracts from ReachableFrom only those cities you can reach from Portland by flying Vannevar. In this example, all six other cities are reachable from Portland — in few enough hops that you won’t feel like you’re going by pogo stick.
If you scrutinize the code in the recursive query, it doesn’t look any simpler than the seven individual queries that it replaces. It does, however, have two advantages:
When you set it in motion, it completes the entire operation without any further intervention.
It can do the job fast.
Imagine a real-world airline with many more cities on its route map. The more possible destinations, the bigger the advantage of using the recursive method.
What makes this query recursive? The fact that you’re defining ReachableFrom in terms of itself. The recursive part of the definition is the second SELECT statement, the one just after the UNION. ReachableFrom is a temporary table that progressively is filled with data as the recursion pro-ceeds. Processing continues until all possible destinations have been added to ReachableFrom. Any duplicates are eliminated, because the UNION opera-tor doesn’t add duplicates to the result table. After the recursion completes, ReachableFrom contains all the cities that are reachable from any starting city. The third and final SELECT statement returns only those destination cities that you can reach from Portland. Bon voyage.
Where Else Might I Use a Recursive Query?
Any problem that you can lay out as a tree-like structure can potentially be solved by using a recursive query. The classic industrial application is materi-als processing (the process of turning raw stuff into finished things). Suppose your company is building a new gasoline-electric hybrid car. Such a machine is built of subassemblies — engine, batteries, and so on — which are con-structed from smaller subassemblies (crankshaft, electrodes, and so on) — which are made of even smaller parts. Keeping track of all the various parts can be difficult in a relational database that does not use recursion. Recursion enables you to start with the complete machine and ferret your way along any path to get to the smallest part. Want to find out the specs for the fastening screw that holds the clamp to the negative electrode of the aux-iliary battery? The WITH RECURSIVE structure gives SQL the capability to address such problems.
Recursion is also a natural for ‘What if?’ processing. In the Vannevar Airlines example, what if management discontinues service from Portland to Charlotte? How does that affect the cities that are reachable from Portland? A recursive query quickly gives you the answer.
No comments:
Post a Comment