
Working with Indexes
The SQL:2003 specification doesn’t address the topic of indexes, but that omission doesn’t mean that indexes are rare or even optional parts of a data-base system. Every SQL implementation supports indexes, but no universal agreement exists on how to support them. In previous article, I show you how to create an index by using Microsoft Access, a rapid application development (RAD) tool. You must refer to the documentation for your particular database management system (DBMS) to see how the system implements indexes.
What’s an index, anyway?
Data generally appears in a table in the order in which you originally entered the information. That order may have nothing to do with the order in which you later want to process the data. Say, for example, that you want to process your CLIENT table in ClientName order. The computer must first sort the table in ClientName order. These sorts take time. The larger the table, the longer the sort takes. What if you have a table with 100,000 rows? Or a table with a million rows? In some applications, such table sizes are not rare. The best sort algorithms would have to make some 20 million comparisons and millions of swaps to put the table in the desired order. Even with a very fast computer, you may not want to wait that long.
Indexes can be a great timesaver. An index is a subsidiary or support table that goes along with a data table. For every row in the data table, you have a corresponding row in the index table. The order of the rows in the index table is different.
Table
|
CLIENT Table
|
|||
ClientName
|
Address1
|
Address2
|
City
|
State
|
Butternut
|
5 Butternut
Lane
|
Hudson
|
NH
|
|
Animal Clinic
|
||||
Amber
|
470 Kolvir
Circle
|
Amber
|
MI
|
|
Veterinary,
Inc.
|
||||
Vets R Us
|
2300 Geoffrey Road
|
Suite 230
|
Anaheim
|
CA
|
Doggie Doctor
|
32 Terry
Terrace
|
Nutley
|
NJ
|
|
The
Equestrian
|
Veterinary
|
7890 Paddock
|
Gallup
|
NM
|
Center
|
Parkway
|
|||
Dolphin
Institute
|
1002 Marine Drive
|
Key West
|
FL
|
|
J. C.
Campbell,
|
2500 Main
Street
|
Los Angeles
|
CA
|
|
Credit Vet
|
||||
Wenger’s
|
15 Bait Boulevard
|
Sedona
|
AZ
|
|
Worm Farm
|
||||
The rows are not in alphabetical order by ClientName. In fact, they aren’t in any useful order at all. The rows are simply in the order in which somebody entered the data.
An index for this CLIENT table may look like Table
Table
|
Client Name
Index for the CLIENT Table
|
ClientName
|
Pointer
to Data Table
|
Amber Veterinary, Inc.
|
2
|
Butternut Animal Clinic
|
1
|
Doggie Doctor
|
4
|
Dolphin
Institute
|
6
|
J. C. Campbell, Credit Vet
|
7
|
The Equestrian Center
|
5
|
Vets R Us
|
3
|
Wenger’s Worm
Farm
|
8
|
The index contains the field that forms the basis of the index (in this case, ClientName) and a pointer into the data table. The pointer in each index row gives the row number of the corresponding row in the data table.
Why would I want an index?
If I want to process a table in ClientName order, and I have an index arranged in ClientName order, I can perform my operation almost as fast as I could if the data table itself was in ClientName order. I can work through the index sequentially, moving immediately to each index row’s corresponding data record by using the pointer in the index.
If you use an index, the table processing time is proportional to N, where N is the number of records in the table. Without an index, the processing time for the same operation is proportional to N lg N, where lg N is the logarithm of N to the base 2. For small tables, the difference is insignificant, but for large tables, the difference is great. On large tables, some operations aren’t practical to perform without the help of indexes.
As an example, say that you have a table containing 1,000,000 records (N = 1,000,000), and processing each record takes one millisecond (one-thousandth of a second). If you have an index, processing the entire table takes only 1,000 seconds — less than 17 minutes. Without an index, you need to go through the table approximately 1,000,000 × 20 times to achieve the same result. This process would take 20,000 seconds — more than five and a half hours. I think you can agree that the difference between 17 minutes and five and a half hours is substantial. That’s the difference that indexing makes on processing records.
Maintaining an index
After you create an index, something must maintain it. Fortunately, your DBMS maintains your indexes for you by updating them every time you update the corresponding data tables. This process takes some extra time, but it’s worth it. After you create an index and your DBMS maintains it, the index is always available to speed up your data processing, no matter how many times you need to call on it.
The best time to create an index is at the same time you create its corre-sponding data table. If you create the index at the start and begin maintaining it at the same time, you don’t need to undergo the pain of building the index later, with the entire operation taking place in a single, long session. Try to anticipate all the ways that you may want to access your data and then create an index for each possibility.
Some DBMS products give you the capability to turn off index maintenance. You may want to do so in some real-time applications where updating indexes takes a great deal of time and you have precious little to spare. You may even elect to update the indexes as a separate operation during off-peak hours.
Don’t fall into the trap of creating an index for retrieval orders that you’re unlikely ever to use. Index maintenance is an extra operation that the com-puter must perform every time it modifies the index field or adds or deletes a data table row, which affects performance. For optimal performance, create only those indexes that you expect to use as retrieval keys — and only for tables containing a large number of rows. Otherwise, indexes can degrade performance.
You may need to compile something such as a monthly or quarterly report that requires the data in an odd order that you don’t ordinarily need. Create an index just before running that periodic report, run the report, and then drop the index so that the DBMS isn’t burdened with maintaining the index during the long period between reports.
No comments:
Post a Comment