SQL Series...13

Working with Indexes

The SQL:2003 specification doesn’t address the topic of indexes, but that omission doesn’t mean that indexes are rare or even optional parts of a data-base system. Every SQL implementation supports indexes, but no universal agreement exists on how to support them. In previous article, I show you how to create an index by using Microsoft Access, a rapid application development (RAD) tool. You must refer to the documentation for your particular database management system (DBMS) to see how the system implements indexes.

What’s an index, anyway?

Data generally appears in a table in the order in which you originally entered the information. That order may have nothing to do with the order in which you later want to process the data. Say, for example, that you want to process your CLIENT table in ClientName order. The computer must first sort the table in ClientName order. These sorts take time. The larger the table, the longer the sort takes. What if you have a table with 100,000 rows? Or a table with a million rows? In some applications, such table sizes are not rare. The best sort algorithms would have to make some 20 million comparisons and millions of swaps to put the table in the desired order. Even with a very fast computer, you may not want to wait that long.

Indexes can be a great timesaver. An index is a subsidiary or support table that goes along with a data table. For every row in the data table, you have a corresponding row in the index table. The order of the rows in the index table is different.

Table		CLIENT Table
*ClientName*	*Address1*	*Address2*	*City*	*State*
Butternut	5 Butternut Lane		Hudson	NH
Animal Clinic

Amber	470 Kolvir Circle		Amber	MI
Veterinary, Inc.

Vets R Us	2300 Geoffrey Road	Suite 230	Anaheim	CA

Doggie Doctor	32 Terry Terrace		Nutley	NJ

The Equestrian	Veterinary	7890 Paddock	Gallup	NM
Center		Parkway

Dolphin Institute	1002 Marine Drive		Key West	FL

J. C. Campbell,	2500 Main Street		Los Angeles	CA
Credit Vet

Wenger’s	15 Bait Boulevard		Sedona	AZ
Worm Farm

The rows are not in alphabetical order by ClientName. In fact, they aren’t in any useful order at all. The rows are simply in the order in which somebody entered the data.

An index for this CLIENT table may look like Table

Table	Client Name Index for the CLIENT Table
*ClientName*	*Pointer to Data Table*
Amber Veterinary, Inc.	2

Butternut Animal Clinic	1

Doggie Doctor	4

Dolphin Institute	6

J. C. Campbell, Credit Vet	7

The Equestrian Center	5

Vets R Us	3

Wenger’s Worm Farm	8

The index contains the field that forms the basis of the index (in this case, ClientName) and a pointer into the data table. The pointer in each index row gives the row number of the corresponding row in the data table.

Why would I want an index?

If I want to process a table in ClientName order, and I have an index arranged in ClientName order, I can perform my operation almost as fast as I could if the data table itself was in ClientName order. I can work through the index sequentially, moving immediately to each index row’s corresponding data record by using the pointer in the index.

If you use an index, the table processing time is proportional to N, where N is the number of records in the table. Without an index, the processing time for the same operation is proportional to N lg N, where lg N is the logarithm of N to the base 2. For small tables, the difference is insignificant, but for large tables, the difference is great. On large tables, some operations aren’t practical to perform without the help of indexes.

As an example, say that you have a table containing 1,000,000 records (N = 1,000,000), and processing each record takes one millisecond (one-thousandth of a second). If you have an index, processing the entire table takes only 1,000 seconds — less than 17 minutes. Without an index, you need to go through the table approximately 1,000,000 × 20 times to achieve the same result. This process would take 20,000 seconds — more than five and a half hours. I think you can agree that the difference between 17 minutes and five and a half hours is substantial. That’s the difference that indexing makes on processing records.

Maintaining an index

After you create an index, something must maintain it. Fortunately, your DBMS maintains your indexes for you by updating them every time you update the corresponding data tables. This process takes some extra time, but it’s worth it. After you create an index and your DBMS maintains it, the index is always available to speed up your data processing, no matter how many times you need to call on it.

The best time to create an index is at the same time you create its corre-sponding data table. If you create the index at the start and begin maintaining it at the same time, you don’t need to undergo the pain of building the index later, with the entire operation taking place in a single, long session. Try to anticipate all the ways that you may want to access your data and then create an index for each possibility.

Some DBMS products give you the capability to turn off index maintenance. You may want to do so in some real-time applications where updating indexes takes a great deal of time and you have precious little to spare. You may even elect to update the indexes as a separate operation during off-peak hours.

Don’t fall into the trap of creating an index for retrieval orders that you’re unlikely ever to use. Index maintenance is an extra operation that the com-puter must perform every time it modifies the index field or adds or deletes a data table row, which affects performance. For optimal performance, create only those indexes that you expect to use as retrieval keys — and only for tables containing a large number of rows. Otherwise, indexes can degrade performance.

You may need to compile something such as a monthly or quarterly report that requires the data in an odd order that you don’t ordinarily need. Create an index just before running that periodic report, run the report, and then drop the index so that the DBMS isn’t burdened with maintaining the index during the long period between reports.

IT Skills

Tags

Search This Blog

Thursday, 24 May 2018

SQL Series...13

Working with Indexes

What’s an index, anyway?

Why would I want an index?

Maintaining an index

No comments:

Post a Comment

Report Abuse

Popular

About Me

Archive

Comments

Tags

Contact Form

Labels

Categories

Trending

Recent