|When a table grows to millions or even billions of rows, performance can be a serious problem. The last thing you want to do is a table scan. Unfortunately, you can't do some queries any other way.|
There are, however, several modifications that you can make. A RAID configuration that uses striping will help because more heads are available to move across more surfaces. Also, a multi-processor setup can sometimes be useful since different queries or different parts of the same query can run on different processors. (Note: In SQL 2000, this requires the use of SQL 2000 Advanced Server.)
If these hardware tweaks are still insufficient for your requirements, you might consider horizontal partitioning. This approach requires multiple hard disks at the very least, and ideally multiple servers.
Imagine a table containing 100 million rows. For simplicity's sake, you could divide this table into 10 tables, each containing 10 million rows, and named Table_01 through Table_10. Each table will contain a range of rows, the range being defined by primary key value. This works best if the original table's primary key is an identity key, but it will also work if the primary key values are at least logically divisible. In other words, you need to know in which table to look for any given key value.
The next step is to define the tables, adding a check constraint to enforce that any new row's key lies within the range. For example:
Move the data from the original table into the partitions. Now you should create a linked server definition on each of the servers, specifying the connection information enabling each server to connect to the others.
Then, create a distributed partition view on each server. (You can actually create one view and copy it to the other servers.) This view UNIONs the partitions into a single logical table. For clarity, it's best to fully qualify the names of the table partitions in the view:
Now the database application can refer to Table_00 without regard to the physical location of its partitions. You can index each of the partitions the same way as the original and, in these columns, the data will usually not correspond to the key ranges (e.g., SalespersonID has nothing to do with the primary key range). But even in the worst case scenario (such as a table scan), each processor will search one tenth of the rows, and all of the processors will search at once. The result will be substantially quicker than a scan of the original table.
We've only considered SELECT queries, but chances are that you'll also want to insert and update rows. To do so, you need an updatable partitioned view. A partitioned view is updatable if it can meet the following two conditions:
- The view is built from a series of SELECT statements, each of which references one table partition (linked or local), all UNIONed together to form the logical equivalent of the original table.
- Triggers and cascading updates and deletes may not be defined on any of the table partitions.
In addition, you should be aware of these four rules:
1. The view can reference each table partition only once.
2. Computed columns cannot be indexed.
3. The primary key definitions must be identical.
4. All columns must be referenced in the SELECT list. The simplest way to do that is SELECT *.
You may not have to partition tables horizontally very often, but when the need arises, you'll need hardware resources to make it work. If at all possible, keep the linked servers on the same hub or at least the same subnet.