Query Processing & Query
Optimization
Query Processing
- Find one or more plans for carrying out the processing
- Estimate the costs, and select the most efficient plan.
- Carry out the plan.
We look at selects, projects, and joins.
Processing Selection Queries:
- We look at sample files, and execute some select queries
against them. (Note typo in some (all?) copies
of the text -- should be only 10,000 entries in the Rental
table, so only 100 blocks. accountId B+ tree has depth
of only 2, as does the movieId B+ tree.)
- We go through a few of the queries of Table 13.2 in class.
Students are encouraged to do some others on their own. (Several
examples are done in detail in the text.)
Conjunctive & Disjunctive Queries:
- Here we find that there can be several equivalent relational
algebra expressions that have very different costs. We look
at examples. For definiteness, we take k1 = 100, k2 = 500,
k3 = 30.
- We compare the answers given in Table 13.3. Based on that,
we could choose a strategy. But how
does the DBMS estimate the (unknown) values of k1, k2, k3 in order to make
a "wise" decision"?
- With disjunction we encounter the liklihood of duplicates.
Projections:
- Often lead to duplicates, and elimination of duplicates
is hard. (Some of us saw that in the Query project.)
- No duplicates if the primary key -- or other keys -- retained.
- There are two basic strategies; one is based on sorting,
and the other on hashing. To begin, we scan the input data file and
create a list by writing records with the selected attributes. Then
we sort or scan it.
Join Operations: Here there
are several strategies that can be employed, and the choice of strategy
can have a big impact on the efficiency of the plan. (Pay attention
to the notation here: Bc and Rc refer to the number of blocks
and records in the c (customer) table, respectively, for example.
We look at a simple example:
SELECT * FROM Customer
c, Rental r
WHERE c.accountId = r.accountId
- Simple Nested Loops Join.
Table 13.5 shows logic and costs Bc + Rc * Br, or Br + Rr * Bc.
- With our data, 1,000 + 10,000 * 100 or 100 + 10,000 *
1,000. Notice the difference.
- Block Nested Loops Join.
Code in Figure 13.7. We see a cost of Bc + Bc * Br. In
our case, we have 1,000 + 1,000 * 100. Compare with
"simple". Consider the savings if we can afford multiple buffers.
- Index Nested Loops Join. Table
13.6 illustrates, and we have costs of Bc + Rr (1,000 +
10,000) or Br + Rr (100 + 10,000).
- Sort-Merge Join. Figure 13.8 shows
code. We discuss this. Clearly, if the data files are sorted
on the appropriate keys, the cost is just the merge time, which is Bc
+ Br.
- Hash Join: Create a hash table
from the join attributes.
- Scan each file in turn, and build <attribute, address>
pairs into hash tables.
- Begin scan of hash tables, comparing attribute values
to find join matches.
- For each join match read and output joined records.
- Repeat.
- We skip Left, Right, Outer Joins.
Query Optimization:
We return to the task of the Query Optimizer.
- Enumerate the query plans.
- Estimate the costs of the plans.
- Choose the best plan.
- Execute the best plan.
We examine an example, as presented in the Text.