Problem 1. In Database Management Systems by
Ramakrishnan and
Gehrke,
Chapter 14:
Part a: Exercise 14.4, pg. 476, Part 4.
Part b: Exercise 14.4, pg. 476, but
modify the storage of R so that it supports an Alternative 1, clustered
B+ tree index of order 50 on attribute a of R.
Also assume that the B+ tree contains 1000 distinct key values. What is
the cost of
joining R and S using an index nested loop join?
Problem 2.
Part a. In Database Management Systems by Ramakrishnan and
Gehrke, the following algorithm is given for implementing R-S for
relations R and S, which are union-compatible:
For each stage of the algorithm write its disk I/O cost and the
number of buffers required. Your costs should be in terms of
parameters:
Part b. Now suppose that S has an Alternative 2 hash
index on attribute a, which is
a candidate key for S. Propose an algorithm for R-S that uses the hash
index and analyze its disk I/O cost and buffer use. Use the parameters
given in Part a. Does it matter whether attribute a is a candidate key for R?
Part c. Suppose p=10, M=1000 N=20,000 and F=150. Which
algorithm is more efficient? What change in parameter values would
change which algorithm wins?
Problem 3. Assume that you have relation R with 500,000
tuples and 10
tuples per page.
The following indexes exist:
We wish to select all tuples of R with (R.x = a) AND (500 < R.y < 600) AND
(R.z = b), where a and b are constants.
Part a. List the
different algorithms that make use
of one or more of the indexes for evaluating the selection
query. Describe briefly how each
index is used in each algorithm.
Part b. Without doing a detailed analysis,
choose the algorithm from your answer to Part a that you believe is
most efficient in terms of disk I/O cost. Justify your
answer.
Part c. Do a detailed
disk I/O cost analysis of the algorithm you have chosen in Part b.
Problem 4. Assume that you have relations R and S of
Problem 1 above,
with the same
characteristics and the index of Part b. Assume that
you
also have a relation T with the same properties as relation R, but with
no indexes built for it and that attribute c of T ranges
over 3000 values. Assume that you have 52 buffers as in Problem
1. Consider the evaluation of the query