Problem 1 When discussing I/O costs in class, we used parameters D, the average time to read or write a page from disk, and B, the number of data pages in a file. Parameter D reflects seek time, rotational latency and transfer time. Suppose we introduce additional parameters T, which is only the transfer time of a data page, and P, the number of data pages per cylinder of the disk.
Part a: Using parameters D, B, T, and P, give an equation for the I/O cost of a range search on a sorted file stored contiguously on disk. (I have used the term "sequentially" in class. "Sequentially" is somewhat ambiguous - does it refer to search key order or disk sector order. By "contiguously" I mean the file pages are stored in order on the disk so that arm movement is minimized: consecutively on the same track, then on the same cylinder, then on adjacent cylinders. There is no rotational latency in reading from one track to the next within a cylinder. )
Part b: Now consider the index
structure shown in Figure 10.2
on page 340 of
Database Management Systems by Ramakrishnan and Gehrke.
Suppose that the index file is sorted by key values (Ki's) and stored
contiguously
on disk. Let I be the number of data pages in the index file.
Assume
that the data file has B pages and that these are not stored
contiguously
on disk. What is the equation for the I/O cost of range search for this
index file and data file configuration? When is this a savings over
your
answer of part a?
Problem 2 In
class we executed inserts and delete on an order one B+ tree, finishing
with tree shown in this pdf file.
Part a:
Delete the entry with B+ tree key value 29 from the B+ tree.
Describe each step of the deletion.
Part b: Delete the entry
with B+ tree key value 27 from the B+ tree resulting from Part a.
Describe each step of the deletion.
Problem 3 In
this problem we consider B+ trees on a file that has large records: one
record requires a page to store. In contrast, the order of the B+ tree
is d=16 and each interior node of the B+ tree is stored in one page;
therefore up to 32 B+ tree keys and 33 pointers fit in one page.
You may assume the B+ tree key is a candidate key.
Part c:
Discuss the pros and cons of each of the alternatives (represented by
parts A and B) for building a B+ tree for this file.
Problem 4 As
discussed in class and in Section
10.7 of Database Management
Systems by Ramakrishnan and Gehrke, for B+ trees, there are
several ways to deal with records that have the same B+ tree key
value (duplicate keys). The figure below shows an B+ tree
of order 2 for the instance in Figure 10.30 on page 368 of Database Management Systems by
Ramakrishnan and Gehrke. This B+ tree uses duplicate keys in the
interior nodes and has leaves containing pointers of the form
<page-id,slot> to the actual data pages. (See Exercise
10.10, part 2 for an explanation of the <page-id, slot> notation.
)
Consider the following different scheme for dealing with duplicate
key values: Interior nodes satisfy B+ tree properties, and there
are no duplicate keys in interior nodes. Each data entry in
a leaf is of the form (B+ tree key
value, list of pointers to all records with that key value indexed by
the tree). This means the entries for key values are of
variable length in the leaves. More than one key value can reside
in a leaf, but the entire list of pointers for a key value must be in
the same leaf. The only use of overflow pages is if the list of
pointers for a key value cannot fit in one leaf; then the leaf
consists of the key value and as much of the list as will fit and the
rest of the list is in one or more overflow pages.
Problem 5 Let 0, 1, 2, 3, 64, 128, 192, 4,
8, 16, 32
be a sequence
of hash key values to be inserted in a dynamic hashing index. You
may view each value in the sequence to be the result of applying some
base hash function
h to the
actual hash field value of a record.