Problem Set Number 9
Computer Science 471

Due by 5 PM, WEDNESDAY (not Monday) Dec. 4, 1996

Questions 1 and 2 both ask you to write programs using only the 11 instructions from the MIPS IV architecture that are described in the R10000 handout, section 3 (beq, beql, dadd, daddi, ld, movn, movz, slt, slti, pref, sd). Question 1 further restricts this subset. Remember that registers are now 64 bits wide, as are virtual addresses. Here is the initial situation for both questions:

Both questions ask you to write a program that computes the sum of all elements of V that are less than C, leaving the result in register $10. Both programs will be graded on correctness and efficiency.

1. (5 points) First, write this program without using the instructions beql, movn, movz, and pref. Don't unroll the loop or do software pipelining or other advanced trickery; the simple, straightforward, correct answer is all we're looking for.

2. (15 points) Your solution to Question 1 probably has a branch in the middle of the loop, based on whether the vector element was less than C or not. If the elements of V are distributed randomly around C, that branch is not likely to be very predictable by the sophisticated mechanisms used in modern processors like the R10000. Mispredicted branches can be quite costly. The conditional move instructions movn and movz can help.

Your solution to Question 1 will also suffer from cache misses as the elements of V are loaded from memory. The prefetch instruction pref is designed to help with this problem.

Please write another MIPS IV program to solve the same problem, but use any of the subset instructions you like, and try to avoid the branch-mispredict and cache miss problems noted above. Assume your cache has 8-byte blocks and contains no elements of V when your program starts. Assume that the miss penalty of your cache is about twice the time of a single iteration of your loop. Use the ``load'' hint in your prefetch instructions.

3. (10 points) The R10000 has two integer pipelines and one load/store pipeline; see the picture on page 5 of the ``User's Manual'' excerpt in section 5 of the handout: Figure 1-4 Superscalar Pipeline Architecture in the R10000. All three of these pipes read registers and write registers. Assume that all possible bypasses (or forwarding paths) are included in the design. Please draw a picture of these three pipelines at about the same level of detail as Figure 6.38 from the textbook, and clearly show all the bypasses necessary.

4. (5 points) What did you think of Chapter 7? Please tell the authors and publisher in the usual way: fill out http://www.cs.princeton.edu/courses/cs471/chap7.txt. To get the 5 points for this question, just write down the time you sent your message to the publisher.

5. (1 point) How long did this take you, not counting the reading, and with whom did you work?