Princeton COS 318: Operating Systems.

Warmup Parallel Programming

COS 598A, Spring 2007, Princeton University

Parallel Matrix Multiplication (due March 10, 2007)

The goal of this programming assignment is for students to learn how to program with Pthreads and get familiar with the multiprocessors and their environments. The assignment has two parts. You are required to use only integer multiplications.

Part 1: Parallel Matrix Multiply

This part requires students to write a parallel program "pmm" to solve matrix multiply C = A * B, where A, B, and C are all N by N. Matrix A is initialized by setting A[i,j] = i + j, for all 0 ≤ i < N and 0 ≤ j < N. Matrix B is initialized by setting B[i,j] = i + 2*j, for all 0 ≤ i < N and 0 ≤ j < N. N should be a command line argument. By typing "pmm 2000," pmm will run the matrix multiply with N=2000 and will print out the 10-by-10 top left corners of matrix A, B, and C, as well as its elapse time in microseconds. By typing "pmm 6," pmm will compute matrix multiply with N=6 and print out entire matrix A, B, and C, as well as elapse time in microseconds.

The pecific requirements are:

Your pmm should execute twice. The first time should run the multiplications with natural data layout for A, and B. The second time should run with a transposed matrix B. Each time, your program should print out the three matrices and elapse time.
You should try to write a portable version of pmm that runs on the quad-core Xeon server, the Niagara server, and the SGI Origin 3000.
You should try to figure out how to create threads to achieve the minimal elapse times.
You should submit a printout of your program source, and a report showing the outputs of "pmm 2000" on all three parallel machines. You should explain why the two elapse times of requirement ! are different and explain why you choose the method for requirement 3.

Part 2: Pipelined Parallel Matrix Multiply

This part requires students to write a parallel program ppmm using the pipelined model to solve matrix multiplies X = A * B * C * D, where all matrices are N by N. The matrices are initialized in the following way: A[i,j] = i + j; B[i,j] = i + 2 * j; C[i,j] = i + 3 * j; and D[i,j] = i+4*j, for all 0 ≤ i < N and 0 ≤ j < N. Similar to pmm, ppmm also uses a command line argument to set N and prints out the 10-by-10 top left cornors of all matrices, as well as the program's elapse time. You can use pmm program you wrote for Part 1 as the base.

The specific requirements are:

You are required to use the pipelined model to write this parallel program. Your program has three pipeline stages, one for each matrix multiply. Each stage should use more than one thread to execute its matrix multiplication. You should define a queue between two stages and implement Enqueue() and Dequeue() operations on the queue. You are asked to use mutex and condition variable primitives to synchronize among threads when accessing each queue.
This program will use one data representation that achieves the best elapse time.
You should try to write a portable version of ppmm that runs on the quad-core Xeon server, the Niagara server, and the SGI Origin 3000.
You should submit a printout of your program source, and a report showing the outputs of "pmm 2000" on all three parallel machines. You should explain why the two elapse times of requirement ! are different and explain why you choose the method for requirement 3.