|
Warmup Parallel Programming
COS 598A, Spring 2007, Princeton University
|
Parallel Matrix
Multiplication (due March 10, 2007)
The goal
of this programming assignment is for students to learn how to program with
Pthreads and get familiar with the multiprocessors and their environments.
The assignment has two parts. You are required to use only integer
multiplications.
Part 1: Parallel Matrix Multiply
This part requires
students to write a parallel program "pmm" to solve matrix multiply C = A * B,
where A, B, and C are all N by N. Matrix A is initialized by setting
A[i,j] = i + j, for all 0 ≤
i < N and 0 ≤
j < N. Matrix B is initialized by setting B[i,j] = i + 2*j, for all 0
≤ i < N
and 0 ≤
j < N. N should be a command line argument. By typing "pmm 2000,"
pmm will run the matrix multiply with N=2000 and will print out the 10-by-10 top
left corners of matrix A, B, and C, as well as its elapse time in microseconds.
By typing "pmm 6," pmm will compute matrix multiply with N=6 and print out
entire matrix A, B, and C, as well as elapse time in microseconds.
The pecific requirements
are:
- Your pmm should
execute twice. The first time should run the multiplications with
natural data layout for A, and B. The second time should run with a
transposed matrix B. Each time, your program should print out the
three matrices and elapse time.
- You should try to
write a portable version of pmm that runs on the quad-core Xeon server, the Niagara
server, and the SGI Origin 3000.
- You should try to
figure out how to create threads to achieve the minimal elapse times.
- You should submit a
printout of your program source, and a report showing the outputs of "pmm
2000" on all three parallel machines. You should
explain why the two elapse times of requirement ! are different and explain
why you choose the method for requirement 3.
Part 2: Pipelined Parallel Matrix Multiply
This part requires students to write a parallel program ppmm using the
pipelined model to solve matrix multiplies X = A * B * C * D, where all matrices
are N by N. The matrices are initialized in the following way: A[i,j] = i
+ j; B[i,j] = i + 2 * j; C[i,j] = i + 3 * j; and D[i,j] = i+4*j, for
all 0 ≤
i < N and 0 ≤
j < N. Similar to pmm, ppmm also uses a command line argument to
set N and prints out the 10-by-10 top left cornors of all matrices, as well as
the program's elapse time. You can use pmm program you wrote for Part 1 as
the base.
The specific requirements are:
- You are required to use the pipelined model to write this parallel
program. Your program has three pipeline stages, one for each matrix
multiply. Each stage should use more than one thread to execute its
matrix multiplication. You should define a queue between two stages and
implement Enqueue() and Dequeue() operations on the queue. You are
asked to use mutex and condition variable primitives to synchronize among
threads when accessing each queue.
- This program will use one data representation that achieves the best
elapse time.
- You should try to write a portable version of ppmm that runs on
the quad-core
Xeon server, the Niagara server, and the SGI Origin 3000.
- You should submit a
printout of your program source, and a report showing the outputs of "pmm
2000" on all three parallel machines. You should
explain why the two elapse times of requirement ! are different and explain
why you choose the method for requirement 3.