Possible Parallel Programming Projects

COS 598A, Spring 2007, Princeton University


Here is a list of possible candidates for the course projects (based on a list compiled by Chris Bienia for building a benchmark suite): 

- KNNImputer: Using K-Nearest-Neighbors in the space of Genes to impute missing express values of microarray data.  You can find a sequential version of the program at here.  You may be able to get help from Curtis Huttenhower at Computer Science department and Genomic Institute.

- x264 (http://developers.videolan.org/x264.html): An MPEG-4 video encoder. This program is already parallelized, but scales poorly.

- Weka (http://www.cs.waikato.ac.nz/ml/weka/): A suite with implementations of commonly used data mining algorithms. Real-world datasets are available from the web site. You may be able to find a project from this suite.

- Sphinx-4 (http://cmusphinx.sourceforge.net/): A speech recognition framework written in Java. Already contains a few multi-threaded components. Sphinx-3 (another Sphinx version in C) is part of SPEC2006. You can consider parallelize either one.

- CreditCruncher (http://www.generacio.com/ccruncher/): Computes Value At Risk (VAR) of large credit portfolios using Monte Carlo method.
It currently supports MPI but no not multi-threading.  You will can use it as the base to parallelize it using the shared memory programming model.

- SpamAssassin (http://spamassassin.apache.org/): The most well-known Spam Filter. Very large, but maybe parallelization of one or two modules might be a suitable project.

- Gnu Go (http://www.gnu.org/software/gnugo/gnugo.html): An artificial intelligence which plays go. Part of SPEC2006.

- Clam AntiVirus (http://www.clamav.net): Open-source virus scanner.