Clustering involves placing entities into mutually exclusive categories. We wish to relax the requirement of mutual exclusivity, allowing objects to belong simultaneously to multiple classes, a formulation that we refer to as "feature allocation." The first step is a theoretical one. In the case of clustering the class of probability distributions over exchangeable partitions of a dataset has been characterized (via exchangeable partition probability functions and the Kingman paintbox). These characterizations support an elegant nonparametric Bayesian framework for clustering in which the number of clusters is not assumed to be known a priori. We establish an analogous characterization for feature allocation; we define notions of "exchangeable feature probability functions" and "feature paintboxes" that lead to a Bayesian framework that does not require the number of features to be fixed a priori. The second step is a computational one. Rather than appealing to Markov chain Monte Carlo for Bayesian inference, we develop a method to transform Bayesian methods for feature allocation (and other latent structure problems) into optimization problems with objective functions analogous to K-means in the clustering setting. These yield approximations to Bayesian inference that are scalable to large inference problems.
Tamara Broderick is a PhD candidate in the Department of Statistics at the University of California, Berkeley. Her research in machine
learning focuses on the design and study of Bayesian nonparametric models, with particular emphasis on feature allocation as a
generalization of clustering that relaxes the mutual exclusivity and exhaustivity assumptions of clustering. While at Berkeley, she has
been a National Science Foundation Graduate Student Fellow and a Berkeley Fellowship recipient. She graduated with an AB in Mathematics from Princeton University in 2007---with the Phi Beta Kappa Prize for highest average GPA in her graduating class and with Highest Honors in Mathematics. She spent the next two years on a Marshall Scholarship at the University of Cambridge, where she received a Masters of Advanced Study in Mathematics for completion of Part III of the Mathematical Tripos (with Distinction) in 2008 and an MPhil by Research in Physics in 2009. She received a Masters in Computer Science from UC Berkeley in 2013.