Less Talking, More Learning: Avoiding Coordination In Parallel Machine Learning Algorithms
Date and Time
Monday, March 7, 2016 - 12:30pm to 1:30pm
Location
Computer Science Large Auditorium (Room 104)
Type
CS Department Colloquium Series
Speaker
Host
Elad Hazan
The recent success of machine learning (ML) in both science and industry has generated an increasing demand to support ML algorithms at scale. In this talk, I will discuss strategies to gracefully scale machine learning on modern parallel computational platforms. A common approach to such scaling is coordination-free parallel algorithms, where individual processors run independently without communication, thus maximizing the time they compute. However, analyzing the performance of these algorithms can be challenging, as they often introduce race conditions and synchronization problems.
In this talk, I will introduce a general methodology for analyzing asynchronous parallel algorithms. The key idea is to model the effects of core asynchrony as noise in the algorithmic input. This allows us to understand the performance of several popular asynchronous machine learning approaches, and to determine when asynchrony effects might overwhelm them. To overcome these effects, I will propose a new framework for parallelizing ML algorithms, where all memory conflicts and race conditions can be completely avoided. I will discuss the implementation of these ideas in practice, and demonstrate that they outperform the state-of-the-art across a large number of ML tasks on gigabyte-scale data sets.
In this talk, I will introduce a general methodology for analyzing asynchronous parallel algorithms. The key idea is to model the effects of core asynchrony as noise in the algorithmic input. This allows us to understand the performance of several popular asynchronous machine learning approaches, and to determine when asynchrony effects might overwhelm them. To overcome these effects, I will propose a new framework for parallelizing ML algorithms, where all memory conflicts and race conditions can be completely avoided. I will discuss the implementation of these ideas in practice, and demonstrate that they outperform the state-of-the-art across a large number of ML tasks on gigabyte-scale data sets.
Dimitris Papailiopoulos is a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at UC Berkeley and a member of the AMPLab. His research interests span machine learning, coding theory, and parallel and distributed algorithms, with a current focus on coordination-free parallel machine learning, large-scale data and graph analytics, and the use of codes to speed up distributed computation. Dimitris completed his Ph.D. in electrical and computer engineering at UT Austin in 2014. At Austin he worked under the supervision of Alex Dimakis. In 2015, he received the IEEE Signal Processing Society, Young Author Best Paper Award.