Colloquium | Page 12 | Computer Science Department at Princeton University

Probabilistic Models of Text and Images

Date and Time

Wednesday, April 6, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

David Blei, from CMU

Host

Andrea LaPaugh

Managing large and growing collections of information is a central goal of modern computer science. Data repositories of texts, images, music, and genetic information have become widely accessible, necessitating good methods of retrieval, organization, and exploration. In this talk, I will describe probabilistic models of information collections, for which the above problems can be cast as statistical queries.

First, I will describe the use of graphical models as a flexible framework for the representation of modeling assumptions. Fast posterior inference algorithms based on variational methods allow us to specify complex Bayesian models and apply them to large datasets.

With this framework in hand, I will develop latent Dirichlet allocation (LDA), a graphical model particularly suited to analyzing text collections. LDA posits an index of hidden topics which describe the underlying documents. The topics are learned from a collection, and new documents can be situated into that collection via posterior inference. Extensions of LDA can index a set of images, or multimedia collections of related text and images. I will illustrate the use of such models with several datasets.

Finally, I will describe nonparametric Bayesian methods for relaxing the restriction to a fixed number of topics. These methods allow for models based on the natural assumption that the number of topics grows with the collection. I will extend this idea to trees, and to models for discovering both the structure and content of a topic hierarchy.

Joint work with Michael Jordan, Andrew Ng, Thomas Griffiths, and Josh Tenenbaum

Putting Program analysis to Work at Microsoft

Date and Time

Friday, April 1, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Zhe Yang, from Microsoft

Host

Daniel Wang

Over the last few years, product groups at Microsoft have increasingly relied on automated defect detection tools to find software failures before releasing products to customers. In this talk, I will describe how a small set of program analysis / compiler geeks have tried to change the way we build software. In particular, I'll talk about a new approach to defect detection that involves the use of simple but effective specifications, automatic inference of specifications, powerful defect detection tools, and plenty of good old fashioned software engineering process.

Bio

Zhe Yang works in the Center for Software Excellence at Microsoft Corporation, where he leads the Engine development in the Program Analysis research group. This group is responsible for building innovative tools that help programmers, both within Microsoft and elsewhere, improve the quality of the software they create. Zhe Yang received a bachelor's degree from Shanghai Jiao Tong University and a PhD from New York University.

The Eyes Have it

Date and Time

Wednesday, March 30, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Ben Shneiderman, from University of Maryland

Host

Olga Troyanskaya

User Interfaces for Information Visualization"* *Human perceptual skills are remarkable, but largely underutilized by current graphical user interfaces. The next generation of animated GUIs and visual data mining tools can provide users with remarkable capabilities if designers follow the Visual Information-Seeking Mantra: Overview first, zoom and filter, then details-on-demand Then dynamic queries allow user control of widgets, such as sliders and buttons that update the result set within 100msec. Seven types of information visualizations (1-, 2-, 3-, multi-dimensional data, temporal, tree and network data) will be shown in examples for U.S. Census, time series searching, and gene expression data. Commercial success stories based on our early work include multi-dimensional data in dynamic scattergrams (www.spotfire.com), hierarchical stock market data in treemaps (www.smartmoney.com/marketmap), and production monitoring/product catalogs in treemaps (www.hivegroup.com).

This talk will emphasize scientific and statistical data analysis such as gene expression studies, multi-variate temporal data sets, and hierarchical clustering. For more info and to download programs, visit: www.cs.umd.edu/hcil/treemap www.cs.umd.edu/hcil/timesearcher www.cs.umd.edu/hcil/hce

OpenDHT: A Public DHT Service

Date and Time

Monday, March 28, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Sean Rhea, from UC Berkeley

Host

Vivek Pai

Website

http://www.srhea.net/

Large-scale distributed systems are hard to deploy, and distributed hash tables (DHTs) are no exception. To lower the barriers facing DHT-based applications, we have created a public DHT service called OpenDHT. Designing a DHT that can be widely shared, both among mutually-untrusting clients and among a variety of applications, poses two distinct challenges. First, there must be adequate control over storage allocation so that greedy or malicious clients do not use more than their fair share. Second, the interface to the DHT should make it easy to write simple clients, yet be sufficiently general to meet a broad spectrum of application requirements. In this talk I will describe our solutions to these design challenges. I'll also report on our early deployment experiences with OpenDHT and describe the variety of applications already using the system.

Event (no name)

Date and Time

Thursday, March 24, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

TBD

Host

Adam Finkelstein

Computation and Learning in Economic Networks

Date and Time

Wednesday, March 23, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Sham Kakade, from University of Pennsylvania

Host

Robert Schapire

Website

http://www.cis.upenn.edu/~skakade/

Network models for game theory and economics provide a powerful framework for studying strategic interactions in large population settings. The semantics of these networks are that nodes represent parties (e.g. players or consumers) and edges represent strategic or economic interactions. These models allow the incorporation of rich structure into the network, allowing the promise of increased applicability of strategic reasoning to large, complex systems.

In this talk, I will present algorithms and learning models for game theoretic and economic equilibria --- focusing on how the network structure influences the learning process and the outcomes. This work highlights many natural connections to AI and modern probabilistic modeling. I will also provide results at the intersection of this line of study and topics in social network theory.

This is joint work with Dean Foster, Michael Kearns, John Langford, Luis Ortiz, Robin Pemantle, and Siddharth Suri.

Decentralized Security Mechanisms for Internet Routing

Date and Time

Monday, March 21, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Lakshmi Subramanian, from UC Berkeley

Host

Jennifer Rexford

Website

http://www.cs.berkeley.edu/~lakme/

Today's Internet is at risk. A single misbehaving router--whether through misconfiguration or malicious intent--can hijack routes, bringing down over a third of the Internet. This critical vulnerability stems from the pervasive assumption inherent in existing protocols that any information propagated by routers is correct. Emerging security proposals for Internet routing require a public key infrastructure and a trusted central authority, and thus are unlikely to see wide deployment.

In this talk, I will first describe Listen and Whisper, two decentralized and deployable security mechanisms that improve the security of the Border Gateway Protocol (BGP), the current inter-domain routing protocol. Their combination eliminates the threat of route hijacking due to misconfigurations and restricts the damage that deliberate attackers can cause. Using a real-world deployment of these mechanisms within the Berkeley campus network, we have been able to detect several routing anomalies.

Then, I will show how these techniques can be extended to provide a foundational suite of security primitives to achieve secure routing in an arbitrary network against a bounded number of adversaries. These techniques address two open theoretical problems: (a) Under what constraints can one achieve decentralized key distribution given a bounded number of adversaries? (b) When can one achieve Byzantine agreement if the underlying graph is not known to the nodes?

Bio

Lakshminarayanan Subramanian is currently a PhD candidate at UC Berkeley working with Professors Randy H. Katz, Ion Stoica and Scott Shenker. He received an M.S. in Computer Science from UC Berkeley in 2002 and a B.Tech in Computer Science from the Indian Institute of Technology, Madras in 1999. His research interests are in the areas of networking and distributed systems with specific emphasis on routing, network security, Internet architecture, overlay networks and quality of service.

Memorization and Association on a Realistic Neural Model

Date and Time

Friday, March 11, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Leslie G. Valiant, from Harvard University

Host

Sanjeev Arora

Website

http://www.deas.harvard.edu/faculty/profile/Leslie_Valiant

A central open question of computational neuroscience is to identify the data structures and algorithms that are used in mammalian cortex to support successive acts of the basic cognitive tasks of memorization and association. This talk addresses the simultaneous challenges of realizing these two distinct tasks with the same data structure, and doing so while respecting the following four basic quantitative parameters of cortex, the neuron number, the synapse number, the synapse strengths, and the switching times. Previous work apparently had not succeeded in reconciling all these opposing constraints, the low values of synapse strengths that are typically observed experimentally having contributed a particular obstacle. We describe a computational scheme that supports both memory formation and association, and is feasible on networks of model neurons that respect the widely observed values of the above-mentioned four quantitative parameters. Our scheme allows for both disjoint and shared representations. The algorithms are simple, and in one version both memorization and association require just one step of neighborly influence. The issues of interference among the different circuits that are established, of robustness to noise, and of the stability of the hierarchical memorization process are addressed. A calculus, therefore, is implied for analyzing the capabilities of particular neural systems and subsystems, in terms of their basic numerical parameters.

Interaction and Audiovisual Abstraction

Date and Time

Wednesday, March 9, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Golan Levin, from CMU

Host

Adam Finkelstein

Website

http://www.flong.com/

Golan Levin is an artist, composer, performer and engineer interested in developing artifacts and events which explore supple new modes of reactive expression. His work focuses on the design of systems for the creation, manipulation and performance of simultaneous image and sound, as part of a more general inquiry into the formal language of interactivity, and of nonverbal communications protocols in cybernetic systems. Through performances, digital artifacts, and virtual environments, often created with a variety of collaborators, Levin applies creative twists to digital technologies that highlight our relationship with machines, make visible our ways of interacting with each other, and explore the intersection of abstract communication and interactivity. Identified by Technology Review as one of the world's "Top 100 Innovators Under 35" [2004], and dubbed by El Pais as "one of the most brilliant figures in contemporary audiovisual art" [2002], Levin has exhibited widely in Europe, America and Asia.

Computational Foundations for Statistical Learning: Enabling Massive Science

Date and Time

Wednesday, March 2, 2005 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Alexander Gray, from CMU

Host

Jaswinder Singh

The data sciences (statistics, and recently machine learning) have always been part of the underpinning of all of the natural sciences. `Massive datasets' represent potentially unprecedented capabilities in a growing number of fields, but most of this potential remains unlocked, due to the computational intractability of the most powerful statistical learning methods. The computational problems underlying many of these methods are related to some of the hardest problems of applied mathematics, but have unique properties which make classical solution classes inappropriate. I will describe the beginnings of a unified framework for a large class of problems, which I call generalized N-body problems. The resulting algorithms, which I call multi-tree methods, appear to be the fastest practical algorithms to date for several foundational problems. I will describe four examples -- all-nearest-neighbors, kernel density estimation, distribution-free Bayes classification, and spatial correlation functions, and touch on two more recent projects, kernel matrix-vector multiplication and high-dimensional integration. I'll conclude by showing examples where these algorithms are enabling previously intractable data analyses at the heart of major modern scientific questions in cosmology and fundamental physics.