Quick links

Colloquium

Democratizing content distribution

Date and Time
Wednesday, March 28, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Michael J Freedman, from Stanford / NYU
Host
Vivek Pai
In order to reach their large audiences, today's Internet publishers primarily use content distribution networks (CDNs) to deliver content. Yet the architectures of the prevalent commercial systems are tightly bound to centralized control, static deployments, and trusted infrastructure, thus inherently limiting their scope to ensure cost recovery. This talk describes a number of techniques (and the resulting systems) that realize highly-scalable cooperative content distribution. I show how to build a fully self-organizing CDN that efficiently delivers content using unreliable nodes, describe how to transparently dispatch clients to nearby servers, and touch on how to securely leverage even untrusted resources. These ideas have been implemented, deployed, and tested in production systems currently serving several terabytes of data to more than a million people every day. The view of the Internet gained from deploying these systems answered long-standing questions about network configuration, while providing important insights that helped evolve our systems' designs.

Evolutionary Escape on Fitness Landscapes

Date and Time
Tuesday, March 27, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Niko Beerenwinkel, from Harvard
Host
Mona Singh
The evolution of HIV within individual patients is associated with disease progression and failure of antiretroviral drug therapy. Using graphical models we describe the development of HIV drug resistance mutations and show how these models improve predictions of the clinical outcome of combination therapy. We present combinatorial algorithms for computing the risk of escape of an evolving population on a given fitness landscape. The geometry of fitness landscapes and the underlying gene interactions are analyzed in an attempt to generalize the notion of pairwise epistasis to higher-order genetic systems. Finally, we discuss the new and exciting prospects for analyzing viral genetic variation that arises from recent pyro-sequencing technology.

Object Recognition and Feedforward models of the Ventral Stream in Visual Cortex: What is Next?

Date and Time
Thursday, March 15, 2007 - 4:15pm to 5:45pm
Location
Computer Science Large Auditorium (Room 104)
Type
Colloquium
Speaker
Tomaso Poggio, from MIT
Host
Fei-fei Li
T. Poggio (Center for Biological and Computational Learning, Computer Science and Artificial Intelligence Laboratory and McGovern Institute for Brain Research, M.I.T.)

Understanding the processing of information in our cortex is a significant part of understanding how the brain works and of understanding intelligence itself, arguably one of the greatest problems in science today. In particular, our visual abilities are computationally amazing and we are still far from imitating them with computers. Thus, visual cortex may well be a good proxy for the rest of the cortex and indeed for intelligence itself. But despite enormous progress in the physiology and anatomy of the visual cortex, our understanding of the underlying computations remains fragmentary.

I will briefly review hierarchical feedforward quantitative models of the ventral stream which, heavily constrained by physiology and biophysics, are surprisingly successful in explaining several physiological data and psychophysical results in rapid scene categorization. I will then focus on the limitations of such models for object recognition, suggesting specific questions about the computational role of attention and about recognition tasks beyond scene classification.

The Limits of Quantum Computers

Date and Time
Thursday, March 8, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Scott Aaronson, from University of Waterloo
Host
Boaz Barak
In the popular imagination, quantum computers would be almost magical devices, able to "solve impossible problems in an instant" by trying exponentially many solutions in parallel. In this talk, I'll describe four results in quantum computing theory that directly challenge this view.

First, I'll show that any quantum algorithm to decide whether a function f:[n]->[n] is one-to-one or two-to-one needs to query the function at least n^{1/5} times. This provides strong evidence that collision-resistant hash functions, and hence secure electronic commerce, would still be possible in a world with quantum computers.

Second, I'll show that in the "black-box" or "oracle" model that we know how to analyze, quantum computers could not solve NP-complete problems in polynomial time, even with the help of nonuniform "quantum advice states."

Third, I'll show that quantum computers need exponential time to find local optima -- and surprisingly, that the ideas used to prove this result also yield new classical lower bounds for the same problem.

Finally, I'll show how to do "pretty-good quantum state tomography" using a number of measurements that increases only linearly, not exponentially, with the number of qubits. This illustrates how one can sometimes turn the limitations of quantum computers on their head, and use them to develop new techniques for experimentalists.

No quantum computing background will be assumed.

Learning Predictive Models of Gene Regulation

Date and Time
Wednesday, March 7, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Christina Leslie, from Columbia University
Host
Robert Schapire
Studying the behavior of gene regulatory networks by learning from high-throughput genomic data has become one of the central problems in computational systems biology. Most work in this area has focused on learning structure from data -- e.g. finding clusters or modules of potentially co-regulated genes, or building a graph of putative regulatory "edges" between genes -- and has been successful at generating qualitative hypotheses about regulatory networks.

Instead of adopting the structure learning viewpoint, our focus is to build predictive models of gene regulation that allow us both to make accurate quantitative predictions on new or held-out experiments (test data) and to capture mechanistic information about transcriptional regulation. Our algorithm, called MEDUSA, integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, the algorithm learns to predict up/down expression of target genes by identifying condition-specific regulators and discovering regulatory motifs that may mediate their regulation of targets. We use boosting, a technique from statistical learning, to help avoid overfitting as the algorithm searches through the high dimensional space of potential regulators and sequence motifs. We will report computational results on the yeast environmental stress response, where MEDUSA achieves high prediction accuracy on held-out experiments and retrieves key stress-related transcriptional regulators, signal transducers, and transcription factor binding sites. We will also describe recent results on hypoxia in yeast, where we used MEDUSA to propose the first global model of the oxygen sensing and regulatory network, including new putative context-specific regulators. Through our experimental collaborator on this project, the Zhang Lab at Columbia University, we are in the process of validating our computational predictions with wet lab experiments, with encouraging preliminary results.

Looking at People

Date and Time
Wednesday, February 21, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
David Forsyth, from UIUC
Host
Fei-fei Li
There is a great need for programs that can describe what people are doing from video. This is difficult to do, because it is hard to identify and track people in video sequences, because we have no canonical vocabulary for describing what people are doing, and because the interpretation of what people are doing depends very strongly on what is nearby.

Tracking is hard, because it is important to track relatively small structures that can move relatively fast, for example,lower arms. I will describe research into kinematic tracking, tracking that reports the kinematic configuration of the body, that has resulted in a fairly accurate, fully automatic tracker, that can keep track of multiple people.

Once one has tracked the body, one must interpret the results. One way to do so is to have a motion synthesis system that takes the track, and produces a motion that is (a) like a human motion and (b) close to the track. Our work has produced a high-quality motion synthesis system that can produce motions that look very much like human activities. I will describe work that couples that system with a tracker to produce a description of the activities, entirely automatically.

It is often very difficult to obtain a rich set of examples of activities, because activities appear to compose in complex ways. I will describe methods to extract a representation of activity that can be searched with a text-like query.

I will speculate on some of the many open problems. What should one report? how do nearby objects affect one's interpretation of activities? how can one interpret patterns of behaviour?

We describe methods to determine what people are doing in a video sequence.

Reach for A*: Shortest Paths on Road Networks

Date and Time
Wednesday, November 15, 2006 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Renato Werneck
Host
Robert Tarjan
We study the problem of finding point-to-point shortest paths in a setting where preprocessing is allowed. We are interested in exact algorithms only, and our focus is on road networks. Our algorithms use several different strategies to speed up the queries. First, we use A* search to direct the search towards the goal. Second, we use the notion of "reach" to identify and prune local intersections.

Third, we add shortcuts to the graph to reduce the number of highway (i.e., non-local) edges. Our best query algorithm is thousands of times faster than Dijkstra's algorithm on the road networks of the United States and Western Europe, each with roughly 20 million vertices. This is joint work with Andrew Goldberg, Chris Harrelson, and Haim Kaplan.

Models and Algorithms for Genomic Sequences, Proteins, and Networks of Protein Interactions

Date and Time
Wednesday, November 8, 2006 - 4:00pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Serafim Batzoglou, from Stanford
Host
Olga Troyanskaya
This talk has two parts: the first part is on new ways to model and analyze biological sequences; the second part describes methods for constructing and comparing protein interaction networks, which are emerging as canonical data sets of the post-genomic era.

Algorithms for biological sequence analysis. One of the most fruitful developments in bioinformatics in the past decade was the wide adoption of Hidden Markov Models (HMMs) and related graphical models to an array of applications such as gene finding, sequence alignment, and non-coding RNA folding. Conditional Random Fields (CRFs) are a recent alternative to HMMs, and provide two main advantages: (1) they enable more elaborate modeling of biosequences by allowing us to conveniently describe and select rich feature sets. For example, when comparing two residues during protein alignment, using a CRF allows leveraging in a principled manner the chemical properties of the neighborhood of those residues. (2) CRFs allow training of parameters in a way that is more effective for making predictions on new input sequences. I will describe three practical CRF-based tools that improve upon state-of-the-art methods in terms of accuracy: CONTRAlign, a protein aligner; CONTRAST, a gene finder; and CONTRAfold, a method for predicting the secondary structure of non-coding RNAs. Our tools are available at http://contra.stanford.edu.

Networks of protein interactions. Graphs that summarize pairwise interactions between all proteins of an organism have emerged as canonical data sets that can be constructed using multiple sources of functional genomic data. We construct protein interaction networks for all sequenced microbes by integrating information extracted from genomic sequences as well as microarrays and other predictors of pairwise interactions. We then align these networks in multiple species using Graemlin, a tool that we developed for that purpose, and search for modules (subgraphs) of proteins that exhibit homology as well as conservation of pairwise interactions among many organisms. Graemlin provides substantial speed and sensitivity gains compared to previous network alignment methods; it can be used to compare microbial networks at http://graemlin.stanford.edu.

Leaving the Span

Date and Time
Wednesday, October 25, 2006 - 4:00pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Manfred K. Warmuth, from University of California, Santa Cruz
Host
Robert Schapire
When linear models are too simple then the following "kernel trick" is commonly used: Expand the instances into a high-dimensional feature space and use any algorithm whose linear weight vector in feature space is a linear combination of the expanded instances. Linear models in feature space are typically non-linear in the original space and seemingly more powerful. Also dot products can still be computed efficiently via the use of a kernel function.

However we discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the embedded training instances as its weight vector, no matter what embedding is used. We show that these algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned.

Surprisingly the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances. This algorithm enforces "additional constraints" on the weights (all must be non-negative and sum to one) and in some cases these constraints alone force the rank of the weight space to grow as fast as 2k.

(Joint work with S.V.N. Vishwanathan!)

New Market Models and Algorithms

Date and Time
Wednesday, October 4, 2006 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Vijay Vazirani, from Georgia Tech
Host
Moses Charikar
The notion of a ``market'' has undergone a paradigm shift with the Internet -- totally new and highly successful markets have been defined and launched by companies such as Google, Yahoo!, Amazon, MSN and Ebay. Another major change is the availability of massive computational power for running these markets in a centralized or distributed manner.

In view of these new realities, the study of market equilibria, an important, though essentially non-algorithmic, theory within Mathematical Economics, needs to be revived and rejuvenated with an inherently algorithmic approach. Such a theory should not only address traditional market models but also define new models for some of the new markets.

In this two-talk series, I will give a feel for the exciting work going on on this front. Interestingly enough, this work has also contributed handsomely to the theory of algorithms itself. In particular, the highly successful primal-dual schema from exact and approximation algorithms, which was so far used for combinatorially solving special classes of linear programs, has been extended to solving nonlinear convex programs.

Both talks are self-contained; the first is meant for a general audience and will be the Princeton CS department colloquium on Wednesday, Oct 4. The second will be at the Princeton theory lunch on Friday, Oct 6: http://www.cs.princeton.edu/theory/index.php/Main/TheoryLunch

1). Resource Allocation Markets

This talk is based on the following three papers:
http://www-static.cc.gatech.edu/fac/Vijay.Vazirani/adwords.pdf
http://www-static.cc.gatech.edu/fac/Vijay.Vazirani/EG.pdf
http://www-static.cc.gatech.edu/fac/Vijay.Vazirani/EG2.pdf

2). Spending Constraint Utilities, with Applications to the Adwords Market

This talk is based on:
http://www-static.cc.gatech.edu/fac/Vijay.Vazirani/spending.pdf

BIO: Vijay Vazirani got his Bachelor's degree in Computer Science from MIT in 1979 and his PhD from the University of California at Berkeley in 1983. The central theme in his research career has been the design of efficient algorithms. In addition, he has also worked on complexity theory, cryptography, coding theory and game theory. In 2001 he published what is widely regarded as the definitive book on Approximation Algorithms; this book has been translated into Japanese, Polish and French. He is currently involved in editing a comprehensive volume on Algorithmic Game Theory. He is a Fellow of the ACM.

Follow us: Facebook Twitter Linkedin