Models and Algorithms for Genomic Sequences, Proteins, and Networks of Protein Interactions
Algorithms for biological sequence analysis. One of the most fruitful developments in bioinformatics in the past decade was the wide adoption of Hidden Markov Models (HMMs) and related graphical models to an array of applications such as gene finding, sequence alignment, and non-coding RNA folding. Conditional Random Fields (CRFs) are a recent alternative to HMMs, and provide two main advantages:
Networks of protein interactions. Graphs that summarize pairwise interactions between all proteins of an organism have emerged as canonical data sets that can be constructed using multiple sources of functional genomic data. We construct protein interaction networks for all sequenced microbes by integrating information extracted from genomic sequences as well as microarrays and other predictors of pairwise interactions. We then align these networks in multiple species using Graemlin, a tool that we developed for that purpose, and search for modules (subgraphs) of proteins that exhibit homology as well as conservation of pairwise interactions among many organisms. Graemlin provides substantial speed and sensitivity gains compared to previous network alignment methods; it can be used to compare microbial networks at http://graemlin.stanford.edu.