Computational Functional Genomics for Directing Biological Discoveries
Abstract:
Biological systems have been extensively studied for over a century, however we still only have a partial functional and mechanistic understanding of the interplay between genes and pathways. Recently, there has been an exponential increase in experimental datasets generated. However, the complexity of data types and the ambiguity of dataset relevance to biological processes and pathways have limited the integrated usage of this vast knowledge base for directing biological discoveries in human and model organisms. In this thesis, I develop several approaches of utilizing such public effort to address the challenges of inferring gene function and diverse biomolecular interaction networks and of improving the transfer of functional knowledge between organisms to facilitate the investigation of understudied biological processes.
Specifically, in the first part of the thesis, I show that computational functional genomics can be used to improve the transfer of gene annotations between organisms. Furthermore, I demonstrate that functional knowledge transfer, when coupled with machine learning algorithms, can improve the coverage and accuracy of gene function prediction in a diverse set of organisms. In the second part of the thesis, I provide a general method for simultaneous prediction of many interaction types genome-wide and present the results of applying this methodology in S. cerevisiae. By incrementally overlaying different interaction types as suggested by our results, investigators can make specific and testable novel hypotheses about new pathways, new pathway components, or new interconnections between existing pathways. Finally, I extend our interaction inference work in S. cerevisiae to mammalian organisms, by methodologically addressing the largest source of biological variation in the metazoan data compendium: tissue and cell-lineage heterogeneity.