Deciphering Disease Genomes in a Network Context

Report ID: TR-008-19
Author: Hristov, Borislav
Date: 2019-08-29
Pages: 91
Download Formats: |PDF|
Abstract:

Despite the incredible influx of sequencing data, pinpointing the gene variants responsible for the development of heterogeneous diseases remains a particularly hard task because the same phenotypic outcome (disease) can result from a myriad of combinations of different alterations across the genome. A promising avenue is to consider genome alterations within the context of pathways instead of genes because different alterations within any of several genes comprising the same pathway can have similar consequences with respect to disease development. Large-scale biological networks provide a helpful proxy for biological pathway knowledge as genes that participate in the same pathway tend to interact with each other and form modules within the larger network. In this dissertation, I introduce two novel methods that further our ability to computationally highlight potential disease-causing genes by examining disease genomes in the context of biological networks. First, in Chapter 2, I present a novel network-based approach which tackles cancer mutational heterogeneity by utilizing per-individual mutational profiles. I provide an intuitive formulation relying on balancing the size of a connected subgraph within the larger network with covering many patients. I describe a machine learning-like schema for selecting the value of the single required parameter and both an integer linear programming framework and a fast heuristic for optimizing the objective function. I demonstrate the outstanding performance of my method in identifying cancer-relevant genes, especially those mutated at very low rates. Next, in Chapter 3, I propose a general computational framework that uses prior knowledge of disease-associated genes to guide a network-based search for novel ones based upon newly acquired information. I use a graph diffusion kernel to spread the signal from the set of already known disease genes and then use it to bias a random walk originating from the newly implicated genes to move closer to the known ones. I demonstrate that integrating the two types of information is better than using iii either one of them alone. I show, in the context of cancer, that my method readily outperforms other network-based methods. Finally, I apply my approach to several complex diseases, thereby demonstrating its versatility in a broad range of settings.