Our research is focused on computational cancer biology. Cancer is a collection of diseases largely caused by mutations in DNA sequence that allow cancer cells to grow rapidly, and to avoid destruction at the hands of the immune system or treatments. Advances in biotechnology – including DNA sequencing, gene editing, and others – promise to revolutionize our understanding of cancer biology and treatment, but the datasets generated by these technologies are so massive that they require sophisticated computational analysis. We develop such novel computational methods – sitting at the intersection of machine learning, algorithms, and statistics – to make predictions and to generate hypotheses from these new cancer datasets.
Analyzing mutations in cancer
We research the causes and effects of mutations in the DNA sequence of tumors, focused on three fundamental questions.
- What are the processes that cause mutations in cancer? A combination of intrinsic and extrinsic processes cause cancer by mutating the DNA sequence of healthy cells. These processes – for example, ultraviolet light – leave footprints in cancer genomes. We seek to develop models for these processes, integrating genomic and other data, and analyze the footprints of these processes across tumor types and contexts.
- Which mutations are responsible for cancer? Most tumors harbor dozens to hundreds of mutations, but only a handful of these mutations are responsible for causing cancer. These driver mutations occur in cancer genes, but most cancer genes are rarely mutated, making it difficult to distinguish them from other genes. We seek to identify driver mutations by searching for combinations of mutations to pathways across patients.
- What are the effects of multiple mutations? While mutations cause cancer, they also leave cancer cells vulnerable to treatments targeting individual mutations, or exploiting genetic interactions between pairs of genes. For example, in the latter case, a mutation to gene A may make the tumor vulnerable to a treatment targeting gene B, because the cancer cell requires that either A or B be mutated, but not both. We seek to develop methods to discover such genetic interactions by leveraging high-throughput experiments in yeast, fruit flies, and human cancer cell lines.
We are in the midst of one of the largest changes in cancer treatment in decades, as new drugs unleash a patient’s immune system to attack cancer cells. Different forms of immunotherapy have yielded remarkable recoveries in previously untreatable, late-stage cancers, and have become standard-of-care. However, in some ways we are still at the beginning of this paradigm shift in treatment, as researchers seek to understand why some patients do not respond to immunotherapy or suffer from large toxicities and side-effects.
The solutions to these challenges may lie in the frontier of computational cancer biology. New computational methods for problems such as predicting tumor immunogenicity, reconstructing tumor evolution, integrating single cell ‘omics data, and others, will be required to effectively model response to immunotherapy. We seek to develop these methods to model response to immunotherapy, and apply them in collaboration with biologists and clinicians.
Fairness, accountability, and transparency in machine learning
Recent advances in data-driven machine learning have led to its widespread adoption for building tools in areas such as natural language processing and computer vision. However, despite some of these tools being used by millions of people, there is currently a deficit of computational methods for ensuring these tools are fair or accountable. Consequently, machines often learn the bias encoded in their input data, from racial bias in image recognition to gender bias in word choice. We seek to develop methods to discover this bias in existing tools or models, and/or counter this bias either before or after the model is trained.