Machine Learning for Investigating Post-Transcriptional Regulation of Gene Expression

PhD Candidate Gianluca Corrado
15 maggio 2017
15 maggio 2017

Time: May 15, 2017, h. 10:00 am
Location: Room Ofek, Polo scientifico e tecnologico “Fabio Ferrari”, Building Povo 1 - Povo (Trento)

PhD Candidate

Dr. Gianluca Corrado

Abstract of Dissertation

RNA binding proteins (RBPs) and non-coding RNAs (ncRNAs) are key actors inpost-transcriptional gene regulation. By being able to bind messenger RNA (mRNA) they modulate many regulatory processes. In the last years, the increasing interest in this level of regulation favored the development of many NGS-based experimental techniques to detect RNA-proteininteractions, and the consequent release of a considerable amount of interaction data on a growing number of eukaryotic RBPs.

Despite the continuous advances in the experimental procedures, these techniques are still far from fully uncovering, on their own, the global RNA-protein interaction system. For instance, the available interaction data still covers a small fraction (less than 10\%) of the known human RBPs. Moreover, experimentally determined interactions are often noisy and cell-line dependent. Importantly, obtaining genome-wide experimental evidence of combinatorial interactions of RBPs is still an experimental challenge.

Machine learning approaches are able to learn from the data and generalize the information contained in them. This might give useful insights to help the investigation of the post-transcriptional regulation. In this work, three machine learning contributions are proposed. They aim at addressing the three above-mentioned short comings of the experimental techniques, to help researchers unveiling some yet uncharacterized aspects of post-transcriptional gene regulation.

The first contribution is RNA commender, a tool capable of suggesting RNA targets to unexplored RBPs at a genome-wide level. RNA commender is a recommender system that propagates the available interaction data, considering biologically relevant aspects of the RNA-protein interactions such as protein domains and RNA predicted secondary structure.

The second contribution is ProtScan, a tool that models RNA-protein interactions at a single-nucleotide resolution. Learning models from experimentally determined interactions allows to denoise the data and tomake predictions of the RBP binding preferences in conditions that are different from those of the experiment.

The third and last contribution is PTR combiner, a tool that unveils the combinatorial aspects of post-transcriptional gene regulation. It extracts clusters of mRNA co-regulators from the interaction annotations, and it automatically provides a biological analysis that might supply a functional characterization of the set of mRNAs targeted by a cluster of co-regulators, as well as of the binding dynamics of different RBPs belonging to the same cluster.