Protein function prediction from big domain similarity graphs
Abstract
Thanks to recent developments in genomic sequencing technologies, the number of protein sequences in public databases is growing enormously.
To enrich and exploit this immensely valuable data, it is essential to annotate these sequences with functional properties such as Enzyme Commission (EC) numbers, for example. The January 2023 release of the Uniprot Knowledge base (UniprotKB) contains around 250 million protein sequences. However, only about half of a million of these (UniprotKB/SwissProt) have been reviewed and functionally annotated by expert curators using data extracted from the literature and computational analyses. To reduce the gap between the annotated and unannotated protein sequences, it is essential to develop accurate automatic protein function annotation techniques.
In this talk, I will present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with EC number functional descriptors from a large protein domain similarity graph. GrAPFI performs automatic inference on a network of proteins that are related according to their domain composition. The evaluation of GrAPFI shows that it gives better performance than other state of the art methods.
About the speaker
Sabeur Aridhi is an Associate Professor (Maître de conférences) of Computer Science at TELECOM Nancy at the University of Lorraine.
He is member of Capsid research team (Inria – CNRS) at the Lorraine Laboratory of Research in Computer Science and its Applications (LORIA). Previously, he worked as a postdoctoral researcher at Aalto University in Finland and as a research fellow at the University of Trento in Italy. He received his Ph.D. in Computer Science from the Blaise Pascal University, France in 2013.
His research interests include Big Data Management and Analytics, Data Mining, machine learning and Bioinformatics.