Scalable Data Analytics through High-Performance Abstraction
Massively parallel systems are vital for processing large data volumes at unprecedented speeds, playing a critical role in data analysis. However, programming high-performance computing systems poses significant productivity and scalability challenges. This research aims to enable emerging data-intensive scientific areas to achieve performance and scalability on extreme-scale systems while maintaining productivity through high-performance abstraction.
In particular, we focus here on advances in genome sequencing, which have led to a flood of genomic data that pose enormous computational challenges and require new bioinformatics approaches. Genomic applications are often irregular and unstructured, making them difficult for distributed-memory parallelism.
This work demonstrates the feasibility of writing highly parallel code for irregular genomic computation through the sparse matrix abstraction in the context of de novo long-read genome assembly. ELBA (Extreme-Scale Long-Read Berkeley Assembler) reduces the runtime for mammalian genomes from days on a single processor to less than 30 minutes on a supercomputer.
About the Speaker
Guidi is an Assistant Professor of Computer Science at Cornell University, working on high-performance computing for large-scale computational sciences (especially computational biology). Her research involves developing algorithms, software infrastructures, and systems for parallel machines to speed up data processing without sacrificing programming productivity and to make high-performance computing more accessible. Guidi received her Ph.D. in Computer Science from the University of California Berkeley. Prior to joining Cornell Bowers CIS, she worked as a project scientist in the Performance and Algorithms Research Group in the Applied Math and Computational Sciences Division at Lawrence Berkeley National Laboratory in Berkeley, California, where she is also currently an Affiliate Faculty.