Per partecipare agli eventi (telematici), contattare alessandro.oneto [at] unitn.it
Nicole Bussola, FBK
- Mercoledì 5 Maggio, 15.30 - 17.30
- Giovedì 6 Maggio, 08.30 - 10.30
- Mercoledì 12 Maggio, 15.30 - 17.30
- Giovedì 13 Maggio, 08.30 - 10.30
- Mercoledì 19 Maggio, 15.30 - 17.30
During the last decade many scientific fields have undergone a tremendous revolution with the accelerated development of technologies, leading to an explosion of the amounts of complex, high-dimensional data. The translation of these data into relevant information requires several challenging steps, including data curation, data visualization and data analysis. Data visualization is a crucial part of the analysis process but it can get very tricky, especially for high-dimensional dataset; although there is no silver bullet approach, several methods can be implemented to reduce the dimensionality of the data. Dimensionality reduction is not only necessary to visualize data on a “comfortable” space, but it is also beneficial to possibly reduce
the noise before a data mining algorithm can be successfully applied.
Among the different dimensionality reduction approaches, the recently introduced UMAP algorithm (McInnes L, 2018) is a state-of-the-art manifold-learning dimensionality reduction technique, based on a strong mathematical framework that leverages algebraic geometry and topology. Computational topology has been successfully applied to a wide range of disciplines, including data analysis, pattern recognition, and machine learning. Topological descriptors, such as persistent diagrams, can characterize the structure of a dataset and be used to build topology-based machine learning models.
In particular, an increasing number of Topological Data Analysis (TDA) tools have been adopted to improve machine learning and deep learning techniques. For example, TDA frameworks have been integrated to convolutional neural networks using persistent homology and persistence landscapes to investigate the role of repeated clinical measures and biological variables in predicting response to drug treatments in clinical trials.
In these seminars, several dimensionality reduction techniques will be described, including linear methods (e.g. PCA, MDS) and non-linear manifold learning methods (e.g. tSNE, UMAP), along with the corresponding mathematical background. Also, TDA-based approaches in data analysis and machine learning will be illustrated, in particular upon the literature available in the context of precision medicine. Each seminar will be combined with a hands-on session providing practical examples to test the described algorithms on toy datasets, based on the Python language and TDA-specific libraries.