Nonlinear Embedding Methods in Modern Data Visualization: Theory & Practice
Learning and representing low-dimensional structures from noisy and possibly high-dimensional data is an indispensable component of modern data science. Recently, a special class of nonlinear embedding methods has become particularly influential, most notably, the t-distributed stochastic neighbor embedding (t-SNE) and the uniform manifold approximation and projection (UMAP). Despite their empirical success in many research fields, these algorithms are oftentimes subject to criticisms such as lack of theoretical understanding, unclear interpretations, sensitivity to tuning parameters, etc.
This talk will present a novel theoretical framework for understanding and explaining the exceptional performance of t-SNE and other related algorithms for visualizing high-dimensional clustered data. The results uncover the intrinsic mechanism, the large-sample limits, and several fundamental principles behind the algorithms; they also have practical implications to improve the current nonlinear embedding methods in real-world applications, such as enabling efficient selection of tuning parameters, improving normativity of analytic praxis, and avoiding common interpretive pitfalls. Recognizing current limitations, it will also introduce some new approaches and ideas that may lead to more accountable and reliable dimension reduction and data visualization.
Join our talk to dive into data and explore the cutting-edge techniques of dimension reduction and data visualization, along with their practical applications.
Date and Time
Location
Hosts
Registration
- Date: 15 Apr 2024
- Time: 11:00 PM UTC to 12:00 AM UTC
-
Add Event to Calendar
Speakers
Rong Ma of Harvard University
Nonlinear Embedding Methods in Modern Data Visualization: Theory & Practice
Learning and representing low-dimensional structures from noisy and possibly high-dimensional data is an indispensable component of modern data science. Recently, a special class of nonlinear embedding methods has become particularly influential, most notably, the t-distributed stochastic neighbor embedding (t-SNE) and the uniform manifold approximation and projection (UMAP). Despite their empirical success in many research fields, these algorithms are oftentimes subject to criticisms such as lack of theoretical understanding, unclear interpretations, sensitivity to tuning parameters, etc. In this talk, Prof. Ma will present a novel theoretical framework for understanding and explaining the exceptional performance of t-SNE and other related algorithms for visualizing high-dimensional clustered data. The results uncover the intrinsic mechanism, the large-sample limits, and several fundamental principles behind the algorithms; they also have practical implications to improve the current nonlinear embedding methods in real-world applications, such as enabling efficient selection of tuning parameters, improving normativity of analytic praxis, and avoiding common interpretive pitfalls. Recognizing current limitations, he will also introduce some new approaches and ideas that may lead to more accountable and reliable dimension reduction and data visualization.
Biography:
Rong Ma is an Assistant Professor of Biostatistics at Harvard T.H. Chan School of Public Health at Harvard University. He received his Ph.D. in biostatistics from the University of Pennsylvania, and was a postdoctoral scholar in statistics at Stanford University. His current research focuses on statistical inference for large random matrices, nonlinear embedding theory, and manifold learning for biomedical research, especially for single-cell genomics and multiomics. He was a recipient of the 2022 Lawrence D. Brown Ph.D. Student Award from the Institute of Mathematical Statistics.
Address:United States