Nonlinear Embedding Methods in Modern Data Visualization: Theory & Practice

#STEM #Lehigh #CS #Visualization #WIE
Share

Learning and representing low-dimensional structures from noisy and possibly high-dimensional data is an indispensable component of modern data science. Recently, a special class of nonlinear embedding methods has become particularly influential, most notably, the t-distributed stochastic neighbor embedding (t-SNE) and the uniform manifold approximation and projection (UMAP). Despite their empirical success in many research fields, these algorithms are oftentimes subject to criticisms such as lack of theoretical understanding, unclear interpretations, sensitivity to tuning parameters, etc.

This talk will present a novel theoretical framework for understanding and explaining the exceptional performance of t-SNE and other related algorithms for visualizing high-dimensional clustered data. The results uncover the intrinsic mechanism, the large-sample limits, and several fundamental principles behind the algorithms; they also have practical implications to improve the current nonlinear embedding methods in real-world applications, such as enabling efficient selection of tuning parameters, improving normativity of analytic praxis, and avoiding common interpretive pitfalls. Recognizing current limitations, it will also introduce some new approaches and ideas that may lead to more accountable and reliable dimension reduction and data visualization.

Join our talk to dive into data and explore the cutting-edge techniques of dimension reduction and data visualization, along with their practical applications.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 15 Apr 2024
  • Time: 07:00 PM to 08:00 PM
  • All times are (UTC-04:00) Eastern Time (US & Canada)
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Hosts
  • Starts 21 March 2024 08:00 PM
  • Ends 15 April 2024 08:00 PM
  • All times are (UTC-04:00) Eastern Time (US & Canada)
  • No Admission Charge


  Speakers

Rong Ma of Harvard University

Topic:

Nonlinear Embedding Methods in Modern Data Visualization: Theory & Practice

Learning and representing low-dimensional structures from noisy and possibly high-dimensional data is an indispensable component of modern data science. Recently, a special class of nonlinear embedding methods has become particularly influential, most notably, the t-distributed stochastic neighbor embedding (t-SNE) and the uniform manifold approximation and projection (UMAP). Despite their empirical success in many research fields, these algorithms are oftentimes subject to criticisms such as lack of theoretical understanding, unclear interpretations, sensitivity to tuning parameters, etc. In this talk, Prof. Ma will present a novel theoretical framework for understanding and explaining the exceptional performance of t-SNE and other related algorithms for visualizing high-dimensional clustered data. The results uncover the intrinsic mechanism, the large-sample limits, and several fundamental principles behind the algorithms; they also have practical implications to improve the current nonlinear embedding methods in real-world applications, such as enabling efficient selection of tuning parameters, improving normativity of analytic praxis, and avoiding common interpretive pitfalls. Recognizing current limitations, he will also introduce some new approaches and ideas that may lead to more accountable and reliable dimension reduction and data visualization.

Biography:

Rong Ma is an Assistant Professor of Biostatistics at Harvard T.H. Chan School of Public Health at Harvard University. He received his Ph.D. in biostatistics from the University of Pennsylvania, and was a postdoctoral scholar in statistics at Stanford University. His current research focuses on statistical inference for large random matrices, nonlinear embedding theory, and manifold learning for biomedical research, especially for single-cell genomics and multiomics. He was a recipient of the 2022 Lawrence D. Brown Ph.D. Student Award from the Institute of Mathematical Statistics.

Address:United States