When your big data seems too small: accurate inferences beyond the empirical distribution

#biodataa #machine #learning #deep #data #science
Share

When your big data seems too small: accurate inferences beyond the empirical distribution

Gregory Valiant, Stanford University

 

Event Co-Sponsors:

 

Date & Time: Thursday, August 23, 2018, 6:00 PM – 8:00 PM PDT

Location: Intel SC12, 3600 Juliette Ln, Santa Clara, CA 95054

Registration Fee:IEEE CIS members: free
         Students - $3 (Register at Door $3)
         IEEE (non-CIS) members - $7 donation (Register at Door $10)
         Non-members - $10 (Register at Door $15)

 

Abstract: We discuss several problems related to the general challenge of making accurate inferences about a complex phenomenon, in the regime in which the amount of available data (i.e the sample size) is too small for the empirical distribution of the samples to be an accurate representation of the phenomenon in question. We show that for several fundamental and practically relevant settings, including estimating the covariance structure of a high-dimensional distribution, and learning a population of distributions given few data points from each individual, it is possible to ``denoise'' the empirical distribution significantly. We will also discuss the problem of estimating the ``learnability'' of a dataset: given too little labeled data to train an accurate model, we show that it is often possible to estimate the extent to which a good model exists. Framed differently, even in the regime in which there is insufficient data to learn, it is possible to estimate the performance that could be achieved if additional data (drawn from the same data source) were obtained. Our results, while theoretical, have a number of practical applications, and we also discuss some of these applications.

 

Biography: Gregory Valiant is an assistant professor of Computer Science at Stanford University. His current research interests span algorithms, statistics, and machine learning, with an emphasis on developing algorithms and information theoretic lower bounds for a variety of fundamental data-centric tasks. Recently, this work has also included questions of how to robustly extract meaningful information from untrusted datasets that might contain a significant fraction of corrupted or arbitrarily biased data points. Prior to joining Stanford, Gregory completed his PhD at UC Berkeley in 2012, and was a postdoctoral researcher at Microsoft Research, New England. He has received several honors, including the ACM Dissertation Award Honorable Mention, NSF Career Award, and Sloan Foundation Fellowship.

 

Attendence

IEEE CIS Members : 23

IEEE Members (non CIS)  : 10

Non-IEEE Members : 17

Students : 4



  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar
  • Intel SC12, 3600 Juliette Ln,
  • Santa Clara, California
  • United States 95054

  • Contact Event Host
  • Starts 01 August 2018 11:02 AM UTC
  • Ends 23 August 2018 11:02 AM UTC
  • No Admission Charge


  Speakers

Gregory Valiant

Topic:

When your big data seems too small: accurate inferences beyond the empirical distribution

Gregory Valiant, Stanford University

Biography: Gregory Valiant is an assistant professor of Computer Science at Stanford University. His current research interests span algorithms, statistics, and machine learning, with an emphasis on developing algorithms and information theoretic lower bounds for a variety of fundamental data-centric tasks. Recently, this work has also included questions of how to robustly extract meaningful information from untrusted datasets that might contain a significant fraction of corrupted or arbitrarily biased data points. Prior to joining Stanford, Gregory completed his PhD at UC Berkeley in 2012, and was a postdoctoral researcher at Microsoft Research, New England. He has received several honors, including the ACM Dissertation Award Honorable Mention, NSF Career Award, and Sloan Foundation Fellowship.