Speech Recognition: What's Left?

#Speech #recognition #noisy #speech #reverberant #human #perception
Share

Presented by the University of Pennsylvania and the IEEE Philadelphia Chapter of the Signal Processing Society


Recent speech recognition advances on the SWITCHBOARD corpus suggest that because of recent advances in Deep Learning, we now achieve Word Error Rates comparable to human listeners. Does this mean the speech recognition problem is solved and the community can move on to a different set of problems?



  Date and Time

  Location

  Hosts

  Registration



  • Date: 30 Apr 2018
  • Time: 01:30 PM to 03:00 PM
  • All times are (GMT-05:00) US/Eastern
  • Add_To_Calendar_icon Add Event to Calendar
  • University of Pennsylvania
  • 3330 Walnut St.
  • Philadelphia, Pennsylvania
  • United States 19104
  • Building: Levine Hall
  • Room Number: 307

  • Contact Event Host
  • Starts 05 April 2018 12:00 PM
  • Ends 30 April 2018 12:00 PM
  • All times are (GMT-05:00) US/Eastern
  • No Admission Charge


  Speakers

Dr. Michael Picheny Dr. Michael Picheny

Topic:

Speech Recognition: What's Left?

Recent speech recognition advances on the SWITCHBOARD corpus suggest that because of recent advances in Deep Learning, we now achieve Word Error Rates comparable to human listeners. Does this mean the speech recognition problem is solved and the community can move on to a different set of problems? In this talk, we examine speech recognition issues that still plague the community and compare and contrast them to what is known about human perception. We specifically highlight issues in accented speech, noisy/reverberant speech, speaking style, rapid adaptation to new domains, and multilingual speech recognition. We try to demonstrate that compared to human perception, there is still much room for improvement, so significant work in speech recognition research is still required from the community.

Biography:

Michael Picheny (Fellow, IEEE) is the Senior Manager of the Speech Technologies Group in IBM Research AI based at the IBM TJ Watson Research Center. Michael has worked in the Speech Recognition area since 1981, joining IBM after finishing his doctorate at MIT. He has been heavily involved in the development of almost all of IBM's recognition systems, ranging from the world's first real-time large vocabulary discrete system through IBM's current product lines for telephony and embedded systems. He has published numerous papers in both journals and conferences on almost all aspects of speech recognition. He has received several awards from IBM for his work, including three outstanding Technical Achievement Awards and two Research Division Awards, and most recently, a Corporate Award. He is the co-holder of over 40 patents and was named a Master Inventor by IBM in 1995 and again in 2000. Michael served as an Associate Editor of the IEEE Transactions on Acoustics, Speech, and Signal Processing from 1986-1989, was the chairman of the Speech Technical Committee of the IEEE Signal Processing Society from 2002-2004, and is a Fellow of the IEEE. He served as an Adjunct Professor in the Electrical Engineering Department of Columbia University in the spring of 2016 and co-taught a course in speech recognition. He is a Fellow of ISCA (International Speech Communication Association) and was a member of the ISCA board from 2005-2013. He was a co-organizer of the 2011 IEEE Automatic Speech





Agenda

Lecture Starts: 1:00 PM