Speech Recognition: What’s Left?

3 December 2019

Venue: Polo Ferrari 2, via Sommarive 9 (Povo) – Room B103
Time: 16.00

Speaker

Prof. Michael Picheny, New York University - IEEE Industry Distinguished Speaker

Abstract

Recent speech recognition advances on the SWITCHBOARD corpus suggest that because of recent advances in Deep Learning, we now achieve Word Error Rates comparable to human listeners. Does this mean the speech recognition problem is solved and the community can move on to a different set of problems? In this talk, we examine speech recognition issues that still plague the community and compare and contrast them to what is known about human perception. We specifically highlight issues in accented speech, noisy/reverberant speech, speaking style, rapid adaptation to new domains, and multilingual speech recognition. We try to demonstrate that compared to human perception, there is still much room for improvement, so significant work in speech recognition research is still required from the community.

About the Speaker

Michael Picheny has worked in the Speech Recognition area since 1981, joining IBM after finishing his doctorate at MIT. He was heavily involved in the development of almost all of IBM's recognition systems, ranging from the world's first real-time large vocabulary discrete system in 1984 through IBM's product lines for telephony and embedded systems in the 1990s, and most recently was responsible for putting out a set of Speech Services for both Speech Recognition and Speech Synthesis during his tenure in IBM's Watson Group. He has published numerous papers in both journals and conferences on almost all aspects of speech recognition (see web page for details). He is the co-holder of over 50 patents and was named a Master Inventor by IBM in 1995 and again in 2000. In addition to professional volunteer service (as indicated below), he served multiple times as an Adjunct Professor in the Electrical Engineering Department of Columbia University and co-taught a course in speech recognition. Michael is a Fellow of both the IEEE and of ISCA.

Michael Picheny was a manager for 35 years in the Speech area at IBM, and led the Speech team in Yorktown Heights since 2007. He just retired from IBM and joined NYU-Courant Computer Science and the Center for Data Science as a part-time Research Professor. At NYU, he hopes to continue speech recognition research and focus on problems dealing with challenging types of speech problems such as accented and disfluent speech, and rapid domain adaptation, as well as looking into cross-modality synergies involving text and vision.

Contact: giuseppe.riccardi [at] unitn.it (Giuseppe Riccardi)

Sponsored by