Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton; Li Deng; Dong Yu; George E. Dahl; Abdelrahman Mohamed; Navdeep Jaitly; Andrew Senior; Vincent Vanhoucke; Patrick Nguyen; Tara N. Sainath; Brian Kingsbury

doi:10.1109/msp.2012.2205597

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton(University of Toronto), Li Deng(University of Waterloo), Dong Yu(Microsoft (United States)), George E. Dahl(University of Toronto), Abdelrahman Mohamed(University of Toronto), Navdeep Jaitly(University of Toronto), Andrew Senior(Google (United States)), Vincent Vanhoucke(Google (United States)), Patrick Nguyen(Google (United States)), Tara N. Sainath(IBM Research - Thomas J. Watson Research Center), Brian Kingsbury(Michigan State University)

IEEE Signal Processing Magazine

October 19, 2012

10.1109/msp.2012.2205597

Cited by 10,281

Abstract

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

Related Papers

No related papers found

Powered by citation graph analysis