The OGI multi-language telephone speech corpus

Unknown
October 13, 1992
Cited by 253

Abstract

The OGI Multi-language Telephone Speech Corpus is designed to support research on automatic language identification and multi-language speech recognition. The corpus consists of up to nine separate responses from each caller, ranging from single words to short topic-specific descriptions to 60 seconds of unconstrained spontaneous speech. The utterances were spoken over commercial telephone lines by speakers of English, Farsi (Persian), French, German, Japanese, Korean, Mandarin Chinese, Spanish, Tamil, and Vietnamese. We have completed the initial phase of our data acquisition effort: the recording and initial verification of utterances produced by 100 different speakers in each of the 10 languages. We describe the recording protocol, data collection procedure, ongoing corpus development, preliminary results of the statistical evaluation of the 10 languages, and plans to provide orthographic transcriptions of the speech. INTRODUCTION Research in multi-language recognition systems wou...


Related Papers

No related papers found

Powered by citation graph analysis