The ICSI Meeting Corpus

Adam Janin(International Computer Science Institute), Don Baron(International Computer Science Institute), Jane A. Edwards(University of California, Berkeley), Daniel P. W. Ellis(Columbia University), David Gelbart(International Computer Science Institute), N. Morgan(University of California, Berkeley), Barbara Peskin(International Computer Science Institute), Thilo Pfau(International Computer Science Institute), E. Shriberg(Menlo School), Andreas Stolcke(Menlo School), Chuck Wooters(International Computer Science Institute)
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
November 20, 2003
Cited by 672

Abstract

We have collected a corpus of data from natural meetings that occurred at the International Computer Science Institute (ICSI) in Berkeley, California over the last three years. The corpus contains audio recorded simultaneously from head-worn and table-top microphones, word-level transcripts of meetings, and various metadata on participants, meetings, and hardware. Such a corpus supports work in automatic speech recognition, noise robustness, dialog modeling, prosody, rich transcription, information retrieval, and more. We present details on the contents of the corpus, as well as rationales for the decisions that led to its configuration. The corpus were delivered to the Linguistic Data Consortium (LDC).


Related Papers

No related papers found

Powered by citation graph analysis