B

Benjamin J. Shannon

Griffith University

Publishes on Speech and Audio Processing, Speech Recognition and Synthesis, Advanced Adaptive Filtering Techniques. 11 papers and 548 citations.

11Publications
548Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Role of phase estimation in speech enhancement
Cited by 29

Typical speech enhancement algorithms that operate in the Fourier domain only modify the magnitude component. It is commonly understood that the phase component is perceptually unimportant, and thus, it is passed directly to the output.In recent intelligibility experiments, it has been reported that the Short-Time Fourier Transform (STFT) phase spectrum can provide significant intelligibility when estimated using a window function lower in dynamic range than the typical Hamming window. Motivated by this, we investigate the role of the window function for STFT phase estimation in relation to speech enhancement.Using a modified STFT Analysis-Modification-Synthesis (AMS) framework, we show that noise reduction can be achieved by modifying the window function used to estimate the STFT phase spectra. We demonstrate this through spectrogram plots and results from two objective speech quality measures.

Speech-Signal-Based Frequency Warping
Kuldip K. Paliwal, Benjamin J. Shannon, James Lyons et al.|IEEE Signal Processing Letters|2009
Cited by 23Open Access

The speech signal is used for transmission of linguistic information. High energy portions of the speech spectrum have higher signal-to-noise ratios than the low energy portions. As a result, these regions are more robust to noise. Since the speech signal is known to be very robust to noise, it is expected that the high energy regions of the speech spectrum carry the majority of the linguistic information. This letter tries to derive a frequency warping function directly from the speech signal by sampling the frequency axis nonuniformly with the high energy regions sampled more densely than the low energy regions. To achieve this, an ensemble average short-time power spectrum is computed from a large speech corpus. The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum. The proposed frequency warping is shown to be similar to the frequency scales obtained through psycho-acoustic experiments, namely the mel and bark scales. The warping is then used in filterbank design for automatic speech recognition experiments. The results of these experiments show that cepstral features based on the proposed warping achieve performance under clean conditions comparable to that of mel-frequency cepstral coefficients, while outperforming them under noisy conditions.

MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition
Cited by 21

Processing of the speech signal in the autocorrelation domain in the context of robust feature extraction is based on the following two properties: 1) pole preserving property (the poles of a given (original) signal are preserved in its autocorrelation function), and 2) noise separation property (the autocorrelation function of a noise signal is confined to lower lags, while the speech signal contribution is spread over all the lags in the autocorrelation function, thus providing a way to eliminate noise by discarding lower-lag autocorrelation coefficients). In this paper, we use these properties to derive robust features for automatic speech recognition. We compute the magnitude spectrum of the one-sided higher-lag autocorrelation sequence, process it through a Mel filter bank and parameterise it in terms of Mel Frequency Cepstral Coefficients (MFCCs). Since the proposed method combines autocorrelation domain processing with Mel filter bank analysis, we call the resulting MFCCs, Autocorrelation Mel Frequency Cepstral Coefficients (AMFCCs). Recognition experiments are conducted on the Aurora II database and it is found that the AMFCC representation performs as well as the MFCC representation in clean conditions and provides more robust performance in the presence of background noise.