M

Matthieu Hébert

Nuance Communications (Canada)

Publishes on Speech Recognition and Synthesis, Music and Audio Processing, Topic Modeling. 11 papers and 160 citations.

11Publications
160Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Phonetic class-based speaker verification
Cited by 22

Phonetic Class-based Speaker Verification (PCBV) is a natural refinement of the traditional single Gaussian Mixture Model (Single GMM) scheme. The aim is to accurately model the voice characteristics of a user on a per-phonetic class basis. The paper describes briefly the implementation of a representation of the voice characteristics in a hierarchy of phonetic classes. We present a framework to easily study the effect of the modeling on the PCBV. A thorough study of the effect of the modeling complexity, the amount of enrollment data and noise conditions is presented. It is shown that Phoneme-based Verification (PBV), a special case of PCBV, is the optimal modeling scheme and consistently outperforms the state-of-the-art Single GMM modeling even in noisy environments. PBV achieves 9% to 14% relative error rate reduction while cutting the speaker model size by 50% and CPU by 2/3.

T-Norm for Text-Dependent Commercial Speaker Verification Applications: Effect of Lexical Mismatch
Cited by 12

We describe a test-time score normalization technique (T-Norm) for text-dependent speaker verification that is robust to lexical mismatch. The main challenge to the deployment of T-Norm in a text-dependent task is the mismatch between the lexicon of the target speaker model in the application and that of the cohort speaker models. We show the negative effect of that mismatch in controlled experiments and propose a hybrid scoring scheme (T-Norm and background model) to remedy it. In a lexically mismatched scenario, which is inherent to the deployment of T-Norm in a text-dependent system, we show a 31% relative error rate reduction using the hybrid scoring over T-Norm alone. A 22% relative error rate reduction is measured over the baseline (no T-Norm) system.

Parameterization of the score threshold for a text-dependent adaptive speaker verification system
Cited by 12

We present a computationally efficient strategy for setting a priori thresholds in an adaptive speaker verification system. We have two motivations: to eliminate the externally preset overall system thresholds and replace them with automatically-set internal thresholds conditioned by a target FA rate and calculated at runtime; to counter the verification score shifts resulting from online adaptation. Our approach entails calculating the trajectory of the score threshold as a function of 1) length of the password, 2) target FA, 3) the number of training frames in the speaker model. The solution is successful at both achieving the target FA rates and keeping the FA rate constant during online adaptation. Furthermore, it is algorithmically simple and requires negligible computational resources. The threshold function is calibrated on a Japanese database and experimental results are presented on 12 databases in four different languages.