An adaptive k-nearest neighbor algorithmShiliang Sun, Rongqing Huang|2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery|2010 An adaptive k-nearest neighbor algorithm (AdaNN) is brought forward in this paper to overcome the limitation of the traditional k-nearest neighbor algorithm (kNN) which usually identifies the same number of nearest neighbors for each test example. It is known that the value of k has crucial influence on the performance of the kNN algorithm, and our improved kNN algorithm focuses on finding out the suitable k for each test example. The proposed algorithm finds out the optimal k, the number of the fewest nearest neighbors that every training example can use to get its correct class label. For classifying each test example using the kNN algorithm, we set k to be the same as the optimal k of its nearest neighbor in the training set. The performance of the proposed algorithm is tested on several data sets. Experimental results indicate that our algorithm performs better than the traditional kNN algorithm.
Network-Scale Traffic Modeling and Forecasting with Graphical Lasso and Neural NetworksShiliang Sun, Rongqing Huang, Ya Gao|Journal of Transportation Engineering|2012 Traffic flow forecasting, especially the short-term case, is an important topic in intelligent transportation systems (ITS). This paper researches network-scale modeling and forecasting of short-term traffic flows. First, the concepts of single-link and multilink models of traffic flow forecasting are proposed. Secondly, four prediction models are constructed by combining the two models with single-task learning (STL) and multitask learning (MTL). The combination of the multilink model and multitask learning not only improves the experimental efficiency but also improves the prediction accuracy. Moreover, a new multilink, single-task approach that combines graphical lasso (GL) with neural network (NN) is proposed. GL provides a general methodology for solving problems involving lots of variables. Using L1 regularization, GL builds a sparse graphical model, making use of the sparse inverse covariance matrix. Gaussian process regression (GPR) is a classic regression algorithm in Bayesian machine learning. Although there is wide research on GPR, there are few applications of GPR in traffic flow forecasting. In this paper, GPR is applied to traffic flow forecasting, and its potential is shown. Through sufficient experiments, all of the proposed approaches are compared, and an overall assessment is made.
SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken WordJohn H. L. Hansen, Rongqing Huang, Bowen Zhou et al.|IEEE Transactions on Speech and Audio Processing|2005 Advances in formulating spoken document retrieval for a new National Gallery of the Spoken Word (NGSW) are addressed. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings from the 20th century. After presenting an overview of the audio stream content of the NGSW, with sample audio files from U.S. Presidents from 1893 to the present, an overall system diagram is proposed with a discussion of critical tasks associated with effective audio information retrieval. These include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. For segmentation, a new evaluation criterion entitled fused error score (FES) is proposed, followed by application of the CompSeg segmentation scheme on DARPA Hub4 Broadcast News (30.5% relative improvement in FES) and NGSW data. Transcript generation is demonstrated for a six-decade portion of the NGSW corpus. Novel model adaptation using structure maximum likelihood eigenspace mapping shows a relative 21.7% improvement. Issues regarding copyright assessment and metadata construction are also addressed for the purposes of a sustainable audio collection of this magnitude. Advanced parameter-embedded watermarking is proposed with evaluations showing robustness to correlated noise attacks. Our experimental online system entitled "SpeechFind" is presented, which allows for audio retrieval from a portion of the NGSW corpus. Finally, a number of research challenges such as language modeling and lexicon for changing time periods, speaker trait and identification tracking, as well as new directions, are discussed in order to address the overall task of robust phrase searching in unrestricted audio corpora.
Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corporaRongqing Huang, John H. L. Hansen|IEEE Transactions on Audio Speech and Language Processing|2006 The problem of unsupervised audio classification and segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in 1) audio classification for speech recognition and 2) audio segmentation for unsupervised multispeaker change detection. A new algorithm is proposed for audio classification, which is based on weighted GMM Networks (WGN). Two new extended-time features: variance of the spectrum flux (VSF) and variance of the zero-crossing rate (VZCR) are used to preclassify the audio and supply weights to the output probabilities of the GMM networks. The classification is then implemented using weighted GMM networks. Since historically there have been no features specifically designed for audio segmentation, we evaluate 16 potential features including three new proposed features: perceptual minimum variance distortionless response (PMVDR), smoothed zero-crossing rate (SZCR), and filterbank log energy coefficients (FBLC) in 14 noisy environments to determine the best robust features on the average across these conditions. Next, a new distance metric, T/sup 2/-mean, is proposed which is intended to improve segmentation for short segment turns (i.e., 1-5 s). A new false alarm compensation procedure is implemented, which can compensate the false alarm rate significantly with little cost to the miss rate. Evaluations on a standard data set-Defense Advanced Research Projects Agency (DARPA) Hub4 Broadcast News 1997 evaluation data-show that the WGN classification algorithm achieves over a 50% improvement versus the GMM network baseline algorithm, and the proposed compound segmentation algorithm achieves 23%-10% improvement in all metrics versus the baseline Mel-frequency cepstral coefficients (MFCC) and traditional Bayesian information criterion (BIC) algorithm. The new classification and segmentation algorithms also obtain very satisfactory results on the more diverse and challenging National Gallery of the Spoken Word (NGSW) corpus.
Dialect/Accent Classification Using Unrestricted AudioRongqing Huang, John H. L. Hansen, Pongtep Angkititrakul|IEEE Transactions on Audio Speech and Language Processing|2007 This study addresses novel advances in English dialect/accent classification. A word-based modeling technique is proposed that is shown to outperform a large vocabulary continuous speech recognition (LVCSR)-based system with significantly less computational costs. The new algorithm, which is named Word-based Dialect Classification (WDC), converts the text-independent decision problem into a text-dependent decision problem and produces multiple combination decisions at the word level rather than making a single decision at the utterance level. The basic WDC algorithm also provides options for further modeling and decision strategy improvement. Two sets of classifiers are employed for WDC: a word classifier D <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">W(k)</sub> and an utterance classifier D <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</sub> . D <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">W(k)</sub> is boosted via the AdaBoost algorithm directly in the probability space instead of the traditional feature space. D <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</sub> is boosted via the dialect dependency information of the words. For a small training corpus, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a context adapted training (CAT) algorithm is formulated, which adapts the universal phoneme Gaussian mixture models (GMMs) to dialect-dependent word hidden Markov models (HMMs) via linear regression. Three separate dialect corpora are used in the evaluations that include the Wall Street Journal (American and British English), NATO N4 (British, Canadian, Dutch, and German accent English), and IViE (eight British dialects). Significant improvement in dialect classification is achieved for all corpora tested