A Review of Feature Selection and Feature Extraction Methods Applied on Microarray DataZena M. Hira, Duncan Gillies|Advances in Bioinformatics|2015 We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.
An Outbreak of Mycobacterial Furunculosis Associated with Footbaths at a Nail SalonKevin Winthrop, Marcy Abrams, Mitchell A. Yakrus et al.|New England Journal of Medicine|2002 BACKGROUND: In September 2000, a physician in northern California described four patients with persistent, culture-negative boils on the lower extremities. The patients had received pedicures at the same nail salon. We identified and investigated an outbreak of Mycobacterium fortuitum furunculosis among customers of this nail salon. METHODS: Patients were defined as salon customers with persistent skin infections below the knee. A case-control study was conducted that included the first 48 patients identified, and 56 unaffected friends and family members who had had a pedicure at the same salon served as controls. Selected M. fortuitum isolates, cultured from patients and the salon environment, were compared by pulsed-field gel electrophoresis. RESULTS: We identified 110 customers of the nail salon who had furunculosis. Cultures from 34 were positive for rapidly growing mycobacteria (32 M. fortuitum and 2 unidentified). Most of the affected patients had more than 1 boil (median, 2; range, 1 to 37). All patients and controls had had whirlpool footbaths. Shaving the legs with a razor before pedicure was a risk factor for infection (70 percent of patients vs. 31 percent of controls; adjusted odds ratio, 4.8; 95 percent confidence interval, 2.1 to 11.1). Cultures from all 10 footbaths at the salon yielded M. fortuitum. The M. fortuitum isolates from three footbaths and 14 patients were indistinguishable by electrophoresis. CONCLUSIONS: We identified a large outbreak of rapidly growing mycobacterial infections among persons who had had footbaths and pedicures at one nail salon. Physicians should suspect this cause in patients with persistent furunculosis after exposure to whirlpool footbaths.
Overfitting in linear feature extraction for classification of high-dimensional image dataRaymond Liu, Duncan Gillies|Pattern Recognition|2015 A New Covariance Estimate for Bayesian Classifiers in Biometric RecognitionCarlos Eduardo Thomaz, Duncan Gillies, Raul Queiroz Feitosa|IEEE Transactions on Circuits and Systems for Video Technology|2004 In many biometric pattern-recognition problems, the number of training examples per class is limited, and consequently the sample group covariance matrices often used in parametric and nonparametric Bayesian classifiers are poorly estimated or singular. Thus, a considerable amount of effort has been devoted to the design of other covariance estimators, for use in limited-sample and high-dimensional classification problems. In this paper, a new covariance estimate, called the maximum entropy covariance selection (MECS) method, is proposed. It is based on combining covariance matrices under the principle of maximum uncertainty. In order to evaluate the MECS effectiveness in biometric problems, experiments on face, facial expression, and fingerprint classification were carried out and compared with popular covariance estimates, including the regularized discriminant analysis and leave-one-out covariance for the parametric classifier, and the Van Ness and Toeplitz covariance estimates for the nonparametric classifier. The results show that, in image recognition applications whenever the sample group covariance matrices are poorly estimated or ill posed, the MECS method is faster and usually more accurate than the aforementioned approaches in both parametric and nonparametric Bayesian classifiers.
A maximum uncertainty LDA-based approach for limited sample size problems — with application to face recognitionCarlos Eduardo Thomaz, Edson C. Kitani, Duncan Gillies|Journal of the Brazilian Computer Society|2006 Abstract A critical issue of applying Linear Discriminant Analysis (LDA) is both the singularity and instability of the within-class scatter matrix. In practice, particularly in image recognition applications such as face recognition, there are often a large number of pixels or pre-processed features available, but the total number of training patterns is limited and commonly less than the dimension of the feature space. In this study, a new LDA-based method is proposed. It is based on a straightforward stabilisation approach for the within-class scatter matrix. In order to evaluate its effectiveness, experiments on face recognition using the well-known ORL and FERET face databases were carried out and compared with other LDA-based methods. The classification results indicate that our method improves the LDA classification performance when the within-class scatter matrix is not only singular but also poorly estimated, with or without a Principal Component Analysis intermediate step and using less linear discriminant features. Since statistical discrimination methods are suitable not only for classification but also for characterisation of differences between groups of patterns, further experiments were carried out in order to extend the new LDA-based method to visually analyse the most discriminating hyper-plane separating two populations. The additional results based on frontal face images indicate that the new LDA-based mapping provides an intuitive interpretation of the two-group classification tasks performed, highlighting the group differences captured by the multivariate statistical approach proposed.