Robust Statistics: Theory and MethodsClassical statistical techniques fail to cope well with deviations from a standard distribution. Robust statistical methods take into account these deviations while estimating the parameters of parametric models, thus increasing the accuracy of the inference. Research into robust methods is flourishing, with new methods being developed and different applications considered.
Robust Statistics sets out to explain the use of robust methods and their theoretical justification. It provides an up-to-date overview of the theory and practical application of the robust statistical methods in regression, multivariate analysis, generalized linear models and time series. This unique book: Enables the reader to select and use the most appropriate robust method for their particular statistical model. Features computational algorithms for the core methods. Covers regression methods for data mining applications. Includes examples with real data and applications using the S-Plus robust statistics library. Describes the theoretical and operational aspects of robust methods separately, so the reader can choose to focus on one or the other. Supported by a supplementary website featuring time-limited S-Plus download, along with datasets and S-Plus code to allow the reader to reproduce the examples given in the book. Robust Statistics aims to stimulate the use of robust methods as a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. It is ideal for researchers, practitioners and graduate students of statistics, electrical, chemical and biochemical engineering, and computer vision. There is also much to benefit researchers from other sciences, such as biotechnology, who need to use robust statistical methods in their work.
High Breakdown-Point and High Efficiency Robust Estimates for RegressionVı́ctor J. Yohai|The Annals of Statistics|1987 A class of robust estimates for the linear model is introduced. These estimates, called MM-estimates, have simultaneously the following properties: (i) they are highly efficient when the errors have a normal distribution and (ii) their breakdown-point is 0.5. The MM-estimates are defined by a three-stage procedure. In the first stage an initial regression estimate is computed which is consistent robust and with high breakdown-point but not necessarily efficient. In the second stage an M-estimate of the errors scale is computed using residuals based on the initial estimate. Finally, in the third stage an M-estimate of the regression parameters based on a proper redescending psi-function is computed. Consistency and asymptotical normality of the MM-estimates assuming random carriers are proved. A convergent iterative numerical algorithm is given. Finally, the asymptotic biases under contamination of optimal bounded influence estimates and MM-estimates are compared.
Robust StatisticsRicardo A. Maronna, R. Douglas Martin, Vı́ctor J. Yohai|Wiley series in probability and statistics|2006 High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient ScaleVı́ctor J. Yohai, Ruben H. Zamar|Journal of the American Statistical Association|1988 Abstract A new class of robust estimates, τ estimates, is introduced. The estimates have simultaneously the following properties: (a) they are qualitatively robust, (b) their breakdown point is .5, and (c) they are highly efficient for regression models with normal errors. They are defined by minimizing a new scale estimate, τ, applied to the residuals. Asymptotically, a τ estimate is equivalent to an M estimate with a ψ function given by a weighted average of two ψ functions, one corresponding to a very robust estimate and the other to a highly efficient estimate. The weights are adaptive and depend on the underlying error distribution. We prove consistency and asymptotic normality and give a convergent iterative computing algorithm. Finally, we compare the biases produced by gross error contamination in the τ estimates and optimal bounded-influence estimates.
Influence Functionals for Time SeriesA definition is given for influence functionals of parameter estimates in time-series models. The definition involves the use of a contaminated observations process of the form $y^\gamma_t=(1-z^\gamma_t){x_t+z^\gamma_tw_t}$, $p=1,2,...,0\leq\gamma\leq1$ where $x_t$ is a core process (usually Gaussian), $w_t$ is a contaminating process, and $z^\gamma_t"$ is a zero-one process with $P(z^\gamma_t=1)={\gamma+0(\gamma)}$. This form is sufficiently general to model such diverse contamination types as isolated outliers and patches of outliers. Let $T(\mu^\gamma_y)$ denote the functional representation of a given estimate, where the measures $\mu^\gamma_y, 0\leq\gamma\leq1$ for $y^\gamma_t$ are in an appropriate subset of the family of stationary and ergodic measures on $(R^\infty,\beta^\infty)$. The influence functional IF is a derivative of T along "arcs" traced by $\mu^\gamma_y$ as $\gamma\rigtharrow0$, and correspondingly $\mu^\gamma_y\rigtharrow\gamma_x$. Although this influence functional is similar in spirit to Hampel's influence curve ICH for the i. i.d. setting, it is not the same as ICH. However, a simple relationship between the IF and the ICH is established. Results are given which aid in the computation of IF and insure that IF is bounded. We compute the IF for some robust estimates of the first-order autoregressive and first-order moving average parameters using various contamination processes. A definition of gross-error sensitivity (GES) for the IF is given, and some estimates are compared in terms of their GES's. Also the IF is used to show that a class of generalized RA estimates has a certain optimality property. Finally, some possible generalizations of the IF are indicated.