I

Iain M. Johnstone

Stanford University

ORCID: 0000-0002-2865-3076

Publishes on Statistical Methods and Inference, Random Matrices and Applications, Image and Signal Denoising Methods. 191 papers and 44.3k citations.

191Publications
44.3kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Least angle regression
Bradley Efron, Trevor Hastie, Iain M. Johnstone et al.|The Annals of Statistics|2004
Cited by 9.5kOpen Access

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

Ideal spatial adaptation by wavelet shrinkage
Cited by 7.8k

With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle o ers dramatic advantages over traditional linear estimation by nonadaptive kernels � however, it is a priori unclear whether such performance can be obtained by a procedure relying on the data alone. We describe a new principle for spatially-adaptive estimation: selective wavelet reconstruction. Weshowthatvariableknot spline ts and piecewise-polynomial ts, when equipped with an oracle to select the knots, are not dramatically more powerful than selective wavelet reconstruction with an oracle. We develop a practical spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coe cients. RiskShrink mimics the performance of an oracle for selective wavelet reconstruction as well as it is possible to do so. A new inequality inmultivariate normal decision theory which wecallthe oracle inequality shows that attained performance di ers from ideal performance by at most a factor 2logn, where n is the sample size. Moreover no estimator can give a better guarantee than this. Within the class of spatially adaptive procedures, RiskShrink is essentially optimal. Relying only on the data, it comes within a factor log 2 n of the performance of piecewise polynomial and variable-knot spline methods equipped with an oracle. In contrast, it is unknown how or if piecewise polynomial methods could be made to function this well when denied access to an oracle and forced to rely on data alone.

Adapting to Unknown Smoothness via Wavelet Shrinkage
David L. Donoho, Iain M. Johnstone|Journal of the American Statistical Association|1995
Cited by 4.3k

Abstract We attempt to recover a function of unknown smoothness from noisy sampled data. We introduce a procedure, SureShrink, that suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: A threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein unbiased estimate of risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N · log(N) as a function of the sample size N. SureShrink is smoothness adaptive: If the unknown function contains jumps, then the reconstruction (essentially) does also; if the unknown function has a smooth piece, then the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothness adaptive: It is near minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods—kernels, splines, and orthogonal series estimates—even with optimal choices of the smoothing parameter, would be unable to perform in a near-minimax way over many spaces in the Besov scale. Examples of SureShrink are given. The advantages of the method are particularly evident when the underlying function has jump discontinuities on a smooth background.

Ideal Spatial Adaptation by Wavelet Shrinkage
Cited by 2.5k

Journal Article Ideal spatial adaptation by wavelet shrinkage Get access David L Donoho, David L Donoho Department of Statistics, Stanford University, Stanford, California, U.S.A Search for other works by this author on: Oxford Academic Google Scholar Iain M Johnstone Iain M Johnstone Department of Statistics, Stanford University, Stanford, California, U.S.A Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 81, Issue 3, September 1994, Pages 425–455, https://doi.org/10.1093/biomet/81.3.425 Published: 01 September 1994 Article history Received: 01 August 1992 Revision received: 01 June 1993 Published: 01 September 1994

On the distribution of the largest eigenvalue in principal components analysis
Iain M. Johnstone|The Annals of Statistics|2001
Cited by 2kOpen Access

Let x(1) denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x(1) is the largest principal component variance of the covariance matrix $X'X$, or the largest eigenvalue of a p­variate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with $n/p = \gamma \ge 1$. When centered by $\mu_p = (\sqrt{n-1} + \sqrt{p})^2$ and scaled by $\sigma_p = (\sqrt{n-1} + \sqrt{p})(1/\sqrt{n-1} + 1/\sqrt{p}^{1/3}$, the distribution of x(1) approaches the Tracey-Widom law of order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.