Large-Scale Machine Learning with Stochastic Gradient Descent

Léon Bottou

doi:10.1007/978-3-7908-2604-3_16

Large-Scale Machine Learning with Stochastic Gradient Descent

Léon Bottou(Princeton University)

Unknown

January 1, 2010

10.1007/978-3-7908-2604-3_16

Cited by 5,615

Abstract

During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

Robert Tibshirani|Journal of the Royal Statistical Society Series B (Statistical Methodology)|1996|51.5k

Support-Vector Networks

Corinna Cortes, Vladimir Vapnik|Machine Learning|1995|32.7k

Some methods for classification and analysis of multivariate observations

James B. MacQueen|Defense Technical Information Center (DTIC)|1967|22.8k

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, Andrew McCallum, Fernando C. N. Pereira|Scholarly Commons (University of Pennsylvania)|2001|13k

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

J. E. Dennis, Robert B. Schnabel|Society for Industrial and Applied Mathematics eBooks|1996|7.6k

Large-Scale Machine Learning with Stochastic Gradient Descent

Abstract

Related Papers