Léon Bottou

Gradient-based learning applied to document recognition

Yann LeCun, Léon Bottou, Yoshua Bengio et al.|Proceedings of the IEEE|1998

Cited by 57.9kOpen Access

Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

Large-Scale Machine Learning with Stochastic Gradient Descent

Léon Bottou|Unknown|2010

Cited by 5.6k

During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

Natural Language Processing (almost) from Scratch

Ronan Collobert, Jason Weston, Léon Bottou et al.|arXiv (Cornell University)|2011

Cited by 5.2kOpen Access

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.

Wasserstein Generative Adversarial Networks

Martín Arjovsky, Soumith Chintala, Léon Bottou|International Conference on Machine Learning|2017

Cited by 5k

Proceedings of the 21st International Conference on Neural Information Processing Systems

Daphne Koller, Dale Schuurmans, Yoshua Bengio et al.|Unknown|2008

Cited by 4.5k

Is this you? Claim your profile.

Top publicationsby citations