Unsupervised Models for Named Entity Classification

Michael Collins(AT&T (United States)), Yoram Singer(AT&T (United States))
Unknown
January 1, 1999
Cited by 817

Abstract

This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabeled data can reduce the requirements for supervision to just 7 simple "seed" rules. The approach gains leverage from natural redundancy in the data: for many named-entity instances both the spelling of the name and the context in which it appears are sufficient to determine its type. We present two algorithms. The first method uses a similar algorithm to that of (Yarowsky 95), with modifications motivated by (Blum and Mitchell 98). The second algorithm extends ideas from boosting algorithms, designed for supervised learning tasks, to the framework suggested by (Blum and Mitchell 98). 1 Introduction Many statistical or machine-learning approaches for natural language problems require a rel...


Related Papers

No related papers found

Powered by citation graph analysis