Script-based classification of hand-written text documents in a multilingual environment

Vipul Singhal(Indian Institute of Technology Guwahati), Nandan Kumar Navin(Indian Institute of Technology Guwahati), Debashis Ghosh(Indian Institute of Technology Guwahati)
Unknown
March 22, 2004
Cited by 43

Abstract

Script-based text document classification is an important field of research in the context of multilingual textual document processing. But, all script identification techniques available in the literature so far do not consider handwritten documents. Variations in the writing style, character size, inter-line and inter-word spacings, etc. make the recognition process difficult and unreliable when these script identification algorithms, more specifically visual appearance based approaches, are applied directly on hand-written documents. Therefore, in this paper, we propose to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances. Accordingly, we apply denoising, thinning, pruning, m-connectivity and text size normalization in sequence. Multi-channel Gabor filtering is used to extract texture features that characterize the visual appearances of the document images. Experimental result proves the potentiality of our proposed method of script identification for hand-written text document classification.


Related Papers

No related papers found

Powered by citation graph analysis