V

Vlado Kešelj

Dalhousie University

ORCID: 0000-0001-8760-3223

Publishes on Natural Language Processing Techniques, Topic Modeling, Authorship Attribution and Profiling. 112 papers and 1.8k citations.

112Publications
1.8kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

N-gram-based detection of new malicious code
Cited by 281

The current commercial anti-virus software detects a virus only after the virus has appeared and caused damage. Motivated by the standard signature-based technique for detecting viruses, and a recent successful text classification method, we explore the idea of automatically detecting new malicious code using the collected dataset of the benign and malicious code. We obtained accuracy of 100% in the training data, and 98% in 3-fold cross-validation.

Language independent authorship attribution using character level language models
Cited by 155Open Access

We present a method for computer-assisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we obtain a 18% accuracy improvement over the best published results for a Greek data set, while using a far simpler technique than previous investigations.

Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech
Cited by 108

Current methods of assessing dementia of Alzheimer type (DAT) in older adults involve structured interviews that attempt to capture the complex nature of deficits suffered. One of the most significant areas affected by the disease is the capacity for functional communication as linguistic skills break down. These methods often do note capture the true nature of language deficits in spontaneous speech. We address this issue by exploring novel automatic and objective methods for diagnosing patients through analysis of spontaneous speech. We detail several lexical approaches to the problem of detecting and rating DAT. The approaches explored rely on character n-gram-based techniques, shown recently to perform successfully in a different, but related task of automatic authorship attribution. We also explore the correlation of usage frequency of different parts of speech and DAT. We achieve a high 95% accuracy of detecting dementia when compared with a control group, and we achieve 70% accuracy in rating dementia in two classes, and 50% accuracy in rating dementia into four classes. Our results show that purely computational solutions offer a viable alternative to standard approaches to diagnosing the level of impairment in patients. These results are significant step forward toward automatic and objective means to identifying early symptoms of DAT in older adults.