Orthogonal nonnegative matrix t-factorizations for clusteringChris Ding, Tao Li, Wei Peng et al.|Unknown|2006 Currently, most research on nonnegative matrix factorization (NMF)focus on 2-factor $X=FG^T$ factorization. We provide a systematicanalysis of 3-factor $X=FSG^T$ NMF. While it unconstrained 3-factor NMF is equivalent to it unconstrained 2-factor NMF, itconstrained 3-factor NMF brings new features to it constrained 2-factor NMF. We study the orthogonality constraint because it leadsto rigorous clustering interpretation. We provide new rules for updating $F,S, G$ and prove the convergenceof these algorithms. Experiments on 5 datasets and a real world casestudy are performed to show the capability of bi-orthogonal 3-factorNMF on simultaneously clustering rows and columns of the input datamatrix. We provide a new approach of evaluating the quality ofclustering on words using class aggregate distribution andmulti-peak distribution. We also provide an overview of various NMF extensions andexamine their relationships.
Aspect Based Sentiment Analysis with Gated Convolutional NetworksWei Xue, Tao Li|Unknown|2018 Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models. 1
A Survey on Malware Detection Using Data Mining TechniquesYanfang Ye, Tao Li, Donald Adjeroh et al.|ACM Computing Surveys|2017 In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious and evolving security threats to Internet users. To protect legitimate users from these threats, anti-malware software products from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the major defense against malware. Unfortunately, driven by the economic benefits, the number of new malware samples has explosively increased: anti-malware vendors are now confronted with millions of potential malware samples per year. In order to keep on combating the increase in malware samples, there is an urgent need to develop intelligent methods for effective and efficient malware detection from the real and large daily sample collection. In this article, we first provide a brief overview on malware as well as the anti-malware industry, and present the industrial needs on malware detection. We then survey intelligent malware detection methods. In these methods, the process of detection is usually divided into two stages: feature extraction and classification/clustering . The performance of such intelligent malware detection approaches critically depend on the extracted features and the methods for classification/clustering. We provide a comprehensive investigation on both the feature extraction and the classification/clustering techniques. We also discuss the additional issues and the challenges of malware detection using data mining techniques and finally forecast the trends of malware development.