1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Attribute Overlap Minimization and Outlier Elimination as Dimensionality Reduction Techniques for Text Classification Algorithms

Simon Fong1 and Antonio Cerone2
1. Department of Computer and Information Science, University of Macau, Macau SAR
2. International Institute for Software Technology, United Nations University, Macau SAR
AbstractText classification is the task of assigning free text documents to some predefined groups. Many algorithms have been proposed; in particular, dimensionality reduction (DR) which is an important data pre-processing step has been extensively studied. DR can effectively reduce the features representation space which in turn helps improve the efficiency of text classification. Two DR methods namely Attribute Overlap Minimization (AOM) and Outlier Elimination (OE) are applied for downsizing the features representation space, on the numbers of attributes and amount of instances respectively, prior to training a decision model for text classification. AOM works by swapping the membership of the overlapped attributes (which are also known as features or keywords) to a group that has a higher occurrence frequency. Dimensionality is lowered when only significant and unique attributes are describing unique groups. OE eliminates instances that describe infrequent attributes. These two DR techniques can function with conventional feature selection together to further enhance their effectiveness. In this paper, two datasets on classifying languages and categorizing online news into six emotion groups are tested with a combination of AOM, OE and a wide range of classification algorithms. Significant improvements in prediction accuracy, tree size and speed are observed.

Index Terms—data stream mining, optimized very fast decision tree, incremental optimization  

Cite: Simon Fong and Antonio Cerone, "Attribute Overlap Minimization and Outlier Elimination as Dimensionality Reduction Techniques for Text Classification Algorithms," Journal of Emerging Technologies in Web Intelligence, Vol. 4, No. 3, pp. 259-263, August 2012. doi:10.4304/jetwi.4.3.259-263
Copyright © 2013-2020 Journal of Emerging Technologies in Web Intelligence, All Rights Reserved
E-mail: jetwi@etpub.com