1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Formal Concept Analysis Based Corrective Approach Using Query-log for Web Page Classification

Abdelbadie Belmouhcine and Mohammed Benkhalifa
Computer Science Laboratory (LRI), Computer Science Department, Faculty of Science, Mohammed V-Agdal University, Rabat, Morocco
Abstract—Web page classification has many applications and plays a vital role in web mining and semantic web. Web pages contain much irrelevant information that does not reflect their categories or topics, and operates as noise in the process of their classification, especially when using a text classifier. Thus, the use of information from related web pages can help to overcome the problem of noisy content and to get a better result after the classification. Web pages are linked either directly by hyperlinks or indirectly by user’s intuitive judgment. In this work, we suggest a post classification corrective method that uses the query-log to build an implicit neighborhood, and collectively propagate classes over web pages of that neighborhood. This collective propagation helps improving text classifier results by correcting wrongly assigned categories. Our technique operates in four steps. In the first step, it builds a weighted graph called initial graph, whose vertices are web pages and edges are implicit links. In the second step, it uses a text classifier to determine classes of all web pages represented by vertices in the initial graph. In the third step, it constructs clusters of web pages using Formal Concept Analysis. Then, it applies a first adjustment of classes called Internal Propagation of Categories (IPC). In the final step, it performs a second adjustment of classes called External Propagation of Categories (EPC). This adjustment leads to significant improvements of results provided by the text classifier. We conduct our experiments using five classifiers: SVM (Support Vector Machine), NB (Naïve Bayes), KNN (K Nearest Neighbors), ICA (Iterative classification algorithm) based on SVM and ICA based on NB, on four subsets of ODP (Open Directory Project). We also compare our approach to Classification using Linked Neighborhood (CLN) considered as the closest algorithm to EPC. Results show that: (1) when applied after SVM, NB, KNN or ICA classification, IPC followed by EPC help bringing improvements on results. (2) F1 scores provided by our approach with any of the five classifiers are significantly better than those obtained by CLN. (3) The performance provided by our proposed approach grows proportionally to the size of the query-log, and to the density of the weighted graph.
 
Index Terms—formal concept analysis, centrality degree, semantic web, web page classification, query-log.   

Cite: Abdelbadie Belmouhcine and Mohammed Benkhalifa, "Formal Concept Analysis Based Corrective Approach Using Query-log for Web Page Classification," Journal of Emerging Technologies in Web Intelligence, Vol. 6, No. 2, pp. 200-209, May 2014. doi:10.4304/jetwi.6.2.200-209
Array
Copyright © 2013-2020 Journal of Emerging Technologies in Web Intelligence, All Rights Reserved
E-mail: jetwi@etpub.com