Multi-View Learning for Web Spam Detection

JETWI News

The new website of JETWI is established. Welcome to submit your manuscripts.

Submissions

Please send your full manuscript to: jetwi@etpub.com

Useful Documents

FAQs

1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Home > Published Issues > 2013 > Volume 5, No. 4, November 2013 >

Ali Hadian and Behrouz Minaei-Bidgoli

Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

Abstract—Spam pages are designed to maliciously appear among the top search results by excessive usage of popular terms. Therefore, spam pages should be removed using an effective and efficient spam detection system. Previous methods for web spam classification used several features from various information sources (page contents, web graph, access logs, etc.) to detect web spam. In this paper, we follow page-level classification approach to build fast and scalable spam filters. We show that each web page can be classified with satisfactory accuracy using only its own HTML content. In order to design a multi-view classification system, we used state-of-the-art spam classification methods with distinct feature sets (views) as the base classifiers. Then, a fusion model is learned to combine the output of the base classifiers and make final prediction. Results on our Persian web spam dataset show that multi-view learning significantly improves the classification performance, namely AUC by 22%, while providing linear speedup for parallel execution.

Index Terms—web spam, content spam, machine learning, multi-view learning

Cite: Ali Hadian and Behrouz Minaei-Bidgoli, "Multi-View Learning for Web Spam Detection," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 4, pp. 395-400, November 2013. doi:10.4304/jetwi.5.4.395-400

jetwi0504_10

Array

Previous paper：Modeling Future Generation E-Mail Communication Model for Improving Quality of Service
Next paper：LSI Based Relevance Computation for Topical Web Crawler