Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning

JETWI News

The new website of JETWI is established. Welcome to submit your manuscripts.

Submissions

Please send your full manuscript to: jetwi@etpub.com

Useful Documents

FAQs

1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Home > Published Issues > 2013 > Volume 5, No. 1, February 2013 >

Mukesh Kumar and Renu Vig

Computer Science and Engineering Department, University Institute of Engineering and Technology, Panjab University Chandigarh, India

Abstract—A focused crawler traverses the Web to collect documents related to a particular topic, and can be used to build topic specific collection of documents for use in digital libraries and domain specific search. General crawlers make use of breath first search method to traverse the Web for as much amount of information as possible. Focused crawler help the search indexer to index all documents present on the World Wide Web related to a specific domain which in turn provides search engine’s users complete and fresher most information. In this paper we present a focused crawler capable of learning from the previous crawl results to collect the documents related to the sports domain. Crawling results for four consecutive crawls are shown. Results shows significant improvement in the precision value for the crawler with respect to the number of crawling attempts made.

Index terms—web, internet, retrieval, focused web crawler, search engine

Cite: Mukesh Kumar and Renu Vig, "Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 1, pp. 70-77, February 2013. doi:10.4304/jetwi.5.1.70-77

jetwi0501_11

Array

Previous paper：Arabic Semantic Web Applications – A Survey
Next paper：Stream Mining Dynamic Data by Using iOVFDT