Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning
Mukesh Kumar and Renu Vig
Computer Science and Engineering Department, University Institute of Engineering and Technology, Panjab University
Chandigarh, India
Abstract—A focused crawler traverses the Web to collect documents related to a particular topic, and can be used to build topic specific collection of documents for use in digital libraries and domain specific search. General crawlers make use of breath first search method to traverse the Web for as much amount of information as possible. Focused crawler help the search indexer to index all documents present on the World Wide Web related to a specific domain which in turn provides search engine’s users complete and fresher most information. In this paper we present a focused crawler capable of learning from the previous crawl results to collect the documents related to the sports domain. Crawling results for four consecutive crawls are shown. Results shows significant improvement in the precision value for the crawler with respect to the number of crawling attempts made.
Index terms—web, internet, retrieval, focused web crawler, search engine
Cite: Mukesh Kumar and Renu Vig, "Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 1, pp. 70-77, February 2013. doi:10.4304/jetwi.5.1.70-77
Index terms—web, internet, retrieval, focused web crawler, search engine
Cite: Mukesh Kumar and Renu Vig, "Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 1, pp. 70-77, February 2013. doi:10.4304/jetwi.5.1.70-77
Array
Previous paper:Arabic Semantic Web Applications – A Survey
Next paper:Stream Mining Dynamic Data by Using iOVFDT
Next paper:Stream Mining Dynamic Data by Using iOVFDT