LSI Based Relevance Computation for Topical Web Crawler

JETWI News

The new website of JETWI is established. Welcome to submit your manuscripts.

Submissions

Please send your full manuscript to: jetwi@etpub.com

Useful Documents

FAQs

1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Home > Published Issues > 2013 > Volume 5, No. 4, November 2013 >

Gurmeen Minhas and Mukesh Kumar

UIET, Panjab University, Chandigarh, India

Abstract—Today, size of the web is exceptionally large. And this size is increasing rapidly. Huge number of web pages and web sites are being added each day. Hence, results which are effective, factual and authentic are needed. A simple crawler cannot cover each web page as it would take polynomial time to do so. In order to overcome such issues, this paper proposes an algorithm to develop an efficient, focused, domain specific crawler using LSI (Latent Semantic Indexing). This algorithm makes the crawler highly efficient in downloading relevant documents, thus, avoiding over-heads and resource wastage, and also increases the precision and recall values of the IR system developed on it.

Index Terms—crawling, focused crawler, latent semantic indexing, domain specific crawler

Cite: Gurmeen Minhas and Mukesh Kumar, "LSI Based Relevance Computation for Topical Web Crawler," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 4, pp. 401-406, November 2013. doi:10.4304/jetwi.5.4.401-406

jetwi0504_11

Array

Previous paper：Multi-View Learning for Web Spam Detection
Next paper：Developed an Intelligent Knowledge Representation Technique Using Semantic Web Technology