LSI Based Relevance Computation for Topical Web Crawler

Gurmeen Minhas and Mukesh Kumar
UIET, Panjab University, Chandigarh, India
Abstract—Today, size of the web is exceptionally large. And this size is increasing rapidly. Huge number of web pages and web sites are being added each day. Hence, results which are effective, factual and authentic are needed. A simple crawler cannot cover each web page as it would take polynomial time to do so. In order to overcome such issues, this paper proposes an algorithm to develop an efficient, focused, domain specific crawler using LSI (Latent Semantic Indexing). This algorithm makes the crawler highly efficient in downloading relevant documents, thus, avoiding over-heads and resource wastage, and also increases the precision and recall values of the IR system developed on it.

Index Terms—crawling, focused crawler, latent semantic indexing, domain specific crawler  

Cite: Gurmeen Minhas and Mukesh Kumar, "LSI Based Relevance Computation for Topical Web Crawler," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 4, pp. 401-406, November 2013. doi:10.4304/jetwi.5.4.401-406
