A Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages

Vishal Gupta1 and Gurpreet Singh Lehal2
1. UIET, Panjab University, Chandigarh, India
2. Department of Computer Science, Punjabi University, Patiala, India
Abstract—Stemming is an operation that relates morphological variants of a word. The purpose of stemming is to obtain the stem or radix of those words which are not found in dictionary. If stemmed word is present in dictionary, then that is a genuine word, otherwise it may be proper name or some invalid word. Stemming is the process for reducing inflected or sometimes derived words to their stem, base or root form, generally a written word form. The stem need not be identical to the morphological root of the word, it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Stemming is used in Information Retrieval systems to improve performance. The design of stemmers is language specific, and requires some to significant linguistic expertise in the language, as well as the understanding of the needs for a spelling checker for that language. A stemmer’s performance and effectiveness in applications such as spelling checker vary across languages. A typical simple stemmer algorithm involves removing suffixes using a list of frequent suffixes, while a more complex one would use morphological knowledge to derive a stem from the words. In this paper a survey of common stemming techniques and existing stemmers for Indian languages have been presented.

Index Terms—stemmer, stemming techniques, indian stemmers, suffix removal  

Cite: Vishal Gupta and Gurpreet Singh Lehal, "A Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 2, pp. 157-161, May 2013. doi:10.4304/jetwi.5.2.157-161
