Punjabi Documents Clustering System

JETWI News

The new website of JETWI is established. Welcome to submit your manuscripts.

Submissions

Please send your full manuscript to: jetwi@etpub.com

Useful Documents

FAQs

1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Home > Published Issues > 2013 > Volume 5, No. 2, May 2013 >

Saurabh Sharma and Vishal Gupta

University Institute of Engineering & Technology, Panjab University, Chandigarh, India

Abstract—Text document clustering inherits its qualities from Natural Languages Processing, Machine Learning and Information Retrieval. For unsupervised document organization, automatic topic extraction and fast information filtering and accuracy in retrieval, this is an effective method. Many clustering algorithms are available for unsupervised document organization and its retrieval thereof. The documents for text clustering are merely considered as an assortment of words in traditional approaches to clustering. The semantic relationship of the words should form the decisive base for clustering, which is generally conveniently forgotten albeit the information is vital for the purpose. A new method for generating frequent phrases by analyzing the semantic relations between the words in a sentence is discussed. Karaka list captures the semantic relations, which is a grammatical connector for connecting Nouns, Pronouns and Verbs in a sentence. This new clustering method utilizes an amalgamation of the theories behind Karaka Analyzer, Frequent Item sets and Frequent Word Sequences. Results are indicative of the fact that New Hybrid approach performs better in terms of Number of Clusters, Meaningful label of Clusters and effectiveness of clustering for those documents which do not have desired information in frequent phrases. Use of semantic features is the key to better results.

Index Terms—punjabi document clustering, karaka theory, frequent phrases

Cite: Saurabh Sharma and Vishal Gupta, "Punjabi Documents Clustering System," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 2, pp. 171-187, May 2013. doi:10.4304/jetwi.5.2.171-187

v5n2-15

Array

Previous paper：Direction Determination in Wireless Sensor Networks Using Grid Topology
Next paper：Developing a Common Personalization Framework for the E-Application Software Systems