A Novel Method of Significant Words Identification in Text Summarization
Maryam Kiabod, Mohammad Naderi Dehkordi, and Mehran Sharafi
Department of Computer Engineering, Najafabad Branch, Islamic azad University, Isfahan, Iran
Abstract—Text summarization is a process that reduces the size of the text document and extracts significant sentences from a text document. We present a novel technique for text summarization. The originality of technique lies on exploiting local and global properties of words and identifying significant words. The local property of word can be considered as the sum of normalized term frequency multiplied by its weight and normalized number of sentences containing that word multiplied by its weight. If local score of a word is less than local score threshold, we remove that word. Global property can be thought of as maximum semantic similarity between a word and title words. Also we introduce an iterative algorithm to identify significant words. This algorithm converges to the fixed number of significant words after some iterations and the number of iterations strongly depends on the text document. We used a two-layered backpropagation neural network with three neurons in the hidden layer to calculate weights. The results show that this technique has better performance than MS-word 2007, baseline and Gistsumm summarizers.
Index Terms—significant words, text summarization, pruning algorithm
Cite: Maryam Kiabod, Mohammad Naderi Dehkordi, and Mehran Sharafi, "A Novel Method of Significant Words Identification in Text Summarization," Journal of Emerging Technologies in Web Intelligence, Vol. 4, No. 3, pp. 252-258, August 2012. doi:10.4304/jetwi.4.3.252-258
Index Terms—significant words, text summarization, pruning algorithm
Cite: Maryam Kiabod, Mohammad Naderi Dehkordi, and Mehran Sharafi, "A Novel Method of Significant Words Identification in Text Summarization," Journal of Emerging Technologies in Web Intelligence, Vol. 4, No. 3, pp. 252-258, August 2012. doi:10.4304/jetwi.4.3.252-258
Array