MBNSeg: A Clustering System for Segmenting Malay Spoken Broadcast News
Zainab A. Khalaf Aleqili1,2 and
Tien Ping Tan1
1. School of Computer Sciences, Universiti Sains Malaysia (USM), 11800 Pinang, Malaysia
2. Department of Computer Science, College of Science, University of Basrah, Basrah, Iraq
2. Department of Computer Science, College of Science, University of Basrah, Basrah, Iraq
Abstract —This paper describes a spoken document retrieval system for processing Malay spoken broadcast news that uses an approach to enhance retrieval performance. An automatic speech recognition (ASR) system was adapted to reduce the impact of ASR transcription errors on retrieval performance. The performance of unsupervised learning was evaluated using Malay broadcast news as the data source. A latent semantic analysis was used to reduce the impact of synonymous words and to identify the story boundaries within the news segments. Among other things, the current system proved to be a powerful instrument to identify news story boundaries automatically.
Index Terms—spoken document retrieval; broadcast news transcription; clustering; latent semantic analysis; singular value decomposition (SVD)
Cite: Zainab Ali Khalaf and Tan Tien Ping, "MBNSeg: A Clustering System for Segmenting Malay Spoken Broadcast News," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 1, pp. 28-34, February 2013. doi:10.4304/jetwi.5.1.28-34
Index Terms—spoken document retrieval; broadcast news transcription; clustering; latent semantic analysis; singular value decomposition (SVD)
Cite: Zainab Ali Khalaf and Tan Tien Ping, "MBNSeg: A Clustering System for Segmenting Malay Spoken Broadcast News," Journal of Emerging Technologies in Web Intelligence, Vol. 5, No. 1, pp. 28-34, February 2013. doi:10.4304/jetwi.5.1.28-34
Array