A Novel Class-Based Data Fusion Technique for Information Retrieval
Muath Alzghool1 and
Diana Inkpen2
1. Al-Balqa’ Applied University, Alsalt, Jordan
2. University of Ottawa, Ottawa, Canada
2. University of Ottawa, Ottawa, Canada
Abstract—Data fusion in information retrieval combines the results from multiple retrieval models or document representations. The achievement of data fusion technique is dependent on the quality of the inputs; classical data fusion techniques fail to improve the retrieval if the quality of the retrieval results varies from low to high quality. In order to tackle this problem, in this paper we address the issue of high variation among the retrieval strategies or document representations which affect the combination of their outputs. Our investigation on the MALACH speech collection – in which different segment representations are available – shows that neither the classical data fusion (CombSUM) nor the weighted version (WCombSum) improve the retrieval. We propose a novel class-based data fusion technique to deal with this issue. The segments retrieved by models based on different document representations are classified according to the quality of the segment into three classes: high, intermediate, and low quality class; then the similarity scores of each segment are fused using the classical CombSUM. Our experimental results show that the new technique is significantly better than CombSUM or WCombSUM in combing results with high quality variation.
Index Terms—information storage and retrieval, searching spontaneous speech transcriptions, data fusion
Cite: Muath Alzghool and Diana Inkpen, "A Novel Class-Based Data Fusion Technique for Information Retrieval," Journal of Emerging Technologies in Web Intelligence, Vol. 2, No. 3, pp. 160-166, August 2010. doi:10.4304/jetwi.2.3.160-166
Index Terms—information storage and retrieval, searching spontaneous speech transcriptions, data fusion
Cite: Muath Alzghool and Diana Inkpen, "A Novel Class-Based Data Fusion Technique for Information Retrieval," Journal of Emerging Technologies in Web Intelligence, Vol. 2, No. 3, pp. 160-166, August 2010. doi:10.4304/jetwi.2.3.160-166
Array