DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining
Aysha Banu1 and
M. Chitra2
1. Anna University, Chennai, India
2. Department of Information Technology, Sona College of Technology, Salem, India
2. Department of Information Technology, Sona College of Technology, Salem, India
Abstract—Deep Web is a widely unexplored data source, is becoming an important research topic. Retrieving structured data from deep web pages is the challenging problem due to their complex structure. In this paper, Information extracts on the Deep Web pages based on the Deep Web Data Extraction technique (DWDR-IR). Search engines usually return a large number of pages in response to the user queries. To help the users to navigate in the result list, ranking methods are activated on the search results. In this paper, a page ranking mechanism called Coherence Ratio based Page (CRP) ranking algorithm is used. To retrieve the information accurately, an approach called WordNet is used. WordNet checks the similarity of data records and find the correct data region with higher precision using the semantic properties of data records. This concept is very important to display the valuable results occur on the top of the result list on the basis of browsing behavior of the user, it reduces the search space and provides high accuracy. This approach handles the visual features on the deep web data extraction, including data item extraction, data record extraction and visual wrapper generation. The proposed work removes all noise such as header, footer, irrelevant advertisement and irrelevant content using NoiSe Filter (NSFilter) algorithm. The proposed method retrieves perfect extraction of relevant results from the deep web pages. DWDE-IR results higher precision, recall and filter accuracy than the existing method ViDE.
Index Terms—data item extraction, data record extraction, deep web data extraction, ranking algorithm, visual wrapper generation, wordnet
Cite: Aysha Banu and M. Chitra, "DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining," Journal of Emerging Technologies in Web Intelligence, Vol. 6, No. 1, pp. 133-141, February 2014. doi:10.4304/jetwi.6.1.133-141
Index Terms—data item extraction, data record extraction, deep web data extraction, ranking algorithm, visual wrapper generation, wordnet
Cite: Aysha Banu and M. Chitra, "DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining," Journal of Emerging Technologies in Web Intelligence, Vol. 6, No. 1, pp. 133-141, February 2014. doi:10.4304/jetwi.6.1.133-141
Array