Bayesian Network and Network Pruning Strategy for XML Duplicate Detection

Main Article Content

Ms. Trupti Patil, Siddheshwar Patil, Mis

Abstract

Data Duplication causes excess use of redundant storage, excess time and inconsistency. Duplicate detection will help to ensure accurate data by identifying and preventing identical or similar records. There is a long work on identifying duplicates in relational data, but only a slight solution focused on duplicate detection in more complex hierarchical structures, like XML data. Hierarchical data are defined as a set of data items that are related to each other by hierarchical relationships such as XML .In the world of XML there are not necessarily uniform and clearly defined structures like tables. Duplicate detection has been studied extensively for relational data. Methods devised for duplicate detection in a single relation do not directly apply to XML data. Therefore there is a need to develop a method to detect duplicate objects in nested XML data. In proposed system duplicates are detected by using duplicate detection algorithm called as XMLDup. Proposed XMLDup method will be using Bayesian network. It determine the probability of two XML elements being duplicates by considering the information within the elements and the structure of information. In order to improve the Bayesian Network evaluation time, pruning strategy is used. Finally work will be analyzed by measuring Precision and Recall value.

Article Details

How to Cite
, M. T. P. S. P. M. (2014). Bayesian Network and Network Pruning Strategy for XML Duplicate Detection. International Journal on Recent and Innovation Trends in Computing and Communication, 2(11), 3701–3703. https://doi.org/10.17762/ijritcc.v2i11.3540
Section
Articles