A survey on Data Extraction and Data Duplication Detection

Main Article Content

Yashika A. Shah, Snehal S. Zade, Smita M. Raut, Shraddha P. Shirbhate, Vijeta U. Khadse, Anup P. Date


Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Processing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algorithms are needed to extract useful features from huge amount of data. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. This Paper review the literature on duplicate detection and data fusion (remov e and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user.

Article Details

How to Cite
, Y. A. S. S. S. Z. S. M. R. S. P. S. V. U. K. A. P. D. “A Survey on Data Extraction and Data Duplication Detection”. International Journal on Recent and Innovation Trends in Computing and Communication, vol. 6, no. 5, May 2018, pp. 77-82, doi:10.17762/ijritcc.v6i5.1579.