A survey on Data Extraction and Data Duplication Detection

Yashika A. Shah, Snehal S. Zade, Smita M. Raut, Shraddha P. Shirbhate, Vijeta U. Khadse, Anup P. Date

doi:10.17762/ijritcc.v6i5.1579

PDF

Published: May 31, 2018

DOI: https://doi.org/10.17762/ijritcc.v6i5.1579

Yashika A. Shah, Snehal S. Zade, Smita M. Raut, Shraddha P. Shirbhate, Vijeta U. Khadse, Anup P. Date

Abstract

Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Processing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algorithms are needed to extract useful features from huge amount of data. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. This Paper review the literature on duplicate detection and data fusion (remov e and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user.

How to Cite

, Y. A. S. S. S. Z. S. M. R. S. P. S. V. U. K. A. P. D. (2018). A survey on Data Extraction and Data Duplication Detection. International Journal on Recent and Innovation Trends in Computing and Communication, 6(5), 77–82. https://doi.org/10.17762/ijritcc.v6i5.1579

Issue

Vol. 6 No. 5 (2018): May (2018) Issue

Section

Articles

Make a Submission

Announcements

Call for Papers

January 5, 2026

Call for Papers for the New Issue.
Last Date of Submission: June 30^th, 2026

Imp. Announcement

April 15, 2022

Dear Authors,
We are feeling proud congratulations to all the contributors of IJRITCC. Because The "International Journal on Recent and Innovation Trends in Computing and Communication" has been accepted for Scopus.

Like, Subscribe and Share This Video