Structured and Unstructured Information Extraction Using Text Mining and Natural Language Processing Techniques

S. Nagarajan, Dr. K. Perumal

doi:10.17762/ijritcc.v5i11.1271

PDF

Published: Nov 30, 2017

DOI: https://doi.org/10.17762/ijritcc.v5i11.1271

S. Nagarajan, Dr. K. Perumal

Abstract

Information on web is increasing at infinitum. Thus, web has become an unstructured global area where information even if available, cannot be directly used for desired applications. One is often faced with an information overload and demands for some automated help. Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents by means of Text Mining and Natural Language Processing (NLP) techniques. Extracted structured information can be used for variety of enterprise or personal level task of varying complexity. The Information Extraction (IE) in also a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. Information extraction is structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. Data mining research assumes that the information to be “mined” is already in the form of a relational database. IE can serve an important technology for text mining. The knowledge discovered is expressed directly in the documents to be mined, then IE alone can serve as an effective approach to text mining. However, if the documents contain concrete data in unstructured form rather than abstract knowledge, it may be useful to first use IE to transform the unstructured data in the document corpus into a structured database, and then use traditional data mining tools to identify abstract patterns in this extracted data. We propose a novel method for text mining with natural language processing techniques to extract the information from data base with efficient way, where the extraction time and accuracy is measured and plotted with simulation. Where the attributes of entities and relationship entities from structured and semi structured information .Results are compared with conventional methods.

How to Cite

, S. N. D. K. P. (2017). Structured and Unstructured Information Extraction Using Text Mining and Natural Language Processing Techniques. International Journal on Recent and Innovation Trends in Computing and Communication, 5(11), 32–43. https://doi.org/10.17762/ijritcc.v5i11.1271