Web Content Extraction Techniques: A survey

Main Article Content

Kinnari Ajmera, Khushali Deulkar

Abstract

As technology grows everyday and the amount of research done in various fields rises exponentially the amount of this information being published on the World Wide Web rises in a similar fashion. Along with the rise in useful information being published on the world wide web the amount of excess irrelevant information termed as ‘noise’ is also published in the form of (advertisement, links, scrollers, etc.). Thus now-a-days systems are being developed for data pre-processing and cleaning for real-time applications. Also these systems help other analyzing systems such as social network mining, web mining, data mining, etc to analyze the data in real time or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. For web content extraction task, researchers have proposed many different methods, such as wrapper-based method, DOM tree rule-based method, machine learning-based method and so on. This paper presents a comparative study of 4 recently proposed methods for web content extraction. These methods have used the traditional DOM tree rule-based method as the base and worked on using other tools to express better results.

Article Details

How to Cite
, K. A. K. D. (2015). Web Content Extraction Techniques: A survey. International Journal on Recent and Innovation Trends in Computing and Communication, 3(11), 6163–6165. https://doi.org/10.17762/ijritcc.v3i11.5011
Section
Articles