Web Content Extraction Techniques: A survey

Kinnari Ajmera, Khushali Deulkar

doi:10.17762/ijritcc.v3i11.5011

PDF

Published: Nov 30, 2015

DOI: https://doi.org/10.17762/ijritcc.v3i11.5011

Kinnari Ajmera, Khushali Deulkar

Abstract

As technology grows everyday and the amount of research done in various fields rises exponentially the amount of this information being published on the World Wide Web rises in a similar fashion. Along with the rise in useful information being published on the world wide web the amount of excess irrelevant information termed as ‘noise’ is also published in the form of (advertisement, links, scrollers, etc.). Thus now-a-days systems are being developed for data pre-processing and cleaning for real-time applications. Also these systems help other analyzing systems such as social network mining, web mining, data mining, etc to analyze the data in real time or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. For web content extraction task, researchers have proposed many different methods, such as wrapper-based method, DOM tree rule-based method, machine learning-based method and so on. This paper presents a comparative study of 4 recently proposed methods for web content extraction. These methods have used the traditional DOM tree rule-based method as the base and worked on using other tools to express better results.

How to Cite

, K. A. K. D. (2015). Web Content Extraction Techniques: A survey. International Journal on Recent and Innovation Trends in Computing and Communication, 3(11), 6163–6165. https://doi.org/10.17762/ijritcc.v3i11.5011

Issue

Vol. 3 No. 11 (2015): November (2015) Issue

Section

Articles

Make a Submission

Announcements

Call for Papers

January 5, 2026

Call for Papers for the New Issue.
Last Date of Submission: June 30^th, 2026

Imp. Announcement

April 15, 2022

Dear Authors,
We are feeling proud congratulations to all the contributors of IJRITCC. Because The "International Journal on Recent and Innovation Trends in Computing and Communication" has been accepted for Scopus.

Like, Subscribe and Share This Video