Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering for Web Page Ranking

Main Article Content

P. Sujai
V. Sangeetha

Abstract

Web content mining retrieves the information from web in more structured forms. The page rank plays an essential part in web content mining process. Whenever user searches for any information on web, the relevant information is shown at top of list through page ranking. Many existing page ranking algorithms were developed and failed to rank the web pages in accurate manner through minimum time feeding. In direction to address the above mentioned issues, Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering (LSSPFS-SXGBC) Approach is introduced for page ranking based on user query. LSSPFS-SXGBC Approach has three processes for performing efficient web page ranking, namely preprocessing, feature selection and clustering. LSSPFS-SXGBC Approach in account of the numeral of operator request by way of an input. Lancaster Stemming Preprocessed Analysis is carried out in LSSPFS-SXGBC Approach for removing the noisy data from the input query. It eradicates the stem words, stop words and incomplete data for minimizing the time and space consumption. Sammon Projective Feature Selection Process is carried out in LSSPFS-SXGBC Approach to select the relevant features (i.e., keywords) based on user needs for efficient page ranking. Sammon Projection maps the high-dimensional space to lower dimensionality space to preserve the inter-point distance structure. After feature selection, Stochastic eXtreme Gradient Boost Page Rank Clustering process is carried out to cluster the similar keyword web pages based on their rank. Gradient Boost Page Rank Cluster is an ensemble of several weak clusters (i.e., X-means cluster). X-means cluster partitions the web pages into ‘x’ numeral of clusters where each reflection goes towards the cluster through adjacent mean value. For every weak cluster, selected features are considered as the training samples. Subsequently, all weak clusters are joined to form the strong cluster for attaining the webpage ranking results. By this way, an efficient page ranking is carried out through higher accurateness and minimum time consumption. The practical validation is carried out in LSSPFS-SXGBC Approach on factors such ranking accurateness, false positive rate, ranking time and space complexity with respect to numeral of user query.

Article Details

How to Cite
Sujai, P. ., & Sangeetha, V. . (2023). Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering for Web Page Ranking. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 268–279. https://doi.org/10.17762/ijritcc.v11i4s.6537
Section
Articles

References

Mohamed Attia, Manal A. Abdel-Fattah and Ayman E. Khedr, “A proposed multi criteria indexing and ranking model for documents and web pages on large scale data”, Journal of King Saud University - Computer and Information Sciences, Elsevier, Volume 34, Issue 10, Part A, November 2022, Pages 8702-8715

Carla Limongelli, Matteo Lombardi, Alessandro Maran, and Davide Taibi, “A Semantic Approach to Ranking Techniques: Improving Web Page Searches for Educational Purposes”, IEEE Access, Volume 10, 2022, Pages 68885 - 68896

Ahmet Selman Bozkir and Ebru Akcapinar Sezer, “Layout-based computation of web page similarity ranks”, International Journal of Human-Computer Studies, Elsevier, Volume 110, February 2018, Pages 95-114

N Jayalakshmi, V Sangeeta and Appala Srinuvasu Muttipati, “Taylor Horse Herd Optimized Deep Fuzzy clustering and Laplace based K-nearest neighbor for web page recommendation”, Advances in Engineering Software, Elsevier, Volume 175, January 2023, Pages 1-15

Jangwan Koo, Dong-Kyu Chae, Dong-Jin Kim and Sang-Wook Kim, “Incremental C-Rank: An effective and efficient ranking algorithm for dynamic Web environments”, Knowledge-Based Systems, Elsevier, Volume 176, 15 July 2019, Pages 147-158

Leandro Tortosa, Jose F. Vicent and Gevorg Yeghikyan, “An algorithm for ranking the nodes of multiplex networks with data based on the PageRank concept”, Applied Mathematics and Computation, Elsevier, Volume 392, 1 March 2021, Pages 1-15

Sergio Jimenez, Fabio N Silva, George Dueñas and Alexander Gelbukh, “ProficiencyRank: Automatically ranking expertise in online collaborative social networks”, Information Sciences, Elsevier, Volume 588, April 2022, Pages 231-247

Késsia Nepomuceno, Thyago Nepomuceno and Djamel Sadok, “Measuring the Internet Technical Efficiency: A Ranking for the World Wide Web Pages”, IEEE Latin America Transactions, Volume 18, Issue 06, June 2020, Pages 1119 - 1125

Moitrayee Chatterjee and Akbar Siami Namin, “A fuzzy Dempster–Shafer classifier for detecting Web spams”, Journal of Information Security and Applications, Elsevier, Volume 59, June 2021, Pages 1-18

P. Chahal, M. Singh and S. Kumar, “An Efficient Web Page Ranking for Semantic Web”, Journal of the Institution of Engineers (India): Series B, Springer, Volume 95, 2014, Pages 15–21

Prem Sagar Sharma, Divakar Yadav, and R. N. Thakur, “Web Page Ranking Using Web Mining Techniques: A Comprehensive Survey”, Mobile Information Systems, Hindawi Publishing Corporation, Volume 2022, 2022, Pages 1-19

Mohammed Rashad Baker and M. Ali Akcayol, “A novel web ranking algorithm based on pages multi-attribute”, International Journal of Information Technology, Springer, Volume 14, 2022, Pages 739–749

Shubham Goel, Ravinder Kumar, Munish Kumar and Vikram Chopra, “An efficient page ranking approach based on vector norms using sNorm(p) algorithm”, Information Processing & Management, Elsevier, Volume 56, Issue 3, May 2019, Pages 1053-1066

P.V. Vidya, P.C. Reghu Raj and V. Jayan, “Web Page Ranking Using Multilingual Information Search Algorithm - A Novel Approach”, Procedia Technology, Elsevier, Volume 24, 2016, Pages 1240-1247

Yun Li, Yongyao Jiang, Chaowei Yang, Manzhu Yu, Lara Kamal, Edward M. Armstrong, Thomas Huang, David Moroni and Lewis J. McGibbney, “Improving search ranking of geospatial data based on deep learning using user behavior data”, Computers & Geosciences, Elsevier, Volume 142, September 2020, Pages 1-15

Syed Ahmed Yasin and P. V. R. D. Prasada Rao, “Enhanced CRNN-Based Optimal Web Page Classification and Improved Tunicate Swarm Algorithm-Based Re-Ranking”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Volume 30, Issue 05, 2022, Pages 813-846

Késsia Nepomuceno, Thyago Nepomuceno and Djamel Sadok, “Measuring the Internet Technical Efficiency: A Ranking for the World Wide Web Pages”, IEEE Latin America Transactions, Volume 18, Issue 06, June 2020, Pages 1119 - 1125

Shubham Goel, Ravinder Kumar, Munish Kumar and Vikram Chopra, “An efficient page ranking approach based on vector norms using sNorm(p) algorithm”, Information Processing and Management, Elsevier, Volume 56, 2019, Pages 1053–1066

Dheeraj Malhotra and O.P. Rishi, “IMSS-P: An intelligent approach to design & development of personalized meta search & page ranking system”, Journal of King Saud University - Computer and Information Sciences, Elsevier, November 2018, Pages 1-16

M. Coppola, J. Guo, E. Gill and G. C. H. E. de Croon, “The PageRank algorithm as a method to optimize swarm behavior through local analysis”, Swarm Intelligence, Springer, Volume 13, Issue 3–4, December 2019, Pages 277–319