Enhancing Feature Extraction through G-PLSGLR by Decreasing Dimensionality of Textual Data

Narender  Chinthamu; Chandrasekar  Venkatachalam; Muthuvairavan  Pillai.N; Setti Vidya Sagar  Appaji; M.  Murali

doi:10.17762/ijritcc.v11i4s.6540

PDF

Published: May 5, 2023

DOI: https://doi.org/10.17762/ijritcc.v11i4s.6540

Keywords:

Big Data, dimensionality reduction, vector space model, Bayesian information criterion

Narender Chinthamu

Enterprise Architect, MIT CTO Candidate

Chandrasekar Venkatachalam

Professor, Department of CSE, Faculty of Engineering and Technology, Jain (Deemed-to-be) University, Bangalore, Karnataka

Muthuvairavan Pillai.N

Assistant Professor, Department of Computer Science and Business Systems, R.M.D Engineering College, Kavarapettai

Setti Vidya Sagar Appaji

Associate Professor, Department of Computer Science and Engineering, Baba Institute of Technology and Sciences, Visakhapatnam, Andhra Pradesh

M. Murali

Assistant Professor, Department of iT, Sona College of Technology

Abstract

The technology of big data has become highly popular in numerous industries owing to its various characteristics such as high value, large volume, rapid velocity, wide variety, and significant variability. Nevertheless, big data presents several difficulties that must be addressed, including lengthy processing times, high computational complexity, imprecise features, significant sparsity, irrelevant terms, redundancy, and noise, all of which can have an adverse effect on the performance of feature extraction. The objective of this research is to tackle these issues by utilizing the Partial Least Square Generalized Linear Regression (G-PLSGLR) approach to decrease the high dimensionality of text data. The suggested algorithm is made up of four stages: Firstly, gathering featured data in vector space model (VSM) and training it with bootstrap technique. Second, grouping trained feature samples using a Pearson correlation coefficient and graph-based technique. Third, getting rid of unimportant features by ranking significant group features using PLSGR. Lastly, choosing or extracting significant features using Bayesian information criterion (BIC). The G-PLSGLR algorithm surpasses current methods by achieving a high reduction rate and classification performance, while minimizing feature redundancy, time consumption, and complexity. Furthermore, it enhances the accuracy of features by 35%.

How to Cite

Chinthamu, N. ., Venkatachalam, C. ., Pillai.N, M. ., Appaji, S. V. S. ., & Murali, M. . (2023). Enhancing Feature Extraction through G-PLSGLR by Decreasing Dimensionality of Textual Data. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 288–295. https://doi.org/10.17762/ijritcc.v11i4s.6540

Issue

Vol. 11 No. 4s (2023)

Section

Articles

References

Binyu Wang, Wenfen Liu, Zijie Lin, Xuexian Hu, Jianghong Wei and Chun Liu, ""A Text Clustering Algorithm Based On Deep Representation Learning"", The Journal of engineering, 2018.

Austin J. Brockmeier, Tingting Mu,Sophia Ananiadou, and John Y. Goulermas, ""Self-TunedDescriptive Document Clustering using a Predictive Network"",IEEE Transactions on Knowledge and Data Engineering, 2017.

Yuanping Zhu, Kuang Zhang, ""Text segmentation using super pixel clustering"", IET journals Image Process., 2017.

R. Malathi Ravindran and Dr. Antony Selvadoss Thanamani ""K-Means Document Clusteringusing Vector Space Model"" Bonfring International Journal of Data Mining, Vol. 5, No. 2, July 2015.

Ms. K.L.Sumathy and Dr. M. Chidambaram ""Semantic Based Vector Space Model to Improve the Clustering Accuracy in Knowledge Repositories"", (IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 9, September 2016.

Guixian xu, ziheng yu and qi, ""EfficientSensitive Information Classification and Topic Tracking Based on Tibetan Web Pages"", IEEE.Translations and content mining, VOLUME 6, 2018.

Yaqing Liu, Xiaokai Yi, Rong Chen, Zhengguo Zhaianf and Jingxuan Gu, ""Feature extraction based on information gain and sequential pattern for English question classification"", IET journals 2018.

Yintong Wang, ""Unsupervisedrepresentative feature selection algorithm based on information entropy and relevance analysis"", IEEE Translations and content mining 2017.

Jiarun Cao, Chongwen Wang, Liming Gao, ""A Joint Model for Text and Image Semantic Feature Extraction"", ACAI 2018.

R. Krishnana, V.A. Samaranayake & S. Jagannathan, ""A Multi-step Nonlinear Dimension-reduction Approach with Applications toBig data"",

Kiran Adnan and Rehan Akbar 2019, ""An analytical study of information extraction from unstructured and multidimensional big data"", Journal of Big Data, Springer Open.

Sudha Ramkumar and Dr. B. Poorna, ""Text Document Clustering Using Dimension Reduction Technique"" International Journal of Applied Engineering Research2016.

Ayush Aggarwa, Chhavi Sharma, Minni Jain and Amita Jain, ""Semi Supervised Graph Based Keyword Extraction Using Lexical Chains and Centrality Measures"", ISSN 2018.

Prateek Chanda and Asit Kumar Das, ""A Novel Graph Based Clustering Approach to Document Topic Modeling"" IEEE 2018.

Terry Ruas and William Grosky, ""Semantic Feature Structure Extraction from Documents Based on Extended Lexical Chains"".

Shabanaafreen and dr.b.Srinivasu, ""Semantic Based Document Clustering Using Lexical Chains"" International Research Journal of Engineering and Technology (IRJET) 2017.

Francis Musembi Kwale ""An Overview of VSM-Based Text Clustering Approaches"",International Journal of Advanced Research in Computer Science, 2014.

Halima ELAIDI, Younes ELHADDAR, Zahra BENABBOU and Hassan ABBAR, ""An idea of a clustering algorithm using support vector machines based on binary decision tree"",IEEE Translations and content mining 2018.

Yaohuan Huang , Chuanpeng Zhao, Haijun Yang , Xiaoyang Song and Zhonghua Li, Jie Chen 2019, ""Feature Selection Solution with High Dimensionality and Low-Sample Size for Land CoverClassi?cationin Object-Based Image Analysis"", Remote Sensing.

Liwei Kuang, Laurence T. Yang, Jinjun Chen, Fei Hao, Changqing LuoView, ""A Holistic Approach for Distributed Dimensionality Reduction of Big Data"", IEEE Transactions on Cloud Computing, 2018.

Citation Indices	All	Since 2018
Citation	5854	3996
h-index	28	23
i10-index	119	72

Year	Rate
2019	12.6%
2018	18.3%
2017	16.9%
2016	18.8%
2015	22.9%
2014	28.9%
2013	26.1%

Enhancing Feature Extraction through G-PLSGLR by Decreasing Dimensionality of Textual Data

Abstract

References

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links:

Article Sidebar

Main Article Content

Abstract

Article Details

References

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links: