Enhancing Feature Extraction through G-PLSGLR by Decreasing Dimensionality of Textual Data

Main Article Content

Narender Chinthamu
Chandrasekar Venkatachalam
Muthuvairavan Pillai.N
Setti Vidya Sagar Appaji
M. Murali

Abstract

The technology of big data has become highly popular in numerous industries owing to its various characteristics such as high value, large volume, rapid velocity, wide variety, and significant variability. Nevertheless, big data presents several difficulties that must be addressed, including lengthy processing times, high computational complexity, imprecise features, significant sparsity, irrelevant terms, redundancy, and noise, all of which can have an adverse effect on the performance of feature extraction. The objective of this research is to tackle these issues by utilizing the Partial Least Square Generalized Linear Regression (G-PLSGLR) approach to decrease the high dimensionality of text data. The suggested algorithm is made up of four stages: Firstly, gathering featured data in vector space model (VSM) and training it with bootstrap technique. Second, grouping trained feature samples using a Pearson correlation coefficient and graph-based technique. Third, getting rid of unimportant features by ranking significant group features using PLSGR. Lastly, choosing or extracting significant features using Bayesian information criterion (BIC). The G-PLSGLR algorithm surpasses current methods by achieving a high reduction rate and classification performance, while minimizing feature redundancy, time consumption, and complexity. Furthermore, it enhances the accuracy of features by 35%.

Article Details

How to Cite
Chinthamu, N. ., Venkatachalam, C. ., Pillai.N, M. ., Appaji, S. V. S. ., & Murali, M. . (2023). Enhancing Feature Extraction through G-PLSGLR by Decreasing Dimensionality of Textual Data. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 288–295. https://doi.org/10.17762/ijritcc.v11i4s.6540
Section
Articles

References

Binyu Wang, Wenfen Liu, Zijie Lin, Xuexian Hu, Jianghong Wei and Chun Liu, ""A Text Clustering Algorithm Based On Deep Representation Learning"", The Journal of engineering, 2018.

Austin J. Brockmeier, Tingting Mu,Sophia Ananiadou, and John Y. Goulermas, ""Self-TunedDescriptive Document Clustering using a Predictive Network"",IEEE Transactions on Knowledge and Data Engineering, 2017.

Yuanping Zhu, Kuang Zhang, ""Text segmentation using super pixel clustering"", IET journals Image Process., 2017.

R. Malathi Ravindran and Dr. Antony Selvadoss Thanamani ""K-Means Document Clusteringusing Vector Space Model"" Bonfring International Journal of Data Mining, Vol. 5, No. 2, July 2015.

Ms. K.L.Sumathy and Dr. M. Chidambaram ""Semantic Based Vector Space Model to Improve the Clustering Accuracy in Knowledge Repositories"", (IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 9, September 2016.

Guixian xu, ziheng yu and qi, ""EfficientSensitive Information Classification and Topic Tracking Based on Tibetan Web Pages"", IEEE.Translations and content mining, VOLUME 6, 2018.

Yaqing Liu, Xiaokai Yi, Rong Chen, Zhengguo Zhaianf and Jingxuan Gu, ""Feature extraction based on information gain and sequential pattern for English question classification"", IET journals 2018.

Yintong Wang, ""Unsupervisedrepresentative feature selection algorithm based on information entropy and relevance analysis"", IEEE Translations and content mining 2017.

Jiarun Cao, Chongwen Wang, Liming Gao, ""A Joint Model for Text and Image Semantic Feature Extraction"", ACAI 2018.

R. Krishnana, V.A. Samaranayake & S. Jagannathan, ""A Multi-step Nonlinear Dimension-reduction Approach with Applications toBig data"",

Kiran Adnan and Rehan Akbar 2019, ""An analytical study of information extraction from unstructured and multidimensional big data"", Journal of Big Data, Springer Open.

Sudha Ramkumar and Dr. B. Poorna, ""Text Document Clustering Using Dimension Reduction Technique"" International Journal of Applied Engineering Research2016.

Ayush Aggarwa, Chhavi Sharma, Minni Jain and Amita Jain, ""Semi Supervised Graph Based Keyword Extraction Using Lexical Chains and Centrality Measures"", ISSN 2018.

Prateek Chanda and Asit Kumar Das, ""A Novel Graph Based Clustering Approach to Document Topic Modeling"" IEEE 2018.

Terry Ruas and William Grosky, ""Semantic Feature Structure Extraction from Documents Based on Extended Lexical Chains"".

Shabanaafreen and dr.b.Srinivasu, ""Semantic Based Document Clustering Using Lexical Chains"" International Research Journal of Engineering and Technology (IRJET) 2017.

Francis Musembi Kwale ""An Overview of VSM-Based Text Clustering Approaches"",International Journal of Advanced Research in Computer Science, 2014.

Halima ELAIDI, Younes ELHADDAR, Zahra BENABBOU and Hassan ABBAR, ""An idea of a clustering algorithm using support vector machines based on binary decision tree"",IEEE Translations and content mining 2018.

Yaohuan Huang , Chuanpeng Zhao, Haijun Yang , Xiaoyang Song and Zhonghua Li, Jie Chen 2019, ""Feature Selection Solution with High Dimensionality and Low-Sample Size for Land CoverClassi?cationin Object-Based Image Analysis"", Remote Sensing.

Liwei Kuang, Laurence T. Yang, Jinjun Chen, Fei Hao, Changqing LuoView, ""A Holistic Approach for Distributed Dimensionality Reduction of Big Data"", IEEE Transactions on Cloud Computing, 2018.