Churn Identification and Prediction from a Large-Scale Telecommunication Dataset Using NLP

Main Article Content

Vijay R. Sonawane
Abhinav S. Thorat
Jaya R. Suryawanshi
Ravindra G. Dabhade
Megharani Patil
Bhausaheb Musmade


The identification of customer churn is a major issue for large telecom businesses. In order to manage the data of current customers as well as acquire and manage new customers, every day, a substantial volume of data gets generated. Therefore, it's crucial to identify the causes of client churn so that the appropriate steps can be taken to lower it. Numerous researchers have already discussed their efforts to combine static and dynamic approaches in order to reduce churn in big data sets, but these systems still have many issues when it comes to actually identifying churn. In this paper, we suggested two methods, the first of which is churn identification and using Natural Language Processing (NLP) methods and machine learning techniques, we make predictions based on a vast telecommunication data set. The NLP process involves data pre-processing, normalization, feature extraction, and feature selection. For feature extraction, we employ unique techniques like TF-IDF, Stanford NLP, and occurrence correlation methods, have been suggested. Throughout the lesson, a machine learning classification algorithm is used for training and testing. Finally, the system employs a variety of cross validation techniques and training and evaluating Machine learning algorithms. The experimental analysis shows the system's efficacy and accuracy.

Article Details

How to Cite
Sonawane, V. R., Thorat, A. S., Suryawanshi, J. R., Dabhade, R. G., Patil, M. ., & Musmade, B. . (2023). Churn Identification and Prediction from a Large-Scale Telecommunication Dataset Using NLP. International Journal on Recent and Innovation Trends in Computing and Communication, 11(7), 39–46.


J.Z. Zhang, and C.W. Chang, "Consumer dynamics: Theories, methods, and emerging directions", Journal of the Academy of Marketing Science, Vol. 49, Page 166-196, 2021

S.A. Banday, and S. Khan, "Evaluation Study of Churn Prediction Models for Business Intelligence", In Big Data Analytics, Auerbach Publications, Page 201-213, 2021

A. Amin, F. Al-Obeidat, B. Shah, A. Adnan, J. Loo, S. Anwar, "Customer churn prediction in telecommunication industry using data certainty", Journal of Business Research, Vol. 94, Page 290-301, 2019.

A.K. Ahmad, A. Jafar, K. Aljoumaa, "Customer churn prediction in telecom using machine learning in big data platform", Journal of Big Data, Vol. 6, Issue 1, Page 1-24, 2019

D. Bell, and C. Mgbemena, "Data-driven agent-based exploration of customer behavior", Simulation, Vol. 94, Issue 3, Page 195-212, 2018 [6] Yang, L. J. (2011) US Patent 8,033,499B2.

L.C. Cheng, C.C. Wu, and C.Y. Chen, "Behavior analysis of customer churn for a customer relationship system: an empirical case study", Journal of Global Information Management (JGIM), Vol. 27, Issue 1, Page 111-127, 2019

P. Lalwani, M.K. Mishra, J.S. Chadha, and P. Sethi, "Customer churn prediction system: a machine learning approach", Computing, Page 1-24, 2022

V. Kavitha, G.H. Kumar, S.M. Kumar, and M. Harish, "Churn prediction of customer in telecom industry using machine learning algorithms", International Journal of Engineering Research & Technology (IJERT), Vol. 9, Issue 5, Page 181-184, 2020

S.K. Routray, "Marketing strategy through machine learning techniques: A case study at telecom industry", International Journal of Innovation Engineering and Science Research, Vol. 5, Issue 3, Page 21-30, 2021

S. Kim, and H. Lee, "Customer churn prediction in influencer commerce: an application of decision trees", Procedia Computer Science, Vol. 199, Page 1332-1339, 2022

X. Hu, Y. Yang, L. Chen, and S. Zhu, "Research on a customer churn combination prediction model based on decision tree and neural network. In 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), IEEE, Page 129-132, 2020

Krishnan, V. G. ., Saradhi, M. V. V. ., Dhanalakshmi, G. ., Somu, C. S. ., & Theresa, W. G. . (2023). Design of M3FCM based Convolutional Neural Network for Prediction of Wheat Disease. International Journal of Intelligent Systems and Applications in Engineering, 11(2s), 203 –. Retrieved from

H. Jain, A. Khunteta, and S. Srivastava, "Churn prediction in telecommunication using logistic regression and logit boost", Procedia Computer Science, Vol. 167, Page 101-112, 2020

J. Pamina, B. Raja, S. SathyaBama, M.S. Sruthi, and A. VJ, "An effective classifier for predicting churn in telecommunication", Journal of Advanced Research in Dynamical & Control Systems, Vol. 11, 2019

J. Man?ák, and J. Han?lová, "Use of Logistic Regression for Understanding and Prediction of Customer Churn in Telecommunications", Statistika: Statistics & Economy Journal, Vol. 99, Issue 2, 2019

A.A. Ahmed, and D. Maheswari, "Churn prediction on huge telecom data using hybrid firefly based classification", Egyptian Informatics Journal, Vol. 18, Issue 3, Page 215-220, 2017

M. Ahmed, H. Afzal, I. Siddiqi, M.F. Amjad, and K. Khurshid, "Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry", Neural Computing and Applications, Vol. 32, Page 3237-3251, 2020

S. Höppner, E. Stripling, B. Baesens, S. vanden Broucke, and T. Verdonck, "Profit driven decision trees for churn prediction", European journal of operational research, Vol. 284, Issue 3, Page 920-933, 2020

H. Faris, "A hybrid swarm intelligent neural network model for customer churn prediction and identifying the influencing factors", Information, Vol. 9, Issue 11, 2018

Prof. Barry Wiling. (2018). Identification of Mouth Cancer laceration Using Machine Learning Approach. International Journal of New Practices in Management and Engineering, 7(03), 01 - 07.

N.N.A Sjarif, M. Rusydi, M. Yusof, D. Hooi, T. Wong, S. Yaakob, et al., "A customer Churn prediction using Pearson correlation function and K nearest neighbor algorithm for telecommunication industry", International Journal of Advances in Soft Computing and Its Applications, Vol. 11, Issue 2, Page 46-59 2019

N. Almufadi, A.M. Qamar, R.U. Khan, and M.T.B. Othman, "Deep learning-based churn prediction of telecom subscribers", International Journal of Engineering Research and Technology, Vol. 12, Issue 12, Page 2743-2748, 2019

Martínez, L., Mili?, M., Popova, E., Smit, S., & Goldberg, R. Machine Learning Approaches for Human Activity Recognition. Kuwait Journal of Machine Learning, 1(4). Retrieved from

K.G. Li, and B.P. Marikannan, "Hybrid particle swarm optimization-extreme learning machine algorithm for customer churn prediction", Journal of Computational and Theoretical Nanoscience, 16(8), 3432-3436, 2019

Leila Abadi, Amira Khalid, Predictive Maintenance in Renewable Energy Systems using Machine Learning , Machine Learning Applications Conference Proceedings, Vol 3 2023.

Ullah, B. Raza, A.K. Malik, M. Imran, S.U. Islam, and S.W Kim, "A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector", IEEE access, Vol. 7, Page 60134-60149 ,2019

A.S. Choudhari, and M. Potey, "Predictive to prescriptive analysis for customer churn in telecom industry using hybrid data mining techniques", In 2018 Fourth international conference on computing communication control and automation (ICCUBEA), IEEE, Page 1-6, 2018

R. Liu, S. Ali, S.F. Bilal, Z. Sakhawat, A. Imran, A. Almuhaimeed, et al., "An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms", Applied Sciences, Vol. 12, Issue 18, 2022.