Survey on Insurance Claim analysis using Natural Language Processing and Machine Learning

Main Article Content

Sapana Kolambe, Parminder Kaur


In the insurance industry nowadays, data is carrying the major asset and playing a key role. There is a wealth of information available to insurance transporters nowadays. We can identify three major eras in the insurance industry's more than 700-year history. The industry follows the manual era from the 15th century to 1960, the systems era from 1960 to 2000, and the current digital era, i.e., 2001-20X0. The core insurance sector has been decided by trusting data analytics and implementing new technologies to improve and maintain existing practices and maintain capital together. This has been the highest corporate object in all three periods.AI techniques have been progressively utilized for a variety of insurance activities in recent years. In this study, we give a comprehensive general assessment of the existing research that incorporates multiple artificial intelligence (AI) methods into all essential insurance jobs. Our work provides a more comprehensive review of this research, even if there have already been a number of them published on the topic of using artificial intelligence for certain insurance jobs. We study algorithms for learning, big data, block chain, data mining, and conversational theory, and their applications in insurance policy, claim prediction, risk estimation, and other fields in order to comprehensively integrate existing work in the insurance sector using AI approaches.

Article Details

How to Cite
Sapana Kolambe, et al. (2023). Survey on Insurance Claim analysis using Natural Language Processing and Machine Learning. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 30–38.
Author Biography

Sapana Kolambe, Parminder Kaur

Sapana Kolambe1, Dr. Parminder Kaur2

1Research Scholar

MGM University

Chhatrapati Sambhajinagar, India


2Associate Professor

MGM University

Chhatrapati Sambhajinagar, India



Kshirsagar R. and Hsu L, “Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans”, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, issue. 17, pp. 15127-15136, 2021.

Abdelhadi S., ElBahnasy K.A., and Abdelsalam M.M, “A Proposed Model to Predict Auto Insurance Claims using Machine Learning Techniques”, 2020.

Mohamed H, Ruixing M, “Using Machine Learning Models to Compare Various Resampling Methods in Predicting Insurance Fraud”, Journal of Theoretical and Applied Information Technology, Vol.99. No 12, pp. 2819-2833, 2021.

Goundar S., Prakash S., Sadal P., & Bhardwaj A, “Health Insurance Claim Prediction Using Artificial Neural Networks”, International Journal of System Dynamics Applications (IJSDA), vol. 9, no. 3, pp.40-57, 2020.

Selvakumar, Dipak Kumar Satpathi, P. T. V. Praveen Kumar, V. V. Haragopal, “PredictiveModelingof Insurance Claims using Machine Learning Approach for Different Ty pes of Motor Vehicles”, Universal Journal of Accounting and Finance, vol. 9, no. 1, pp.1 - 14, 2021, DOI: 10.13189/ujaf.2021.090101.

Pesantez-Narvaez J, Guillen M, Alcañiz M, “Predicting Motor Insurance Claims Using Telematics Data - XGBoost versus Logistic Regression”, Risks. vol. 7, no.2, 2019

Burri RD, Burri R, Bojja RR, Buruga SR, “Insurance claim analysis using machine learning algorithms”, International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 6Special Issue 4, pp. 577–582, 2019, doi: 10.35940/ijitee. F1118.0486S419.

Fauzan M. A., & Murfi H, “The accuracy of XGBoost for insurance claim prediction”, International Journal of Advances in Soft Computing and its Applications, vol. 10, no. 2, pp. 159-171, 2018.

Yunos Z. M., Ali A., Shamsyuddin S. M., Ismail N., & Sallehuddin R. S, “Predictive Modelling forMotor Insurance Claims Using Artificial Neural Networks”, International Journal of Advances in Soft Computingand its Applications, vol. 8, no. 3, pp.160-172, 2016.

Mnasser A., Bouani F., & Ksouri M, “Neural Networks Predictive Controller Using an Adaptive ControlRate”, International Journal of System Dynamics Applications, vol. 3 no. 3, pp.127–147, 2014, doi:10.4018/ijsda.2014070106.

Azar A., & Balas V, “Statistical Methods and Artificial Neural Networks Techniques in Electromyography”, International Journal of System Dynamics Applications, vol. 1, no.1, pp.39–47, 2012, doi:10.4018/ijsda.2012010103.

Boodhun N, Jayabalan M, “Risk prediction in life insurance industry using supervised learningalgorithms”, Complex & Intelligent Systems, vol. 4, no. 2, pp.145–154, 2018 doi: 10.1007/s40747-018-0072-1/2018.

K. P. Sinha, M. Sookhak and S. Wu, “Agentless Insurance Model Based on Modern Artificial Intelligence”, IEEE 22ndInternational Conference on Information Reuse and Integration for Data Science (IRI), Las Vegas, NV, USA, 2021, pp. 49-56, doi: 10.1109/IRI51335.2021.00013.

R. Singh, M. P. Ayyar, T. V. Sri Pavan, S. Gosain and R. R. Shah, "Automating Car Insurance Claims Using Deep Learning Techniques," 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), Singapore, 2019, pp. 199-207, doi: 10.1109/BigMM.2019.00-25.

Sebastian Baran, Przemyslaw Rola, “Prediction of motor insurance claims occurrence as an imbalanced machine learning problem” arXiv:2204.06109v1.

Hanafy, M; Ming, R., “Machine Learning Approaches for Auto Insurance Big Data” Risks 2021, 9(2) 42.

Fred Popowich, “Using Text Mining and Natural Language Processing for Health Care Claims Processing”, SIGKDD Explorations, Volume 7, Issue 1, pp. 59-66.

Michael McTear, Zoraida Callejas, David Griol, “The Conversational Interface: Talking to Smart Devices”, Springer, volume 6, 2016.

Cahn J. “Chatbot: Architecture, design, & development’, 2017.

Harkous H, Fawaz K, Shin K. G, and Aberer K, “Pribots: Conversational privacy with chatbots”, In Twelfth Symposium on Usable Privacy and Security, Denver, CO. USENIX Association, 2016.

S’orensen I, “Expectations on chatbots among novice users during the onboarding process”, 2017, 202710.

Yacoubi A. and Sabouret N, “Teatime: A formal model of action tendencies in conversational agents”,2018, In ICAART (2), pages 143–153.

Kowatsch T., Niben M., Shih C.-H. I., R’uegger D., Volland D., et al., “Text-based healthcare chatbots supporting patient and health professional teams: Preliminary results of a randomized controlled trial on childhood obesity”, In Persuasive Embodied Agents for Behavior Change (PEACH2017), 2017, ETH Zurich.

Kowshalya G., & Nandhini M, “Predicting fraudulent claims in automobile insurance”, In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1338-1343.

Millican M., Zhang L., & Kimball D. CS 229 “Final Report: Predicting Insurance Claims in Brazil”, 2017.

Enders C. K. (2010),“Applied missing data analysis”, Guilford press.

Eekhout Iris, et al. "Missing data in a multi-item instrument were best handled by multiple imputation at the item score level." Journal of clinical epidemiology 67.3 (2014): 335- 342.

Pijl T., van de Velden M., & Groenen P. “A Framework to Forecast Insurance”, 2017.

Yang C., Delcher C., Shenkman E and Sanjay Ranka, “Machine learning approaches for predicting high-cost high need patient expenditures in health care”, BioMed Eng Online 17, No.131, 2018.

Takeshima T, Keino S, Aoki R, Matsui T, and Iwasaki K., “Development of Medical Cost Prediction Model Based on Statistical Machine Learning Using Health Insurance Claims Data, Value in Health”, Vol. 21, No. 2, 2018. [Accessed: 07- Oct-2019].

Amit Kumar Mondal and Dipak Kumar Maji, "Improved Algorithms for Keyword Extraction and Headline Generation from Unstructured Text", p. 14.

M. Abulaish and T. Anwar "A Supervised Learning Approach for Automatic Keyphrase Extraction", International Journal of Innovative Computing, Information and Control, vol. 8, 2012.

S. Siddiqi and A. Sharan, "Keyword and Keyphrase Extraction Techniques: A Literature Review", International Journal of Computer Applications, vol. 109, no. 2, pp. 18-23, 2015.

Z. Zhu, M. Li, L. Chen, Z. Yang and S. Chen, "Combination of Unsupervised Keyphrase Extraction Algorithms", 2013 International Conference on Asian Language Processing, Urumqi, 2013, pp. 33-36.,2019.

Turney, Peter "Learning Algorithms for Keyphrase Extraction", Inf.Retr., vol. 2, pp. 303-336, 2000.

Marujo Luís & Ling, (2015). Automatic Keyword Extraction on Twitter. 2. 637-643. 10.3115/v1/P15-2105.

Witten, G. Paynter, E. Frank, C. Gutwin and C. Nevill-Manning, "KEA: Practical Automatic Keyphrase Extraction", Proceedings of the Fourth ACM conference on Digital Libraries, August 11-14, 1999, Berkeley, CA, USA, pp. 254-255, 1999.

O. Medelyan, & I. Witten (2006). Thesaurus based automatic Keyphrase indexing. 296 - 297. 10.1145/1141753.1141819.

Taemin Jo, Jee-Hyong Lee "Latent Keyphrase Extraction Using Deep Belief Networks", International Journal of Fuzzy Logic and Intelligent Systems, vol. 15, pp. 153-158, 2015.

J. P. Tensuan, A. Azcarraga "Neural Network Based Keyword Extraction using Word Frequency, Position, Usage and Format Features", Research Congress 2012 De La Salle University, 2013.

Wang, Yang & Gong, Yeyun & Huang, Xuanjing. (2016). Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter. 836-845. 10.18653/v1/D16-1080.

S. K. Bharti and K. S. Babu "Automatic Keyword Extraction for Text Summarization: A Survey", CoRR, vol. 170403242, 2017.

Shady Abdelhadi, Khaled Elbahnasy, Mohamed Abdelsalam, “A Proposed Model to Predict Auto Insurance Claims using Machine Learning Techniques”, Journal of Theoretical and Applied Information Technology, Vol.98. No 22, 2020, pp. 3428-3437.

Rama Devi Burri, Ram Burri, Ramesh Reddy Bojja, Srinivasa Rao Buruga, “Insurance Claim Analysis Using Machine Learning Algorithms”, International Journal of Innovative Technology and Exploring Engineering, Volume-8, Issue- 6S4, 2019, pp. 577-582.

Phd Mohammad & Hussain, Omar. (2020). IntelliBot: A Dialogue-based chatbot for the insurance industry. Knowledge-Based Systems. 196. 105810. 10.1016/j.knosys.2020.105810.

K. Joshi, K. Pande Joshi and S. Mittal, "A Semantic Approach for Automating Knowledge in Policies of Cyber Insurance Services," 2019 IEEE International Conference on Web Services (ICWS), 2019, pp. 33-40, doi: 10.1109/ICWS.2019.00018.

K. Sane, K. Joshi and S. Mittal, "Semantically Rich Framework to Automate Cyber Insurance Services" in IEEE Transactions on Services Computing, vol., no. 01, pp. 1-1, 5555.

Alohaly M., Takabi H. & Blanco E. “Automated extraction of attributes from natural language attribute-based access control (ABAC) Policies”. Cybersecurity 2, 2 (2019).

D. Çavuso?lu, O. Dayibasi and R. B. Sa?lam, "Key Extraction in Table Form Documents: Insurance Policy as an Example," 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018, pp. 195-200, doi: 10.1109/UBMK.2018.8566309.

T. Pu, Q. Zhang, J. Yao and Y. Zhang, "Medical Entity Extraction from Health Insurance Documents," 2020 IEEE International Conference on Knowledge Graph (ICKG), 2020, pp. 565-572, doi: 10.1109/ICBK50248.2020.00085.

X. Mao, S. Huang, R. Li and L. Shen, "Automatic Keywords Extraction Based on Co- Occurrence and Semantic Relationships Between Words," in IEEE Access, vol. 8, pp. 117528-117538, 2020, doi: 10.1109/ACCESS.2020.3004628.

Papagiannopoulou E, Tsoumakas G. A review of keyphrase extraction. WIREs Data Mining KnowlDiscov. 2020; 10: e1339.

Firoozeh N., Nazarenko A., Alizon F., & Daille B. (2020). Keyword extraction: Issues and methods. Natural Language Engineering, 26(3), 259-291. doi:10.1017/S1351324919000457

Ö. Ünlü and A. Çetin, "A Survey on Keyword and Key Phrase Extraction with Deep Learning," 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2019, pp. 1-6, doi: 10.1109/ISMSIT.2019.8932811.

W. Ding, P. Yu, H. Li, H. Li and X. Lu, "A new method for extracting table borders of insurance policies," 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), 2020, pp. 1933-1937, doi: 10.1109/ITNEC48623.2020.9084823.

R. Kedtiwerasak, E. Adsawinnawanawa, P. Jirakunkanok and R. Kongkachandra, "Thai Keyword Extraction using TextRank Algorithm," 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), 2019, pp. 1-6, doi: 10.1109/iSAI-NLP48611.2019.9045523.

T. Weerasooriya, N. Perera and S. R. Liyanage, "A method to extract essential keywords from a tweet using NLP tools," 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer), 2016, pp. 29-34, doi: 10.1109/ICTER.2016.7829895.

S. Lee, T. Park and M. Lee, "4W1H Keyword Extraction based Summarization Model," 2021 International Conference on Electronics, Information, and Communication (ICEIC), 2021, pp. 1-4, doi: 10.1109/ICEIC51217.2021.9369820.

M. Noura, A. Gyrard, S. Heil and M. Gaedke, "Automatic Knowledge Extraction to Build Semantic Web of Things Applications," in IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8447-8454, Oct. 2019, doi: 10.1109/JIOT.2019.2918327.

Jyothsna, K. Srinivas, B. Bhargavi,, "Health Insurance Premium Prediction using XGboost Regressor," 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 2022, pp. 1645- 1652, doi: 10.1109/ICAAIC53929.2022.9793258.

X. Mao, S. Huang R. Li and L. Shen, "Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words," in IEEE Access, vol. 8, pp. 117528-117538, 2020, doi: 10.1109/ACCESS.2020.3004628.

Dragoni Mauro, Villata Serena, Rizzi Williams , (2016). Combining NLP Approaches for Rule Extraction from Legal Documents.

Jingxia Ma, Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis, 2022, Article ID 4293102 |