Imputation Techniques in Machine Learning – A Survey

Main Article Content

Angeline Christobel, R. Jaya Suji, J. Jeya A Celin

Abstract

Machine learning plays a pivotal role in data analysis and information extraction. However, one common challenge encountered in this process is dealing with missing values. Missing data can find its way into datasets for a variety of reasons. It can result from errors during data collection and management, intentional omissions, or even human errors. It's important to note that most machine learning models are not designed to handle missing values directly. Consequently, it becomes essential to perform data imputation before feeding the data into a machine learning model. Multiple techniques are available for imputing missing values, and the choice of technique should be made judiciously, considering various parameters. An inappropriate choice can disrupt the overall distribution of data values and subsequently impact the model's performance. In this paper, various imputation methods, including Mean, Median, K-nearest neighbors (KNN)-based imputation, Linear Regression, Miss Forest, and MICE are examined.

Article Details

How to Cite
Angeline Christobel, et al. (2023). Imputation Techniques in Machine Learning – A Survey. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 1217–1221. https://doi.org/10.17762/ijritcc.v11i10.8662
Section
Articles
Author Biography

Angeline Christobel, R. Jaya Suji, J. Jeya A Celin

Angeline Christobel1, R. Jaya Suji2, J. Jeya A Celin3

1Dean, School of Computational Studies

Hindustan College of Arts & Science

Chennai-603103

angelinechristobel5@gmail.com

2Assistant Professor, Department of Computer Science

Hindustan College of Arts & Science

Chennai-603103

 jayasuji1981@gmail.com

3Professor, Department of Information Technology

Kalasalingam Academy of Research and Education

Krishnankoil-626126

 jjeyacelin@gmail.com