A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data

Main Article Content

Upasana Gupta, Vaishali Singh

Abstract

This paper presents a comprehensive method for dimension reduction and detecting anomalies in high-dimensional data (on healthcare datasets) using R. Realizing that traditional linear methods such as Principal Component Analysis (PCA) often ignore the complexity of the non-linear manifold of the data, our approach exploits iterative learning, on the belief that high-dimensional data is largely based on a low-dimensional manifold. The methodology starts by preparing the data using R libraries like Keras, dplyr, and ggplot2, addressing challenges like missing values ??and visualizing meaningful information. Using the Mahalanobis distance, the paper identifies and removes country-specific outliers. The pipelined model integrates Principal Component Analysis (PCA) for data transformation and combines an Autoencoder with t-SNE for dimensionality reduction. This refined dataset is then used to train a Multi-Layer Perceptron (MLP) artificial neural network, which facilitates anomaly detection based on reconstruction errors, illustrated by the point cloud. Additionally, the paper explores metric multidimensional scaling using artificial neural networks, tests large datasets such as healthcare and wine, and compares the results of the work using conventional techniques. This study highlights the effectiveness of integrating various pre-processing, visualization, and artificial neural network strategies through R for effective anomaly detection.

Article Details

How to Cite
Upasana Gupta, et al. (2023). A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 1978–1986. https://doi.org/10.17762/ijritcc.v11i10.8810
Section
Articles
Author Biography

Upasana Gupta, Vaishali Singh

Upasana Gupta1, Vaishali Singh2

1Research Scholar,

Department of Computer Science & Engineering,

Maharishi University of Information Technology, Lucknow (U.P)

upasana_gupta31@yahoo.com

2Assistant Professor,

Department of Computer Science & Engineering,

Maharishi University of Information Technology, Lucknow (U.P)

singh.vaishali05@gmail.com