A Unique Pipeline Model to Improve Anomaly Detection in High  Dimensional Data

et al. Upasana Gupta

doi:10.17762/ijritcc.v11i10.8810

PDF

Published: Nov 7, 2023

DOI: https://doi.org/10.17762/ijritcc.v11i10.8810

Keywords:

High-Dimensional Data, Data Pre-processing and Visualization, Dimensionality Reduction, Reconstruction Error, Anomaly Detection, Healthcare, Multi-Layer Perceptron, Autoencoder, R Programming Language

Upasana Gupta, Vaishali Singh

Abstract

This paper presents a comprehensive method for dimension reduction and detecting anomalies in high-dimensional data (on healthcare datasets) using R. Realizing that traditional linear methods such as Principal Component Analysis (PCA) often ignore the complexity of the non-linear manifold of the data, our approach exploits iterative learning, on the belief that high-dimensional data is largely based on a low-dimensional manifold. The methodology starts by preparing the data using R libraries like Keras, dplyr, and ggplot2, addressing challenges like missing values ??and visualizing meaningful information. Using the Mahalanobis distance, the paper identifies and removes country-specific outliers. The pipelined model integrates Principal Component Analysis (PCA) for data transformation and combines an Autoencoder with t-SNE for dimensionality reduction. This refined dataset is then used to train a Multi-Layer Perceptron (MLP) artificial neural network, which facilitates anomaly detection based on reconstruction errors, illustrated by the point cloud. Additionally, the paper explores metric multidimensional scaling using artificial neural networks, tests large datasets such as healthcare and wine, and compares the results of the work using conventional techniques. This study highlights the effectiveness of integrating various pre-processing, visualization, and artificial neural network strategies through R for effective anomaly detection.

How to Cite

Upasana Gupta, et al. (2023). A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 1978–1986. https://doi.org/10.17762/ijritcc.v11i10.8810

Issue

Vol. 11 No. 10 (2023)

Section

Articles

Author Biography

Upasana Gupta, Vaishali Singh

Upasana Gupta¹, Vaishali Singh²

¹Research Scholar,

Department of Computer Science & Engineering,

Maharishi University of Information Technology, Lucknow (U.P)

upasana_gupta31@yahoo.com

²Assistant Professor,

Department of Computer Science & Engineering,

Maharishi University of Information Technology, Lucknow (U.P)

singh.vaishali05@gmail.com

Citation Indices	All	Since 2018
Citation	5854	3996
h-index	28	23
i10-index	119	72

Year	Rate
2019	12.6%
2018	18.3%
2017	16.9%
2016	18.8%
2015	22.9%
2014	28.9%
2013	26.1%

A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data

Abstract

Upasana Gupta, Vaishali Singh

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links:

Article Sidebar

Main Article Content

Abstract

Article Details

Upasana Gupta, Vaishali Singh

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links: