Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection

Main Article Content

D. Padma Prabha
C. Victoria Priscilla

Abstract

Due to the imbalanced data of outnumbered legitimate transactions than the fraudulent transaction, the detection of fraud is a challenging task to find an effective solution. In this study, autoencoder with probabilistic threshold shifting of XGBoost (AE-XGB) for credit card fraud detection is designed. Initially, AE-XGB employs autoencoder the prevalent dimensionality reduction technique to extract data features from latent space representation. Then the reconstructed lower dimensional features utilize eXtreame Gradient Boost (XGBoost), an ensemble boosting algorithm with probabilistic threshold to classify the data as fraudulent or legitimate. In addition to AE-XGB, other existing ensemble algorithms such as Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), Random Forest, Categorical Boosting (CatBoost), LightGBM and XGBoost are compared with optimal and default threshold. To validate the methodology, we used IEEE-CIS fraud detection dataset for our experiment. Class imbalance and high dimensionality characteristics of dataset reduce the performance of model hence the data is preprocessed and trained. To evaluate the performance of the model, evaluation indicators such as precision, recall, f1-score, g-mean and Mathews Correlation Coefficient (MCC) are accomplished. The findings revealed that the performance of the proposed AE-XGB model is effective in handling imbalanced data and able to detect fraudulent transactions with 90.4% of recall and 90.5% of f1-score from incoming new transactions.

Article Details

How to Cite
Prabha, D. P. ., & Priscilla, C. V. . (2023). Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection. International Journal on Recent and Innovation Trends in Computing and Communication, 11(8s), 528–537. https://doi.org/10.17762/ijritcc.v11i8s.7234
Section
Articles

References

Nilson Report, “The Nilson Report Newsletter Archive,” The Nilson Report, 2019. https://nilsonreport.com/publication_newsletter_archive_issue.php?issue=1146.

C. V. Priscilla and D. P. Prabha, “Credit Card Fraud Detection: A Systematic Review,” in Springer, Cham, 2020, pp. 290–303. https://doi.org/10.1007/978-3-030-38501-9_29

A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 8, pp. 3784–3797, 2018. https://doi.org/10.1109/TNNLS.2017.2736643

H. Wang, P. Zhu, X. Zou, and S. Qin, “An Ensemble Learning Framework for Credit Card Fraud Detection Based on Training Set Partitioning and Clustering,” in 2018 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovations, SmartWorld/UIC/ATC/ScalCom/CBDCom/IoP/SCI 2018, 2018, pp. 94–98.

Abdallah, M. A. Maarof, and A. Zainal, “Fraud detection system: A survey,” J. Netw. Comput. Appl., vol. 68, pp. 90–113, 2016. https://doi.org/10.1016/j.jnca.2016.04.007

S. Sorournejad et al., “A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective,” CoRR, vol.abs/1611.0,November,2016. https://doi.org/10.48550/arXiv.1611.06439

A. A. Taha and S. J. Malebary, “An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine,” IEEE Access, vol. 8, no. February,pp.25579–25587,2020. https://doi.org/10.1109/ACCESS.2020.2971354

D. Elavarasan, P. M. Durai Raj Vincent, K. Srinivasan, and C. Y. Chang, “A hybrid CFS filter and RF-RFE wrapper-based feature extraction for enhanced agricultural crop yield prediction modeling,” Agric., vol. 10, no. 9, pp. 1–27, 2020. https://doi.org/10.3390/agriculture10090400

N. Barraza, S. Moro, M. Ferreyra, and A. de la Peña, “Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study,” J. Inf. Sci., vol. 45, no.1,pp.53–67,2019. https://doi.org/10.1177/0165551518770967

H. Jeon and S. Oh, “Hybrid-Recursive Feature Elimination for Efficient Feature Selection,” Appl. Sci., vol. 10, no. 9, p. 3211, 2020. https://doi.org/10.3390/app10093211

S. Wang, C. Liu, X. Gao, H. Qu, and W. Xu, “Session-Based Fraud Detection in Online E-Commerce Transactions Using Recurrent Neural Networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, pp. 241–252. https://doi.org/10.1007/978-3-319-71273-4_20

Kaur, P., Gosain, A. Issues and challenges of class imbalance problem in classification. Int. j. inf. tecnol. 14, 539–545 (2022). https://doi.org/10.1007/s41870-018-0251-8

Y. Zhang, G. Liu, L. Zheng, and C. Yan, “A hierarchical clustering strategy of processing class imbalance and its application in fraud detection,” Proc. - 21st IEEE Int. Conf. High Perform. Comput. Commun. 17th IEEE Int. Conf. Smart City 5th IEEE Int. Conf. Data Sci. Syst. HPCC/SmartCity/DSS 2019, pp. 1810–1816, 2019.

F. F. Noghani and M.-H. Moattar, “Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection,” J. AI Data Min., vol. 5, no. 2, pp. 235–243, 2017.

J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” Proc. IEEE Int. Conf. Comput. Netw. Informatics, ICCNI 2017, vol. 2017-Janua, pp. 1–9, 2017.

C. Wang and D. Han, “Credit card fraud forecasting model based on clustering analysis and integrated support vector machine,” Cluster Comput., vol. 0123456789, pp. 1–6, 2018. https://doi.org/10.1007/s10586-018-2118-y

J. Jurgovsky et al., “Sequence classification for credit-card fraud detection,” Expert Syst. Appl., vol. 100, pp. 234–245, 2018. https://doi.org/10.1016/j.eswa.2018.01.037

S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, Random forest for credit card fraud detection,” ICNSC 2018 - 15th IEEE Int. Conf. Networking, Sens. Control, pp. 1–6, 2018.

G. Rushin, C. Stancil, M. Sun, S. Adams, and P. Beling, “Horse race analysis in credit card fraud—deep learning, logistic regression, and Gradient Boosted Tree,” in 2017 Systems and Information Engineering Design Symposium (SIEDS), 2017, pp. 117–121.

A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, “Deep learning detecting fraud in credit card transactions,” in 2018 Systems and Information Engineering Design Symposium (SIEDS), 2018, pp. 129–134.

J. Akosa, “Predictive accuracy: A misleading performance measure for highly imbalanced data,” in Proceedings of the SAS Global Forum, 2017.

Ahmad, H., Kasasbeh, B., Aldabaybah, B. et al. Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (SBS). Int. j. inf. tecnol. (2022). https://doi.org/10.1007/s41870-022-00987-w

Itoo, F., Meenakshi & Singh, S. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int. j. inf. tecnol. 13,1503–1511 (2021). https://doi.org/10.1007/s41870-020-00430-y

C. Zhang and X. Zhang, “An effective sampling strategy for ensemble learning with imbalanced data,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10363 LNAI, pp. 377–388, 2017.

G. Rekha, A. K. Tyagi, and V. Krishna Reddy, “A novel approach to solve class imbalance problem using noise filter method,” Adv. Intell. Syst. Comput., vol. 940, pp. 486–496, 2020. https://doi.org/10.1007/978-3-030-16657-1_45

G. Collell, D. Prelec, and K. R. Patil, “A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data,” Neurocomputing, vol.275,pp.330–340,2018. https://doi.org/10.1016/j.neucom.2017.08.035

Z. C. Lipton, C. Elkan, and B. Naryanaswamy, “Optimal thresholding of classifiers to maximize F1 measure,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8725 LNAI, no. PART 2. pp. 225–239, 2014.

N. Thai-Nghe, “Learning optimal threshold on resampling data to deal with class imbalance,” Proc. IEEE RIVF …, 2010.

C. Esposito, G. A. Landrum, N. Schneider, N. Stiefl, and S. Riniker, “GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning,” Journal of Chemical Information and Modeling, vol. 61, no. 6. pp. 2623–2640, 2021. https://doi.org/10.1021/acs.jcim.1c00160

C. Tang, N. Luktarhan, and Y. Zhao, “SS symmetry An Efficient Intrusion Detection Method Based on,” pp. 1–16, 2020.

Q. Y. Yin, J. S. Zhang, C. X. Zhang, and N. N. Ji, “A novel selective ensemble algorithm for imbalanced data classification based on exploratory undersampling,” Math. Probl.Eng.,vol.2014,no.ii,2014. https://doi.org/10.1155/2014/358942

B. S. Raghuwanshi and S. Shukla, “Class imbalance learning using UnderBagging based kernelized extreme learning machine,” Neurocomputing, vol. 329, pp. 172–187, 2019. https://doi.org/10.1016/j.neucom.2018.10.056

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Ann. Stat., pp. 1189–1232, 2001. https://doi.org/10.1214/aos/1013203451

G. Ke et al., “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in neural information processing systems, 2017, pp. 3146–3154.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in neural information processing systems, 2018, pp. 6638–6648.

D. Chushig-muzo, C. Soguero-ruiz, P. De Miguel-bohoyo, and I. Mora-jim, “Interpreting clinical latent representations using autoencoders and probabilistic models ,” vol. 122, 2021. https://doi.org/10.1016/j.artmed.2021.102211

A. Oluwasanmi, M. U. Aftab, E. Baagyere, Z. Qin, M. Ahmad, and M. Mazzara, “Attention Autoencoder for Generative Latent Representational Learning in Anomaly Detection,”pp. 1–14, 2022. https://doi.org/10.3390/s22010123

X. Li, W. Chen, Q. Zhang, and L. Wu, Computers & Security Building Auto-Encoder Intrusion Detection System based on random forest feature selection,” Comput. Secur., vol.95,p.101851,2020. https://doi.org/10.1016/j.cose.2020.101851

A. Pumsirirat and L. Yan, “Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no.1,pp.18–25, 2018.

http://dx.doi.org/10.14569/IJACSA.2018.090103

W. W. Y. Ng, G. Zeng, J. Zhang, D. S. Yeung, and W. Pedrycz, “Dual autoencoders features for imbalance classification problem,” Pattern Recognit., vol. 60, pp. 875–889, 2016. https://doi.org/10.1016/j.patcog.2016.06.013

T. Lin and J. Jiang, “Credit Card Fraud Detection with Autoencoder and Probabilistic Random Forest,” pp. 4–15, 2021. https://doi.org/10.3390/math9212683

A. Alazizi and A. Habrard, “Dual Sequential Variational Autoencoders for Fraud Detection,” vol. 2, pp. 14–26, 2020. https://doi.org/10.1007/978-3-030-44584-3_2

A. F. M. Agarap, “Deep Learning using Rectified Linear Units(ReLU),”no.1,pp.2–8.

https://doi.org/10.48550/arXiv.1803.08375

Weijie Wang and Yanmin Lu., “Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model,” 2018. https://doi.org/10.1088/1757-899X/324/1/012049

Dharmesh D, Natural Language Processing for Automated Document Summarization , Machine Learning Applications Conference Proceedings, Vol 3 2023.

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785

Kaggle, “IEEE-CIS Fraud Detection.” Online]. Available: https://www.kaggle.com/c/ieee-fraud-detection/data.

C. V. Priscilla and D. P. Prabha, “A two-phase feature selection technique using mutual information and XGB- RFE for credit card fraud detection,” Int. J. Adv. Technol. Eng. Explor.,vol.8,no.85,2021. https://doi.org/10.19101/IJATEE.2021.874615

E. F. Malik, K. W. Khaw, B. Belaton, and W. P. Wong, “Credit Card Fraud Detection Using a New Hybrid Machine Learning architecture,” 2022. https://doi.org/10.3390/math10091480

S. Vimal, Application of Deep Reinforcement Learning to Payment Fraud, vol. 1, no. 1. Association for Computing Machinery. https://doi.org/10.48550/arXiv.2112.04236

T. R. B and S. Yu, “A Study on Comparative Evaluation of Credit Card Fraud Detection Using Tree-Based,” vol. 1, pp. 212–219. https://doi.org/10.1007/978-3-030-70639-5_20