Harnessing Data-Driven Insights: Predictive Modeling for Diamond Price Forecasting using Regression and Classification Techniques

Main Article Content

Md Shaik Amzad Basha
Peerzadah Mohammad Oveis

Abstract

In the multi-faceted world of gemology, understanding diamond valuations plays a pivotal role for traders, customers, and researchers alike. This study delves deep into predicting diamond prices in terms of exact monetary values and broader price categories. The purpose was to harness advanced machine learning techniques to achieve precise estimations and categorisations, thereby assisting stakeholders in informed decision-making. The research methodology adopted comprised a rigorous data preprocessing phase, ensuring the data's readiness for model training. A range of sophisticated machine learning models were employed, from traditional linear regression to more advanced   ensemble methods like Random Forest and Gradient Boosting. The dataset was also transformed to facilitate classification into predefined price tiers, exploring the viability of models like Logistic Regression and Support Vector Machines in this context. The conceptual model encompasses a systematic flow, beginning with data acquisition, transitioning through preprocessing, regression, and classification analyses, and culminating in a comparative study of the performance metrics. This structured approach underscores the originality and value of our research, offering a holistic view of diamond price prediction from both regression and classification lenses. Findings from the analysis highlighted the superior performance of the Random Forest regressor in predicting exact prices with an R2 value of approximately 0.975. In contrast, for classification into price tiers, both Logistic Regression and Support Vector Machines emerged as frontrunners with an accuracy exceeding 95%. These results provide invaluable insights for stakeholders in the diamond industry, emphasising the potential of machine learning in refining valuation processes.

Article Details

How to Cite
Basha, M. S. A. ., & Oveis, P. M. . (2023). Harnessing Data-Driven Insights: Predictive Modeling for Diamond Price Forecasting using Regression and Classification Techniques. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 290–301. https://doi.org/10.17762/ijritcc.v11i9.8355
Section
Articles

References

Tadepalli, Satya Kiranmai, and P. V. Lakshmi. "A Comparative Study on Prediction of Endometriosis Causing Infertility Using Machine Learning Techniques: In Detail". International Journal on Recent and Innovation Trends in Computing and Communication 11 (4):131-40. 2023

J. Ghorpade and B. Sonkamble, "Data-driven based Optimal Feature Selection Algorithm using Ensemble Techniques for Classification", International Journal on Recent and Innovation Trends in Computing and Communication, vol. 11, no. 4, pp. 33–41, May 2023.

M. E. Pawar, R. A. Mulla, S. H. Kulkarni, S. Shikalgar, H. B. . Jethva, and G. A. Patel, "A Novel Hybrid AI Federated ML/DL Models for Classification of Soil Components", International Journal on Recent and Innovation Trends in Computing and Communication, vol. 10, no. 1s, pp. 190–199, Dec. 2022.

M. Bhargav and H. Arora, "Comparative Analysis and Design of Different Approaches for Twitter Sentiment Analysis and classification using SVM", International Journal on Recent and Innovation Trends in Computing and Communication, vol. 10, no. 9, pp. 60–66, Sep. 2022.

S. Fauzia and R. Anjum, "Predicting the Discharge of Patients Via Machine Learning Based Discharge Predictive Model", International Journal on Recent and Innovation Trends in Computing and Communication, vol. 10, no. 7, pp. 58–69, Jul. 2022.

H. Mihir, M. I. Patel, S. Jani and R. Gajjar, "Diamond Price Prediction using Machine Learning," 2021 2nd International Conference on Communication, Computing and Industry 4.0 (C2I4), Bangalore, India, 2021, pp. 1-5.

Fitriani, Shafilah Ahmad, Yuli Astuti, and Irma Rofni Wulandari. "Least Absolute Shrinkage and Selection Operator (LASSO) and k-Nearest Neighbors (k-NN) Algorithm Analysis Based on Feature Selection for Diamond Price Prediction." In 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), pp. 135-139. IEEE, 2022.

Basha, Md Shaik Amzad, Peerzadah Mohammad Oveis, C. Prabavathi, Macherla Bhagya Lakshmi, and M. Martha Sucharitha. "An Efficient Machine Learning Approach: Analysis of Supervised Machine Learning Methods to Forecast the Diamond Price." In 2023 International Conference for Advancement in Technology (ICONAT), pp. 1-6. IEEE, 2023.

Sharma, Garima, Vikas Tripathi, Manish Mahajan, and Awadhesh Kumar Srivastava. "Comparative analysis of supervised models for diamond price prediction." In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 1019-1022. IEEE, 2021.

Alsuraihi, Waad, Ekram Al-Hazmi, Kholoud Bawazeer, and Hanan Alghamdi. "Machine learning algorithms for diamond price prediction." In Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing, pp. 150-154. 2020.

Basysyar, Fadhil Muhammad, and Gifthera Dwilestari. "Comparison Of Machine Learning Algorithms for Predicting Diamond Prices Based on Exploratory Data AnalysiS."

Shivam Aggarwal, Diamond Dataset, https://www.kaggle.com/datasets/shivam2503/diamonds

la Tour, Tom Dupré, Michael Eickenberg, Anwar O. Nunez-Elizalde, and Jack L. Gallant. "Feature-space selection with banded ridge regression." NeuroImage 264 (2022): 119728.

Cardall, Anna Catherine, Riley Chad Hales, Kaylee Brooke Tanner, Gustavious Paul Williams, and Kel N. Markert. "LASSO (L1) Regularization for Development of Sparse Remote-Sensing Models with Applications in Optically Complex Waters Using GEE Tools." Remote Sensing 15, no. 6 (2023): 1670.

El Mrabet, Zakaria, Niroop Sugunaraj, Prakash Ranganathan, and Shrirang Abhyankar. "Random forest regressor-based approach for detecting fault location and duration in power systems." Sensors 22, no. 2 (2022): 458.

Degadwala, D. S. ., & Vyas, D. . (2021). Data Mining Approach for Amino Acid Sequence Classification . International Journal of New Practices in Management and Engineering, 10(04), 01–08. https://doi.org/10.17762/ijnpme.v10i04.124

Sipper, Moshe, and Jason H. Moore. "AddGBoost: A gradient boosting-style algorithm based on strong learners." Machine Learning with Applications 7 (2022): 100243.

Abdurohman, Maman, Aji Gautama Putrada, and Mustafa Mat Deris. "A robust internet of things-based aquarium control system using decision tree regression algorithm." IEEE Access 10 (2022): 56937-56951.

El Mrabet, Zakaria, Niroop Sugunaraj, Prakash Ranganathan, and Shrirang Abhyankar. "Random forest regressor-based approach for detecting fault location and duration in power systems." Sensors 22, no. 2 (2022): 458.

Das, Sunanda, Md Samir Imtiaz, Nieb Hasan Neom, Nazmul Siddique, and Hui Wang. "A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier." Expert Systems with Applications 213 (2023): 118914.

Taha, Altyeb Altaher, and Sharaf Jameel Malebary. "An intelligent approach to credit card fraud detection using an optimised light gradient boosting machine." IEEE Access 8 (2020): 25579-25587.

Tharwat, Alaa. "Parameter investigation of support vector machine classifier with kernel functions." Knowledge and Information Systems 61 (2019): 1269-1302.

Shah, Kanish, Henil Patel, Devanshi Sanghvi, and Manan Shah. "A comparative analysis of logistic regression, random forest and KNN models for the text classification." Augmented Human Research 5 (2020): 1-16.