Skewed Evolving Data Streams Classification with Actionable Knowledge Extraction using Data Approximation and Adaptive Classification Framework

Main Article Content

Rahul A Patil
Pramod D Patil

Abstract

Skewed evolving data stream (SEDS) classification is a challenging research problem for online streaming data applications. The fundamental challenges in streaming data classification are class imbalance and concept drift. However, recently, either independently or together, the two topics have received enough attention; the data redundancy while performing stream data mining and classification remains unexplored. Moreover, the existing solutions for the classification of SEDSs have focused on solving concept drift and/or class imbalance problems using the sliding window mechanism, which leads to higher computational complexity and data redundancy problems. To end this, we propose a novel Adaptive Data Stream Classification (ADSC) framework for solving the concept drift, class imbalance, and data redundancy problems with higher computational and classification efficiency. Data approximation, adaptive clustering, classification, and actionable knowledge extraction are the major phases of ADSC. For the purpose of approximating unique items in the data stream with data pre-processing during the data approximation phase, we develop the Flajolet Martin (FM) algorithm. The periodically approximated tuples are grouped into distinct classes using an adaptive clustering algorithm to address the problem of concept drift and class imbalance. In the classification phase, the supervised classifiers are employed to classify the unknown incoming data streams into either of the classes discovered by the adaptive clustering algorithm. We then extract the actionable knowledge using classified skewed evolved data stream information for the end user decision-making process. The ADSC framework is empirically assessed utilizing two streaming datasets regarding classification and computing efficiency factors. The experimental results shows the better efficiency of the proposed ADSC framework as compared with existing classification methods.

Article Details

How to Cite
Patil, R. A. ., & Patil, P. D. . (2023). Skewed Evolving Data Streams Classification with Actionable Knowledge Extraction using Data Approximation and Adaptive Classification Framework. International Journal on Recent and Innovation Trends in Computing and Communication, 11(1), 38–52. https://doi.org/10.17762/ijritcc.v11i1.5985
Section
Articles

References

Da Silva, T. P., Urban, G. A., Lopes, P. de A., & Camargo, H. de A. (2017). A Fuzzy Variant for On-Demand Data Stream Classification. 2017 Brazilian Conference on Intelligent Systems (BRACIS). doi:10.1109/bracis.2017.60.

Sasikala, S., & Devi, D. R. (2017). A review of traditional and swarm search-based feature selection algorithms for handling data stream classification. 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS). doi:10.1109/ssps.2017.8071650.

Roberto, J., Junior, B., & Nicoletti, M. do C. (2016). Functionally expanded streaming data as input to classification processes using ensembles of constructive neural networks. 2016 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2016.7727424.

Krawczyk, Bartosz &Stefanowski, Jerzy & Wozniak, Michal. (2014). Data stream classification and big data analytics. Neurocomputing. 150. 10.1016/j.neucom.2014.10.025.

Daniel, A., Subburathinam, K., Paul, A., Rajkumar, N., & Rho, S. (2017). Big autonomous vehicular data classifications: Towards procuring intelligence in ITS. Vehicular Communications, 9, 306–312. doi:10.1016/j.vehcom.2017.03.002.

Gama, J., Žliobait?, I., Bifet, A., Pechenizkiy, M., &Bouchachia, A. (2014). A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4), 1-37.

Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., &Petitjean, F. (2016). Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4), 964-994.

Ditzler, G., Roveri, M., Alippi, C., &Polikar, R. (2015). Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine, 10(4), 12-25.

Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., &Ghédira, K. (2018). Discussion and review on evolving data streams and concept drift adapting. Evolving systems, 9(1), 1-23.

Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., &Petitjean, F. (2016). Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4), 964-994.

He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data streams. In Learning from imbalanced data sets (pp. 279-303). Springer, Cham.

Liu, X. Y., Wu, J., & Zhou, Z. H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550.

Krishnamurthy, A., Agarwal, A., Huang, T. K., Daumé III, H., & Langford, J. (2017, July). Active learning for cost-sensitive classification. In International Conference on Machine Learning (pp. 1915-1924). PMLR.

Cao, P., Zhao, D., &Zaiane, O. (2013, April). An optimized cost-sensitive SVM for imbalanced data learning. In Pacific-Asia conference on knowledge discovery and data mining (pp. 280-292). Springer, Berlin, Heidelberg.

Zakerzadeh H, Aggarwal CC, Barker K (2016) Managing dimensionality in data privacy anonymization. Knowl Inf Syst 49(1):341–373.

Zhang Y., Szabo C., Sheng Q.Z. (2014) Cleaning Environmental Sensing Data Streams Based on Individual Sensor Reliability. In: Benatallah B., Bestavros A., Manolopoulos Y., Vakali A., Zhang Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8787. Springer, Cham. https://doi.org/10.1007/978-3-319-11746-1_29.

Shaoxu Song, Fei Gao, Aoqian Zhang, Jianmin Wang, and Philip S. Yu. 2021. Stream Data Cleaning under Speed and Acceleration Constraints. ACM Trans. Database Syst. 46, 3, Article 10 (September 2021), 44 pages. DOI:https://doi.org/10.1145/3465740.

Peter M. Fischer, KyumarsSheykhEsmaili, and Renée J. Miller. 2010. Stream schema: Providing and exploiting static metadata for data stream processing. In Proceedings of the 13th International Conference on Extending Database Technology. 207–218. DOI: https://doi.org/10.1145/1739041.1739068.

Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. 2018. Computing optimal repairs for functional dependencies. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 225–237. DOI: https://doi.org/10.1145/3196959.3196980.

Chen, S., & He, H. (2011). Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Systems, 2(1), 35-50.

Ditzler, G., &Polikar, R. (2012). Incremental learning of concept drift from streaming imbalanced data. IEEE transactions on knowledge and data engineering, 25(10), 2283-2301.

Mirza, B., Lin, Z., & Liu, N. (2015). Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing, 149, 316-329.

Ghazikhani, A., Monsefi, R., &Yazdi, H. S. (2013). Ensemble of online neural networks for nonstationary and imbalanced data streams. Neurocomputing, 122, 535-544.

Li, H., Wang, Y., Wang, H., & Zhou, B. (2017). Multiwindow based ensemble learning for classification of imbalanced streaming data. World Wide Web, 20(6), 1507-1525.

Lu, Y., Cheung, Y. M., & Tang, Y. Y. (2017, August). Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift. In IJCAI (pp. 2393-2399).

Zyblewski, P., Sabourin, R., &Wo?niak, M. (2021). Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Information Fusion, 66, 138-154.

Cano, A., & Krawczyk, B. (2020). Kappa updated ensemble for drifting data stream mining. Machine Learning, 109(1), 175-218.

Brzezinski, D.W., Stefanowski, J., Susmaga, R., &Szczech, I. (2020). On the Dynamics of Classification Measures for Imbalanced and Streaming Data. IEEE Transactions on Neural Networks and Learning Systems, 31, 2868-2878.

Cheng, R., Zhang, L., Wu, S., Xu, S., Gao, S., & Yu, H. (2021). Probability Density Machine: A New Solution of Class Imbalance Learning. Sci. Program., 2021, 7555587:1-7555587:14.

Bi, X., Zhang, C., Zhao, X., Li, D., Sun, Y., & Ma, Y. (2020). CODES: Efficient Incremental Semi-Supervised Classification Over Drifting and Evolving Social Streams. IEEE Access, 8, 14024-14035.

Liu, C., Yang, S., & Yu, K. (2020). Markov Boundary Learning With Streaming Data for Supervised Classification. IEEE Access, 8, 102222-102234.

Deng, S., Wang, B., Huang, S., Yue, C., Zhou, J., & Wang, G. (2020). Self-Adaptive Framework for Efficient Stream Data Classification on Storm. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 50, 123-136.

Abbasi, A., Javed, A.R., Chakraborty, C., Nebhen, J., Zehra, W., & Jalil, Z. (2021). ElStream: An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning. IEEE Access, 9, 66408-66419.

Gao, Y., Chandra, S., Li, Y., Khan, L., &Thuraisingham, B.M. (2022). SACCOS: A Semi-Supervised Framework for Emerging Class Detection and Concept Drift Adaption Over Data Streams. IEEE Transactions on Knowledge and Data Engineering, 34, 1416-1426.

Sun, Y., Sun, Y., & Dai, H. (2020). Two-Stage Cost-Sensitive Learning for Data Streams With Concept Drift and Class Imbalance. IEEE Access, 8, 191942-191955.

Lu, Y., Cheung, Y., & Yan Tang, Y. (2020). Adaptive Chunk-Based Dynamic Weighted Majority for Imbalanced Data Streams With Concept Drift. IEEE Transactions on Neural Networks and Learning Systems, 31, 2764-2778.

Sun, Y., Li, M., Li, L., Shao, H., & Sun, Y. (2021). Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance. Computational Intelligence and Neuroscience, 2021.

Coelho, D.N., Barreto, G.A. A Sparse Online Approach for Streaming Data Classification via Prototype-Based Kernel Models. Neural Process Lett 54, 1679–1706 (2022). https://doi.org/10.1007/s11063-021-10701-9.

C. Blake, UCI Repository of Machine Learning Databases, 1998, [online] Available: https://www.ics.uci.edu/~mlearn/MLRepository.html.

https://github.com/alipsgh/data-streams/tree/master/synthetic/led_500_n_0.1

https://moa.cms.waikato.ac.nz/datasets/2013/