ClaraStream: A Novel Algorithm for Real-Time Data Stream Clustering

Main Article Content

Neha Sharma, Shradhdha Masih

Abstract

One crucial step in extracting knowledge from datasets is to cluster or split the records of the data set into groups of related records. The detection of clusters in very large, multi-dimensional, static datasets has been the subject of extensive investigation. Unfortunately, this study has led to the development of classical clustering that is ineffective for clustering data streams. A data stream is a dynamic data set that is defined as an infinite sequence of data records that changes over time and arrives at very fast rates. There are many processes in the world today that produce rapidly changing data streams at high speeds. Credit card transactions, click streams, and sensor networks are a few examples.  The rapid proliferation of data in various fields necessitates the development of algorithms capable of processing and analyzing data streams in real-time. ClaraStream is a novel clustering algorithm designed to efficiently handle the unique challenges posed by data streams, including their high volume, velocity, and potentially boundless nature. Unlike traditional clustering methods that are suitable for static datasets, ClaraStream offers a two-phase approach—online micro-clustering and offline macro-clustering—that enables real-time processing and trend analysis. This paper provides a comprehensive overview of the ClaraStream algorithm, its architecture, and its application to air quality data streams.

Article Details

How to Cite
Neha Sharma. (2023). ClaraStream: A Novel Algorithm for Real-Time Data Stream Clustering. International Journal on Recent and Innovation Trends in Computing and Communication, 11(11), 1577–1584. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/11041
Section
Articles