Main Article Content
Cardiovascular disease (CVD) is a serious but preventable complication of type 2 diabetes mellitus (T2DM) that results in substantial disease burden, increased health services use, and higher risk of premature mortality . People with diabetes are also at a greatly increased risk of cardiovascular which results in sudden death, which increases year by year. Data mining is the search for relationships and global patterns that exist in large databases but are `hidden' among the vast amount of data, such as a relationship between patient data and their medical diagnosis. Usually medical databases of type 2 diabetic patients are high dimensional in nature. If a training dataset contains irrelevant and redundant features (i.e., attributes), classification analysis may produce less accurate results. In order for data mining algorithms to perform efficiently and effectively on high-dimensional data, it is imperative to remove irrelevant and redundant features. Feature selection is one of the important and frequently used data preprocessing techniques for data mining applications in medicine. Many of the research area in data mining has improved the predictive accuracy of the classifiers by applying the various techniques of feature selection This paper illustrates, the application of feature selection technique in medical databases, will enable to find small number of informative features leading to potential improvement in medical diagnosis. It is proposed to find an optimal feature subset of the PIMA Indian Diabetes Dataset using Artificial Bee Colony technique with Differential Evolution, Symmetrical Uncertainty Attribute set Evaluator and Fast Correlation-Based Filter (FCBF). Then Mutual information based feature selection is done by introducing normalized mutual information feature selection (NMIFS). And valid classes of input features are selected by applying Hybrid Fuzzy C Means algorithm (HFCM).