Application of K-Prototypes Clustermix Algorithm for Clustering Risk Factors of Diabetes Disease

Authors

  • Martina Hildha Arda Universitas Sebelas Maret

DOI:

https://doi.org/10.33005/jasid.v1i1.9

Keywords:

diabetes, mixed data types, clustering, silhouette

Abstract

Diabetes mellitus (DM) is recognized as one of the most rapidly increasing chronic diseases worldwide, posing a significant public health challenge. According to the International Diabetes Federation (IDF), approximately 537 million people were living with diabetes mellitus globally, with projections estimating a rise to 643 million by 2030 and 783 million by 2045. Additionally, the World Health Organization (WHO) reported a 3% increase in mortality rates attributed to diabetes mellitus between 2000 and 2019, underscoring the urgent need for effective risk detection and management strategies. Early identification of risk factors is crucial to mitigating the impact of DM, and clustering analysis offers a promising method for stratifying patients based on risk profiles. This study employs the k-prototypes algorithm, which is particularly suited to clustering datasets with mixed numeric and categorical variables, to analyze DM risk factors. Utilizing data from the 2022 Behavioral Risk Factor Surveillance System (BRFSS) annual survey, the study examines a sample of 2,480 diabetes mellitus patients across the United States. The clustering analysis identified two optimal clusters (k=2) based on a high silhouette score of 0.821, indicating strong cluster cohesion and separation. Cluster 2, consisting of 77 patients, exhibited a higher risk profile for diabetes compared to Cluster 1, which included 2,403 patients. The clusters were characterized by significant differences in average values of key DM risk factors including weight, fruit and vegetable consumption, mental and physical health status, age, alcohol consumption, hypertension, smoking status, physical activity, mobility difficulties, sex, education level, income, and ethnicity. These findings highlight the utility of k-prototypes clustering in identifying high-risk DM subgroups to inform targeted prevention and intervention efforts.

References

W. Yudananto, S. S. Remi, and B. Muljarijadi, "Peranan Sektor Pariwisata Terhadap Perekonomian Daerah di Indonesia (Analisis Interregional Input-Output)," Jurnal, vol. 2, no. 4, 2012, Universitas Padjajaran, Bandung.

International Diabetes Federation. IDF Diabetes Atlas 10th Edition ; International Diabetes Federation, 2021.

ElSayed, N. A.; Aleppo, G.; Aroda, V. R.; Bannuru, R. R.; Brown, F. M.; Bruemmer, D.;

Collins, B. S.; Cusi, K.; Das, S. R.; Gibbons, C. H.; Giurini, J. M.; Hilliard, M. E.; Isaacs, D.; Johnson, E. L.; Kahan, S.; Khunti, K.; Kosiborod, M.; Leon, J.; Lyons, S. K.; Murdock, L. Introduction and Methodology: Standards of Care in Diabetes—2023. Diabetes Care 2022, 46 (Supplement_1), S1–S4.

GBD 2021 Diabetes Collaborators. Global, Regional, and National Burden of Diabetes from 1990 to 2021, with Projections of Prevalence to 2050: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet 2021, 402 (10397).

Kshanti, I. A. M. Pedoman Pemantauan Glukosa Darah Mandiri, 1st ed.; PB Perkeni: Jakarta, 2019.

Yuan, C.; Yang, H. Research on K-Value Selection Method of K-Means Clustering Algorithm.

J 2019, 2 (2), 226–235.

Madhuri, R.; Murty, M. R.; Murthy, J. V. R.; Reddy, P. V. G. D. P.; Satapathy, S. C. Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms. ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II 2014, 249 (137), 137–144.

Rousseeuw, P. J.; Leroy, A. M. Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics; John Wiley: New York, 1987.

Bholowalia, P.; Kumar, A. EBK-Means: A Clustering Techiniques Based on Elbow Method and K-Means in WSN. International Journal of Computer Application (0975-8887) 2014, IX (105), 17–24.

Huang, Z. Extensions to the K-Means Algorithm for Clustering Large Data Sets with

Categorical Values. Data Mining and Knowledge Discovery 1998, 2 (3), 283–304.

Siregar, P. A. Analisis Karakteristik Dan Frekuensi Konsumsi Buah Dan Sayur Pada Penderita Diabetes Dan Non Diabetes. Darussalam Nutrition Journal, Mei 2021, 2021 (1), 61–69.

Kulzer, B. Physical and Psychological Long-Term Consequences of Diabetes Mellitus.

Bundesgesundheitsblatt, Gesundheitsforschung, Gesundheitsschutz 2022, 66 (4), 503.

Rudi, A.; Kwureh, H. N. Faktor Risiko Yang Mempengaruhi Kadar Gula Darah Puasa pada Pengguna Layanan Laboratorium. Wawasan Kesehatan 2017, 3 (2), 2087–4995.

Wu, X.; Liu, X.; Liao, W.; Kang, N.; Dong, X.; Abdulai, T.; Zhai, Z.; Wang, C.; Wang, X.; Li,

Y. Prevalence and Characteristics of Alcohol Consumption and Risk of Type 2 Diabetes Mellitus in Rural China. BMC Public Health 2021, 21 (1). https://doi.org/10.1186/s12889-021- 11681-0.

Yang, X.; Chen, J.; Pan, A.; Wu, J. H. Y.; Zhao, F.; Xie, Y.; Wang, Y.; Ye, Y.; Pan, X.-F.;

Yang, C.-X. Association between Higher Blood Pressure and Risk of Diabetes Mellitus in Middle-Aged and Elderly Chinese Adults. Diabetes & Metabolism Journal 2020, 44 (3), 436.

Damayanti, S. Diabetes Melitus Dan Penatalaksanaan Keperawatan; Nuha Medika; Yogyakarta, 2015.

Downloads

Published

2025-05-28

How to Cite

Martina Hildha Arda. (2025). Application of K-Prototypes Clustermix Algorithm for Clustering Risk Factors of Diabetes Disease. Jurnal Aplikasi Sains Data, 1(1), 40–49. https://doi.org/10.33005/jasid.v1i1.9