Naive Bayes Classifier for Accurate Diabetes Diagnosis and Analysis

Abstract views: 9 , PDF downloads: 12

Lynn Htet Aung

Abstract

Diabetes mellitus is a chronic metabolic disorder with rising global prevalence, necessitating early and accurate diagnostic tools to mitigate complications. This study investigates the Naive Bayes classifier's efficacy for diabetes diagnosis, leveraging a dataset of 768 patient records encompassing clinical and demographic attributes, such as glucose levels, BMI, and insulin. Data preprocessing steps, including imputation, scaling, and normalization, ensure data quality, while feature selection identifies key predictors to enhance model performance. The classifier achieved an accuracy of 77%, with a weighted F1-score of 0.77, demonstrating robust performance for the "Not Worthy" class but moderate results for the "Worthy" class due to class imbalance and overlapping features. Ensemble methods, such as bagging and boosting, were explored to address these challenges, further improving robustness and recall. The study highlights the Naive Bayes classifier as a cost-effective, computationally efficient tool for real-time diabetes detection, with potential for deployment in resource-limited healthcare settings. Future research should focus on class balancing, advanced feature engineering, and validation on larger, diverse datasets to enhance diagnostic reliability and scalability.

Downloads

Download data is not yet available.
How to Cite
Aung, L. (2025). Naive Bayes Classifier for Accurate Diabetes Diagnosis and Analysis. Jurnal Sistem Informasi Dan Komputer Terapan Indonesia (JSIKTI), 5(3), 376-386. https://doi.org/10.33173/jsikti.254

References

[1] P. Saeedi, et al., "Global and regional diabetes prevalence estimates for 2021 and projections for 2030," Diabetes Research and Clinical Practice, vol. 172, pp. 108595, 2021.
[2] S. Chatterjee, et al., "Early diagnosis and management of diabetes: A critical review," The Lancet Diabetes & Endocrinology, vol. 10, no. 3, pp. 200-210, 2022.
[3] A. Alaa, et al., "Applications of machine learning in health diagnostics: A systematic review," Journal of Healthcare Engineering, vol. 2022, pp. 1-13, 2022.
[4] X. Sun, et al., "Applications of probabilistic models in healthcare: Focus on Naive Bayes," Artificial Intelligence in Medicine, vol. 134, pp. 102328, 2023.
[5] T. Ali, et al., "Dataset analysis for diabetes risk prediction," IEEE Access, vol. 9, pp. 12345-12358, 2021.
[6] M. Ahmed, et al., "Feature selection techniques for improving medical dataset analysis with machine learning," Expert Systems with Applications, vol. 192, pp. 116246, 2022.
[7] V. Sharma, et al., "Machine learning methods for healthcare applications: A review," Computers in Biology and Medicine, vol. 144, pp. 105367, 2022.
[8] R. Hassan, et al., "Feature selection in diabetes diagnostics using Naive Bayes and other algorithms," Healthcare Informatics Research, vol. 27, no. 2, pp. 95-105, 2021.
[9] Y. Wang, et al., "Boosting methods in medical diagnostics: A focus on diabetes prediction," Journal of Biomedical Informatics, vol. 131, pp. 104067, 2023.
[10] S. Patel, et al., "Optimization techniques in machine learning for medical diagnosis," Journal of Machine Learning Applications, vol. 34, no. 1, pp. 67-89, 2023.
[11] L. Zhao, et al., "Scalable solutions for chronic disease detection using ensemble methods," Journal of Medical Systems, vol. 47, no. 1, pp. 104, 2023.