Random Forest Methodology for Analyzing Diabetes Risk Factors
Abstract views: 70 , PDF downloads: 66Abstract
Diabetes is a chronic disease posing significant health challenges globally, with rising prevalence due to genetic, lifestyle, and environmental factors. This research employs the Random Forest methodology to analyze diabetes risk factors and predict outcomes using a dataset of 768 patient records. Key attributes such as glucose levels, BMI, blood pressure, and age were evaluated to uncover their contribution to diabetes risk. The study achieved an overall accuracy of 72%, with glucose emerging as the most influential predictor, followed by BMI and age. While the model showed strong performance in identifying non-diabetic cases, moderate precision and recall for diabetic cases highlighted the impact of class imbalance. Feature importance analysis provided actionable insights, emphasizing glucose and BMI monitoring in diabetes management. Despite its strengths, challenges such as class imbalance and feature redundancy were noted, suggesting the need for oversampling techniques, additional variables, and advanced feature engineering. These findings demonstrate the utility of Random Forest in healthcare analytics, supporting predictive and preventive care strategies. Future research should focus on integrating lifestyle factors, expanding datasets, and exploring advanced machine learning models to enhance predictive accuracy and real-world applicability.
Downloads

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
[2] Kumar, R., Gupta, A., & Das, P. (2020). Applications of Ensemble Learning in Medical Diagnostics. Computer Methods and Programs in Biomedicine, 196, 105681.
[3] Liang, H., Xu, J., & Zhao, W. (2022). Random Forest Approach in Diabetes Risk Prediction and Management. Health Informatics Journal, 28(3), 230-245.
[4] Zhang, M., Li, T., & Huang, J. (2023). Feature Importance in Diabetes Prediction Using Random Forest. Computational Biology and Medicine, 153, 106472.
[5] Jones, P., & Smith, R. (2023). Emerging Trends in Machine Learning for Healthcare Analytics. Healthcare Advances, 7(4), 145-153.
[6] International Diabetes Federation. (2022). Diabetes Atlas 10th Edition. Retrieved from https://idf.org
[7] Breiman, L. (2021). Random Forests in Predictive Modeling: A Review and Case Studies. Statistical Science, 36(2), 199-210.
[8] Singh, D., & Kaur, H. (2023). Machine Learning for Chronic Disease Prediction: Advances and Challenges. Journal of Computational Medicine, 15(3), 312-326.
[9] Patel, R., & Kumar, S. (2022). Data Quality in Healthcare Analytics: A Review of Techniques and Applications. Health Informatics Journal, 28(2), 145-158.
[10] Chang, L., & Wu, T. (2021). Optimization Techniques in Random Forest Models for Medical Predictions. Computational Biology and Medicine, 133, 104391.
[11] Rahman, A., & Khan, Z. (2023). Evaluation and Validation Metrics in Healthcare Machine Learning Models. AI in Healthcare Systems, 9(4), 276-288.
[12] Chen, G., & Zhou, Q. (2022). Comparing Machine Learning Algorithms for Disease Risk Prediction. Advanced Computational Science in Medicine, 14(1), 45-62.
[13] Brown, P., & Taylor, J. (2023). Ethical Frameworks for Data-Driven Research in Healthcare. Journal of Bioinformatics Ethics, 10(1), 99-112.







