KNN-Based Prediction Model for Assessing Hypertension Risk from Lifestyle Features
Abstract views: 47 , PDF downloads: 51Abstract
Hypertension is one of the most common chronic conditions associated with serious cardiovascular complications, and its prevalence continues to rise due to the influence of lifestyle related factors, motivating the use of data driven approaches for early risk identification. Although various machine learning models have been applied in health analytics, many still face challenges in processing heterogeneous lifestyle attributes, which limits their ability to accurately detect individuals at risk. This study addresses that gap by implementing the K Nearest Neighbors algorithm to predict hypertension using a dataset of 1,985 records containing variables such as age, salt intake, stress score, sleep duration, body mass index, family history, medication use, physical activity, and smoking status. The motivation for selecting KNN lies in its simplicity, adaptability, and strong performance in classification tasks involving structured health data. The contribution of this research includes the development of a lifestyle based hypertension prediction model supported by a preprocessing pipeline and optimized hyperparameters, enabling effective handling of mixed numerical and categorical features. The model is evaluated using accuracy, precision, recall, f1 score, and confusion matrix visualization, achieving an accuracy of 85 percent with balanced performance across both classes, showing that KNN offers reliable generalization for this dataset. Future work involves comparing KNN with ensemble or deep learning models, exploring feature selection techniques, and expanding dataset diversity to improve model robustness and applicability for real world digital health solutions.
Downloads

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
[2] H. Lin and K. Wong, “Silent Indicators of Cardiovascular Disease Progression,” International Journal of Medical Informatics, vol. 165, pp. 104–112, 2023.
[3] M. Tanaka, L. K. Roberts, and J. Silva, “Lifestyle Transitions and Chronic Disease Burden,” Preventive Health Studies, vol. 19, no. 4, pp. 251–264, 2021.
[4] P. Torres and D. Li, “Data-Driven Approaches for Chronic Disease Surveillance,” IEEE Access, vol. 10, pp. 114208–114219, 2022.
[5] S. Mahmoud and R. Ali, “Limitations of Conventional Screening for Hypertension Risk,” Medical Diagnostics Review, vol. 9, no. 1, pp. 34–45, 2021.
[6] Y. Cheng, A. Ibrahim, and M. Davies, “Machine Learning for Multivariate Health Data Interpretation,” Healthcare Informatics Letters, vol. 3, no. 1, pp. 15–27, 2024.
[7] L. Osakwe and J. Monroe, “Performance Assessment of K-Nearest Neighbors in Medical Classification,” Computational Health Studies, vol. 5, no. 3, pp. 199–210, 2023.
[8] F. Delgado, P. Nguyen, and S. Ho, “Applications of KNN in Predictive Healthcare Analytics,” Journal of Biomedical Computation, vol. 14, no. 2, pp. 77–89, 2022.
[9] R. Iqbal and N. Surya, “Evaluation Metrics for Clinical Prediction Models,” IEEE Transactions on Healthcare Systems Engineering, vol. 8, no. 1, pp. 48–59, 2024.
[10] D. Martins and K. A. Osei, “Enhancing Medical Classification Through Feature Optimization,” Journal of Intelligent Systems and Data Science, vol. 7, no. 4, pp. 301–314, 2023.
[11] T. Johnson and S. Kumar, “Predictive Modeling for Hypertension Using Structured Lifestyle Data,” Public Health Informatics Journal, vol. 6, no. 2, pp. 122–133, 2021.
[12] H. Singh and L. Bernardo, “Correlation of Lifestyle Indicators with Emerging Cardiovascular Risks,” Global Epidemiology Review, vol. 11, no. 1, pp. 41–54, 2022.
[13] B. Carmichael, R. Lopez, and D. Stewart, “Limitations of Logistic Regression for Health Risk Prediction,” International Journal of Statistical Medicine, vol. 18, no. 3, pp. 210–224, 2021.
[14] R. Yoon and P. Takahashi, “Performance Evaluation of Ensemble Models in Chronic Disease Diagnosis,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 5, pp. 2334–2345, 2023.
[15] S. Lee and M. Ferreira, “A Comparative Study of KNN in Chronic Disease Classification,” Clinical Data Science Review, vol. 4, pp. 55–67, 2021.
[16] R. Walters and D. Kim, “Impact of Feature Scaling on Distance-Based Classifiers,” Data Analytics and Modelling, vol. 9, no. 2, pp. 98–111, 2022.
[17] Q. Zhao and N. Ahmad, “Normalization Strategies in Medical Machine Learning,” Health Informatics Analytics, vol. 12, no. 1, pp. 77–89, 2023.
[18] K. Park and S. Li, “Encoding Categorical Attributes for Medical AI Models,” Journal of Digital Health Science, vol. 8, no. 3, pp. 145–158, 2024.
[19] M. O’Reilly and T. Santos, “Hyperparameter Optimization Techniques for KNN,” Artificial Intelligence in Medicine, vol. 128, pp. 102–117, 2022.
[20] N. Gupta, H. Roy, and A. Shrestha, “Benchmarking KNN Against Neural Networks in Health Risk Classification,” IEEE Access, vol. 11, pp. 99822–99835, 2023.
[21] F. Salazar and K. Wong, “Validation Strategies for Reliable Medical Classification,” Biomedical Modelling Review, vol. 17, no. 4, pp. 201–214, 2024.
[22] G. Al-Hassan and M. Torres, “Machine Learning Approaches in Preventive Healthcare,” Computational Health Perspectives, vol. 5, no. 2, pp. 81–95, 2023.
[23] P. Singh and D. Choi, “Effects of Scaling and Encoding on Tabular Health Data,” Journal of Machine Learning in Medicine, vol. 19, pp. 144–156, 2022.
[24] L. Zhang and S. Adeyemi, “Feature Optimization for Lifestyle-Based Disease Prediction,” Health Data Modelling Journal, vol. 10, no. 1, pp. 34–48, 2024.
[25] K. Murata and T. Chang, “Application of KNN for Hypertension Screening,” IEEE International Conference on Health Informatics, pp. 220–227, 2021.
[26] W. Santos and N. Abdulrahman, “Hybrid Approaches for Chronic Disease Risk Modelling,” Medical AI Review, vol. 7, no. 4, pp. 312–326, 2023.
[27] J. Okafor and L. White, “Lightweight AI Models for Low-Resource Healthcare Settings,” Digital Health Technologies, vol. 3, no. 3, pp. 119–132, 2024.
[28] R. Silva and M. Khanna, “Lifestyle Determinants of Hypertension: A Machine Learning Perspective,” CardioHealth Informatics, vol. 9, no. 2, pp. 201–214, 2022.
[29] T. Wu and L. Carver, “Error Analysis in Binary Medical Classification Models,” Machine Learning for Healthcare, vol. 5, no. 3, pp. 80–97, 2023.
[30] S. Patel and E. Norton, “Improving KNN Decision Boundaries in Noisy Medical Datasets,” Applied Computational Intelligence, vol. 27, no. 1, pp. 45–59, 2024.
[31] J. Rodriguez and P. Ahmed, “Comparative Analysis of ML Algorithms for Hypertension Prediction,” Biomedical Computing Review, vol. 15, pp. 55–67, 2023.
[32] Y. Matsuda and K. Lee, “Performance of Tree-Based Models in Clinical Data Classification,” Health Predictive Modelling Journal, vol. 7, no. 4, pp. 178–192, 2022.
[33] I. Sari and Y. Malik, “Deploying Lightweight AI Models in Mobile Health Applications,” Digital Wellness Informatics, vol. 6, no. 1, pp. 101–116, 2024.
[34] R. Ganesan and D. Lu, “Impact of Feature Scaling on KNN Performance,” Journal of Applied Artificial Intelligence, vol. 33, no. 5, pp. 490–502, 2021.
[35] A. Rahman and J. Hsu, “Preprocessing Pipelines for Medical Machine Learning,” Clinical Informatics Review, vol. 13, no. 1, pp. 66–78, 2024.
[36] K. Abdullah and R. Wong, “KNN Classification Performance on Mixed-Type Health Datasets,” Journal of Health Data Science, vol. 4, no. 2, pp. 110–123, 2022.
[37] M. Okeke and F. Abdullah, “Integrating AI into Community Health Screening Systems,” Global Digital Health Journal, vol. 9, no. 3, pp. 200–214, 2023.
[38] T. Sanchez and L. Purnama, “Explainable AI for Disease Risk Modelling,” IEEE Transactions on Computational Health, vol. 12, no. 2, pp. 221–234, 2024.







