Sercan TOHMA
Keywords
ECG Arrhythmia Classification MIT-BIH PTB-DB Machine Learning Histogram-Based Gradient Boosting.
Doi : 10.5281/zenodo.18664503
Abstract
Abstract In this study, two publicly available electrocardiogram (ECG) datasets were used to address (i) five-class heartbeat classification on MIT-BIH and (ii) binary (normal/abnormal) cardiac anomaly detection on PTB-DB. Raw time series samples (187-point segments) were used directly as feature vectors; after z-score standardization, Random Forest, Histogram-Based Gradient Boosting, Decision Tree, Naive Bayes, and Logistic Regression models were trained. Experimental evaluation was performed using accuracy (Acc), precision (Prec), recall (Rec), F1, and AUC metrics. The results indicated that the best performance in both datasets was achieved with the Histogram-Based Gradient Boosting model (MIT-BIH: Acc = 0.9801, AUC = 0.9930; PTB-DB: Acc = 0.9825, AUC = 0.9945). The findings show that classical machine learning models, with appropriate preprocessing and parameterization, can offer competitive performance in ECG-based arrhythmia and cardiac anomaly classification problems.
References
- Airlangga, G. (2024). Enhancing cardiac anomaly detection through deep learning autoencoder: An in-depth analysis using the PTB diagnostic ECG database. G-Tech: Jurnal Teknologi Terapan, 8(1), 584–592. https://doi.org/10.33379/gtech.v8i1.3921
- Bousseljot, R., Kreiseler, D., & Schnabel, A. (1995). Nutzung der EKG-signaldatenbank CARDIODAT der PTB über das internet. Biomedizinische Technik/Biomedical Engineering, 317–318. https://doi.org/10.1515/BMTE.1995.40.s1.317
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
- Duruel, O., Kutlu, Y. (2025). Artificial Intelligence in Dental Imaging for Disease Detection and Treatment. Tethys Environmental Science, 2(3), 122-128, doi : 10.5281/zenodo.17220181
- Essa, E., & Xie, X. (2021). An ensemble of deep learning-based multi-model for ECG heartbeats arrhythmia classification. IEEE Access, 9, 103452–103464. https://doi.org/10.1109/ACCESS.2021.3098986
- Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5). https://doi.org/10.1214/AOS/1013203451
- Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody, G. B., Peng, C.-K., & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215–e220. https://doi.org/10.1161/01.CIR.101.23.e215
- Hannun, A. Y., Rajpurkar, P., Haghpanahi, M., Tison, G. H., Bourn, C., Turakhia, M. P., & Ng, A. Y. (2019). Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine, 25(1), 65–69. https://doi.org/10.1038/s41591-018-0268-3
- Kachuee, M., Fazeli, S., & Sarrafzadeh, M. (2018). ECG heartbeat classification: A deep transferable representation. 2018 IEEE International Conference on Healthcare Informatics (ICHI), 443–444. https://doi.org/10.1109/ICHI.2018.00092
- Kanani, P., & Padole, M. (2020). ECG heartbeat arrhythmia classification using time-series augmented signals and deep learning approach. Procedia Computer Science, 171, 524–531. https://doi.org/10.1016/j.procs.2020.04.056
- Madhiva, P., Sarma, M., Kumari, K. (2025). Predicting Air Quality Index Based on Rainfall Patterns: A Machine Learning Approach with Mathematical Modelling. Tethys Environmental Science, 2(2), 77-89
- Mahmud, T., Fattah, S. A., & Saquib, M. (2020). DeepArrNet: An efficient deep CNN architecture for automatic arrhythmia detection and classification from denoised ECG beats. IEEE Access, 8, 104788–104800. https://doi.org/10.1109/ACCESS.2020.2998788
- Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50. https://doi.org/10.1109/51.932724
- Ribeiro, A. H., Ribeiro, M. H., Paixão, G. M. M., Oliveira, D. M., Gomes, P. R., Canazart, J. A., Ferreira, M. P. S., Andersson, C. R., Macfarlane, P. W., Meira Jr., W., Schön, T. B., & Ribeiro, A. L. P. (2020). Automatic diagnosis of the 12-lead ECG using a deep neural network. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-15432-4
- Saito, T., & M. Rehmsmeier (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
- Śmigiel, S., Pałczyński, K., & Ledziński, D. (2021a). ECG signal classification using deep learning techniques based on the PTB-XL dataset. Entropy, 23(9), 1121. https://doi.org/10.3390/e23091121
- Śmigiel, S., Pałczyński, K., & Ledziński, D. (2021b). Deep learning techniques in the classification of ECG signals using r-peak detection based on the PTB-XL dataset. Sensors, 21(24), 8174. https://doi.org/10.3390/s21248174
- Strodthoff, N., Wagner, P., Schaeffter, T., & Samek, W. (2021). Deep learning for ECG analysis: Benchmarks and insights from PTB-XL. IEEE Journal of Biomedical and Health Informatics, 25(5), 1519–1528. https://doi.org/10.1109/JBHI.2020.3022989
- Wagner, P., Strodthoff, N., Bousseljot, R. D., Samek, W., & Schaeffter, T. (2020). PTB-XL, a large publicly available electrocardiography dataset. Scientific Data, 7(1). https://doi.org/10.1038/s41597-020-0495-6