A Data-Driven Imputation Scheme for Cohort Studies: A Collegiate Basketball Casestudy
Document Type
Conference Proceeding
Publication Date
2025
Abstract
Missing data remains a critical challenge in cohort studies. This study introduces a novel missing value imputation technique that integrates feature sensitivity and factor analysis with clustering and predictive modelling to enhance accuracy, reliability, and interpretability.
The dataset comprises 42 features collected from 16 collegiate female basketball athletes over 26 weeks, including sleep and cardiac rhythms, training loads, cognitive states, travel, and countermovement jump performance. The objective is to model the impact of these contextual stressors on athletic readiness, quantified via the Reactive Strength Index modified (RSImod).
When compared to state-of-the-art the proposed methodology reduces computation time by up to 35.71% (KNN), 29.41% (EM), 21.43% (MICE), 14.29% (CART), and 7.14% (XGBoost). It reduces RMSE by up to 12.20% and MAE by up to 10.77%. Moreover, RSImod predictions on the imputed dataset showed substantial improvements, up to an 80.85% reduction in MSE and a 79.99% increase in scores. Interpretability was enhanced using SHAP (SHapley Additive exPlanations), providing actionable insights for coaches and practitioners.
DOI
10.1007/978-3-032-06167-6_18
Recommended Citation
Sharma, S., Barot, V., Divakaran, S., Kaya, T., Taber, C.B., Raval, M.S. (2025). A data-driven imputation scheme for cohort studies: A collegiate basketball casestudy. In J.S. Dong, J. Sun, X. Xie, & K. Jiang (Eds.), Sports Analytics (pp.235-252). Springer. Doi: 10.1007/978-3-032-06167-6_18
Comments
International Sports Analytics Conference and Exhibition
Part of the book series: Lecture Notes in Computer Science