A Data-Driven Imputation Scheme for Cohort Studies: A Collegiate Basketball Casestudy

Document Type

Conference Proceeding

Publication Date

2025

Abstract

Missing data remains a critical challenge in cohort studies. This study introduces a novel missing value imputation technique that integrates feature sensitivity and factor analysis with clustering and predictive modelling to enhance accuracy, reliability, and interpretability.

The dataset comprises 42 features collected from 16 collegiate female basketball athletes over 26 weeks, including sleep and cardiac rhythms, training loads, cognitive states, travel, and countermovement jump performance. The objective is to model the impact of these contextual stressors on athletic readiness, quantified via the Reactive Strength Index modified (RSImod).

When compared to state-of-the-art the proposed methodology reduces computation time by up to 35.71% (KNN), 29.41% (EM), 21.43% (MICE), 14.29% (CART), and 7.14% (XGBoost). It reduces RMSE by up to 12.20% and MAE by up to 10.77%. Moreover, RSImod predictions on the imputed dataset showed substantial improvements, up to an 80.85% reduction in MSE and a 79.99% increase in scores. Interpretability was enhanced using SHAP (SHapley Additive exPlanations), providing actionable insights for coaches and practitioners.

Comments

International Sports Analytics Conference and Exhibition

Part of the book series: Lecture Notes in Computer Science

DOI

10.1007/978-3-032-06167-6_18


Share

COinS