Evaluating Methods for Imputing Missing Data from Longitudinal Monitoring of Athlete Workload

Research article - (2021)20, 188 - 196
DOI: https://doi.org/10.52082/jssm.2021.188

Evaluating Methods for Imputing Missing Data from Longitudinal Monitoring of Athlete Workload

Lauren C. Benson^1,2,

, Carlyn Stilling², Oluwatoyosi B.A. Owoeye^2,3, Carolyn A. Emery^2,4,5,6

¹United States Olympic & Paralympic Committee, Colorado Springs, CO, United States
²Sport Injury Prevention Research Centre, Faculty of Kinesiology, University of Calgary, Calgary, Canada
³Department of Physical Therapy and Athletic Training, Doisy College of Health Sciences, Saint Louis University, Saint Louis, MO, United States
⁴Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Canada
⁵McCaig Bone and Joint Institute, Cumming School of Medicine, University of Calgary, Calgary, Canada
⁶Departments of Community Health Sciences and Pediatrics, Cumming School of Medicine, University of Calgary, Calgary, Canada

Lauren C. Benson
✉ 2500 University Dr NW, Calgary, AB, T2N 1N4, Canada; 403-220-2170
Email: lauren.benson@ucalgary.ca

Received: 24-09-2020 -- Accepted: 26-01-2021
Published (online): 05-03-2021

ABSTRACT

Missing data can influence calculations of accumulated athlete workload. The objectives were to identify the best single imputation methods and examine workload trends using multiple imputation. External (jumps per hour) and internal (rating of perceived exertion; RPE) workload were recorded for 93 (45 females, 48 males) high school basketball players throughout a season. Recorded data were simulated as missing and imputed using ten imputation methods based on the context of the individual, team and session. Both single imputation and machine learning methods were used to impute the simulated missing data. The difference between the imputed data and the actual workload values was computed as root mean squared error (RMSE). A generalized estimating equation determined the effect of imputation method on RMSE. Multiple imputation of the original dataset, with all known and actual missing workload data, was used to examine trends in longitudinal workload data. Following multiple imputation, a Pearson correlation evaluated the longitudinal association between jump count and sRPE over the season. A single imputation method based on the specific context of the session for which data are missing (team mean) was only outperformed by methods that combine information about the session and the individual (machine learning models). There was a significant and strong association between jump count and sRPE in the original data and imputed datasets using multiple imputation. The amount and nature of the missing data should be considered when choosing a method for single imputation of workload data in youth basketball. Multiple imputation using several predictor variables in a regression model can be used for analyses where workload is accumulated across an entire season.

Key words: Jump count, imputation, training load, machine learning, basketball

Key Points

The error associated with single imputation of missing workload data depends on the method used.
Single imputation methods based on the specific context of the session for which data are missing (e.g., team mean) and methods that combine information about the session and the individual (e.g., machine learning models) have the smallest imputation error.
Multiple imputation using several predictor variables in a regression model can be used for analyses where workload is accumulated across an entire season.

INTRODUCTION

There is a growing interest in longitudinal monitoring of athlete workload, with the goal of reducing the risk of injury and optimizing performance (Drew and Finch, 2016; Eckard et al., 2018; Gabbett, 2018; Jones et al., 2017; Soligard et al., 2016). Workload can be categorized as external (direct measurement of the work performed by an athlete) or internal (relative physiological or psychological stressors experienced by an athlete), and both external and internal measures are used in the longitudinal monitoring of athlete workload (Bourdon et al., 2017; Impellizzeri et al., 2019). Common analysis methods involve summing workload over a period of days or weeks (Wang et al., 2020), thus workload values for every session are needed for an accurate measure of accumulated workload. Techniques for recording workload range from reporting rating of perceived exertion (RPE) (Foster et al., 2001; Lupo et al., 2017) to using equipment for direct measurement of events such as jumps (Moran et al., 2019; Van der Worp et al., 2014), with varying degrees of ease of use for a single session and across many sessions (Bourdon et al., 2017). As such, there are a variety of reasons why workload data could be missing: the data may have been recorded but were subsequently lost or deleted, or the data may not have been recorded due to time constraints, poor athlete/team adherence, injuries during a game or practice, or equipment malfunction. While care is taken to reduce the incidence of missing data, attaining the ideal of no missing data is considered impossible in practice (Van Buuren, 2018).

In many published studies of workload monitoring, there is no mention of how missing workload data were handled, particularly for youth sports (Windt et al., 2018). In other studies, missing data were ignored or excluded from the analysis (Black et al., 2018; Curtis et al., 2018; DeWitt et al., 2018; Martín-García et al., 2018; Rago et al., 2019; Smpokos et al., 2018a; Smpokos et al., 2018b; Vahia et al., 2019; Wellman et al., 2017; Whitehead et al., 2019). While excluding incomplete cases is a simple way to handle missing data, complete case analyses can lead to biased estimates and large standard errors (Gelman and Hill, 2006; Van Buuren, 2018). When the pattern of missingness in the data can be described as missing completely at random, some statistical analyses such as mixed models (Lupo et al., 2019) can be used for longitudinal datasets with missing data and avoid the complete-case bias (Ibrahim and Molenberghs, 2009). But in situations where the pattern of missingness cannot be ignored, the missing data are often replaced or imputed with substituted values (Windt et al., 2018).

Single imputation is when a single value is ascribed to an absent value (Patrician, 2002). Some common methods of choosing a single imputation value include the mean of all other observations for that individual, carrying the previous observation (or the mean of several previous observations) forward, selecting a random value, or using related observations from similar individuals (e.g. same sex, same team, individuals with similar values during other known observations, etc.) (Gelman and Hill, 2006; Patrician, 2002; Van Buuren, 2018). Additionally, machine learning approaches (e.g., regression, decision trees) to imputation can use several predictor variables, including similar observations, to estimate single values for the missing data (Jerez et al., 2010).

Choosing a method for single imputation will result in a complete dataset that can be used for further analysis. It has been shown, however, that ascribing a single value to the missing data overstates the precision and leads to bias in any results based on the imputed data (Gelman and Hill, 2006; Patrician, 2002). Multiple imputation accounts for this bias by replacing missing data several times, with each iteration providing a different value that reflects the uncertainty about the imputation model (Van Buuren, 2018). Subsequent analyses are then pooled from the several imputed datasets.

Even though missing data researchers support multiple imputation over single imputation (Enders, 2010; Van Buuren, 2018), when workload monitoring studies report imputation of missing data, single imputation methods are used (Bowen et al., 2019; Bowen et al., 2017; Colby et al., 2014; Duhig et al., 2016; Esmaeili et al., 2018; Jaspers et al., 2018; Skazalski et al., 2018; Vescovi and Klas, 2018). Often the single imputation method used is dependent on the context of the missing data point (e.g., individual mean is used when a game session is missing; team mean is used when a training session is missing), but does not incorporate several predictor variables to estimate single values for the missing data. The decision to use a simple single imputation method instead of a regression approach and multiple imputation may be due the practicality of performing data analyses in field settings or reflect a lack of statistical expertise among sport science researchers and coaching staff, especially in youth sports with limited resources (Windt et al., 2020). Furthermore, the accuracy of these single imputation methods for estimating missing workload data has not been reported, and the effects of imputing data on common injury risk analyses such as the summation of workload over an entire season has not been investigated.

The objectives of this study were to identify the best single imputation methods for imputing simulated missing data from a dataset of both external and internal workload, and to examine trends in longitudinal workload data after imputing actual missing data using multiple imputation. Since previous imputation methods in sport have focused on replicating the context of the missing information, it was hypothesized that the best single imputation method would most closely represent the session of the missing data, and that a combination of methods would provide additional context and thus perform better than one method. Additionally, it was expected that using data from a similar context only (i.e., practice or a game) would improve the accuracy of the single imputation methods. It was also hypothesized that trends in longitudinal workload data would be similar between the original and imputed dataset.

METHODS

Study design and participants

This study is a secondary analysis of data from a prospective study evaluating associations between workload (external and internal) and injuries in youth basketball players. This study was approved by the Conjoint Health Research Ethics Board of the University of Calgary, Alberta, Canada (Ethics ID: REB16-0864). Informed consent was obtained from all study participants.

Ninety-three (45 females, 48 males; mean (SD) 16.4 (0.7) years; 67.2 (11.1) kg; 1.74 (0.10) m) sub-elite high school basketball players from eight teams in Alberta, Canada, participated in this study. Participants played in their typical practice and game sessions throughout the 2017-2018 season. A player had full participation in a practice or game if they were present and physically able to participate (i.e., uninjured) for the entire session.

Data collection

Participants were asked to wear a commercially available inertial measurement unit consisting of a tri-axial accelerometer, gyroscope and magnetometer (VERT^®, Mayfonk Inc., Fort Lauderdale, FL, USA) during each session. The device was attached to participants with an elastic waistband and positioned near the centre of mass according to the manufacturer’s instructions. As the VERT^® recorded movement patterns, the data was transferred in real time via Bluetooth 4.0 technology to an associated Apple iPad application (iPad Air 2, Apple, Cupertino, CA, USA; VTS Basic, Mayfonk Inc., Fort Lauderdale, FL, USA; VERTcoach, version 2.2.6, Mayfonk Inc., Fort Lauderdale, FL, USA) which processed the data using proprietary algorithms and reported number of jumps over six inches (15.24 cm). The use of this jump counter has been previously validated in youth basketball (Benson et al., 2020). The output variable was stored on and later accessed from a server (myVERT^® BETA, Mayfonk Inc., Fort Lauderdale, FL, USA) maintained by the product manufacturer. Jump counts were recorded relative to the duration of the session in hours. Additionally, participants were asked to report their RPE on a scale of 1-10 (Foster et al., 1996; Lupo et al., 2017).

Quantifying missing data

The number of players that had at least one full participation session, the total number of sessions during the season based on the team schedule, and the number of sessions where jump count and/or RPE data were recorded were reported for each team. When data were not recorded for any reason (e.g., no team data recorded that session, equipment malfunction, individual did not wear jump counter and/or report RPE, etc.), it was reported as a percent of player-sessions based on both the total number of sessions and the number of data-recorded sessions for participants with full participation.

Single Imputation Evaluation

Data processing

A team’s external and internal workload data were organized in separate session by player matrices for all sessions with data recorded and for participants with full participation. The accuracy of a single imputation method was evaluated relative to known workload values, thus player-sessions with actual missing data were ignored in this analysis (Table 1). A random 1% of the known data were removed from the matrix, and the effective missing player-sessions due to this simulated missing data was reported for each team. Using built-in functions and custom MATLAB software (v9.5.0.944444 (2018b), Mathworks, Inc., Natick, MA, USA), eight values were imputed for the removed data according to common single imputation methods (Gelman and Hill, 2006; Patrician, 2002; Van Buuren, 2018):

Recent Session 1. Value from previous session.
Recent Session 5. Mean of five previous sessions. If there were fewer than five previous sessions with recorded data, the mean of all previous sessions was reported.
Individual Mean. Mean of all other sessions for the given individual.
Individual Random. A range of values was determined between 0 jumps per hour or 1 on the RPE scale and the individual’s maximum of all sessions. A random (rand function in MATLAB) value within that range was selected as the imputed value, with equal probability of selecting any value within the range.
Individual Weighted. A probability density function of the normal distribution was created with the mean and standard deviation of all sessions (normpdf function in MATLAB), for an individual, evaluated between 0 jumps per hour or 1 on the RPE scale and the maximum value for the individual. A random value within that range was selected as the imputed value, however, the probability of selecting a value was based on the probability density function.
Sex Mean. Mean of all sessions for all participants and all teams of the same sex.
Team Mean. Mean of all other values for same team and the same session.
Team Mean Weighted. For each session for the team, ratios between all players were calculated and mean player ratios were determined. The mean player ratios and recorded data in the given session for the other participants were used to estimate the removed value, and the imputed value was the mean of all estimates.

The error for each method was recorded as the imputed value minus the actual value. Then, the removed data points were put back in the matrix, a different random 1% of the data were removed, and imputed values and associated errors were calculated. This process was repeated until all known data points were imputed at least once. If a data point was randomly selected to be removed more than once, only the error from the first time it was removed was retained. The root mean squared error (RMSE) across all sessions was reported for each participant and each single imputation method.

The values from all previously described single imputation methods were then used as predictor variables in a dataset labelled with the actual value for each player-session. A single value for an individual’s workload data was then predicted using two machine learning models (Jerez et al., 2010):

Team Machine Learning. The testing dataset was all data from one individual. A training dataset was constructed using data from all other participants on the same team. The training dataset was used to train a least-squares boosted regression tree ensemble (fitrensemble function in MATLAB; number learning cycles: 30, minimum leaf size: 8, learning rate: 0.1) to predict load. The team-based model was then used to predict all values in the testing dataset. Performance was reported as the RMSE for all sessions in the testing dataset, and this process was repeated so that everyone on the team was in the testing dataset once.
Individual Machine Learning. A similar model as in the Team Machine Learning method was used; however, only data from one individual was used to train and cross-validate the model, with the number of folds equal to the number of sessions for that individual (i.e., in each fold, one session was predicted based on a model built from all other sessions of that individual). Performance was reported as the RMSE for all sessions, and this process was repeated so that one model was generated for everyone on the team.

With the eight single imputation methods and two machine-learning based methods, a total of ten single imputation methods were evaluated. Additionally, the number of times a value could not be imputed due to too much missing data was reported for each participant and each imputation method. This entire process was repeated a second time, during which only sessions of the same type (practice or game) were used to impute the removed data. For example, to impute a removed value from a game, only game sessions would be used in each imputation method.

Statistical analysis

A generalized estimating equation determined the effect of single imputation method on RMSE for jumps per hour and RPE. This was done separately for imputation using all sessions and imputation using the context of game or practice. In the case of a significant (p < 0.05) effect of single imputation method, all pairwise comparisons were evaluated using a Bonferroni correction for multiple tests (number of conditions = 10; number of independent pairwise comparisons = 45; adjusted α = 0.0011). The generalized estimating equation and follow up tests were conducted in SPSS (v26.0.0.0, SPSS, Inc., Chicago, IL).

Multiple Imputation Evaluation

Data processing

Multiple imputation of the original dataset, with all known and actual missing workload data, was used to examine trends in longitudinal workload data. There was no simulation of missing data for the multiple imputation analysis, rather the actual missing jumps per hour and RPE were imputed in separate analyses. The eight predictor variables used the regression models described above (Recent Session 1, Recent Session 5, Individual Mean, Individual Random, Individual Weighted, Sex Mean, Team Mean, Team Mean Weighted) were calculated for jumps per hour and RPE for every participant-session with full participation. The missingness of the workload variables was described using Little’s Missing Completely At Random test and separate variance t-tests, with the assumption of equal variances checked using Levene’s test, for each of the predictor variables (Garson, 2015).

A linear regression model with all eight predictor variables was use for five imputations of the missing workload values. Constraints on the dependent variables were a minimum of 0 jumps per hour, and a minimum of 1 and a maximum of 10 for RPE. For the original and each of the five imputed datasets, session workload was computed: the total number of jumps in a session was calculated as session duration in hours times jumps per hour, and the session RPE (sRPE) was calculated as the session duration in minutes times RPE. The season workload was the sum of all session workloads for each participant.

Statistical analysis

A Pearson correlation was used to evaluate the association between jump count season load and sRPE season load. The missingness analyses, multiple imputation, data aggregation and correlation were conducted in SPSS to account for the original and multiple imputation datasets (v26.0.0.0, SPSS, Inc., Chicago, IL).

RESULTS

The average amount of missing jump count data for a team ranged from 25.5-93.1% of all player-sessions with full participation and 6.7-48.1% of player-sessions with full participation and data recorded. Similarly, RPE data were missing for between 34.1-92.4% of all player-sessions with full participation and 5.8-53.4% of player-sessions with full participation and data recorded. By removing 1% of the data recorded for the single imputation analysis, the effective amount of missing data for a team was between 7.7-49.4% for jump count and 6.8-54.6% for RPE (Table 2).

There was a significant effect of single imputation method on RMSE for known jumps per hour and RPE when imputation was done with all sessions and either games or practices (all sessions, jump count: χ² (9) = 424.4, p < 0.001; all sessions, RPE: χ² (9) = 460.0, p < 0.001; games or practices, jump count: χ² (9) = 448.3, p < 0.001; games or practices, RPE: χ² (9) = 585.9, p < 0.001). In all cases, the team machine learning method had a significantly lower RMSE (all sessions: 8.5 jumps/hour error, 1.1 on RPE scale error; games or practices: 9.6 jumps/hour error, 1.2 on RPE scale error) than all other methods (Figure 1 and Figure 2). For single imputation of both jumps per hour and RPE, the next best methods were team mean (all sessions: 11.7 jumps/hour error, 1.4 on RPE scale error; games or practices: 11.7 jumps/hour error, 1.4 on RPE scale error) and individual machine learning (all sessions: 12.2 jumps/hour error, 1.4 on RPE scale error; games or practices: 13.3 jumps/hour error, 1.5 on RPE scale error), and team weighted (1.5 on RPE scale error) was also tied as the next best method for single imputation of RPE for analysis done on all sessions.

At least one session per player could not be imputed when the single imputation method was based on values from up to five recent sessions. The value from the previous session was not available for an average of four sessions per player. For most single imputation methods, the number of sessions that could not be imputed increased when using either games or practices rather than using all sessions to impute the missing data (Table 3). An interactive visualization of this dataset is available to view how single imputation error changes with the percent of missing data, session context (all sessions vs. games or practices), and for each variable and sex (https://public.tableau.com/profile/lauren.benson#!/vizhome/ImputationError/ImputationError).

The data in the original dataset, with all known and actual missing workload data, are not missing completely at random based on the significant result of Little’s Missing Completely At Random test for jumps per hour (χ² (92) = 325.5, p < 0.001) and RPE (χ² (80) = 418.9, p < 0.001). There was a significant effect of missing data on five of the predictor variables for jumps per hour (Recent Session 5, Individual Mean, Individual Weighted, Sex Mean, Team Mean) and six of the predictor variables for RPE (Recent Session 5, Individual Mean, Individual Weighted, Sex Mean, Team Mean, Team Mean Weighted), indicating that the data are missing at random.

There was a significant and strong association between jump count season load and sRPE season load in the original and imputed datasets (Table 4).

DISCUSSION

The objectives of this study were to identify the best single imputation methods for a dataset of athlete external and internal workload throughout a youth basketball season, and to use multiple imputation to examine trends in longitudinal workload data. The hypothesis that the best single imputation method would most closely represent the case of the missing data was supported. However, the hypothesis that only using data from a similar context (i.e., games or practices) would improve the accuracy of the imputation methods was not supported. The best single imputation method for both jumps per hour and RPE was the team mean of the session for which data were missing, resulting in an error of about 11.7 jumps/hour and about 1.4 on the RPE scale (range: 1-10). Additionally, a machine learning-based combination of methods that utilized even more information about the individual and the session performed better than any single method, reducing the error to about 8.5 jumps/hour and about 1.1 on the RPE scale. To put these numbers in perspective, an error of 8.5 jumps/hours represents about 27% of the known jump rate for players on the example team in Table 1, and an error of 1 on the RPE scale is the difference between perceiving a session to be “Very, Very Easy” and “Easy”, or between “Somewhat Hard” and “Hard” (Haddad et al., 2017). While the errors related to single imputation may seem substantial for an individual session, the association between jump count and sRPE accumulated across the entire season was maintained between the original and multiple imputed datasets.

The best single imputation method (team mean) related to the specific context of the session (i.e., the team’s practice or game on that day) rather than the individual’s context (e.g., season mean or recent workload). That workload depends more on session than an individual’s tendencies is likely due to variability in demands of different sessions across a youth basketball season. For example, a session the day before a game should have a different workload than a session that is several days before the next game. This rationale may be the impetus in previous research for utilizing different imputation methods for game (individual mean) and training (team mean) sessions (Bowen et al., 2019; Bowen et al., 2017; Colby et al., 2014; Jaspers et al., 2018; Vescovi and Klas, 2018).

It was expected that combining session and individual information would provide additional context. Similar to the approach used in professional volleyball by Skazalski et al. (2018), the weighted team mean was based on the average workload ratios between players. However, this method was not better than the unweighted team mean for imputing RPE and was much worse for imputing jumps per hour. This unexpected result suggests that the workload ratio between players is more consistent across sessions for RPE than jumps per hour in youth basketball. A weighted team mean imputation method might have better success with a more comprehensive measure of external load that accounts for running and other physical work in addition to jumping. It is also possible that the way in which the team mean was weighted could be modified to yield a better imputation method, or a “hot deck” approach could be used to replace the missing data from the known value of the most similar player(s) (Patrician, 2002).

One limitation to using the team mean or a weighted team mean as a single imputation method is that these methods cannot always be calculated, particularly in a session when data are missing for the entire team. This situation rarely occurs in this study (see Table 4), however, that is a by-product of the study design. The data used in this analysis only included sessions where at least one player’s external or internal load was recorded. For the example team in Table 1, jump count data were recorded but RPE data are missing for the entire team during sessions 36 and 38, and so the team mean or weighted team mean methods could not be used to impute RPE. However, as shown in Table 2, of the 54 sessions that this team held throughout the season, both types of workload data were not collected in 16 sessions (30%). Likewise, when there is a long time between sessions where data were collected, single imputation methods based on recent sessions cannot be calculated. In these cases, an alternative single imputation method would have to be used.

Using several predictor variables in a regression model was better than using individual single imputation methods for missing workload data. It is likely that context from the session and individual as well as across other players of the same sex led to the improved performance. The difference between the individual and team machine learning methods is in the quantity and origin of the training data. The team machine learning method contained more data, namely all player-sessions from the entire team except for the one individual being tested, thus the team machine learning model was less likely to be overfit when applied to the test data. In contrast, the individual machine learning model contained only the sessions of the individual being tested so the context was likely more relevant. Based on the results presented here, the team machine learning model performed better than the individual machine learning model. The individual machine learning model performed just as well as the team mean and better than all other single imputation methods.

In the analysis that used only data from sessions in a similar context (i.e., games or practices) to impute missing values, the imputation error was not better than that of the original analysis based on all sessions. It was expected that differences between practices and games would provide additional context and improve the prediction of missing data. For example, a player that does not play much in games might have higher rate of jumps per hour in practices than games, or a player may perceive a harder effort in games compared to practices. The lack of observed differences between the analyses with all sessions and games or practices suggests that differences in jump count and RPE between practices and games are not systematic in youth (high school) basketball and therefore using only data in this context is not useful for imputation. However, it may be that only using practice or game data to impute missing values is beneficial for just one scenario, as is done in studies where the individual mean is used to predict missing game data and the team mean is used to predict missing practice data (Bowen et al., 2019; Bowen et al., 2017; Colby et al., 2014; Jaspers et al., 2018; Vescovi and Klas, 2018). It is also possible that by only using data from games or practices, the reduced quantity of data contributed to a weaker prediction thus negating any potential benefits of the additional context. Perhaps longer monitoring periods (e.g., multiple years) would improve the performance of more context-based imputation.

Implementation of imputation methods will be of interest to both sport practitioners and researchers interested in longitudinal monitoring of athlete workload, however, it is reasonable to question the practicality of the best imputation methods in all situations. Using several predictor variables in a regression model works best when a lot of data can be used to train the model, which is likely not feasible for a coach attempting to impute missing data early in a season. Additionally, proper execution of a regression model or analyses based on multiple imputation may be difficult for sport practitioners without the use of validated software. In these cases, a single imputation method that most closely represents the case of the missing data may be the most practical approach for imputing missing workload data.

A few limitations and areas for future study are acknowledged. First, only player-sessions where there was full participation were used, ignoring the effects of partial participation. Another limitation is that other available non-workload information was ignored which may improve the imputation error. Other recorded variables such as playing position, session duration, injury status, height, etc. could potentially be used in a regression model to impute missing data (Gelman and Hill, 2006; Patrician, 2002; Van Buuren, 2018). Future investigations may consider the number of variables and the importance of each variable in models to predict missing data. Also, measures of internal workload commonly report session RPE (sRPE), which is the session duration in minutes times the reported RPE value, instead of just the RPE value (Bourdon et al., 2017). For the purposes of this study, predicting internal workload was only reliant on estimating the actual RPE value, since the session duration was a known value, and sRPE was calculated once the RPE values were imputed.

CONCLUSION

In conclusion, this study has shown differences in error depending on the method used for single imputation of missing workload data. A single imputation method based on the specific context of the session for which data are missing (team mean) is only outperformed by methods that combine information about the session and the individual (machine learning models). The amount and nature of the missing data should be considered when choosing a method for single imputation of workload data in youth basketball. Multiple imputation using several predictor variables in a regression model can be used for analyses where workload is accumulated across an entire season.

ACKNOWLEDGEMENTS

The coaches, athletes and team representatives of the Calgary Senior High School Athletic Association; John Choi for assistance with data collection. The Sport Injury Prevention Research Centre is one of the International Research Centres for Prevention of Injury and Protection of Athlete Health supported by the International Olympic Committee. This study was funded by the National Basketball Association and General Electric Healthcare (NBA/GE). LCB is funded through a Canadian Institutes of Health Research Postdoctoral Fellowship (MFE – 164608). CAE is supported by a Canada Research Chair (Tier 1). The experiments comply with the current laws of the country in which they were performed. The authors have no conflict of interest to declare. The datasets generated during and/or analyzed during the current study are not publicly available, but are available from the corresponding author who was an organizer of the study.

AUTHOR BIOGRAPHY


		Lauren C. Benson
		Employment:United States Olympic & Paralympic Committee
		Degree: PhD
		Research interests: Biomechanics, Wearables
		E-mail: lauren.benson@ucalgary.ca


		Carlyn Stilling
		Employment:University of Calgary
		Degree: BS
		Research interests: Sport Injury Prevention
		E-mail: cmstilli@ucalgary.ca


		Oluwatoyosi B.A. Owoeye
		Employment:Saint Louis University
		Degree: PT, PhD
		Research interests: Sport Injury Prevention
		E-mail: olu.owoeye@health.slu.edu


		Carolyn A. Emery
		Employment:University of Calgary
		Degree: PT, PhD
		Research interests: Sport Injury Prevention
		E-mail: caemery@ucalgary.ca

REFERENCES

Benson L.C., Tait T.J., Befus K., Choi J., Hillson C., Stilling C., Grewal S., MacDonald K., Pasanen K., Emery C.A. (2020) Validation of a commercially available inertial measurement unit for recording jump load in youth basketball players. Journal of Sports Sciences 38, 928-936.

Black G.M., Gabbett T.J., Johnston R.D., Cole M.H., Naughton G., Dawson B. (2018) The Influence of Physical Qualities on Activity Profiles of Female Australian Football Match Play. International Journal of Sports Physiology & Performance 13, 524-529.

Bourdon P.C., Cardinale M., Murray A., Gastin P., Kellmann M., Varley M.C., Gabbett T.J., Coutts A.J., Burgess D.J., Gregson W. (2017) Monitoring athlete training loads: consensus statement. International Journal of Sports Physiology and Performance 12, 161-170.

Bowen L., Gross A.S., Gimpel M., Bruce-Low S., Li F.X. (2019) Spikes in acute:chronic workload ratio (ACWR) associated with a 5-7 times greater injury rate in English Premier League football players: a comprehensive 3-year study. British Journal of Sports Medicine 54, 731-738.

Bowen L., Gross A.S., Gimpel M., Li F.X. (2017) Accumulated workloads and the acute:chronic workload ratio relate to injury risk in elite youth football players. British Journal of Sports Medicine 51, 452-459.

Colby M.J., Dawson B., Heasman J., Rogalski B., Gabbett T.J. (2014) Accelerometer and GPS-derived running loads and injury risk in elite Australian footballers. Journal of Strength & Conditioning Research 28, 2244-2252.

Curtis R.M., Huggins R.A., Looney D.P., West C.A., Fortunati A., Fontaine G.J., Casa D.J. (2018) Match demands of National Collegiate Athletic Association Division I men’s soccer. Journal of Strength & Conditioning Research 32, 2907-2917.

DeWitt J.K., Gonzales M., Laughlin M.S., Amonette W.E. (2018) External loading is dependent upon game state and varies by position in professional women’s soccer. Science & Medicine in Football 2, 225-230.

Drew M.K., Finch C.F. (2016) The relationship between training load and injury, illness and soreness: a systematic and literature review. Sports Medicine 46, 861-883.

Duhig S., Shield A.J., Opar D., Gabbett T.J., Ferguson C., Williams M. (2016) Effect of high-speed running on hamstring strain injury risk. British Journal of Sports Medicine 50, 1536-1540.

Eckard T.G., Padua D.A., Hearn D.W., Pexa B.S., Frank B.S. (2018) The relationship between training load and injury in athletes: a systematic review. Sports Medicine 48, 1929-1961.

Enders, C.K. (2010) Applied missing data analysis. Guilford press.

Esmaeili A., Hopkins W.G., Stewart A.M., Elias G.P., Lazarus B.H., Aughey R.J. (2018) The individual and combined effects of multiple factors on the risk of soft tissue non-contact injuries in elite team sport athletes. Frontiers in Physiology 9, 1280.

Foster C., Daines E., Hector L., Snyder A.C., Welsh R. (1996) Athletic performance in relation to training load. Wisconsin Medical Journal 95, 370-374.

Foster C., Florhaug J.A., Franklin J., Gottschall L., Hrovatin L.A., Parker S., Doleshal P., Dodge C. (2001) A new approach to monitoring exercise training. The Journal of Strength & Conditioning Research 15, 109-115.

Gabbett T.J. (2018) Debunking the myths about training load, injury and performance: empirical evidence, hot topics and recommendations for practitioners. British Journal of Sports Medicine 54, 58-66.

Garson, G.D. (2015) Missing values analysis and data imputation. Asheboro, NC: Statistical Associates Publishers.

Gelman, A. and Hill, J. (2006) Missing-data imputation. In: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. 529-544.

Haddad M., Stylianides G., Djaoui L., Dellal A., Chamari K. (2017) Session-RPE Method for Training Load Monitoring: Validity, Ecological Usefulness, and Influencing Factors. Frontiers in Neuroscience 11, 612.

Ibrahim J.G., Molenberghs G. (2009) Missing data methods in longitudinal studies: a review. Test (Madrid, Spain) 18, 1-43.

Impellizzeri F.M., Marcora S.M., Coutts A.J. (2019) Internal and external training load: 15 years on. International Journal of Sports Physiology and Performance 14, 270-273.

Jaspers A., Kuyvenhoven J.P., Staes F., Frencken W.G.P., Helsen W.F., Brink M.S. (2018) Examination of the external and internal load indicators’ association with overuse injuries in professional soccer players. Journal of Science and Medicine in Sport 21, 579-585.

Jerez J.M., Molina I., García-Laencina P.J., Alba E., Ribelles N., Martín M., Franco L. (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intelligence in Medicine 50, 105-115.

Jones C.M., Griffiths P.C., Mellalieu S.D. (2017) Training load and fatigue marker associations with injury and illness: a systematic review of longitudinal studies. Sports Medicine 47, 943-974.

Lupo C., Tessitore A., Gasperi L., Gomez M. (2017) Session-RPE for quantifying the load of different youth basketball training sessions. Biology of Sport 34, 11-17.

Lupo C., Ungureanu A.N., Frati R., Panichi M., Grillo S., Brustio P.R. (2019) Player Session Rating of Perceived Exertion: A More Valid Tool Than Coaches’ Ratings to Monitor Internal Training Load in Elite Youth Female Basketball. International Journal of Sports Physiology & Performance 15, 548-553.

Martín-García A., Casamichana D., Gómez Díaz A., Cos F., Gabbett T.J. (2018) Positional Differences in the Most Demanding Passages of Play in Football Competition. Journal of Sports Science & Medicine 17, 563-570.

Moran L.R., Hegedus E.J., Bleakley C.M., Taylor J.B. (2019) Jump load: capturing the next great injury analytic. British Journal of Sports Medicine 53, 8-9.

Patrician P.A. (2002) Multiple imputation for missing data. Research in Nursing & Health 25, 76-84.

Rago V., Brito J., Figueiredo P., Krustrup P., Rebelo A. (2019) Relationship between External Load and Perceptual Responses to Training in Professional Football: Effects of Quantification Method. Sports (Basel, Switzerland) 7, 68.

Skazalski C., Whiteley R., Bahr R. (2018) High jump demands in professional volleyball-large variability exists between players and player positions. Scandinavian Journal of Medicine & Science in Sports 28, 2293-2298.

Smpokos E., Mourikis C., Linardakis M. (2018a) Differences in motor activities of Greek professional football players who play most of the season (2016/17). Journal of Physical Education & Sport 18, 490-496.

Smpokos E., Mourikis C., Linardakis M. (2018b) Seasonal physical performance of a professional team’s football players in a national league and European matches. Journal of Human Sport & Exercise 13, 720-730.

Soligard T., Schwellnus M., Alonso J.M., Bahr R., Clarsen B., Dijkstra H.P., Gabbett T., Gleeson M., Hagglund M., Hutchinson M.R., Janse van Rensburg C., Khan K.M., Meeusen R., Orchard J.W., Pluim B.M., Raftery M., Budgett R., Engebretsen L. (2016) How much is too much? (Part 1) International Olympic Committee consensus statement on load in sport and risk of injury. British Journal of Sports Medicine 50, 1030-1041.

Vahia D., Kelly A., Knapman H., Williams C.A. (2019) Variation in the Correlation Between Heart Rate and Session Rating of Perceived Exertion-Based Estimations of Internal Training Load in Youth Soccer Players. Pediatric Exercise Science 31, 91-98.

Van Buuren, S. (2018) Flexible imputation of missing data. CRC press.

Van der Worp H., de Poel H.J., Diercks R.L., Van Den Akker-Scheek I., Zwerver J. (2014) Jumper’s knee or lander’s knee? A systematic review of the relation between jump biomechanics and patellar tendinopathy. International Journal of Sports Medicine 35, 714-722.

Vescovi J.D., Klas A. (2018) Accounting for the warm-up: describing the proportion of total session demands in women’s field hockey - Female Athletes in Motion (FAiM) study. International Journal of Performance Analysis in Sport 18, 868-880.

Wang C., Vargas J.T., Stokes T., Steele R., Shrier I. (2020) Analyzing Activity and Injury: Lessons Learned from the Acute: Chronic Workload Ratio. Sports Medicine , 1-12.

Wellman A.D., Coad S.C., McLellan C.P., Goulet G.C. (2017) Quantification of accelerometer derived impacts associated with competitive games in National Collegiate Athletic Association Division I college football players. Journal of Strength & Conditioning Research 31, 330-338.

Whitehead S., Till K., Weaving D., Hunwicks R., Pacey R., Jones B. (2019) Whole, half and peak running demands during club and international youth rugby league match-play. Science & Medicine in Football 3, 63-69.

Windt J., Ardern C.L., Gabbett T.J., Khan K.M., Cook C.E., Sporer B.C., Zumbo B.D. (2018) Getting the most out of intensive longitudinal data: a methodological review of workload–injury studies. British Medical Journal Open 8, e022626.

Windt J., MacDonald K., Taylor D., Zumbo B.D., Sporer B.C., Martin D.T. (2020) “To Tech or Not to Tech?” A Critical Decision-Making Framework for Implementing Technology in Sport. Journal of Athletic Training 55, 902-910.

Back

PDF

Email link to this article

Evaluating Methods for Imputing Missing Data from Longitudinal Monitoring of Athlete Workload

Lauren C. Benson, Carlyn Stilling, Oluwatoyosi B.A. Owoeye, Carolyn A. Emery

2021(20), 188 - 196.

Share this article