Research article - (2023)22, 475 - 486 DOI: https://doi.org/10.52082/jssm.2023.476 |
Predicting Injury and Illness with Machine Learning in Elite Youth Soccer: A Comprehensive Monitoring Approach over 3 Months |
Nils Haller1,2,, Stefan Kranzinger3, Christina Kranzinger3, Julia C. Blumkaitis1, Tilmann Strepp1, Perikles Simon2, Aleksandar Tomaskovic2, James O’Brien4, Manfred Düring4, Thomas Stöggl1,4 |
Key words: Football, artificial intelligence, injury prevention, load management, load monitoring |
Key Points |
|
|
|
Ethical approval |
The local human ethics committee in Salzburg (GZ 02/2021) approved the experimental design. All procedures were in accordance with the standards of the Declaration of Helsinki of the World Medical Association. Participants were informed about the study both verbally and in writing and gave their written informed consent. |
Participants and setting |
Twenty-five male players (age: 16.6 ± 0.9 years, height: 178 ± 7 cm, weight: 74 ± 7 kg, VO2max: 59 ± 4 ml/min/kg) of an elite European youth soccer team (first national league, UEFA Youth League participant) were included in this study. Following one familiarization session in which participants were informed about the objectives of the study, data were collected over a three-month period from September to December during the 2021/2022 regular season. During this process, the researchers had no influence on the training program, and the coaching staff did not receive feedback on the preliminary results before the study was completed. A standardized set-up with test stations was used each week to ensure consistency and comparability of measurements. The training focus and number of training sessions per day were also identical across all weeks, with small fluctuations when additional matches were scheduled in midweek. All testing was integrated into the regular training schedule replicating a real-life scenario of an entire soccer team. Specifically, participants were asked to complete questionnaires each morning (AM) and evening (PM). Strength and conditioning (S&C) training was performed two mornings each week, with hamstring and abductor/adductor performance tests as part of the S&C training on match day (MD) -4 (days). Twice a week, venous blood was drawn under resting conditions, prior to training, in a fasted state (MD -4, and -2), followed by CMJ testing. Players had previous experience with the procedures used in the study (i.e., hamstring and abductor/adductor, CMJ performance tests, questionnaires, but not venous blood sampling) prior to the start of the study. Team soccer training and matches were consistently monitored with a local positioning system (LPS). |
MeasuresPerformance, injury, and illness |
Performance data (e.g., distance covered, heart rate, high metabolic power distance (HMPD), training impulse (TRIMP) (Stagno et al., |
Physiological exercise testing prior to season start |
Players performed physiological exercise testing prior to the season to determine maximal oxygen uptake (VO2max), peak running speed (Vpeak) and lactate threshold using a 2-phase (submaximal step-wise and maximal ramp) test as previously described (Stöggl et al., |
Questionnaires |
Using cluster analysis of our pilot study data (Haller et al., |
Nordic hamstring strength |
Eccentric hamstring strength was measured with the Nordic hamstring exercise on the Nordbord device (Vald Performance, Albion, Australia) (Opar et al., |
Hip abduction, adduction strength |
Isometric force of hip abduction and hip adduction was measured using the ForceFrame device (Vald Performance, Albion, Australia) on MD -4. After a general warm-up of 12-15 min, players were barefoot in the supine position, with their arms crossed in front of the chest. The hips were positioned in 0° flexion and neutral rotation. The medial malleoli were centered over the inner load cells for adduction, and the lateral malleoli over the outer load cells to test abduction. A single repetition at 100% was performed in both abduction and adduction. As shown in our pilot study the maximum values for both abduction and adduction occur in the vast majority of cases in the first repetition. Each repetition was held for 5 s, with a 10 s break between repetitions. Verbal encouragement in the form of, “3, 2, 1 push, push, push“ was given (Haller et al., |
Neuromuscular performance |
The CMJ as a proxy of neuromuscular performance was performed on a split force plate (Forcedecks, VALD Performance, Albion, Australia), with arms fixed at the hip. To save time, the jumps were integrated into the 15-min team warm-up treadmill running session in which the players rotated to perform the jumps and then continued treadmill running. The order of the players to perform the jumps remained the same throughout the study period. Following two warm-up jumps (while waiting for the jumps on the force plate), two maximal jump attempts were performed in a standardized order (Gathercole et al., |
Blood collection |
Venous blood (~ 3-5 ml) was collected at rest, in a fasted condition, on days MD -4 and -2 by certified medical staff, and analyzed for, i) cell-free DNA (cfDNA) levels, ii) hematological blood count and iii) further established blood parameters. For cfDNA analyses, blood was immediately centrifuged after collection at 1600 x g for 10 min. The plasma was then stored at < -20° C. Briefly, plasma was diluted 1:10 in H2O and served as a template for qPCR based on amplification of a 90-base pair sequence within the L1PA2 transposon. A CFX384 Touch™ real-time PCR system (Bio-Rad, Munich, Germany) was used to analyze the blood samples according to the following protocol: Denaturation at 98° C for 2 min, 35 cycles of melting at 95° C for 10 s, annealing at 64° C for 10 s, followed by a melting curve (Neuberger et al., |
Statistical analysis |
Feasibility was determined by the number of adverse events and discontinuations during the study period. Adherence, which was calculated using the following formula: the number of completed tests or questionnaires performed divided by the total number of scheduled tests or questionnaires (i.e., (completed/scheduled) x 100 to express as percentage). In addition, we targeted three classification tasks: We evaluated the ability to predict a non-contact injury (yes/no), based on data from the most recent monitoring session. Second, we evaluated the ability to predict illness (yes/no) based on the most recent blood data. Third, we evaluated the association between illness (yes/no) with blood data from the same day, to determine whether current illness can be identified via the blood variables. For all three classification tasks, we excluded two participants due to missing data (one player was injured during the entire study period; for another player, only 2 weeks of data were available due to injury and illness), resulting in a total number of 23 participants. For the analysis, 18 participants were randomly selected as the training data set and 5 participants were selected as the test data set. The allocation of training and test data sets remained the same for all three tasks, to facilitate comparison of the results between tasks. The training data set for injury prediction included seven data points (indicating the presence or absence of an injury/illness) with and 1078 without an injury, while the test data set has four data points with and 296 without an injury. The training data set for illness prediction consists of 11 data points with and 272 without an illness, while the test data set has one data point with and 59 data points without an illness. For illness determination, the training data set contained 9 data points with and 168 without an illness, while the test data set has one data point with and 41 data points without an illness, which leads to a highly imbalanced data set ( |
Oversampling |
In imbalanced data sets standard classification methods tend to ignore the minority class and may be dominated by the majority class (Guo et al., In view of the mixed variable types (categorical and numerical variables) in injury prediction, we applied the Synthetic Minority Over-sampling Technique-Nominal Continuous (SMOTE-NC) (Chawla et al., We have opted for the Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN) algorithm (He et al., |
Classification |
For the classification purpose, we applied several machine-learning algorithms such as tree-based methods, naive bayes, or neural networks. Ultimately, a simple linear SVM (Hearst et al., To assess the importance of variables for each classification task, we employed the caret package (Kuhn, To train the algorithm, we used a two-fold cross-validation and tried 10 different values per algorithm parameter and chose the parameters that showed the highest area under the ROC curve. The best model was finally used to evaluate the test data. To create the balanced data set, we used the RSBID package (Wu, |
Performance metrics |
As metrics to evaluate the classification tasks we chose accuracy, Cohen’s Kappa, precision, and recall. Accuracy is the ratio between the sum of true-positive and true-negative predictions, divided by the sum of positive and negative observations. Precision represents the ratio between the sum of true-positives divided by the sum of predicted-positives, while recall shows the sum of true-positives divided by the sum of positive observations. Cohen’s Kappa (Cohen, |
Data pre-processing |
For the evaluation of the injury prediction task, we used a total of 65 training load variables (all 65 variables are outlined in Appendix 3). To better detect anomalies of the respective participants, we scaled the load variables per participant. We also took the EWMA of the last two training sessions of each participant where the player was not injured. In addition, we calculated the ACWR, i.e., the ratio between the mean value of the respective load variables of the last seven days (x7) and the mean value between the eighth and the 28th day (x8-28) of a respective date. For the evaluation, we also took the values of x7 and x8-28 into account. Due to missing data, it was not possible to calculate EWMA, x7, x8-28 and ACWR (Gabbett, In addition, we used two performance parameters (Vpeak, VO2max), three items of the questionnaires (sleep quality, sleep duration, and muscle fatigue), two blood variables (CK and cfDNA), two jump variables (jump height impulse max, concentric peak force max) and a dummy whether the participant had a physiotherapist treatment on the respective day. Since these data were collected less frequently than the training load variables covered by LPS, we could not simply merge these data sets. Therefore, we used two different approaches to solve this problem by including these variables via clustering approaches inspired by Rossi et al. ( As information on performance parameters was only available for two time points (before and after the study period), a simple k-means algorithm of the R package factoextra (Kassambara and Mundt, |
|
|
Aspects of feasibility |
No adverse events in the form of injury or dropout were noted during the study period, although some players had concerns about repeated blood sampling. |
Results of prediction tasks |
To investigate the most important variables for classification of each task |
|
|
The overall goals of this study were to i) demonstrate the feasibility of a comprehensive monitoring approach being fully integrated in the training process, and ii) test the predictive accuracy of a machine learning approach in terms of injury and illness prediction using the combination of external training load data and a variety of objective (neuromuscular performance and strength testing, biomarkers, heart rate) and subjective (questionnaire) internal load and recovery measures in an elite youth soccer team. It has been demonstrated that it is possible to develop machine learning models to predict injuries and to detect and predict illnesses. |
Principal findings |
In general, the integration of a holistic monitoring approach into the training regime can only succeed if the following factors are present: 1) coach buy-in, 2) cost, time, and logistical prerequisites, 3) team adherence, 4) an interdisciplinary team, and 5) the benefits coaches see in an empirically based measure (Akenhead and Nassis, To develop the best performing model, many variables were taken into account. However, in order to simplify data collection in the future the most relevant parameters were identified by a ROC curve variable importance analysis. For injury prediction, the three most important variables were sleep quality, the ACWR of tempo runs (> 19.8 km/h) and CMJ jump height. For illness prediction, ferritin, CRP and percentage EOS were identified to be most important, for illness determination CRP, creatinine and percentage EOS. For injury prediction, one of four injuries present in the test data set was detected and 96.3% of all data points were detected correctly. For illness prediction and determination, only one illness was present in the test data set, as the same random split of players into training or test data set for all three classification tasks was used. However, this data point was detected by the linear SVM for both illness prediction and determination. Unfortunately, the model showed quite low precision values for both predictive tasks. Thus, the model tends to predict false-positive injuries and illnesses. Differences in accuracy between illness determination and prediction appear reasonable because in some cases there was some time lag between blood collection and illness onset. Thus, the accuracy is not expected to be perfect for the prediction task. In addition, it should be noted that throughout the study period, COVID illnesses occurred as well as illnesses that may have been specifically related to the high density of training and competition. In practical terms, applying the present model could lead to over-estimation of players’ risk of injuries and illnesses. On the other hand, the model works perfectly to detect illness, driven mainly by the CRP variable, which was shown to have the highest importance among blood variables. According to a categorization framework presented by Landis and Koch (Landis and Koch, During the study period of three months, there were too few injuries and illnesses from a statistical perspective to develop a better predictive model. Comparable studies covered longer periods of at least about half a year or more and included more injuries in their predictive models (Rommers et al., Since many variables (blood, jumps) were collected less frequently and over a shorter period compared to the training load data, this information was included in the machine learning models via clusters leading to factor variables. Therefore, the information on blood, jumps and fitness was aggregated and not included as detailed as would be possible with more frequent measures. When using longitudinal clusters to include the blood, questionnaire and jump data in the injury prediction analysis, we were faced with a decreasing willingness to fill out the questionnaires. Therefore, the number of missing values increased over time, and we could not use these data over the entire survey period. While there was an increase in questionnaire adherence compared to our pilot study (Haller et al., |
Practical applications |
The strength of machine learning approaches is to not only seek linear relationships and consider only one or two parameters, but to consider specifically the interaction between many variables, which may be necessary due to the multifactorial nature of injury and illness. In the present study, training load data, questionnaire scores and blood variables were found to be potentially associated with impending injury or illness. Unfortunately, it is not possible to draw conclusions about the specific direction of the variables, which might be possible with other statistical methods. While tracking variables have been associated with impending injury, blood variables could be useful for early detection of illness. Specifically, illness determination showed the best model performance, but the precision was low for prediction tasks. From a practical point of view, this may lead to overcautious reactions from practitioners so far. Hence, it is necessary to test whether the identified important variables persist by feeding the algorithm with additional data points and to observe the evolution of the accuracy measures. The long-term goal is to capture the critical variables with minimal effort (i.e., omitting unnecessary methods to minimize human effort and avoid large amounts of data) or minimally invasive, e.g., for blood with point-of-care devices in the future. The use of minimally invasive point-of-care methods would also allow for more frequent blood collection than two days per week, and it is not recommended to collect venous blood at this regularity anyway, as this would impose an immense burden on the athletes in the long term (Carling et al., Strategies to further reduce the large amount of redundant data (e.g., tracking, blood and CMJ variables) are recommended and discussed elsewhere. Lastly, it should be noted that the study required a highly professional environment and significant human resources (two physicians for blood collection, two physiotherapists for hip abduction/adduction, two practitioners for the CMJ). The blood collection required staff and sophisticated equipment for the analysis including sophisticated qPCR methods for the determination of cfDNA. Thus, the approach is feasible for financially strong clubs, but not practical and cost-effective for non-elite clubs or professional clubs with limited resources, unless the machine models perform better. |
|
|
A holistic approach to monitoring training load and training load response was successfully integrated into regular practice, and many variables indicative of the occurrence of injury and illness were identified. Whereas conventional statistical approaches have the disadvantage of focusing, for example, on factors linearly associated with injury and disease by performing regression analyses, we provide an approach to consider interactions among a large number of variables potentially associated with illness and injury. Future studies can build on our initial results and apply a longer study period with more data points to further train the algorithms and determine if the variables identified are truly critical. Further statistical methods can then be used i) to reveal and interpret the direction of the variables, or ii) the crucial variables can be used in practice over a longer period of time so that changes can be identified on an individual basis allowing early interventions to be made. |
ACKNOWLEDGEMENTS |
We would like to thank all the players, practitioners, and club officials who agreed to, planned, participated in, helped with, or conducted the study. The study is a cooperation project between the University of Salzburg, the University of Mainz and the Red Bull Athlete Performance Center. The study receives funding from the Red Bull Athlete Performance Center for the scientific accompaniment of the monitoring concept in the context of which data for the current study were collected. Christina and Stefan Kranzinger acknowledge the financial support by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology and Land Salzburg under Contract No. 2021-0.641.557. All experiments comply with the current laws of the country in which they were performed. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The data sets generated and analyzed during the current study are not publicly available but are available from the corresponding author, who was an organizer of the study. |
AUTHOR BIOGRAPHY |
|
REFERENCES |
|
Email link to this article