Table 1. Characteristics of the included studies.
Study Sport Competitive Context Sample (n, sex, age range, maturation) Data Domains & Sources ML Approach (class, algorithms, model development) Validation Strategy
(Abidin and Erdem, 2025) Multiple (Football, Basketball, Volleyball, Athletics; plus “Others”) Sports high school entrance (youth selection, ages 14–16) n=2222 (620Females/1602Males), 14–16 y, maturation not reported Device-based physical tests (coordination via Spark, 30m sprint, vertical jump via JumpR, rhythm test) + coach evaluations (17 criteria: physical, reaction, specialism, psychological) Deep learning (Shallow Deep Learning for Stage 1; novel Split-Combine-Merge Deep Learning [SCM-DL] for Stage 2); compared with Random Forest, Decision Tree, Extra Trees, SVC; nine feature selection methods (RFE variants, SelectKBest, Lasso, Boruta) Train/test splits (70/30 and 80/20), k-fold cross-validation (3, 5, 7 folds); comparisons with multiple classifiers
(Abidin, 2021) Football (soccer) Altınordu Football Academy, U13 youth n=21 field players (goalkeepers excluded), all male, age ≈ 13 y, maturation not reported; synthetic augmentation expanded to 231 instances Training performance data via Hit/it Assistant (reaction times, coordination, speed, agility, etc.) + Coach evaluations across 18 qualitative/quantitative criteria (converted to numeric) Supervised learning; seven algorithms tested in WEKA: ANN (MLP), SVM (SMO), Logistic Model Tree (LMT), Logistic regression, Naïve Bayes, Random Forest, CART. Dataset combined real + synthetic instances; preprocessing included normalization and derived position scores (D/M/F) 10-fold cross-validation
(de Almeida-Neto et al., 2023) Football (soccer) National-level youth athletes (club teams, ~V-level competition) and sports initiation program n=75 males, 12–16 y (mean 13.3 ± 1.65), ~13% SI practitioners, 87% athletes; somatic maturation estimated (PHV categories: pre-, circa-, post-PHV) Morphological (anthropometry, DXA: body mass, height, leg length, sitting height, body composition, BMC/BMD) + Neuromuscular (handgrip, medicine ball throw, vertical jump, countermovement jump via force plate) Supervised deep learning; multilayer perceptron (MLP) artificial neural networks with backpropagation; z-scores used to normalize by sport/age; tested morphological, neuromuscular, and combined models Train/test split (70/30) with cross-validation (10 repeated runs; all participants rotated through training/testing); ~10,000 training iterations
(Altmann et al., 2024) Football (soccer) German Bundesliga youth academy (U12–U19) n=13,876 players (96% male), 11–19 y; maturation not explicitly reported but age categories (U12–U19) considered Longitudinal match-derived data: ~32 million events across 10 years; position-specific technical/tactical features; aggregated spatiotemporal event-based data Supervised ML; Gradient Boosted Decision Trees (LightGBM); models built separately per playing position; hyperparameter tuning with Bayesian optimization; features reduced with domain knowledge + automated selection Nested cross-validation (inner loop for hyperparameter optimization, outer loop for model evaluation); train/test splits by season; temporal separation to avoid leakage
(Brown et al., 2024) Cricket County Age Group (CAG) programme, final trial stage n=82 male players, 14–17 y (mean 15.3 ± 1.1); selected n=33, non-selected n=49; ethnicity: White British n=34, British South Asian n=44, Other n=4; maturation estimated (age at PHV, maturity offset) Multidimensional: (a) physiological & anthropometrical (Yo-Yo test, sprint tests, jumps, planks, body size, weight, PHV), (b) perceptual–cognitive (video occlusion batting test), (c) psychological (PCDEQ + multiple psychosocial questionnaires), (d) participation history (practice/game history, multi-sport), (e) socio-cultural (ethnicity, schooling, siblings, birth quarter, postcode) Supervised ML: Bayesian binomial regression (rSTAN); dimensionality reduction via correlation clustering → 21 derived features; weak normal prior Cross-validation not reported; model convergence checks (posterior intervals, n_eff, BFMI) used for validation; sensitivity to ethnicity effects tested with interaction models
(Contreras-García et al., 2024) Basketball Spanish U14 Minicopa (youth) vs. Liga Endesa (professionals, comparator group) n=217 U14 male players, 13–14 y; n=391 professional players; maturation not reported Match-derived shooting charts (field goal attempts by location, 2020–21 & 2021–22 seasons) Unsupervised ML (k-means and KNN clustering to classify shooting zones); outlier detection (IQR-based model) to identify “specialist shooters” 5-fold cross-validation for cluster classification; train/test split (20/80) for KNN consistency
(Cornforth et al., 2015) Australian Rules Football Elite professional players (AFL) n=44 males, mean age 20 y, ~85.7 kg; maturation not reported Physiological: daily ECG-derived HRV measures (time-, frequency-, and non-linear domain); Contextual: field size dimensions, match-day temperatures; Performance outcomes: GPS-derived match load, distance, speed zones Supervised ML regression; seven algorithms in WEKA: Gaussian Processes, Linear Regression, LeastMedSq, Multilayer Perceptron, PLS Classifier, RBF Network, SMOreg; feature selection via PCA vs. wrapper subset + Genetic Algorithm 10-fold cross-validation; train/holdout splits tested
(Craig and Swinton, 2021) Football (soccer) Elite Scottish soccer academy (U10–U17), 10-year follow-up n=512 male players, aged 10–17 at entry; 100 awarded pro contracts; maturation not directly reported; strong relative age effect observed Anthropometric (height, weight, BMI) and physical performance (5, 10, 20m sprint times; countermovement jump; Yo-Yo IR1) collected longitudinally (1–14 sessions/player) Supervised ML: LASSO logistic regression (with mixed-effects models for associations); multiple imputation for missing data 10-fold cross-validation to tune LASSO penalty; bootstrap (10,000 samples); train/test split (2/3–1/3) for predictive evaluation
(Duncan et al., 2024) Football (soccer) Grassroots club football in England (County FA structure) n=162 boys, 7–14 y (mean 10.5 ± 2.1); biological maturation via APHV (Moore equation) Anthropometry; maturity offset (APHV); fundamental movement skills via TGMD-3 with video scoring; perceived physical competence (PPASC); physical fitness (15 m sprint speed—timing gates; standing long jump); coach ratings (technical, social, physical, effort, overall); birth-quartile; technical skill test: UGent dribbling (procedural details reported) Supervised ML: linear, ridge, lasso, random forest, boosted trees; recursive feature elimination; L1/L2 regularisation; collinearity control; Python implementation Train/validation/test split 80/10/10 per age band; 5-fold cross-validation; age-band stratification to avoid leakage/under- representation
(Formenti et al., 2022) Volleyball Youth Italian championship, regional vs. provincial levels n=26 female players (13 regional, 13 provincial), 13–15 y; maturation not reported Volleyball-specific skill battery (setting, passing, spiking, serving; accuracy + technique); Physical performance (modified T-test COD, CMJ); Cognitive (Flanker task – executive control; Visual Search task – perceptual speed) Supervised ML: Linear Discriminant Analysis, Logistic Regression, SVM, Decision Tree; features = volleyball skills + physical + cognitive measures Stratified 5-fold cross-validation
(Ge, 2024) Basketball Secondary school training teams n=40 (20 boys, 20 girls), adolescents (~13–15 y); maturation not reported Physical fitness tests (lung capacity, standing long jump, grip strength, 1000 m run boys / 800 m run girls); ~5000 test data entries used for model training/validation Unsupervised feature learning (CNN + Autoencoder); Gaussian Mixture Model (EM algorithm for parameter estimation); model termed CNN-AE-MG Train/test split 4:1 (4000/1000 records); ablation comparisons vs. CNN, CNN-AE, CNN-AE-SG; consistency tested with Bland-Altman plots
(Gogos et al., 2020) Australian Rules Football AFL U18 National/State/other combines; relates combine to senior career outcomes (retired/delisted cohort) n=1,488 combine attendees (1999–2016); summary models on n=536 with ≥1 AFL player rating; mean age ≈18.5 y Combine anthropometrics & physical tests (e.g., 20 m sprint, Yo-Yo IR, jumps), plus draft order & position; career outcomes from AFLTables & Champion Data Linear models for ratings/ rankings; boosted regression trees for matches played (gradient-boosted ML) Model fit assessed with BIC; no external validation; retrospective explanatory analysis
(Jauhiainen et al., 2019) Football (soccer) National TID database; focus on 14-y Finnish juniors and “academy player” labelling N=951 14-year-old boys; minority “academy” class n≈14; tests/events 2011–2017 Physical tests (technical, speed, agility) + self-assessment (perceived competence, tactical skills, motivation) collected at biannual events; several data representations (phys, quest, combined) One-class SVM (RBF) framed as anomaly detection to flag potential elite; PCA for decorrelation; k-NN imputation Performance evaluated with AUC-ROC on held labels after unsupervised training; mean AUC ~0.763 across hyperparameters
(Jennings et al., 2024) Australian Rules Football Elite-junior AFL talent pathway; prospective prediction of 2021 National Draft n=708 males; train 2017–2020 (n=465), prospective test 2021 (n=243) Physical testing, in-game movement (GPS), and technical involvements; league-wide multi-season dataset Logistic regression vs neural networks to classify drafted vs not drafted; operating at multiple cut-off thresholds Prospective external hold-out (2017–20→2021) with sensitivity/specificity/ accuracy comparisons
(Kelly et al., 2022) Football (soccer) English professional academy; U9–U16 development and U18 selection/ deselection (contract) Study 1: n=98, U9–U16; Study 2: n=18, U18 (male) Multidomain 53 features across 8 methods over 2 seasons (technical/tactical, physical, psychological, social; e.g., PCDEQ, maturation %PAH, match hours) Penalized regression (cross-validated LASSO via glmnet) predicting (a) review ratings; (b) achieving a pro contract Cross-validation (CV) noted for LASSO; internal only
(Kilian et al., 2023) Football (soccer) Youth elite soccer talent-promotion program (DFB) — methodological evaluation on real program data Sample details not fully specified in abstract text; applied to a set of multidimensional performance assessments within the program (youth cohort) Multidomain performance battery used for latent factor structure; evaluation contrasts with PCA; study funded by DFB talent program Deep latent-variable factor model: VAE estimator with importance-weighted variational inference + normalizing-flow priors; linear, identifiable measurement model (generalized EFA) Robustness discussed; no classic predictive CV—focus is dimensionality reduction and identifiability; (not a selection classifier)
(López-De-Armentia, 2024) Football (soccer) Multi-league women’s scouting context; data scarcity/coverage issues addressed by tool ~12,000 players tracked across ~30 leagues; basic roster & participation info (adults and youth) Aggregated web-sourced player metadata (age, position, height, market value, contracts, injuries) and minutes played; alert generation pipeline Rule-/criteria-driven alerts; “AI-powered” extraction mentioned, but no supervised model for TID classification is specified Expert usability evaluation; no predictive CV/hold-out
(Owen et al., 2022) Rugby Union Regional age-grade academy selection (U16 & U18) in North Wales; talent camps n=104 male; Mage=15.47±0.80; U16 n=62; U18 n=42; 66 selected/38 not 21 physiological (demographics, anthropometrics, sprint/power, grip, etc.) + 47 psychosocial (burnout, motivation, trait measures, EI, coping) assessed at selection days Bayesian pattern-recognition pipeline to classify selected vs non- selected; position-specific models (forwards/backs) Leave-one-out cross- validation (LOOCV) to minimize overfitting; internal validation only
(Razali et al., 2017) Football (soccer) Bukit Jalil Sports School (academy) n=100; 15–17 y; sex not reported Coach-rated physical, mental, and technical skills (1–10); Football Player Information System (BJSS) Supervised classification; Bayesian Networks, Decision Tree, k-NN; WEKA implementation; GK excluded Leave-one-out CV (small sample size)
(Retzepis et al., 2024) Team sports Preadolescent (≈11 y) team-sport athletes n≈92; ~11 y; sex not reported Anthropometry & motor tests (e.g., leg length, sitting height, weight, jumps) used to classify PHV Supervised classification; Random Forest, Logistic Regression, Neural Network; forward feature selection with stratified 10-fold CV 10-fold stratified cross- validation (feature selection & tuning)
(Sandamal et al., 2024) Football (soccer) University-level players in Karakalpakstan vs. Khwarazm n=60; 18–22 y; male 33 features (anthropometric, psychological, physical); questionnaires & field tests Supervised regression/classification; Linear model, k-NN, Random Forest, XGBoost; SHAP for feature ranking Train/test split with repeated evaluations; details limited
(Sanjaykumar et al., 2024) Volleyball (women) College-level players (state & national level) n not reported; college-aged (≥18 y); female Technical skill and execution metrics; likely field-based assessments Supervised regression; KNN, Multiple Linear Regression, Lasso, Ridge, Elastic Net, Random Forest, XGBoost Model evaluation via MAE, MSE, R2; split/CV details not reported
(Theagarajan and Bhanu, 2021) Football (soccer) High-school and professional competitions (video) Image dataset: 49,950 images; includes high-school (youth) and pros Match video frames; automated player/ team/ball detection; event detection Deep learning computer vision (object detection/tracking; event detection); supervised Runtime and accuracy metrics discussed; formal CV/test split not reported
(Venkataraman et al., 2024) Football (soccer) Conceptual scouting framework; professional case studies Sample not reported; case studies (e.g., Kevin De Bruyne); adults Perceptual–cognitive attributes via YUVA-SQ questionnaire None (scouting tool; no ML modeling) Not applicable
(Woods et al., 2018b) Australian Rules Football (AFL) Elite junior (U18 national championships) n=244 players; 680 observations; 17.6 ± 0.6 y; male 12 in-game technical skill indicators (match statistics) Supervised classification; LDA, Random Forest, PART (decision list); variable importance & rule extraction Internal classification accuracy; external validation not reported
(Woods et al., 2018a) Rugby League Elite youth (U20) vs. senior (not reportedL) competition comparison U20: 372 obs; not reportedL: 378 obs; male Team performance indicators from matches (not reportedL & U20) Supervised classification tree to distinguish competitions; interpretable rules Internal classification (apparent accuracy); external validation not reported
(Zhao et al., 2019) Multi-sport (elite youth) Elite sport school (6 sports; U15–U16) n=97; male; U15–U16; training load ~20.8 h/week 18 anthropometric, 5 physiological, 2 motor tests; standardized lab/field assessments Supervised multiclass classification; Linear Discriminant Analysis; Multilayer Perceptron; stepwise DA; repeated MLP training/testing Leave-one-out (DA); repeated 80/10/10 splits for MLP