| (Abidin and
Erdem, 2025) |
Multiple
(Football, Basketball, Volleyball, Athletics; plus “Others”) |
Sports
high school entrance (youth selection, ages 14–16) |
n=2222
(620Females/1602Males), 14–16 y, maturation not reported |
Device-based
physical tests (coordination via Spark, 30m sprint, vertical jump via JumpR,
rhythm test) + coach evaluations (17 criteria: physical, reaction,
specialism, psychological) |
Deep learning
(Shallow Deep Learning for Stage 1; novel Split-Combine-Merge Deep Learning
[SCM-DL] for Stage 2); compared with Random Forest, Decision Tree, Extra
Trees, SVC; nine feature selection methods (RFE variants, SelectKBest, Lasso,
Boruta) |
Train/test
splits (70/30 and 80/20), k-fold cross-validation (3, 5, 7 folds); comparisons
with multiple classifiers |
| (Abidin,
2021) |
Football
(soccer) |
Altınordu
Football Academy, U13 youth |
n=21 field
players (goalkeepers excluded), all male, age ≈ 13 y, maturation
not reported; synthetic augmentation expanded to 231 instances |
Training
performance data via Hit/it Assistant (reaction times, coordination, speed,
agility, etc.) + Coach evaluations across 18 qualitative/quantitative
criteria (converted to numeric) |
Supervised
learning; seven algorithms tested in WEKA: ANN (MLP), SVM (SMO), Logistic
Model Tree (LMT), Logistic regression, Naïve Bayes, Random Forest,
CART. Dataset combined real + synthetic instances; preprocessing
included normalization and derived position scores (D/M/F) |
10-fold
cross-validation |
| (de Almeida-Neto
et al., 2023) |
Football
(soccer) |
National-level
youth athletes (club teams, ~V-level competition) and sports initiation
program |
n=75 males,
12–16 y (mean 13.3 ± 1.65), ~13% SI practitioners,
87% athletes; somatic maturation estimated (PHV categories: pre-, circa-,
post-PHV) |
Morphological
(anthropometry, DXA: body mass, height, leg length, sitting height, body
composition, BMC/BMD) + Neuromuscular (handgrip, medicine ball throw,
vertical jump, countermovement jump via force plate) |
Supervised
deep learning; multilayer perceptron (MLP) artificial neural networks with
backpropagation; z-scores used to normalize by sport/age; tested morphological,
neuromuscular, and combined models |
Train/test
split (70/30) with cross-validation (10 repeated runs; all participants
rotated through training/testing); ~10,000 training iterations |
| (Altmann
et al., 2024) |
Football
(soccer) |
German
Bundesliga youth academy (U12–U19) |
n=13,876
players (96% male), 11–19 y; maturation not explicitly reported but
age categories (U12–U19) considered |
Longitudinal
match-derived data: ~32 million events across 10 years; position-specific
technical/tactical features; aggregated spatiotemporal event-based data |
Supervised
ML; Gradient Boosted Decision Trees (LightGBM); models built separately
per playing position; hyperparameter tuning with Bayesian optimization;
features reduced with domain knowledge + automated selection |
Nested
cross-validation (inner loop for hyperparameter optimization, outer loop
for model evaluation); train/test splits by season; temporal separation
to avoid leakage |
| (Brown et
al., 2024) |
Cricket |
County
Age Group (CAG) programme, final trial stage |
n=82 male
players, 14–17 y (mean 15.3 ± 1.1); selected n=33, non-selected
n=49; ethnicity: White British n=34, British South Asian n=44, Other n=4;
maturation estimated (age at PHV, maturity offset) |
Multidimensional:
(a) physiological & anthropometrical (Yo-Yo test, sprint tests, jumps,
planks, body size, weight, PHV), (b) perceptual–cognitive (video
occlusion batting test), (c) psychological (PCDEQ + multiple psychosocial
questionnaires), (d) participation history (practice/game history, multi-sport),
(e) socio-cultural (ethnicity, schooling, siblings, birth quarter, postcode) |
Supervised
ML: Bayesian binomial regression (rSTAN); dimensionality reduction via correlation
clustering → 21 derived features; weak normal prior |
Cross-validation
not reported; model convergence checks (posterior intervals, n_eff, BFMI)
used for validation; sensitivity to ethnicity effects tested with interaction
models |
| (Contreras-García
et al., 2024) |
Basketball |
Spanish
U14 Minicopa (youth) vs. Liga Endesa (professionals, comparator group) |
n=217 U14
male players, 13–14 y; n=391 professional players; maturation not
reported |
Match-derived
shooting charts (field goal attempts by location, 2020–21 &
2021–22 seasons) |
Unsupervised
ML (k-means and KNN clustering to classify shooting zones); outlier detection
(IQR-based model) to identify “specialist shooters” |
5-fold
cross-validation for cluster classification; train/test split (20/80) for
KNN consistency |
| (Cornforth
et al., 2015) |
Australian
Rules Football |
Elite professional
players (AFL) |
n=44 males,
mean age 20 y, ~85.7 kg; maturation not reported |
Physiological:
daily ECG-derived HRV measures (time-, frequency-, and non-linear domain);
Contextual: field size dimensions, match-day temperatures; Performance outcomes:
GPS-derived match load, distance, speed zones |
Supervised
ML regression; seven algorithms in WEKA: Gaussian Processes, Linear Regression,
LeastMedSq, Multilayer Perceptron, PLS Classifier, RBF Network, SMOreg;
feature selection via PCA vs. wrapper subset + Genetic Algorithm |
10-fold
cross-validation; train/holdout splits tested |
| (Craig and
Swinton, 2021) |
Football
(soccer) |
Elite Scottish
soccer academy (U10–U17), 10-year follow-up |
n=512 male
players, aged 10–17 at entry; 100 awarded pro contracts; maturation
not directly reported; strong relative age effect observed |
Anthropometric
(height, weight, BMI) and physical performance (5, 10, 20m sprint times;
countermovement jump; Yo-Yo IR1) collected longitudinally (1–14 sessions/player) |
Supervised
ML: LASSO logistic regression (with mixed-effects models for associations);
multiple imputation for missing data |
10-fold
cross-validation to tune LASSO penalty; bootstrap (10,000 samples); train/test
split (2/3–1/3) for predictive evaluation |
| (Duncan et
al., 2024) |
Football
(soccer) |
Grassroots
club football in England (County FA structure) |
n=162 boys,
7–14 y (mean 10.5 ± 2.1); biological maturation via APHV (Moore
equation) |
Anthropometry;
maturity offset (APHV); fundamental movement skills via TGMD-3 with video
scoring; perceived physical competence (PPASC); physical fitness (15 m sprint
speed—timing gates; standing long jump); coach ratings (technical,
social, physical, effort, overall); birth-quartile; technical skill test:
UGent dribbling (procedural details reported) |
Supervised
ML: linear, ridge, lasso, random forest, boosted trees; recursive feature
elimination; L1/L2 regularisation; collinearity control; Python implementation |
Train/validation/test
split 80/10/10 per age band; 5-fold cross-validation; age-band stratification
to avoid leakage/under- representation |
| (Formenti
et al., 2022) |
Volleyball |
Youth Italian
championship, regional vs. provincial levels |
n=26 female
players (13 regional, 13 provincial), 13–15 y; maturation not reported |
Volleyball-specific
skill battery (setting, passing, spiking, serving; accuracy + technique);
Physical performance (modified T-test COD, CMJ); Cognitive (Flanker task
– executive control; Visual Search task – perceptual speed) |
Supervised
ML: Linear Discriminant Analysis, Logistic Regression, SVM, Decision Tree;
features = volleyball skills + physical + cognitive measures |
Stratified
5-fold cross-validation |
| (Ge, 2024) |
Basketball |
Secondary
school training teams |
n=40 (20
boys, 20 girls), adolescents (~13–15 y); maturation not reported |
Physical
fitness tests (lung capacity, standing long jump, grip strength, 1000 m
run boys / 800 m run girls); ~5000 test data entries used for model
training/validation |
Unsupervised
feature learning (CNN + Autoencoder); Gaussian Mixture Model (EM
algorithm for parameter estimation); model termed CNN-AE-MG |
Train/test
split 4:1 (4000/1000 records); ablation comparisons vs. CNN, CNN-AE, CNN-AE-SG;
consistency tested with Bland-Altman plots |
| (Gogos et
al., 2020) |
Australian
Rules Football |
AFL U18
National/State/other combines; relates combine to senior career outcomes
(retired/delisted cohort) |
n=1,488
combine attendees (1999–2016); summary models on n=536 with ≥1
AFL player rating; mean age ≈18.5 y |
Combine
anthropometrics & physical tests (e.g., 20 m sprint, Yo-Yo IR, jumps),
plus draft order & position; career outcomes from AFLTables &
Champion Data |
Linear
models for ratings/ rankings; boosted regression trees for matches played
(gradient-boosted ML) |
Model fit
assessed with BIC; no external validation; retrospective explanatory analysis |
| (Jauhiainen
et al., 2019) |
Football
(soccer) |
National
TID database; focus on 14-y Finnish juniors and “academy player”
labelling |
N=951 14-year-old
boys; minority “academy” class n≈14; tests/events 2011–2017 |
Physical
tests (technical, speed, agility) + self-assessment (perceived competence,
tactical skills, motivation) collected at biannual events; several data
representations (phys, quest, combined) |
One-class
SVM (RBF) framed as anomaly detection to flag potential elite; PCA for decorrelation;
k-NN imputation |
Performance
evaluated with AUC-ROC on held labels after unsupervised training; mean
AUC ~0.763 across hyperparameters |
| (Jennings
et al., 2024) |
Australian
Rules Football |
Elite-junior
AFL talent pathway; prospective prediction of 2021 National Draft |
n=708 males;
train 2017–2020 (n=465), prospective test 2021 (n=243) |
Physical
testing, in-game movement (GPS), and technical involvements; league-wide
multi-season dataset |
Logistic
regression vs neural networks to classify drafted vs not drafted; operating
at multiple cut-off thresholds |
Prospective
external hold-out (2017–20→2021) with sensitivity/specificity/
accuracy comparisons |
| (Kelly et
al., 2022) |
Football
(soccer) |
English
professional academy; U9–U16 development and U18 selection/ deselection
(contract) |
Study 1:
n=98, U9–U16; Study 2: n=18, U18 (male) |
Multidomain
53 features across 8 methods over 2 seasons (technical/tactical, physical,
psychological, social; e.g., PCDEQ, maturation %PAH, match hours) |
Penalized
regression (cross-validated LASSO via glmnet) predicting (a) review ratings;
(b) achieving a pro contract |
Cross-validation
(CV) noted for LASSO; internal only |
| (Kilian et
al., 2023) |
Football
(soccer) |
Youth elite
soccer talent-promotion program (DFB) — methodological evaluation
on real program data |
Sample
details not fully specified in abstract text; applied to a set of multidimensional
performance assessments within the program (youth cohort) |
Multidomain
performance battery used for latent factor structure; evaluation contrasts
with PCA; study funded by DFB talent program |
Deep latent-variable
factor model: VAE estimator with importance-weighted variational inference
+ normalizing-flow priors; linear, identifiable measurement model
(generalized EFA) |
Robustness
discussed; no classic predictive CV—focus is dimensionality reduction
and identifiability; (not a selection classifier) |
| (López-De-Armentia,
2024) |
Football
(soccer) |
Multi-league
women’s scouting context; data scarcity/coverage issues addressed
by tool |
~12,000
players tracked across ~30 leagues; basic roster & participation
info (adults and youth) |
Aggregated
web-sourced player metadata (age, position, height, market value, contracts,
injuries) and minutes played; alert generation pipeline |
Rule-/criteria-driven
alerts; “AI-powered” extraction mentioned, but no supervised
model for TID classification is specified |
Expert
usability evaluation; no predictive CV/hold-out |
| (Owen et
al., 2022) |
Rugby Union |
Regional
age-grade academy selection (U16 & U18) in North Wales; talent camps |
n=104 male;
Mage=15.47±0.80; U16 n=62; U18 n=42; 66 selected/38 not |
21 physiological
(demographics, anthropometrics, sprint/power, grip, etc.) + 47 psychosocial
(burnout, motivation, trait measures, EI, coping) assessed at selection
days |
Bayesian
pattern-recognition pipeline to classify selected vs non- selected; position-specific
models (forwards/backs) |
Leave-one-out
cross- validation (LOOCV) to minimize overfitting; internal validation only |
| (Razali et
al., 2017) |
Football
(soccer) |
Bukit Jalil
Sports School (academy) |
n=100;
15–17 y; sex not reported |
Coach-rated
physical, mental, and technical skills (1–10); Football Player Information
System (BJSS) |
Supervised
classification; Bayesian Networks, Decision Tree, k-NN; WEKA implementation;
GK excluded |
Leave-one-out
CV (small sample size) |
| (Retzepis
et al., 2024) |
Team sports |
Preadolescent
(≈11 y) team-sport athletes |
n≈92;
~11 y; sex not reported |
Anthropometry
& motor tests (e.g., leg length, sitting height, weight, jumps) used
to classify PHV |
Supervised
classification; Random Forest, Logistic Regression, Neural Network; forward
feature selection with stratified 10-fold CV |
10-fold
stratified cross- validation (feature selection & tuning) |
| (Sandamal
et al., 2024) |
Football
(soccer) |
University-level
players in Karakalpakstan vs. Khwarazm |
n=60; 18–22
y; male |
33 features
(anthropometric, psychological, physical); questionnaires & field
tests |
Supervised
regression/classification; Linear model, k-NN, Random Forest, XGBoost; SHAP
for feature ranking |
Train/test
split with repeated evaluations; details limited |
| (Sanjaykumar
et al., 2024) |
Volleyball
(women) |
College-level
players (state & national level) |
n not reported;
college-aged (≥18 y); female |
Technical
skill and execution metrics; likely field-based assessments |
Supervised
regression; KNN, Multiple Linear Regression, Lasso, Ridge, Elastic Net,
Random Forest, XGBoost |
Model evaluation
via MAE, MSE, R2; split/CV details not reported |
| (Theagarajan
and Bhanu, 2021) |
Football
(soccer) |
High-school
and professional competitions (video) |
Image dataset:
49,950 images; includes high-school (youth) and pros |
Match video
frames; automated player/ team/ball detection; event detection |
Deep learning
computer vision (object detection/tracking; event detection); supervised |
Runtime
and accuracy metrics discussed; formal CV/test split not reported |
| (Venkataraman
et al., 2024) |
Football
(soccer) |
Conceptual
scouting framework; professional case studies |
Sample
not reported; case studies (e.g., Kevin De Bruyne); adults |
Perceptual–cognitive
attributes via YUVA-SQ questionnaire |
None (scouting
tool; no ML modeling) |
Not applicable |
| (Woods et
al., 2018b) |
Australian
Rules Football (AFL) |
Elite junior
(U18 national championships) |
n=244 players;
680 observations; 17.6 ± 0.6 y; male |
12 in-game
technical skill indicators (match statistics) |
Supervised
classification; LDA, Random Forest, PART (decision list); variable importance
& rule extraction |
Internal
classification accuracy; external validation not reported |
| (Woods et
al., 2018a) |
Rugby League |
Elite youth
(U20) vs. senior (not reportedL) competition comparison |
U20: 372
obs; not reportedL: 378 obs; male |
Team performance
indicators from matches (not reportedL & U20) |
Supervised
classification tree to distinguish competitions; interpretable rules |
Internal
classification (apparent accuracy); external validation not reported |
| (Zhao et
al., 2019) |
Multi-sport
(elite youth) |
Elite sport
school (6 sports; U15–U16) |
n=97; male;
U15–U16; training load ~20.8 h/week |
18 anthropometric,
5 physiological, 2 motor tests; standardized lab/field assessments |
Supervised
multiclass classification; Linear Discriminant Analysis; Multilayer Perceptron;
stepwise DA; repeated MLP training/testing |
Leave-one-out
(DA); repeated 80/10/10 splits for MLP |