The SOLDIER model

In this first post for 2019 I will give an outline of the AFL Lab model to be used for its debut full season. I originally started this project from my love of sport and desire to learn about machine learning and data science in general. It also coincided with a career move, which left me with a bit of free time for a while! It is by no means complete, professional and optimised and never will be.

***

Summary

AFL is a team sport that, like many sports, relies on a combination of individual and team efforts. A number of freely-available statistics are accessible to the public, recorded for each player in each game. For many of these statistics, the difference between the teams’ aggregates correlates well with the game outcome. These key statistics are selected from the data and assigned one of seven categories (SOLDIER); and each player in each past game are given a rating in these seven areas. In addition to these “player” ratings, certain team features (derived statistics) that strongly correlate with match outcomes have been identified and this affords a “team” rating in these seven areas. These team statistics are not attributable to particular players and could be considered a descriptor of an overall game plan, or just team performance. A machine learning model was trained using standard supervised learning techniques and parameter tuning. The inputs to the model are the difference in the teams’ aggregate player ratings (seven variables), and the difference in the teams’ team ratings (seven variables), with the output being the game margin. Future games are predicted by using recent data for the selected players and teams to predict each the model inputs for the game, with appropriate error tolerances for variation in form. This allows Monte Carlo simulations of each game, producing a distribution of outcomes. Simulations of past seasons produce accuracy similar to other published AFL model results. The model has the potential to bring deeper insight into many facets of the sport including team tactics, the impact of individual players, and

Aim

The aim of building this model is to implement machine-learning techniques to predict and understand the outcomes of AFL games.

Raw Data

There are a number of sources available that compile and store statistics for AFL matches, without which projects like these just can’t go ahead. AFLTables provides a comprehensive coverage of historical matches in an easy-to-handle manner. Footywire provides additional statistics for more recent games, and AFL.com.au fills the gaps and provides some text describing games. All three of these sources are implemented and scraped responsibly to maintain a database.

The statistics recorded, and the availability of statistics has changed in the past decade. The full gamut of statistics that is used in the current model have been available since 2014 and so earlier data is not used. While it would be possible to adjust the following analysis to account for missing statistics in past data, a key focus of this work is to consider the changing nature and tactics of AFL football and as such, including earlier games may be counterproductive to understanding the game as it is played today.

SOLDIER

The raw statistics were analysed against game outcomes to understand which have the strongest correlation: In each game, for each raw statistic, the sum of away player contributions was subtracted from the sum of home player contributions to obtain a “margin” statistic (if the margin statistic is positive, the home team accrued more of the statistic). The margin statistics were tested against the score margin and a simple Pearson’s correlation coefficient was calculated.

Following this, rather than naively looking at the raw statistics, a number of features (derived statistics) were calculated and tested in the same manner. Features have the potential of providing better context for the raw statistics, for example, a team recording many Rebound 50s (defensive act where the ball is moved out of defensive 50) is not that impressive if their opponent has many more Inside 50s; their success at defending the opponent’s Inside 50s is important, not the raw number.

The relevant raw statistics and features were then allocated different categories depending on what aspect of gameplay they represent. Seven different categories were identified:

  • Scoring – Directly scoring goals/behind, setting up others who do the same.
  • Outside Play – Also called “Uncontested”. Staying out of the contest and being efficient at it.
  • Long Play – Moving the ball quickly with marks and kicks. Getting the ball in the forward 50.
  • Discipline – Also called “Defence”. Doing the tough stuff. Spoils, intercepts, winning free kicks (and not giving away free kicks)
  • Inside Play – Also called “Contested”. Getting the ball in the contest, clearing it from the contest, and tackling. Efficiency important but less so than uncontested play.
  • Experience – How experienced are the players? The number of games played, finals played, Brownlow votes received.
  • aeRial Play – Commanding the ball overhead. Contested marks, hit outs, raw height.

The raw statistics and some of the features can be directly attributed to individual players, but most of the features are representative of the team itself rather than the individuals. These team measures could be considered a way to quantify teamwork and/or game plan. Each chosen statistic and feature has been distributed to each of the above seven categories, each split up by whether they are player-specific or team-specific.

Category Player Examples Team Measure Example
S Scoring Goals, Goal Assists, Points/I50, Marks I50 Percentage
O Outside Metres Gained, Uncontested Possessions Cont. Pos. Ratio
L Long Play Marks, Kicks, Inside 50s Inside 50 Efficiency
D Discipline One Percenters, Intercepts, Free Kicks Rebound 50 Efficiency
I Inside Contested Possessions, Clearances, Tackles Cont. Pos. Margin
E Experience Games Played, Past Brownlow Votes HGA adjustment*
R aeRial Height, Contested Marks, Hit Outs Cont. Marks conceded
*The Team Experience measure is currently taken to be a completely deterministic variable that depends on how far each team has travelled to get to the venue.

For each game, each player and each team get a rating in these seven categories based on the above statistics and features. From this, it is natural to consider extensions such as overall ratings, analysis of form, and determination of a player’s role in a team. However, for the moment, the focus will be on development of the model for predicting match outcomes.

Model Construction

Match outcomes are to be predicted using a machine-learning model. The large number of input variables chosen in this project favours machine-learning models over other models widely (and very successfully) adopted in the sports modelling space.

Machine-learning models, in particular supervised-learning models, are designed to learn from known results and determine non-linear relationships that relate the inputs to particular outcomes. The variety and complexity of machine-learning models is vast, each with their advantages and disadvantages. This project implements techniques in the https://scikit-learn.org/ libraries, allowing many models to be tested side-by-side.

The model has fourteen inputs, and a single output:

Inputs:

  • Margin of player SOLDIER scores (7 variables)
  • Margin of team SOLDIR scores (6 variables)
  • Venue/HGA adjustment (1 variable)

Output:

  • Points margin of the game

Fitting models is very simple once player and team SOLDIER scores are calculated and rescaled. A common measure for selecting a model and tuning its parameters is a train-and-test model, where a proportion (say, 70%) of the data is used to fit the model and is tested against the remaining proportion. However, predicting an unplayed game is quite different; the player and team SOLDIER scores are not known a priori. It is necessary make predictions as to how each player and each team will perform in a given game; in order to predict the outcome using the proposed model.

In a previous piece, I examined how one could measure a player’s form, and what other mitigating factors can affect a player’s output. For the game to be predicted, the form of the involved teams and their players are calculated (mean and variation) to determine probable distributions for the inputs to the model. As predicting unplayed games is the goal, simulating games using no foreknowledge (i.e. only considering the past) is the only appropriate way to test the model. The only exception is that the Team Experience (aka Home Ground Advantage) is known as this is determined from the fixture.

Results and Discussion

I have performed full simulations of the 2017 and 2018 seasons to test a variety of models and tune parameters. The testing procedure is as follows, using 2017 as an example:

  1. Train model using pre-2017 data.
  2. Predict round one performances using pre-2017 data.
  3. Predict round one results a large number of times (N=10,000) and record.
  4. Retrain model with real round one data.
  5. Predict round two performances using pre-2017 data and real round one data.
  6. Predict round two results a large number of times (N=10,000) and record.
  7. Repeat 4-6 for remaining rounds.

The large number of predictions gives a distribution that allows a win probability and a median margin to be recorded for comparison against the actual results. In the following tables, the results from four models are presented with the number of tips they got correct, the number of “Bits” (higher is better) and the average error in the margin (lower is better). The “BEST” row is the best performances in each measure from squiggle.com.au.

2017

Model Tips Bits Av Margin
SVR1 120 12.06 31.08
SVR2 125 12.72 30.27
XGBoost 128 11.19 30.18
KNR 121 1.73 30.48
BEST 137 20.57 29.18

2018

Model Tips Bits Av Margin
SVR1 141 35.68 28.42
SVR2 143 34.98 28.11
XGBoost 141 29.55 28.57
KNR 150 33.39 27.80
BEST 147 39.76 26.55

Models

What is immediately noticeable is not only that different models are better at different prediction types, but also performance is season-dependent. On that second point, if these machine-learning models are picking up gameplay and tactics patterns, doesn’t it make sense that this would change from season to season? In training the models, more recent data is given stronger weighting to reflect this and small improvements (consistent but not necessarily statistically significant) have been observed.

The actual performance of the SVR2 model appears to be the most consistent over many seasons and in 2018 was comparable in success to other models with published results. This model, with a few additional tweaks, is the one that will be adopted for the 2019 season.

A deeper investigation into individual games reveals that with all the models, there is a tendency to under-predict the target. Games expected to be blowouts are predicted to be merely comfortable victories. While this does not affect the tip for the game, it evidently does have affect the margin, and to a lesser extent Bits. One example is the 2018, Round 18 game between Carlton and Hawthorn. Hawthorn were expected to win by over 10 goals, and they did. The SVR2 model predicted a median margin of -25 points (away team’s favour).

badpred.PNG
2018 Round 18, Carlton vs Hawthorn predicted margin distribution (SVR2). Actual margin -72.

 

That this happens with all models tested suggests an issue with how the inputs to the model are calculated rather than the models themselves. Recall that each player and team performance is simulated based on samples from a normal distribution along with their individual means and variances. This infers that in a given game it’s equally probable that each player will perform better or worse than their average. This doesn’t really make sense! One would assume that against a very strong team, player outputs would be less than a normal distribution would suggest. Of course, against a very weak team, player outputs would be higher. The best way to adjust for this is not obvious and is a focus of ongoing work.

Conclusions and Further Work

The model as presented today is in working order, has the capacity of predicting results in the ballpark of other models, and still has many avenues to improve. In particular, the following have been of interest:

  • Home Ground Advantage: The model still uses a flat score based on where the teams are from and where the game is played. There is clearly a lot more to Home Ground Advantage than that.
  • Team Experience score: Currently this is where Home Ground Advantage lives, originally it was planned to be a measure of how experienced the team is playing together; are there a lot of list changes? Coaching staff changes? This is difficult to quantify, and difficult to account for without manual intervention so it has been shelved for the moment.
  • Weather Effects: Wet weather affects the outcome of AFL matches, especially with regards to the expected scores and efficiency (see Part 1 and Part 2)

The game prediction model is just one arm of this project but is definitely the most technical one. By learning about and improving this model it is hoped that further insights into the sport can be uncovered.

Leave a comment