Sharpshooters and Shankers

The 2020 Elimination Final between the Saints and the Bulldogs had the score line:

STK 10.7 (67) def. WB 9.10 (64).

The Dogs had 2 more scoring shots, and surely if they were a little bit more accurate they would have won! After all, if they had converted one of those behinds it would have turned the game. Looking deeper shows that this is a little naive; and we can do better!

A broader consideration could take into account how difficult each scoring shot taken was – compared to the performance of an average league player taking a similar shot.

If the Dogs kicked as accurately as a completely average player, their “expected” score would have been about 69 points from their 19 scoring shots, meaning that they underperformed and should have beaten St Kilda’s 67. However, the Saints also underperformed and should have got a whopping 79 points from their 17 scoring shots.

Scoring map for Western Bulldogs vs St Kilda, Elimination final 2020, showing expected goal scoring percentages

Although both teams actually kicked poorly, St Kilda had taken their scoring shots from much “higher percentage” locations.

This game has been deliberately cherry picked as an example that raw goalkicking accuracy (goals vs behinds) can be a little deceptive and not the whole story. Not every game is decided by good/bad goalkicking, but seasons can definitely be shaped by it.

TeamExp. PosExp. PtsExp. %Actual PosActual PtsActual %Excess Pts
2019 AFL Season ladder had teams kicked with average goalkicking performance

In the 2019 season, if every team’s goalkicking performance was exactly average, West Coast would have won 3 less games and missed the finals. Essendon would also have missed, and the wasteful Power and Hawks would’ve landed in positions 5 and 6 respectively.

But how useful is this? It’s, of course, not a certainty that every team SHOULD score as an exactly average set of goalkickers would.

Expected score could be used to tell us if individuals, or whole teams, reliably (with statistical significance) kick better or worse than average. It could also tell us if teams are attempting a lot of low-percentage shots, or if they are working at finding better scoring opportunities. Consequently, it could also tell us if some teams are consistently restricting teams better by conceding more low-percentage shots, or they are giving up easier shots more readily.

Today, however, I’ll just be focusing on individuals’ goalkicking performances.

What is expected score?

Expected score is a metric used in a number of different sports to evaluate quality of scoring opportunities. At a high level, the metric uses relevant historical data (i.e. all scoring opportunities in the league in the last 5 years) to produce a model that can predict the probability of scoring success under a given set of circumstances (distance and angle to goal, defender location, etc.), based on the underlying data. Expected score is well explored in association football (soccer) (expected goals, “xG”) and some other analytics-rich sports (i.e. ice hockey). In these sports, where total scoring is relatively low, expected score can give a “better” measure of scoring opportunities compared to just goals and total shots.

In Australian rules football (footy), a similar metric can be used. In some ways, it is more statistically relevant in footy compared to say, soccer, as footy games have a high number of scoring shots at higher scoring probabilities. For my expected score model, I take into account the following parameters:

  • Distance from goal
  • Angle from goal
  • Subtended angle (how wide the goal posts appear from the shot location)
  • Context of shot (i.e. set shot, free shot, pressured shot)

The technical details of the model are fairly close to Figuring Footy’s model, with a few minor differences not worth going into here (note: his model is probably better than mine). Robert did a lot of work on this years ago, before he was swept out of amateur analytics into the real world of club land. It’s worth exploring what he was able to do if you are interested in this area. I’ll be doing some similar analysis pieces to him but with hopefully a slightly different slant.

The choice of the above parameters is mostly limited by data availability. As an amateur I have to rely solely on what data is available in the public space. This sets some limitations that can’t be overcome, but nevertheless the model is still useful. The main limitation is that shot location data is only available for scoring shots. Consequently, shots that miss (out on the full, fall short, etc.), and rushed behinds do not enter the analysis, and these shots do not generate any expected score. Expected scores for scoring shots are therefore between a value of 1 (0% chance of a goal) and 6 (100% chance of a goal). This means every result below effectively assumes each subject misses shots at an exactly average rate (about 20%) – shanks out on the full from 20m out straight in front are not registered!

Some other things that could be considered include the prevailing weather/ground conditions, the individual’s goalkicking ability, crowd effects and game context (early/late in game, close game or blowout, etc.). While it would be possible to account for many of these, this has the potential to partition data so much that each individual scenario has too few historical precedents to produce a meaningful estimate of performance.

The AFL data custodians, Champion Data, produce their own expected score metric (published in the Herald Sun). While I don’t have the full details of their model, they are able to take advantage of more data available, including non-scoring shots, pressure rating, kicking skill selected, etc. My results are different to theirs but broadly show the same patterns. Curiously CD’s expected scores always seem to be higher, but I don’t have the database to check whether this is valid or not (over all the data, total expected score should equal total actual score).

Is goalkicking just dumb luck?

This is a very important question. To determine if a player is a good or bad goalkicker requires a proper statistical analysis. Such an analysis would effectively determine the chance that a completely average goalkicker would perform as good (or as bad) as the subject. We can never be completely certain if this is true, we can only protest vs. reasonable doubt. What “reasonable doubt” is as a value is up for debate.

For the analysis below, I will quote a “sig (%)” value for each subject. This is the percentage chance that they are actually better (positive) or worse (negative) than the average goalkicker, based on the statistical analysis (using Poisson binomial trials, for those playing along at home). For example, a value of “-84%” means that I am 84% certain the subject is a worse goalkicker than average.

For each player in the 2021 season so far (up to but not including the R13 Queen’s Birthday match between MEL and COL), I have used my expected score model to predict the expected number of goals (xGoals) they should have kicked from their chances, and compared that to their actual performance. I have ordered the below table by the significance of their performance, and also included their significance over the 2018-2020 period. There is a qualification criteria of xGoals>5 to help ensure a sufficiently cromulent data set.

The Sharpshooters

namegoalsxGoalsscoring shotsmiss rate (%)scorexScore2021 sig (%) 2018-20 sig (%)
Sharpshooters 2021 R1-R13 (excl. Queens Birthday)

While the “significance %” is the key result, it isn’t quite everything so we should be a little careful with the above results. Jamie Elliott (COL) has kicked 10 straight goals in 2021 to date with zero non-scoring shots, which is exceptional, but he just creeps over the qualification criteria. It’s also worth to point out in the 2018-2020 period, he was exceptionally average.

Someone like Robbie Gray is quite likely a good shot, but his miss rate (non-scoring shots) of 27% is quite high (recall that non-scoring shots are not accounted for in the expected score measure) means we may need to be a little careful. However, in 2018-2020, Robbie was again significantly better than average, confirming his good performances as consistent – this is the same for the other bolded names above too.

Surprisingly, a couple of names in there (red) were actually significantly bad goalkickers over the 2018-2020 period!

The Shankers

namegoalsxGoalsscoring shotsmiss rate (%)scorexScore2021 sig (%)2018-20 sig (%)
Shankers 2021 R1-R13 (excl. Queens Birthday)

It’s no surprise to see Fyfe up there along with a few other names having goalkicking troubles this year. Naughton has had a solid scoring output this year so far with 29 goals but should be a fair way ahead of that!

Over a longer, previous period (2018-2020), this year’s bad goalkickers were still bad for the most part, but just not nearly as significantly so.

In the last couple of weeks, Jack Higgins has been put through the wringer for failing to convert, but it turns out he really isn’t doing that bad (across the season so far), having kicked 17 goals from an expected 17.6.

Where to from here?

There’s a lot more slicing and dicing that can be done with this data, and I hope to explore more in time. What we can see from a few basic results is that goalkicking is a skill in that some players are significantly better than average over long periods.

Most players cannot be shown to be significantly better or worse than average over a long period. This suggests that either most players are pretty much average and luck plays a big role, or players go in and out of form over shorter periods.

Next time, I’ll look at teams as an entity, and how the produce scoring opportunities, and what scoring opportunities they concede.

Introducing @AFLxScore

@AFLxScore is a Twitter Bot about to go into operation that will automatically post expected scores (xScore) of live scoring shots in the AFL mens competition. This will hopefully help footy followers get an idea of how skillful/lucky (or the opposite) each individual shot was. A team’s total xScore (vs actual score) gives an idea of seized or missed opportunities.

What is expected score?

xScore is a concept used in many sports (see xGoals, Expected Goals in soccer). It is a way of estimating what the average score a particular shot would score if it were taken over and over again, based on historical observations of similar attempts.

In soccer, it is a number between 0 and 1 as the most you can score with a single shot is 1 goal. In footy, the maximum score is 6.

In order to calculate an xScore, one needs to choose what factors they would consider in differentiating scoring opportunities. The most obvious, and possibly most important, is shot location: a shot from the goal square is expected to be much more successful than a shot on the boundary from 55m out. Other factors to consider could be:

  • Shot context (set shot, free shot on the run, pressured shot),
  • kick type (drop punt, snap, dribble kick, torpedo),
  • the individual kicker,
  • left/right foot,
  • venue, and which end,
  • weather conditions,
  • if the player on the mark can move,

and so on. As there are so many variables that can be considered, a modeller has to draw a line somewhere. If too many factors are considered, individual sets of circumstances may have too few precedents to form a logical estimate (it could be skewed by a few lucky/unlucky shots). Additionally, some desirable factors may not have readily available data.

Hasn’t this been done before?

Yes, it has! The methodology I use is heavily influenced by Robert’s. I’m not doing anything particularly new here, I am just optimising this for live calculations using the best publicly available data, and automatically posting it to a Twitter account.

What data is available?

The data I have is near-live location data of all scoring shots, the player, the time, and the resulting score. From this I can determine the shot distance and angle (measured to the centreline). Save for manually recording additional data from the broadcast, that’s it. Post-game, when all the statistics are published, it’s possible to differentiate the shot context. However what’s important here is what’s available live.

As the data to be used is only distinguished by location, not context, the model built is fit using all available scoring-shot data with no shot context. This data is processed and a smoothed fit is produced to ensure all potential shot locations have a estimate consistent with historical scoring shots near it.

Historical shot accuracy for scoring shots with no shot context

Hey, that doesn’t look right…

You’re right, it doesn’t! However, it’s important to realise that this is not saying that any shot taken from 70m+ out on a slight angle has a 30% chance of being a goal. The key limitation of the data I have is that it only consists of scoring shots. So what the above plot is showing is that if a score is recorded from 70m+ out on a slight angle, it historically has a 30% chance of being a goal.

A proper xScore measure would consider all attempted shots, including those that miss (out on the full, falling short, smothered, etc.). Any shot should have an xScore of between 0 (no score) and 6 (goal). In the methodology presented here when only scoring shots are considered, the xScore measures between 1 and 6.

What can the bot do?

After every scoring shot, the bot will tweet:

  • the game hashtag, the game time and score recorded
  • the player, distance, angle, expected score (goal accuracy %)
  • the teams’ actual scores (and performance relative to xScore)

I may present this information in a different, more friendly way in the future as this is still a work in progress. I considered building it to send a pretty image of the shot trajectory with some data, but I’m aiming for this to be a simple messaging service to avoid heavily polluting people’s twitter streams with excessive images. Having said that, I may build some additional quarter-end reporting with some additional functionality as development continues.

Please follow if you are interested and any feedback to @AFLLab is appreciated!


Win Probability vs. Margin

Tony from Matter of Stats recently posted a great visualisation with a comparison of the Squiggle models’ predictions for the first six rounds of 2019. The visualisation shows how each model related the projected margin to the percentage change of winning the game.

Source: Twitter @MatterOfStats

Everything looks fairly nice and smooth apart from the AFL Lab, which looks very noisy! Is this ideal, problematic, or fine? Either way, it’s not necessarily a conclusion you can arrive at just from this image. What is definitely clear is the AFL Lab model does things a little differently. I thought it would be a nice idea to clarify what is causing this “abnormal” behaviour and that it is somewhat intentional.

The SOLDIER model is fitted by correlating player performances, team performances, and game conditions (weather and venue adjustments) from past AFL games to the resulting margin of that game – the output. Like all models, is it not deterministic; that is the chosen inputs to the model are not all of the variables that contribute to the outcome of a game. There is always some uncertainty based on what the model doesn’t know. In total, there are fourteen input variables that go in to the model to produce the single output of the game margin.

The structure of the SOLDIER model

If, for a game previously unseen by the model, all of the player and team performances are known, the predicted model by the margin rarely differs from the actual margin by more than a goal. This method of cross-validation is a demonstration that the model is not over-fitted and is suitable for fresh observances.

When predicting the outcome of a future game one does not know what the inputs to the model will be! So, there needs to be a way to predict what the team and player performances will be in that game in order to determine the inputs; in order to determine the output. In examples, will Lance Franklin kick 0.4 or 6.1? Will Scott Pendlebury have 30 disposals at 90% efficiency, or 30 disposals at 60% efficiency? Naturally it’s impossible to predict this directly, but it is possible to forecast a distribution of expected performances, based on the players’ and teams’ past performances. 

A player can perform better in some areas, and worse in others.

The SOLDIER model assumes all player and team inputs to a game to be predicted are normally distributed. The distribution measures (mean, variance) for each player and team input are calculated from their past performances. Because of this, as the inputs have a distribution of possible values, the output (predicted margin) also has a distribution – but due to the nonlinearity of the model the distribution of predicted margins is not necessarily normal (it may be skewed, bimodal, etc.)

As a brief aside, why is the model nonlinear and what does that mean? Consider a hypothetical case study game where Team A beats Team B by 20 points. If the same game was played again, and Team A performed 10% better (in some sort of overall measure), would the margin be exactly 10% higher, for 22 points? Maybe that performance increase resulted in only a single behind (for a margin of 21 points), or two goals (32 points). Even with a single performance measure, there is not necessarily a linear relation between the inputs (team and player performance) and outputs (margin). The SOLDIER model takes fourteen different performance measure inputs, and relates them to a single output. The combinations of categories over-, under- and par-performing is exhaustive and any such combination could lead to a different, or no change in the outcome.

Anyway, back to the main narrative of the post. Now that there are distributions of player and team performance, and a model relating performance to margin, how do the predictions come about? For a particular future game, it is simple to randomly sample a performance for each team and player and put this into the model to predict a margin. One such realisation is just that, one possibility of what could happen. To gauge an broader overview of the possibile outcomes, a large number (say 50,000) realisations can be rapidly calculated to get a distribution of margins. This is called a Monte-Carlo simulation.

From this distribution of margins, a number of predictions can be pulled out. Firstly, the median of this distribution is chosen as the predicted margin for the game. The proportion of realisations where the home team wins represents the home win probability. Even though the margin distribution may not be a normal distribution, the standard deviation of the distribution can be calculated and represents the margin standard deviation. These three predictions are sufficient to adequately describe what the model believes could occur – and are the predictions that advanced tipping competitions like the Monash competitions take.

The margin standard deviation calculated by the model is the key driver behind AFL Lab’s anomalous probability-margin “curve” demonstrated by MatterOfStats. The standard deviations the model produces are generally between 18 and 50 points, and mostly on the lower end. This far lower than many other models (usually between 30 and 40). I would argue that it makes sense that the standard deviation SHOULD vary between games – an expected blowout could be a modest or huge victory (large standard deviation), but a closely-matched game in wet conditions suggests a smaller total score and a lower standard deviation is appropriate. Due to this variable standard deviation, two games with the same predicted margin can have a vastly different home win probability.

For a game with a predicted margin of 20, the standard deviation changes the probability significantly.

It is this reason why my probability-margin “curve” is not a curve at all – the probability is a function of both the predicted margin and the margin standard deviation.

I do have some reservations on the low standard deviations produced by the model – the random sampling methodology currently used is flawed and still very much under construction. Hopefully by the end of the season I will have a large enough sample to work on improvements from.

Until next time, which hopefully won’t be as long.


There’s No 2019 Preview, Why?

In the past 10 months of coding, I’ve built up a number of tools and modules that navigate the data I have collected and ratings I have calculated. I had lofty ambitions of using these, and developing more, to provide a comprehensive preview of the upcoming season from the SOLDIER perspective.

It took a while to properly elucidate, but it was soon blindingly obvious that although such a prediction was possible, it would be pointless and inaccurate! The reason is easy to explain. The key outcomes I have chosen to focus on with the SOLDIER model thus far are on form-based predictions with the purpose of predicting upcoming games. The game-prediction model uses recent (5-game) and longer-term (20-game) form of player and team performances to predict an outcome using the players selected on the team sheets. Predicting 24 rounds ahead (plus finals) is a very long bow to draw when the ammunition is a set of darts.

The Problem

Later in the 2018 season, I started producing some weekly predictions simulating the rest of the season to establish end-of-season predictions based on current form. Naturally, my first port of call in predicting 2019 was to use this method to pit the teams against each other on both level playing fields (round-robin, each team playing each other home and away) to rate each team, and simulating the actual 2019 fixture as a more practical measure.

The results were surprising at first.

Wow, are Geelong that good? Are Sydney that bad? There’s a lot to unpack here, but something doesn’t quite add up. With such a broad prediction, the first sanity check for me is to compare to the bookies. After all, they’re the professionals in this caper. Just looking at the top 8 percentages, the ladder simulation is a lot more certain of things than the odds suggest*. Sydney, in particular, are about a 50% chance with the bookies to get into the top 8.

While this serves as a strong argument for the uselessness of such a long-range prediction, it serves as a reminder of the strengths and disadvantages of what I’m modelling. Furthermore, it provides direction for what could be done to improve such predictions in the future

The hurdles to overcome are plentiful if I were to predict a season with the model as it is:

  1. Which players will play each game?
  2. What effect does the off-season have?
  3. How do you account for natural evolution of players?
  4. Will a team’s game plan change?
  5. Will rule changes or “in-vogue” tactics change what statistical measures win games?

*mainly because the player/team form distribution is fixed in the above simulation rather than using a Brownian-motion inspired model

1. Which players will play each game?

The SOLDIER model encompasses player statistics and form. The chosen players for each team have a small but noticeable effect on the predicted outcome of the game. For the above simulations I used the “first choice team” as opined on for each club, and adjusted it for known injuries. But of course, no team is going unchanged all season; injuries play a part, younger players get tried out, and selection can be based on the opposition. A more sensible way would be to look at a squad (say, of 30) and average out to 22 players, which would be fairly easy to do.

2. What effect does the off-season have?

It’s simple to argue that form will not necessarily carry over the off-season. There are too many immeasurables and unknowables to look at the individual off-season and pre-seasons of all 500+ players and adjust “form” accordingly. Are there any rules of thumb? To probe this question, I browsed player data for the last five home-and-away rounds of 2017, and the first five home-and-away rounds of 2018, and looked at the difference of each players’ average performances in these two periods. While there were mostly drops in stats across the board going to a new season, it was not statistically significant. As an easily relatable example, out of the 293 qualifying players for this study, the players scored on average -0.5 less Supercoach points after the season break (p=0.7)*.

A further thought was probed that less- and more-experienced players may be affected differently by the summer break. Further splitting the already-filtered data into players with less than 50 games at end of 2017 (N=92) and more than 200 games (N=28) also proved fruitless with no statistically significant differences across the break. There’s more that could possibly be looked at here but I strongly suspect little progress.

*This isn’t the best measure to use here as Supercoach points are scaled per game, but the p-values are similar for other unscaled measures.

3. How do you account for the natural evolution of players?

Sam Walsh will definitely play this year, barring tragedy. So, how does one predict what Sam Walsh will produce this year? There is no data on how he performs against other AFL-quality teams in games for premiership points. How could I handle him and every other rookie that may or may not play this year?

Currently, if a debutante is playing the model will assign the player’s performance to be a plain old average of every first-game player’s historical outputs, regardless of the draft pick, playing role, team, etc. By the player’s second game, and subsequent games, it uses their personal recorded data. This is a decent trade-off for simplicity in hanging debutantes on a week-by-week basis, but does not hold for long-term predictions.

HPNFooty have done some magnificent work with player-value projection based on analysing a current player’s output and comparing with other players on a similar trajectory, discounting for other factors such as player age. On a slightly different arc, The Arc has used clustering algorithms to classify players into particular roles. By implementing similar concepts and merging the two together, it could be possible to (manually) assign a debutante a playing role (say, key defender or small forward) to project a more meaningful prediction for a season ahead.

Other thoughts this question brought up is how to handle players undergoing positional change (i.e. James Sicily, Tom McDonald) and old fogies put out to stud in the forward line (GAJ), but these are more one-offs that are probably not worth trying to manually override.

4. Will a team’s game plan change?

While player performance is a focus of this model, as important (if not more so) are the team measures that input into the predictions. Each team itself gets a rating in 6/7 SOLDIER categories based on team form. These measures incorporate team-aggregate statistics that cannot be allocated to individual players (say, tackles per opposition contested possessions). These could, conceivably, be a function of both team performance as a whole and the team’s game plan.

How well does this team form carry over to a new season? Do big personnel changes effect a noticeable change in a team’s output? These are questions I planned to have answered before this moment but they’ll have to wait.

5. Will rule changes affect game balance?

The AFL is an evolving competition, there are very frequent rule changes that never really allow the game to settle to a point where all strategies with a given set of rules are explored. Having said that, assuming the player and team performances are projected as well as possible over a whole season, will the model’s prediction be accurate when the effect of rule changes is unknown?

The fit of the model is updated every round when there is fresh data. It compares the player and team SOLDIER scores, as calculated from the published statistics, and fits them against the game results. More recent games are weighted more strongly to reflect the prevailing style of football – and what combinations and strategies beat what combinations and strategies. Significantly better results have been obtained using this approach rather than using all historic data equally weighted to fit the model.

The effects the new 2019 rules will have is very much unknown, and the response in tactics from teams will naturally evolve over the season.


I had planned on presenting more data to back up the above points but time got the better of me and hopefully I’ll expand on this throughout the season.

Without a number of pending and unplanned improvement to my processes, a long-term prediction covering a whole season is not going to be purposefully indicative of reality. Sure, Geelong could top the ladder, but for the above reasons I wouldn’t bank on it!

The SOLDIER model

In this first post for 2019 I will give an outline of the AFL Lab model to be used for its debut full season. I originally started this project from my love of sport and desire to learn about machine learning and data science in general. It also coincided with a career move, which left me with a bit of free time for a while! It is by no means complete, professional and optimised and never will be.



AFL is a team sport that, like many sports, relies on a combination of individual and team efforts. A number of freely-available statistics are accessible to the public, recorded for each player in each game. For many of these statistics, the difference between the teams’ aggregates correlates well with the game outcome. These key statistics are selected from the data and assigned one of seven categories (SOLDIER); and each player in each past game are given a rating in these seven areas. In addition to these “player” ratings, certain team features (derived statistics) that strongly correlate with match outcomes have been identified and this affords a “team” rating in these seven areas. These team statistics are not attributable to particular players and could be considered a descriptor of an overall game plan, or just team performance. A machine learning model was trained using standard supervised learning techniques and parameter tuning. The inputs to the model are the difference in the teams’ aggregate player ratings (seven variables), and the difference in the teams’ team ratings (seven variables), with the output being the game margin. Future games are predicted by using recent data for the selected players and teams to predict each the model inputs for the game, with appropriate error tolerances for variation in form. This allows Monte Carlo simulations of each game, producing a distribution of outcomes. Simulations of past seasons produce accuracy similar to other published AFL model results. The model has the potential to bring deeper insight into many facets of the sport including team tactics, the impact of individual players, and


The aim of building this model is to implement machine-learning techniques to predict and understand the outcomes of AFL games.

Raw Data

There are a number of sources available that compile and store statistics for AFL matches, without which projects like these just can’t go ahead. AFLTables provides a comprehensive coverage of historical matches in an easy-to-handle manner. Footywire provides additional statistics for more recent games, and fills the gaps and provides some text describing games. All three of these sources are implemented and scraped responsibly to maintain a database.

The statistics recorded, and the availability of statistics has changed in the past decade. The full gamut of statistics that is used in the current model have been available since 2014 and so earlier data is not used. While it would be possible to adjust the following analysis to account for missing statistics in past data, a key focus of this work is to consider the changing nature and tactics of AFL football and as such, including earlier games may be counterproductive to understanding the game as it is played today.


The raw statistics were analysed against game outcomes to understand which have the strongest correlation: In each game, for each raw statistic, the sum of away player contributions was subtracted from the sum of home player contributions to obtain a “margin” statistic (if the margin statistic is positive, the home team accrued more of the statistic). The margin statistics were tested against the score margin and a simple Pearson’s correlation coefficient was calculated.

Following this, rather than naively looking at the raw statistics, a number of features (derived statistics) were calculated and tested in the same manner. Features have the potential of providing better context for the raw statistics, for example, a team recording many Rebound 50s (defensive act where the ball is moved out of defensive 50) is not that impressive if their opponent has many more Inside 50s; their success at defending the opponent’s Inside 50s is important, not the raw number.

The relevant raw statistics and features were then allocated different categories depending on what aspect of gameplay they represent. Seven different categories were identified:

  • Scoring – Directly scoring goals/behind, setting up others who do the same.
  • Outside Play – Also called “Uncontested”. Staying out of the contest and being efficient at it.
  • Long Play – Moving the ball quickly with marks and kicks. Getting the ball in the forward 50.
  • Discipline – Also called “Defence”. Doing the tough stuff. Spoils, intercepts, winning free kicks (and not giving away free kicks)
  • Inside Play – Also called “Contested”. Getting the ball in the contest, clearing it from the contest, and tackling. Efficiency important but less so than uncontested play.
  • Experience – How experienced are the players? The number of games played, finals played, Brownlow votes received.
  • aeRial Play – Commanding the ball overhead. Contested marks, hit outs, raw height.

The raw statistics and some of the features can be directly attributed to individual players, but most of the features are representative of the team itself rather than the individuals. These team measures could be considered a way to quantify teamwork and/or game plan. Each chosen statistic and feature has been distributed to each of the above seven categories, each split up by whether they are player-specific or team-specific.

Category Player Examples Team Measure Example
S Scoring Goals, Goal Assists, Points/I50, Marks I50 Percentage
O Outside Metres Gained, Uncontested Possessions Cont. Pos. Ratio
L Long Play Marks, Kicks, Inside 50s Inside 50 Efficiency
D Discipline One Percenters, Intercepts, Free Kicks Rebound 50 Efficiency
I Inside Contested Possessions, Clearances, Tackles Cont. Pos. Margin
E Experience Games Played, Past Brownlow Votes HGA adjustment*
R aeRial Height, Contested Marks, Hit Outs Cont. Marks conceded
*The Team Experience measure is currently taken to be a completely deterministic variable that depends on how far each team has travelled to get to the venue.

For each game, each player and each team get a rating in these seven categories based on the above statistics and features. From this, it is natural to consider extensions such as overall ratings, analysis of form, and determination of a player’s role in a team. However, for the moment, the focus will be on development of the model for predicting match outcomes.

Model Construction

Match outcomes are to be predicted using a machine-learning model. The large number of input variables chosen in this project favours machine-learning models over other models widely (and very successfully) adopted in the sports modelling space.

Machine-learning models, in particular supervised-learning models, are designed to learn from known results and determine non-linear relationships that relate the inputs to particular outcomes. The variety and complexity of machine-learning models is vast, each with their advantages and disadvantages. This project implements techniques in the libraries, allowing many models to be tested side-by-side.

The model has fourteen inputs, and a single output:


  • Margin of player SOLDIER scores (7 variables)
  • Margin of team SOLDIR scores (6 variables)
  • Venue/HGA adjustment (1 variable)


  • Points margin of the game

Fitting models is very simple once player and team SOLDIER scores are calculated and rescaled. A common measure for selecting a model and tuning its parameters is a train-and-test model, where a proportion (say, 70%) of the data is used to fit the model and is tested against the remaining proportion. However, predicting an unplayed game is quite different; the player and team SOLDIER scores are not known a priori. It is necessary make predictions as to how each player and each team will perform in a given game; in order to predict the outcome using the proposed model.

In a previous piece, I examined how one could measure a player’s form, and what other mitigating factors can affect a player’s output. For the game to be predicted, the form of the involved teams and their players are calculated (mean and variation) to determine probable distributions for the inputs to the model. As predicting unplayed games is the goal, simulating games using no foreknowledge (i.e. only considering the past) is the only appropriate way to test the model. The only exception is that the Team Experience (aka Home Ground Advantage) is known as this is determined from the fixture.

Results and Discussion

I have performed full simulations of the 2017 and 2018 seasons to test a variety of models and tune parameters. The testing procedure is as follows, using 2017 as an example:

  1. Train model using pre-2017 data.
  2. Predict round one performances using pre-2017 data.
  3. Predict round one results a large number of times (N=10,000) and record.
  4. Retrain model with real round one data.
  5. Predict round two performances using pre-2017 data and real round one data.
  6. Predict round two results a large number of times (N=10,000) and record.
  7. Repeat 4-6 for remaining rounds.

The large number of predictions gives a distribution that allows a win probability and a median margin to be recorded for comparison against the actual results. In the following tables, the results from four models are presented with the number of tips they got correct, the number of “Bits” (higher is better) and the average error in the margin (lower is better). The “BEST” row is the best performances in each measure from


Model Tips Bits Av Margin
SVR1 120 12.06 31.08
SVR2 125 12.72 30.27
XGBoost 128 11.19 30.18
KNR 121 1.73 30.48
BEST 137 20.57 29.18


Model Tips Bits Av Margin
SVR1 141 35.68 28.42
SVR2 143 34.98 28.11
XGBoost 141 29.55 28.57
KNR 150 33.39 27.80
BEST 147 39.76 26.55


What is immediately noticeable is not only that different models are better at different prediction types, but also performance is season-dependent. On that second point, if these machine-learning models are picking up gameplay and tactics patterns, doesn’t it make sense that this would change from season to season? In training the models, more recent data is given stronger weighting to reflect this and small improvements (consistent but not necessarily statistically significant) have been observed.

The actual performance of the SVR2 model appears to be the most consistent over many seasons and in 2018 was comparable in success to other models with published results. This model, with a few additional tweaks, is the one that will be adopted for the 2019 season.

A deeper investigation into individual games reveals that with all the models, there is a tendency to under-predict the target. Games expected to be blowouts are predicted to be merely comfortable victories. While this does not affect the tip for the game, it evidently does have affect the margin, and to a lesser extent Bits. One example is the 2018, Round 18 game between Carlton and Hawthorn. Hawthorn were expected to win by over 10 goals, and they did. The SVR2 model predicted a median margin of -25 points (away team’s favour).

2018 Round 18, Carlton vs Hawthorn predicted margin distribution (SVR2). Actual margin -72.


That this happens with all models tested suggests an issue with how the inputs to the model are calculated rather than the models themselves. Recall that each player and team performance is simulated based on samples from a normal distribution along with their individual means and variances. This infers that in a given game it’s equally probable that each player will perform better or worse than their average. This doesn’t really make sense! One would assume that against a very strong team, player outputs would be less than a normal distribution would suggest. Of course, against a very weak team, player outputs would be higher. The best way to adjust for this is not obvious and is a focus of ongoing work.

Conclusions and Further Work

The model as presented today is in working order, has the capacity of predicting results in the ballpark of other models, and still has many avenues to improve. In particular, the following have been of interest:

  • Home Ground Advantage: The model still uses a flat score based on where the teams are from and where the game is played. There is clearly a lot more to Home Ground Advantage than that.
  • Team Experience score: Currently this is where Home Ground Advantage lives, originally it was planned to be a measure of how experienced the team is playing together; are there a lot of list changes? Coaching staff changes? This is difficult to quantify, and difficult to account for without manual intervention so it has been shelved for the moment.
  • Weather Effects: Wet weather affects the outcome of AFL matches, especially with regards to the expected scores and efficiency (see Part 1 and Part 2)

The game prediction model is just one arm of this project but is definitely the most technical one. By learning about and improving this model it is hoped that further insights into the sport can be uncovered.

Environmental factors affecting AFL outcomes – the weather, part 2

Today I continue my focus on the weather. In particular, I will look at some key statistics that differentiate dry-weather football from wet-weather football. Unfortunately, for the most part, the results are completely self-evident. However, there is a nugget or two in there that I think are interesting.

In the first part, I discussed how the prevailing weather and conditions of the game affect the outcome, as part of the larger overview of how all environmental factors affect all aspects of the game. There are a few obvious independent variables within the weather space that could affect the game; precipitation, wind and heat.

Precipitation anecdotally affects the game in a number of ways. Rain falling during the game keeps the ball and the ground wet, impacting on the efficiency of skills and even the choice of skills (“wet weather football”). The lasting effect of rain, perhaps after it stops, and other effects such as dew also causes these impacts perhaps to a lesser extent. Wet weather games are anecdotally characterised by “scrappy football”; less handballing, more kicking, and low scores.

Wind comes in a couple of flavours. In all cases the main effect is expected to be on long kicking and consequently goal kicking accuracy. Prevailing winds down the length of the ground provide a bias towards scoring at one end of the ground (i.e. a “five goal breeze”). A prevailing cross wind is a bit more of an unknown. Swirling winds can result from changeable conditions or heavy weather conditions, but also from the geometry of the venue; large grandstands particularly at the goal ends can produce some erratic conditions. It’s possible that this can be somewhat predictable based on knowledge of the venue.

Footy is a winter game but heat sometimes plays a part, especially near the front and back ends of the season. I don’t expect heat to be a huge factor, perhaps it affects player fatigue and creates a more open, high-scoring game.

Evaluating the conditions in past games

It is a relatively straightforward procedure to watch a game of football (or merely some highlights) and, with some knowledge of the game, evaluate the effect of certain environmental conditions on the outcome. An avid football watcher could easily do this on a week-by-week basis and keep a database. However, lacking an ongoing database, it would be very time consuming to individually to do this for a past season’s games, let alone multiple seasons. How can one efficiently and accurately record the conditions at past matches?

In the previous piece I scraped daily rainfall data from the Bureau of Meteorology at the closest weather station to each AFL ground and attached the data to each game. Then I examined the distribution of total points scored for a few different rainfall ranges. The hope was that games with more rainfall would be lower scoring.


Unfortunately this was quite unsuccessful. It did, however, elucidate that what really matters is the conditions at the ground at the time of the game. The “daily rainfall” numbers reset at 9am, there’s a good chance that game-day rain could fall well before or after the game has been played and not affect conditions at all.

I then moved on to looking at published match reports for past games. The main idea is that if the conditions affected the game significantly, it would be discussed in the match report.

Methodology for match report scraping

I chose to use the match reports published on for the simple reason that the URL is formulaic and therefore easy to scrape large quantities of data. In example, is from 2018, Round 17, is an Adelaide home game against Geelong. For all games from 2014 onwards, I scraped the match report text into a database for ease of handling.

I then used Microsoft Excel to flag match reports that contain certain weather-related keywords. The keywords I chose (i.e. rain, slippery, windy, storm, etc.) were borne through a brainstorm and through reading samples of match reports. This allowed me to pass by a vast majority of match reports where weather wasn’t (seemingly) a factor. I also, as a matter of curiosity, flagged some reports where the total points was particularly low.

For the flagged match reports, I pasted the report text into Notepad++ and defined a custom syntax to highlight the list of keywords. This allowed me to efficiently and selectively read match reports to summarise the conditions described.

False flags

If journalists could stop using the following cliches, that would be marvellous:

  • <TEAM> stormed into contention…
  • <TEAM> stormed home…
  • It was raining goals…
  • <TEAM> came home with a wet sail…
  • <PLAYER> put the heat on…
  • etc.

How do you represent the conditions quantitatively?

Now, we have a good summary of the weather for weather-affected games. How do we quantify this in a meaningful way so that it may be used in a model? As mentioned in the first piece, one could be as specific as they like in describing the conditions of a game that’s already happened. This would give a very good measure of the effect on past games. However, my interest (at the moment, at least) is modelling games that haven’t happened yet. Having sophisticated measures for conditions is useless if you can’t predict the conditions with the same accuracy you can measure it with. After looking at the summaries of conditions I recorded, I decided to record weather with four binary (yes or no) variables:

  • If there was mention of wind (or inferred through description of “sideways rain”, etc.) the game would be classified as “windy”.
  • If there was mention of heat it would be a “hot” game.
  • If conditions were slippery (wet ground, actual rain, dew, humidity, etc.) it would be “damp”.
  • If rain fell for a significant portion of the match if would be “rainy” (and of course, also “damp”)

These variables should be very easy to measure in the future, and also relatively predictable from looking at weather forecasts.

Some initial results

The first thing to do is to see if the data passes a sniff test. When looking at the rainfall data I looked at total points scored as a measure. Generally reports mention the rain/damp conditions moreso than wind or heat, so let’s start with this.


This was a relief, the time spent was worth it! The samples are statistically different (t-test: p(Dry~Damp)<10^{-7}, p(Dry~Rain)<10^{-16}, p(Damp~Rain)\approx 0.0015) and are logical in that a dry game is expected to be higher scoring than a damp game, and damp game higher scoring than a rainy game.

For what it’s worth, the “Rain” mode (peak of the curve) is approximately 132 points, “Damp” is 145. The median total score is probably a better measure though:

  • Dry: 178 points
  • Damp: 151.5 points
  • Rain: 136 points

Some problems

While these results look good, they must be scrutinised. The AFL Data Twitterati suggested a number of things to look into when I tweeted the above plot.

  1. What if the conditions just aren’t mentioned in the match report?
  2. Are certain match report authors more likely to mention the conditions?
  3. Is there an agenda in the reporting that might affect exaggerating/understating of the conditions?

These are excellent points. The first was also a prime concern of mine when doing this. To alleviate this going forward, bow that I have a database of past games, for subsequent games I plan to record conditions week-by-week based on my own observations.

Over round 17, when watching games/highlights I kept some notes about the conditions. I noted two games where the conditions were present. Fremantle v Port was affected by rain (and atrocious skills, mind :/) and this was noted in the match report. Hawthorn v Brisbane in Tasmania was beset by dew (as mentioned by the commentators regularly) and conditions were slippery. There was no mention of the slick conditions in the match report. Arguably the conditions were on the minor side and scoring wasn’t hugely affected, but nevertheless I would want this to be recorded for my database.

It is conceivable that my match-report parsing process is mainly flagging games where the adverse conditions had a noticeable effect, or the reporter mentioned it in passing (“skills were good despite the tricky conditions” and the like appeared sometimes). The consequence is that the distributions plotted above are most likely biased. This is not good for predictive purposes; I can predict whether a game will be damp but not whether the teams will perform/score well despite the conditions.

What I can say for sure is the games marked as wet, damp or windy are affected. So let’s see what sets these games apart. Today I’m just going to look at the distributions of certain key statistics that are considerably affected by the weather. Most of it is really self-evident, but it’s always good to have some quantitative confirmation of well-known theories.

Wet Weather Football

What changes? Everything! Well, almost. Let’s start with something that shouldn’t be affected too much as a bit of a control measure. An inside 50 is the movement of the ball (by carrying or disposal) into the forward 50 from outside the 50. I would argue the numbers should be largely independent of the weather; the efficiency will be the main difference.


There’s a noticeable increase in Inside 50s in “Damp” games and it is a statistically significant difference. Speaking of efficiency, let’s look at how “Inside 50 Efficiency” is affected. I define this as:

\text{I50 Efficiency}=\frac{\text{Inside 50s} - \text{Rebound 50s}}{\text{Inside 50s}}\times 100\%

weather-I50efficiency.pngThere is less efficiency in “Rain” games, as expected, but even less in “Damp” games! Perhaps this can be explained by teams not respecting slightly difficult conditions and trying to play a normal game style. While we’re on Inside 50s, Marks Inside 50 are a strong predictor of AFL success.

weather-marksi50Indeed there are less Marks Inside 50s in weather-affected games. Not surprising at all, it’s harder to mark in the wet and harder to hit targets.

Scoring in the wet

Sticking with scoring still, goal accuracy is strongly affected by the weather, not just rain, the wind too.



There are less scoring shots, and a lower goal accuracy. Unfortunately there is no data available on goal attempts that fall short or are kicked out-on-the-full. Strangely (?), the number of behinds scored in games (including rushed behinds) are not distinguishable statistically in different conditions.

Moving the pill

Disposal efficiency is crap in the wet. It drops dramatically and is one of the key differences in wet-weather football. It’s more of a measure of performance rather than tactics. Something that is more of a measure of tactical changes is players choosing to kick or handball.


Perhaps the two stats are related, kicking is more inefficient than handballing but there is the prevailing thought in the wet that you should boot it long rather than dish it around with the hands.

Nevertheless, in the modern game of many stoppages and flooding the contest there’s a lot of contested ball. In fact, in the wet there is much more contested ball than in the dry.


Interestingly, tackles per contested possession (a team measure I used called “Tackling Pressure”) is almost unchanged in the wet. I would have expected tackles to not “stick” as much but the definition of a tackle requires one to affect the efficiency of the disposal, and with many disposals inefficient by default in the wet there may naturally be more tackles recorded.

Picking up the soap

There are a lot of inefficient disposals in the wet, a lot of dropped marks and a lot of stoppages. Picking the ball up and having clean skills is going to be a boon in the wet. Without having numbers on things like “loose ball gets” (I know they’re recorded, just not publicly available!), I have to rely on looking at other stats to infer these things.


A clanger accounts for many different errors including unforced dropped marks, turnovers, free kicks conceded, etc. Also note that intercepts are the consequence of a turnovers. More evidence that wet-weather footy is a scrappy affair.

It’s the little things that count, or not?


Spoils, tap-ons, shepherds, smothers all come under the “One Percenter” stat. On average there are about 25 more per game in the wet. More one percenters fits the narrative of less clean possession. Unfortunately,  the correlation of One Percenters with the outcome of winning a game of footy is very poor.

What about the wind?

In all of the plots above I have plotted distributions for “windy” games as well. I take these with a grain of salt, really. Most of the “windy” distributions are bimodal and probably could be further split into just wind-affected games and rain-and-wind-affected games, but then the sample size would be irrelevant. I would wager most games that are wind-affected but aren’t really obvious are just blown over (yes, I did) in the match reports, so I don’t have a record of it.

What about the heat?

There’s just not enough data to make any meaningful observations.

How do you win wet weather football?

Well I haven’t answered that, and I don’t think anyone can with the available stats. What I can say is that almost all facets of the game are affected. The reduced ability to execute skills properly is a clear result of wet conditions. Being more efficient correlates strongly with winning a game in the AFL so having those skills to handle the wet ball and dispose of it smartly is surely going to be effective. But that’s no revelation.

Scoring accuracy is strongly affected too, the obvious recommendations are kicking straighter and setting up easier shots at goal (introducing more possibilities of turnovers). The publicly available stats are just not good enough to evaluate things like this.

What would be real interesting is to look at stats like loose ball gets. Being able to capitalise on the natural inefficiency of disposals in the wet should be a good predictor of the desired outcome.

I would also think that player positioning would play a key role. Having players in the right zones; close to both pick up loose balls out of a contest and ~60 metres back to intercept long bombs forward seems like the way to go. With lower disposal efficiency it should be less about covering a player and more about covering a probable landing zone.

Aside from analysing player GPS data (which I don’t have and am not good enough to do anyway) a easier measure may be total distance run by a team. I don’t have this data either.

The first few plots are the most interesting to me. In “Damp” games (including things such as dew-affected games, wet ground, etc.) there are counter-intuitively more Inside 50s, and these are less efficient, than both “Dry” and “Rain” games. Do teams neglect to switch into “wet-weather mode” when they should?

I intended to use the weather data in my models to better predict things such as upcoming game totals and margins, and I shall, but with a bit of uncertainty regarding how many of the actually weather-affected games I’ve recorded.

Round 16 Review

Pretty close to a very unique 9/9 but the Giants didn’t quite get up in the end.


I barely lost out on the bits and I seem to be getting some better looking probabilities now (close to other models). It’s going to be difficult to catch up.


Some devastating losses for Sydney, GWS, Essendon, Adelaide this week to really impact their finals chances. Sydney still odds on to make it, but they drop back to the pack of 6 fighting for 5 spots in the 8. GWS a bit of a smoky but injuries (and a pretty key suspension) will probably see them fade in the next couple once their expected “form” catches up.

Out of 10,000 simulations, Richmond made the Top 8 in every single one. It’d take something seriously dramatic (probably involving multiple key player outs) to change that significantly.



Hopefully by next week I’ll have a game-total model so I can simulate actual results (and thus, percentages) because it’s getting pretty bloody tight!


Environmental factors affecting AFL outcomes – the weather

As part of my previous piece that began to explore elements of home ground advantage (HGA), I identified that in order to be able to isolate the effects of HGA, one would need to first account for player performance, team performance and other environmental factors.

A model to predict the outcome of a sporting match

As discussed in the previous piece, the environmental factors consist of of many measurable and immeasurable modifiers that affect the outcome of a game. This could include home ground advantage, the weather, if a team is coming off a short break, if a team has traveled a lot lately, if players are carrying injuries, if there’s a player milestone, if it’s a “rivalry” game, and the list goes on. Some of these are easier to look at than others. In time I hope to investigate and, if necessary, account for all of these factors.

In this piece, I’m going to focus on the weather conditions and how they affect the outcome. Like my previous piece, there are more questions than answers at the moment. Nevertheless, I hope you find this thought-provoking!

Wet Weather Football

Playing in the wet is a different game altogether. The physics of the game changes. The ball is slippery; making slick handballs too slick, contested marks rare and ground ball pickups difficult. The players are slippery; tackles don’t stick as well. The slippery ground however makes the ball bounce a little straighter so there are benefits to exploit, but I digress. It’s easy to argue that wet weather affects the game, but how can we measure it?

“Now over to Tony Greig at the weather wall”

The lads that manage fitzRoy (a magnificent package for those interested in a leg-up start in probing AFL stats) include some BoM rainfall data for each match in the 2017 season; which seems like a good place to start. From BoM I scraped historical rainfall data from the nearest weather stations to each AFL ground and attached the game day’s rainfall to each game from 2011 onwards. There were plenty of gaps in the data, from 1563 games I ended up with 953 records. The standard stereotype of wet weather football is low scores. Below I present a set of histograms of the total points scored in matches with different daily rainfalls.


Wow! Either rain has little effect on the total points scored, or these daily rainfall figures do not represent the conditions at the football ground during the match.

In Round 5, 2015, the Gold Coast local weather station recorded a daily rainfall aggregate of 132mm after 9am. By 4:35pm when the Suns took on the Lions, the ground was seemingly dry and there was no report of wet/slippery weather and scoring certainly wasn’t affected. These types of anomalies (I’m sure there are many) make the daily rainfall near the ground a poor measure of the actual effect of the rain on the game.

While this annoyed me a bit; quite a bit of time was spent scraping and organising the rainfall data, it did illuminate a possible next step. As mentioned above, the match report did not mention the weather or the conditions being a problem. Maybe parsing match reports for keywords could work! This year in Round 10, Geelong took on Carlton at Kardinia Park. The game was low scoring, the daily recorded rainfall was nil, but the match report mentions the “dewy” conditions.

So this seems like a sensible option; if the game conditions are going to be mentioned anywhere it will be in the match report — if the conditions affected the game. I doubt this will be absolutely error-free but I’ll wager it will more accurate than rainfall data.

How Loquacious are Footy Journalists?

The actual task of parsing every match report for keywords is formidable. It can be done, I’m sure, with a big enough vocabulary and suitable processing strategy. I’m currently in the stages of sampling match reports and manually finding a suitable set of keywords for different conditions. As this task sucks it’s happening very slowly. While I work up the enthusiasm to tackle this, another question popped into my thought process.

How do you quantify the conditions?

One could be as descriptive as they wanted with the conditions, specifying how much rain there was, if it was windy (and if it’s prevailing or swirly), if it’s dewy, if it’s hot, humid, etc. The more information you use will likely lead to better model fitting to the existing data. This presents problems:

  • for many combinations of conditions there will be a paucity of data.
  • if your parsing of match reports is wrong (or the journo was exaggerating to forgive their team’s performance!), you’re trying to fit a sophisticated model with rubbish data.
  • if predicting future results is your goal, you need to know exactly what the conditions are going to be to get a good prediction using your model.

At the other extreme, the most simple way to quantify conditions would be to attach a binary variable to each match: Is it weather-affected? Yes/No. This is as fool-proof as you can get, any keyword showing up in the match report will trigger it, and you can be fairly certain a day or two in advance whether weather will affect a game.

As I like to do in almost all areas of my life, I look at the extremes and always end up somewhere between them. In this case, I plan to parse match reports for keywords relating to rain/dew, wind, and heat/humidity separately. I will be giving each game a nominal score from say, 0-10, a measure of the strength of the condition. This will give me the option of implementing each weather type as a binary (yes/no) or ordinal (0-10), or just a single “is it weather-affected?” binary variable.

It’s more than just a number

While I think even the simplest approach outlined above will give a decent idea of how the weather will affect the margin/total points — certainly better than the rainfall data I hope! — it’s about more than just that. While the current aim of my modelling is to improve my understanding of the available stats through predicting future results, there are also more interesting questions I hope to be in a position to answer in the future.

Wet weather football, what is it all about? What type of team does it best? My model uses a number of different variables to measure each player/team’s performance:

  • Scoring,
  • uncontested play,
  • contested play,
  • ball movement/delivery,
  • defence
  • experience
  • air (ruck and contested marking)

Including all of these measures; along with weather measures, has the potential of elucidating what skills, team balance and game plan work in different conditions.

Cheers for now.



Round 15 Review

A horror round for almost everyone it seems.


A couple of things I’m thankful for:

  • my model’s chronic underestimation of the margin came good this week!
  • Breust the late out for Hawthorn pushed the game in GWS’s favour, yielding me the right tip. (go player models!)


Meanwhile, in the actual footy, things are getting tight around 8th-10th!


You’d have to say percentage is going to play a big part (unfortunately I can’t really simulate that yet)


Home Ground Advantage – A Mess

Everyone has done a piece on home ground advantage, and now it’s my turn. This will hopefully be one of a series of posts, the next one or two will hopefully complete this module of my model and hopefully not be a complete waste of time.

In the development of my model, figuring out how to best quantify home ground advantage was difficult to approach. At the moment, I use a very simple measure to account for “team travel”, and use adjustments for each team and player as to how they play at home or away given their upcoming fixture (i.e. Scott Pendlebury would be expected to contribute less to a Collingwood away game as his recent away form is poor.)

I have identified seven possible predictors of home ground advantage, and how each of them may be quantified:

  1. The actual venue itself
  2. “Morale” from playing to a home crowd (?)
  3. “Favouritism” from the umpires (free kick differential)
  4. Familiarity with the ground/facilities (count of previous games played for each team)
  5. Not having to travel far (travel time for each team)
  6. Players sleeping at normal home (boolean for each team)
  7. How often they travel (interstate games per season)

Most of these are measurable from available data on past games, and predictable through the fixture.

Other models deal with HGA by applying a correction to the margin in the form of a flat number (Matter of Stats), or a percentage (possibly different for each venue?), or consideration of some of the above to get a HGA variable into their model (i.e. FiguringFooty, The Arc). Some just ignore it altogether and do pretty well (HPN).

In this post I will investigate the first 3 of these identified predictors and I will investigate their usefulness (or lack thereof). Following this a general discussion of the difficulties of distilling HGA out of existing data.


First, let’s have a look at some of the available data to explore some of the elements of HGA. Here I am using data from 2011 onwards. I could use data from further back but I like to keep things modern.

A broad viewing of game result data shows distinct differences between many of the common AFL venues. For each ground, the distribution of the margin and total points is presented in the following figure.


There’s a lot to unpack here. I’ve only included venues with more than 25 games played in the period or you get some real outliers (Jiangwan Stadium, for example). For clarification, a positive margin indicates a home victory.

While not a huge focus of mine at this stage, the total points scored does show variation, indicating it may be better to consider a percentage HGA bonus rather than a flat points bonus.

On the surface, the ‘Gabba is often a disadvantage to the home team; but that home team is Brisbane, who haven’t cracked the finals since 2009. York Park provides a median 42 point advantage; but Hawthorn mainly play there and they’ve been rather good. Without discounting individual margins by the strengths of the teams on the day, it’s difficult to tell whether each ground has an independent HGA, a common HGA, or no HGA at all! I’m keeping (1) as a possible predictor at the moment until more analysis can be done.

The more interesting data, perhaps, is that of the Melbourne venues MCG and Docklands. Firstly, the large number of games played there gives a better set of data to examine. Secondly, all Melbourne teams play home games there so on average, there should be less bias towards “how good” the home team is. If we filter games to Melbourne teams vs Melbourne teams (i.e. not Geelong) at the MCG and Docklands, things look very even!


For this data (360 games), the mean is -0.825 and the median margin is -1. There is no perceptible skewness in the distribution. From this sample, it cannot be said that there is an advantage (p\approx 0.71). But is this actually important? The only differences for the home and away team in this set of games is the change rooms they use (I think?). I suspect there may be a larger ratio of home fans in attendance but given the capacity of the grounds, not many fans would be locked out. Either way it makes no perceptible difference. At least for moderate differences in crowd it’s probably acceptable to dismiss (2) as a possible predictor.


Let’s now consider another common gripe about Home Ground Advantage, that of the perceived favouritism of umpiring decisions. My personal view is that the free kick differential is not indicative of favouritism, and more indicative of player indiscipline. Possibly this is a mental effect from playing away from home! Without reviewing every decision and classifying each as a “justified” free kick or an “umpiring error”, it is not possible to comment on favouritism as a concept. Nevertheless, let us look at whether teams get more free kicks at home, and if this results in more wins.


This is the data from all games since 2011. In the central plot, a darker colour means a higher frequency of data. On the right-hand side is the distribution of margins (positive means a home victory) and on the top is the distribution of free kick differential (positive means more home free kicks).

Firstly, home teams DO get more free kicks (p<10^{-12}). From the 1554 samples, on average, home sides get 1.70 more free kicks. And of course, home teams score more than their opposition, (p<10^{-10}), 7.97 points on average.

On the face of it you could easily make the connection that free kick differential correlates with the margin. The central plot tells the story that this is simply not true. The free kick differential is not a good predictor of the margin. There are many games where the free kick differential and margin have the opposite sign, almost as many as where they have the same sign. Just beacuse I’m playing around with visualisations at the moment, here is a plot of the Inside 50s differential vs. the Margin:


This is a much better predictor.

I aim to look at some of the other predictors (4-7) in a later piece after I have done some more work on it. For the moment I’m just going to consider some thoughts on how to proceed after doing this work!


The Scale of the Problem

There are a number of challenges facing this analysis. Firstly, let us assume the following model for predicting the outcome of a match


The team performance and player performance of each team may be predicted using their form. Environment factors include things such as HGA, the weather, and other possible factors such as if a team is coming off a short break or the bye.

To get a good measure for HGA one would need to dial out, for each past game, the effect of team performance, player performance, and non-HGA environment factors to work out an adjusted “game HGA”. From this measure, a model with each of the relevant HGA “predictors” identified could be matched.

Without doing any of the quantitative measurements, it’s easy to argue why this is going to at least be very difficult. The HGA is prevalent in the team and player performance too. Although this can be predicted from past data, this means that the full effect of HGA will be difficult to sum up. Furthermore, after removing player and team performance bias, the question remains on how to account for other environmental factors. It will likely be necessary to fit all environmental predictors (HGA, weather, etc.) simultaneously.

Then there are other problems. Is it possible that each venue has its own HGA independent of other factors? Does this change over time, i.e. how does stadium development affect this?

While I have a decent grasp on team and player performance, my model currently neglects to take weather into account (more on this in a future post I hope) and already includes HGA bias for the team and player performance. I am not in a position to attempt this quantitatively at this stage.

Nevertheless, I have some better ideas of how to proceed with this difficult problem. Firstly, I need to use player and team performance to quantify a residual “environmental” margin for each game (encompassing HGA, weather effects and noise), then examine the effects of venue, travel time, days between matches, and determine a way of describing the effect of weather.

It’s easy to see why a simple measure of HGA is attractive.

To be continued.