An Introduction to the Model

Absolutely no-one has asked me for details on my methodology yet, and I’m happy to provide answers for these non-existent questions.

I guess the main idea behind the model is the concept of a team being more than just a sum of its players.

\displaystyle P = P_t + \sum_i P_i

Overall performance P is a sum of team-related performance P_t and the total contribution from each of the players P_i.

I arrived at this idea from observing the freely available statistics that are published by invaluable sites such as AFLTables and Footywire. Individual player contributions are easy to see and understand, but there are other features in the stats I was interested in; for example, does a Rebound 50 reflect on the performance of the player awarded the stat, or is it more closely related to the defensive structure of the team as a whole? Is five Rebound 50s worth as much if the opposition have had 80 inside 50s, as opposed to 40?

I divided the relevant statistics (almost all of them?) into different categories of team and player performance. I arrived at seven different categories. As is the go in footy data analysis circles I came up with a snappy acronym; SOLDIER. For each of the categories, I painstakingly weighted each relevant statistic to favour the statistics that better correlate with the outcome (winning the game). A team performance in a game is described by FOURTEEN (!) variables; seven for the sum of player performance and seven for the team performance. For each game, the difference in these fourteen variables is hypothesised to relate to the difference in the final scores, i.e. the margin.

I recognise that this model is considerably more complicated than other footy models I’ve read about online, but footy is a complicated game!

I began this project after spending some time learning about data analysis; in particular applying machine learning techniques. After doing a few beginner projects through sites like Kaggle, I figured I had enough of the basics to give this project a crack. Unlike the rest of my mathematical life where I use techniques that I have a strong base of understanding in, I have no more than a basic understanding of how machine learning actually works.

Once I have a better grasp on machine learning and refine my model, and the many parameters embedded within, I may publish more details on the categories and the statistics important to each.

I hope to be in a position to be able to predict results, rank players, make ladder predictions, but also to see if the machine learning models can give any insights into concepts such as team balance, matching up of teams with different strengths and weaknesses, etc.

This is primarily a learning exercise for me but I believe (please correct me!) that no other well-discussed footy model is using machine learning techniques, so I hope this is of interest.



Round 11 Review

A pretty good round I think. My models seem to be under-predicting margins, something I didn’t really pick up until I started looking at individual games. Having said that, my probabilities are tending to be higher than other models around and that really helped my BITS score.


Perhaps the volatility estimation for player/team performances I’m using is not optimal and I’m getting a skinnier bell curve of simulated results than others. We shall see!

This year I’ll be focussing on tweaking my model, sussing out its strengths and weaknesses and measuring it up against others. Although I have simulated data from the first 10 rounds, that was simulated blind to the actual results, I will be measuring it against results only from this round onwards; just in case my slightly messy code managed to have prior knowledge.



The AFL Lab

Welcome to the AFL Lab. This project is part of my ongoing education in data analysis. I love footy and numbers, so why not combine the two? I have a strong mathematical background but I’m comparatively weak on the statistics side. This is my attempt to rectify this, in a very reckless and un-rigorous way.

Normally when approaching a problem it is standard practice to start with something simple and add complexity (Occam’s Razor?), but I have gone all-in, throwing stats haphazardly at scikit-learn models. Will it work or will it explode?

My formulation is currently very unrefined, with many parameters (and probably way too many parameters) yet to be tweaked. Nevertheless, having simulated Rounds 1-10, 2018, my model has tipped 63, average margin 28.5 and a bits score of 16.61. According to the Squiggle leaderboard as of today, the leading model is on 62/28.17/14.58.

The model is not completely ready yet (it’s about 5 tips behind in a simulation of 2017), but it’s doing something right. So over the next few weeks I might write a few things about my modelling process and I’ll post round predictions/reviews and any other little fun bits I’ve found.

I’ll probably post a bit more frequently on Twitter at @AFLLab