Featured

Full AFL GPS Data: A Taste

TLDR – I’ve captured full GPS trace data of AFL games from the AFL Live app and I’ve uploaded the 2021 AFL Grand Final data here.

In the last few years there has been more interest in sport analytics, including the AFL, both at the amateur and professional level. The data-related features in media coverage of the game has seemed to increase too, suggesting that public interest is also there. The number of people providing amateur analysis on blogs and Twitter, etc. has increased dramatically, and a handful of these amateur “analysts” have since gone on to fill new professional roles at clubs, publish books, and land writing gigs in the media.

Quantifying the game is seen by some as ruining it, but the reality is that the AFL is a professional league and clubs are doing themselves a disservice by not exploring every possibly opportunity to succeed. I do refer exclusively to “AFL” as a competition here and not the sport itself, as all the relevant data is only available for the league.

Doing good data analysis requires good data. Champion Data is the official stat provider for the AFL (who owns a large chunk of Champion Data) and to my knowledge sells/gives to media and the clubs. The public only gets what has been published by those who buy the data. While the average punter may visit the AFL website for their data, more sophisticated data wranglers will seek out a third-party source (like AFLTables, Footywire, or more recently Fryzigg) that aggregates data in manner for easier consumption. While the tallies of data provided on these sites is good, the data lacks a lot of context! As an example, both an od kick to an unmarked teammate in defence, and a 50m bullet pass inside 50 to a key forward is registered as a “kick” (and both “effective”!) but the latter clearly is worth more to a team and should be credited as such.

There are a lot of identifiable variables that could be used to contextualise individual events (location, game score, time left, location of teammates and opponents, the weather, crowd, etc.). Champion Data’s “Player Rating” is the result of contextualising and “scoring” every player’s actions in order to quantify their contribution to the match outcome. In the previous example, the latter inside 50 kick would be worth more Rating Points than the defensive kick.

As Rating Points are available back to at least 2012, what this means is that Champion Data do have all the raw data to perform such analysis. However, if you, or I, or a club wanted to either replicate the analysis, or build their own contextualised rating system, this would not be possible as the raw data is not available. Even the clubs are only allowed to request their OWN players’ GPS data, meaning they would have no context around opponent positioning, etc.

It is entirely conceivable that someone (or a team of people) could manually transcribe every player’s location at all times during games by manually spotting and plotting off game footage. Besides being an absolute time sink (and maybe an abuse of human rights against the spotters), it would be prone to errors and entirely unnecessary as the data already exists. Manual transcribing must already be happening in clubland to some degree, where researching the setups of teams at stoppages and in defence, for example, are standard parts of opposition analysis. It has been suggested to me that Champion Data do not even have GPS data, and they have manually transcribed event locations themselves (this was definitely true at one point). Either way, a lot of repeated efforts could be saved, and the playing field would be levelled if everyone had the “official” raw data.

For clubs, data hunters, and collectors, the holy grail (I’m so sorry) of complete data would include:

Full GPS traces of all players
Timestamped, location-coded event data (possessions, disposals, stoppages, free kicks, etc.)

I’m happy to say that, actually, these sets of data are almost completely attainable, and have been since 2018!

The AR tracker

Before we get to the full data, it’s worth pointing out that the AFL have been releasing more data recently to keep us all interested. Late in the 2021 season, a new version of the AFL Live app was released with an “AR tracker” feature, an Augmented Reality which shows location-coded disposal data. The AFL has never released location-coded data before in this manner with this accessibility.

The visualisation of the data in the app (if you can find a flat surface) looks pretty, but for anything other than a casual look it’s not particularly useful as an analytical tool. If you wanted to do something else with the data (i.e. possession chains leading to goals, etc.), this isn’t possible as the raw data isn’t readily available. Thankfully, it didn’t take long for us enterprising amateur data nerds (and probably the club pros too) to strip the raw data out for use in other settings. A few people have already started to use the data to test out some fresh ideas.

Presuming you can get it there safely, the closer to the 50m line the better on inside 50 kicks. The deeper you get the ball the more likely you are to score (oval overlay is MCG so "hotspot" probably moves to top of the square). Deeper is also easier to defend on turnover. pic.twitter.com/spOeqFCI7l
— Richard Little (@alittlefitness) September 13, 2021

The underlying data is quite rich, containing a large range of match events that are both time-coded and location-coded! However, it looks to have been adulterated to perhaps limit its usefulness. Only data from the 2021 AFL season is available, all locations are rounded to the nearest metre, and times are rounded to the nearest second. It seems like it was a conscious decision to both round the location data, and to not make more games’ data available. Alternatively, these positions may be from manual spotting rather than from GPS. This would still be an excellent data source if more games were available, and would be almost enough to replicate the Player Ratings (if the methodology was known). The second shortcoming is the lack of location data for players not directly involved in play, but we will see that it can be found.

Smart Replay – “Visions”

Sometime earlier, in 2020 (?), the AFL introduced a “smart replay” feature on their website. On it, you can select a player and game and an individual stat (i.e. Q1 02:51 Kick) and it will seek to the moment in the game footage so the stat can be seen in action – however it’s more likely that the broken AFL website will just play some random unrelated footage from the same game. Nevertheless, behind this fragile website feature is another nice data source. It’s effectively the same data as in the “AR tracker”, without the location embedded, and is available back to the start of 2017! I guess one could use the “smart replay” tool to comb through the footage and locate events manually, but why would you do that when we know that the players are wearing GPS trackers? Although it is missing location data, and the time-coding is rounded to the nearest second, this remains the best source of event data available at this stage. Among other things, this allows us to work out results like “scores from turnover”, something oft discussed on game coverage but not published as a “stat”.

Player Tracker

As discussed above, it turns out that the AFL does actually broadcast all the GPS data for all players, and it can be captured and (with some difficulty) read! On the AFL Live app, there is a Player Tracker feature during live games. Only during play, you can see the positions of every player moving live, along with some possession events, etc.

It’s not a particularly useful feature in this form, as it doesn’t show enough information, and seems to lag even on relatively high spec phones. You cannot rewind and replay so if you miss something live (or the data doesn’t update quick enough) it shits itself and plays catchup all the time. However, what this does mean is that the app receives live streaming GPS data for all players. Keep in mind that this includes data that the AFL/Champion Data don’t even give to the clubs! And it’s all right there, on their app that anyone can download.

Like all the other data provided on the AFL’s app and website, it’s not useful in its presented state, so we really need to find the source of the data to obtain it in a useful format, much like has been done with the AR tracker and visions data sets. I’m aware that some others in the Twitter community have found this data, but as far as I know none have decrypted it and/or published it. I would also be surprised if no club analysts have this data, given that me, as a hack coder fuelled primarily by heavy use of a search engine, has managed to do so.

This not intended as a tutorial but I’ll briefly outline the steps it took to produce the resulting data.

Step 1: Find the API endpoint

Finding the API endpoint that a website uses is often straightforward (say, using Developer Tools in a Chromium-based browser), but there’s a few more hoops to jump through to look at traffic from an app. As the app blocks rooted Android phones, basic Android sniffing tools were not useful. The solution that did work for me was to use a man-in-the-middle attack on the phone to view the network traffic from the phone on a desktop.

Step 2: Grab the data

Once the endpoint is found, it doesn’t require any complicated cookies or anything to get the data using say, Python’s requests library. Like when viewing the player tracker in the app, the data is only “live data”. It turns out that only the last 10 packets (even if you ask for the max of 20 packets) of data is available for each game at any time, so if you’ve missed it at the time it was gone… Forever! It turned out that each packet contained one second of data so the API needs hitting every 10 seconds. I saved a bunch down to look at later.

There may be an endpoint that retrieves past data but that’s not something I was able to find, if it indeed does exist. It’s also worth noting that the API response can actually be quite slow at times, leading to missing data – but obviously this could well be issues on my side.

Step 3 look at the data

It’s a lot of data! A whole match of raw data from the API works out to be about 500MB. A single packet (which is 1 second worth of data) contains a number of fields including the game clocks, a timestamp, players on the interchange, players missing from the data, possession events, and the data itself.

The data itself is encrypted and encoded in a Base64 string of about 74,000 characters per second of data, and is absolutely useless until it is decrypted. The possession events are not encrypted/encoded and will be discussed later.

Step 4 decrypt and parse the data

With a bit (well, a lot) of work, the encryption method can be sussed out. Once it’s known, the data can be quite quickly decrypted using common crypto packages in any number of programming languages (I stuck with Python as I do).

There was a bit more work to do after decrypting, as the data was serialised in Protocol Buffers (protobuf) format before being encrypted, so that needed to be backed out also. Even if nothing had come out of this I’d definitely have learned a lot!

After these steps, you get 10 lines of data per player in a single packet (second) of data. Then It’s a matter of rinse and repeat for the remaining packets available, join them all up, and the GPS trace data is complete!

The Data

I’ve uploaded the data I captured from the 2021 AFL Grand Final. I have trimmed a lot of useless columns out of the raw data, and applied minimal processing. If you just want to jump right in, grab to data, unzip it, and run my iPython notebook to have a look at some very basic use of the data.

The “complete” zipped data is split over three files, which are comma-separated-value formatted:

GPSsample.csv – GPS traces (LARGE: 350MB)
possample.csv – Possession events
vissample.csv – “Visions” data

I’m not fully aware of what the circumstances around what the AFL is allowed to do with the players’ personal GPS data, but given what I captured was readily available on the AFL’s app during the game, in my view it’s been made public enough to share.

Finally, my notes on and interpretation of the data below are based off my observations only and may well be misguided or just plain wrong!

GPS Trace

I’ve uploaded data for the entire game without pauses, starting a number of minutes before the first bounce, and through until after the final siren.

name	timeEpoch	x	y	countdown	countup	timeon	hkl	home
BFritsch	1632560973050	0.081798	-0.25057	327	26	0	0	1
MHannan	1632560973050	-0.12397	-0.21145	327	26	0	0	0
ANaughton	1632560973050	-0.09799	-0.18903	327	26	0	0	0
TEnglish	1632560973050	-0.09739	-0.24415	327	26	0	0	0
LVandermeer	1632560973050	-0.10679	-0.21103	327	26	0	0	0
BSmith	1632560973050	-0.1003	-0.19873	327	26	0	0	0
KPickett	1632560973050	0.09499	-0.17277	327	26	0	0	1

Sample rows from GPSsample.csv

Notes:

Don’t bother trying to open and work with this in Excel, it’s too big.
Here timeEpoch is the timestamp in milliseconds since 1^st January, 1970 (UTC)
The co-ordinate system is centred at (x,y)=(0,0)
The “left” goal is centred at (x,y)=(-0.5,0), and the “right” goal line is centred at (x,y)=(0.5,0)
The “near” wing appears to be at (x,y)=(0,-0.4) and the “far” wing is at (x,y)=(0,0.4). Benched/subbed players seem to hang out around y=-0.5.
The value of +/- 0.4 for boundaries is just an observation. It is possible this could vary depending on ground aspect ratio. The ground for this game is approx. 165m x 130m.
The timestamps on the data do not seem to be very precise, it seems to lag/lead actual play by a few seconds when comparing with game footage and visions data. It seems to be a consistent break though so there’s obviously synching issues.
“countdown”, “countup”, and “timeon” are the self-explanatory period clocks. This data also includes pre-match and quarter-break “periods” without break. It is actually not trivial to split this data into quarters, and to discard non-play periods.
“hkl” is 1 if the home team is kicking left (this switches every quarter)
“home” is 1 if the player is in the home team – handy here as I’ve anonymised the names in no particular order.

Possession Events

name	time	pt	x	y	hp
RSmith	1632561668955	0			TRUE
HPetty	1632561682955				FALSE
	1632561682955		-0.23991	-0.25205	FALSE
	1632561698955		0.036014	-0.30376	FALSE
MGawn	1632561702955				FALSE
MBontempelli	1632561706955	1			TRUE
CDaniel	1632561708955	2			TRUE

Sample rows from possample.csv

“time” is in the same form as the GPS trace data. However, it seems that this is the timestamp of when it is sent in the data stream, NOT when the possession was recorded, so there is a delay here and it is not a consistent lag.
“pt” is presumably “Possession type”:
- <blank> means a play restart (centre bounce, throw in, etc.)
- 0 is a mark (receive from kick)
- 1 is a ground-ball get (either loose ball or bounced kick/handball)
- 2 is a clean handball receive
- 3 is a free kick
- 4 is for end of period (not actually in this dataset)
Co-ordinates “x” and “y” are only given for stoppages, unfortunately. Also, because the timestamp is not actually the time of possession the GPS trace data cannot be used to fill in the gaps here.
“hp” is TRUE if it is actually a possession, otherwise it is FALSE when it’s a stoppage, or it’s a non-possession stat (i.e. a hitout or spoil)
Disposals are not in this data
The sample data above can be interpreted as:
- RSmith marks
- HPetty spoils over the boundary
- Location of where the throw-in started from (the co-ordinates look way off for this line)
- Location of where the throw-in lands
- MGawn gets hitout
- MBontempelli gathers loose ball
- CDaniel receives handball
As the timestamps are not representative, I actually don’t find any use for this data once the visions data is available post-game. As it is live, it is however useful to contextualise my scoring shots on my live expected score Twitter bot @AFLxScore

Visions Data

I’ve taken the raw data available from the API (and also published by alittle), and trimmed a few columns. I have “exploded” the stats list column so that there is one line of data per “stat”. Additionally, I have added and further contextualised some events to my liking, i.e. I mark contested possessions from stoppages as “stoppagePossession”. Feel free to do your own thing with the raw “visions” data if you prefer. Everything in this data should hopefully be self-explanatory.

Name	time	home	Stats	period	periodSeconds
JMacrae	1632561356000	0	centreStoppage	1	0
JMacrae	1632561356000	0	centreStoppagePossession	1	5
JMacrae	1632561356000	0	groundBallGet	1	5
JMacrae	1632561356000	0	handball	1	5
JMacrae	1632561356000	0	disposal	1	5
JViney	1632561356000	1	tackle	1	5
MGawn	1632561360000	1	contestedPossession	1	9
MGawn	1632561360000	1	groundBallGet	1	9

Sample rows from vissample.csv

What have I done with this data?

Unfortunately, the data sets I have been able to capture live is limited, and I have no way of obtaining more, so it’s difficult to commit to any large projects without the ability to guarantee a more complete set of data.

I have however, attempted to location-mark the visions data of a few games I do have by using the GPS traces, with some success. As the GPS timestamps are not synched properly, each game requires manually adjusting the GPS timestamps by matching with the game footage. As the visions timestamps are to the nearest second, I located each event by the players average position in the second around the event (after adjusting for the GPS lead/lag)

I posted this visualisation on Twitter a while back (around the time the AR tracker came out). I actually said at the time that I used the AR tracker data, but that was a lie. The visualisation shows ball receipt location and disposal location from the GPS data, and though I hadn’t, I guess I could have plotted the whole trace between receipt and disposal rather than a straight line.

Coming (not very) soon to a brand new website near you! pic.twitter.com/mySJcA0o8C
— The AFL Lab (@AFLLab) September 9, 2021

I have also knocked up a little iPython notebook that demos how to use the GPS data to calculate a player’s maximum speed for the match. This was also bit of a test to make sure things lined up with the “official” tracker results. It turns out it can come pretty close to matching maximum speeds, but more pre-processing of the data needs to be done to filter out any junk points

What can you do with this data?

This is where you come in! What would you do with this sort of data if you had a large dataset? What would you like to see done? Do you think it would benefit the game?

There was a pleasing number of responses on my below teaser Tweet with suggestions as to what could be done with the data. Most of this is definitely achievable, but all rely on having a large amount of data to detect trends, compare players/teams vs. different opponents, etc.

If you had full GPS traces of AFL games, what would you do with the data?

STAY TUNED 👀
— The AFL Lab (@AFLLab) October 21, 2021

What next?

Data analysis in sport, like it or not, is here to stay. To thrive it needs good data. For whatever reason (and this may include contractual agreements with the players) the AFL is very protective of its data. Yet, I’ve shown that it is already broadcasted publicly on their app – hidden, but insecure – for all to see, and someone will always be enterprising enough to capture it.

I do have more of this data from the last season (and 2020, and maybe some 2019?), but I’m sure there are plenty of missing games. Further, I am unsure how complete each game’s data is. The volume of the data means it will take me a while to work this out, and the nature of the data capture method means any missing data cannot be obtained. Once I have been able to filter through and process my raw data, I will hopefully have a larger set of data to share with the community. If I do, I would very much appreciate feedback on the format of data I’ve presented here and if there is anything obvious I’ve missed.

-AT

Sharpshooters and Shankers

The 2020 Elimination Final between the Saints and the Bulldogs had the score line:

STK 10.7 (67) def. WB 9.10 (64).

The Dogs had 2 more scoring shots, and surely if they were a little bit more accurate they would have won! After all, if they had converted one of those behinds it would have turned the game. Looking deeper shows that this is a little naive; and we can do better!

A broader consideration could take into account how difficult each scoring shot taken was – compared to the performance of an average league player taking a similar shot.

If the Dogs kicked as accurately as a completely average player, their “expected” score would have been about 69 points from their 19 scoring shots, meaning that they underperformed and should have beaten St Kilda’s 67. However, the Saints also underperformed and should have got a whopping 79 points from their 17 scoring shots.

Scoring map for Western Bulldogs vs St Kilda, Elimination final 2020, showing expected goal scoring percentages

Although both teams actually kicked poorly, St Kilda had taken their scoring shots from much “higher percentage” locations.

This game has been deliberately cherry picked as an example that raw goalkicking accuracy (goals vs behinds) can be a little deceptive and not the whole story. Not every game is decided by good/bad goalkicking, but seasons can definitely be shaped by it.

Team	Exp. Pos	Exp. Pts	Exp. %	Actual Pos	Actual Pts	Actual %	Excess Pts
RICH	1	64	116	3	64	113.7	0
GEEL	2	60	127.9	1	64	135.7	4
BL	3	56	116.7	2	64	118.3	8
COLL	4	56	115.1	4	60	117.7	4
PORT	5	54	107.5	10	44	105.4	-10
HAW	6	52	106.1	9	44	108.7	-8
GWS	7	48	115.7	6	52	115.4	4
WB	8	48	109	7	48	107.2	0
WCE	9	48	102.4	5	60	112.5	12
SYD	10	46	98.6	15	32	97.7	-14
NMFC	11	40	101.3	12	40	99.5	0
ESS	12	36	87.8	8	48	95.4	12
MELB	13	36	87	17	20	78.6	-16
ADEL	14	34	96.6	11	40	100.9	6
FRE	15	34	92	13	36	91.9	2
STK	16	32	92.1	14	36	83.9	4
CARL	17	30	83.6	16	28	84.5	-2
GCFC	18	18	63.8	18	12	60.5	-6

2019 AFL Season ladder had teams kicked with average goalkicking performance

In the 2019 season, if every team’s goalkicking performance was exactly average, West Coast would have won 3 less games and missed the finals. Essendon would also have missed, and the wasteful Power and Hawks would’ve landed in positions 5 and 6 respectively.

But how useful is this? It’s, of course, not a certainty that every team SHOULD score as an exactly average set of goalkickers would.

Expected score could be used to tell us if individuals, or whole teams, reliably (with statistical significance) kick better or worse than average. It could also tell us if teams are attempting a lot of low-percentage shots, or if they are working at finding better scoring opportunities. Consequently, it could also tell us if some teams are consistently restricting teams better by conceding more low-percentage shots, or they are giving up easier shots more readily.

Today, however, I’ll just be focusing on individuals’ goalkicking performances.

What is expected score?

Expected score is a metric used in a number of different sports to evaluate quality of scoring opportunities. At a high level, the metric uses relevant historical data (i.e. all scoring opportunities in the league in the last 5 years) to produce a model that can predict the probability of scoring success under a given set of circumstances (distance and angle to goal, defender location, etc.), based on the underlying data. Expected score is well explored in association football (soccer) (expected goals, “xG”) and some other analytics-rich sports (i.e. ice hockey). In these sports, where total scoring is relatively low, expected score can give a “better” measure of scoring opportunities compared to just goals and total shots.

In Australian rules football (footy), a similar metric can be used. In some ways, it is more statistically relevant in footy compared to say, soccer, as footy games have a high number of scoring shots at higher scoring probabilities. For my expected score model, I take into account the following parameters:

Distance from goal
Angle from goal
Subtended angle (how wide the goal posts appear from the shot location)
Context of shot (i.e. set shot, free shot, pressured shot)

The technical details of the model are fairly close to Figuring Footy’s model, with a few minor differences not worth going into here (note: his model is probably better than mine). Robert did a lot of work on this years ago, before he was swept out of amateur analytics into the real world of club land. It’s worth exploring what he was able to do if you are interested in this area. I’ll be doing some similar analysis pieces to him but with hopefully a slightly different slant.

The choice of the above parameters is mostly limited by data availability. As an amateur I have to rely solely on what data is available in the public space. This sets some limitations that can’t be overcome, but nevertheless the model is still useful. The main limitation is that shot location data is only available for scoring shots. Consequently, shots that miss (out on the full, fall short, etc.), and rushed behinds do not enter the analysis, and these shots do not generate any expected score. Expected scores for scoring shots are therefore between a value of 1 (0% chance of a goal) and 6 (100% chance of a goal). This means every result below effectively assumes each subject misses shots at an exactly average rate (about 20%) – shanks out on the full from 20m out straight in front are not registered!

Some other things that could be considered include the prevailing weather/ground conditions, the individual’s goalkicking ability, crowd effects and game context (early/late in game, close game or blowout, etc.). While it would be possible to account for many of these, this has the potential to partition data so much that each individual scenario has too few historical precedents to produce a meaningful estimate of performance.

The AFL data custodians, Champion Data, produce their own expected score metric (published in the Herald Sun). While I don’t have the full details of their model, they are able to take advantage of more data available, including non-scoring shots, pressure rating, kicking skill selected, etc. My results are different to theirs but broadly show the same patterns. Curiously CD’s expected scores always seem to be higher, but I don’t have the database to check whether this is valid or not (over all the data, total expected score should equal total actual score).

Is goalkicking just dumb luck?

This is a very important question. To determine if a player is a good or bad goalkicker requires a proper statistical analysis. Such an analysis would effectively determine the chance that a completely average goalkicker would perform as good (or as bad) as the subject. We can never be completely certain if this is true, we can only protest vs. reasonable doubt. What “reasonable doubt” is as a value is up for debate.

For the analysis below, I will quote a “sig (%)” value for each subject. This is the percentage chance that they are actually better (positive) or worse (negative) than the average goalkicker, based on the statistical analysis (using Poisson binomial trials, for those playing along at home). For example, a value of “-84%” means that I am 84% certain the subject is a worse goalkicker than average.

For each player in the 2021 season so far (up to but not including the R13 Queen’s Birthday match between MEL and COL), I have used my expected score model to predict the expected number of goals (xGoals) they should have kicked from their chances, and compared that to their actual performance. I have ordered the below table by the significance of their performance, and also included their significance over the 2018-2020 period. There is a qualification criteria of xGoals>5 to help ensure a sufficiently cromulent data set.

The Sharpshooters

name	goals	xGoals	scoring shots	miss rate (%)	score	xScore	2021 sig (%)	2018-20 sig (%)
JElliott	10	5.6	10	0	60	37.9	98.6	-50.8
JJKennedy	28	21.4	39	15.2	179	146.1	96.8	79.7
AMcD-Tip.	27	21	36	5.3	171	141.2	96	98.6
LFranklin	25	19.6	32	15.8	157	130	95.8	74.0
JBruce	34	28.1	44	15.4	214	184.6	94.5	79.6
ZBailey	17	12.6	22	24.1	107	84.8	94.3	-15.5
RGray	15	11	19	26.9	94	74	94	92.8
LBreust	20	16	25	16.7	125	104.8	92.7	71.7
ELangdon	10	6.8	13	13.3	63	46.9	92	-95.8
DFogarty	13	9.4	17	10.5	82	63.9	91.7	99.1
DMcStay	11	8	13	7.1	68	53.2	91.1	-95.9
WSnelling	10	7.2	12	14.3	62	48.1	88.7	90.1
GRohan	20	16.1	27	12.9	127	107.4	87.4	32.7
JJones	10	7.5	12	7.7	62	49.4	87.2	-15.6

Sharpshooters 2021 R1-R13 (excl. Queens Birthday)

While the “significance %” is the key result, it isn’t quite everything so we should be a little careful with the above results. Jamie Elliott (COL) has kicked 10 straight goals in 2021 to date with zero non-scoring shots, which is exceptional, but he just creeps over the qualification criteria. It’s also worth to point out in the 2018-2020 period, he was exceptionally average.

Someone like Robbie Gray is quite likely a good shot, but his miss rate (non-scoring shots) of 27% is quite high (recall that non-scoring shots are not accounted for in the expected score measure) means we may need to be a little careful. However, in 2018-2020, Robbie was again significantly better than average, confirming his good performances as consistent – this is the same for the other bolded names above too.

Surprisingly, a couple of names in there (red) were actually significantly bad goalkickers over the 2018-2020 period!

The Shankers

name	goals	xGoals	scoring shots	miss rate (%)	score	xScore	2021 sig (%)	2018-20 sig (%)
NFyfe	5	13.4	22	15.4	47	88.9	-99.9	-56.1
JBattle	3	6.5	10	28.6	25	42.4	-97.7	26.3
TJLynch	18	24.4	40	7	130	162.2	-97.5	-45.9
ANaughton	29	36.5	55	12.7	200	237.6	-97.4	-69.0
MKing	16	22.6	38	15.6	118	150.8	-97	-83.2
LDavies-Uniacke	3	6.5	11	8.3	26	43.7	-96.3	2.9
SHiggins	2	5.3	10	33.3	20	36.5	-94.8	3.4
SBerry	5	8.2	14	17.6	39	54.8	-92.3	N/A
IHill	10	13.7	22	35.3	72	90.7	-90.8	-65.1
MFrederick	5	8.3	16	0	41	57.5	-90.2	-61.9
LDahlhaus	4	6.5	11	15.4	31	43.7	-90	-45.5
JLukosius	3	5.6	11	26.7	26	38.8	-87.9	-25.2
SSwitkowski	4	6.5	11	0	31	43.5	-87.1	-15.9
LMcNeil	6	8.7	14	17.6	44	57.3	-85.4	N/A

Shankers 2021 R1-R13 (excl. Queens Birthday)

It’s no surprise to see Fyfe up there along with a few other names having goalkicking troubles this year. Naughton has had a solid scoring output this year so far with 29 goals but should be a fair way ahead of that!

Over a longer, previous period (2018-2020), this year’s bad goalkickers were still bad for the most part, but just not nearly as significantly so.

In the last couple of weeks, Jack Higgins has been put through the wringer for failing to convert, but it turns out he really isn’t doing that bad (across the season so far), having kicked 17 goals from an expected 17.6.

Where to from here?

There’s a lot more slicing and dicing that can be done with this data, and I hope to explore more in time. What we can see from a few basic results is that goalkicking is a skill in that some players are significantly better than average over long periods.

Most players cannot be shown to be significantly better or worse than average over a long period. This suggests that either most players are pretty much average and luck plays a big role, or players go in and out of form over shorter periods.

Next time, I’ll look at teams as an entity, and how the produce scoring opportunities, and what scoring opportunities they concede.

Introducing @AFLxScore

@AFLxScore is a Twitter Bot about to go into operation that will automatically post expected scores (xScore) of live scoring shots in the AFL mens competition. This will hopefully help footy followers get an idea of how skillful/lucky (or the opposite) each individual shot was. A team’s total xScore (vs actual score) gives an idea of seized or missed opportunities.

What is expected score?

xScore is a concept used in many sports (see xGoals, Expected Goals in soccer). It is a way of estimating what the average score a particular shot would score if it were taken over and over again, based on historical observations of similar attempts.

In soccer, it is a number between 0 and 1 as the most you can score with a single shot is 1 goal. In footy, the maximum score is 6.

In order to calculate an xScore, one needs to choose what factors they would consider in differentiating scoring opportunities. The most obvious, and possibly most important, is shot location: a shot from the goal square is expected to be much more successful than a shot on the boundary from 55m out. Other factors to consider could be:

Shot context (set shot, free shot on the run, pressured shot),
kick type (drop punt, snap, dribble kick, torpedo),
the individual kicker,
left/right foot,
venue, and which end,
weather conditions,
if the player on the mark can move,

and so on. As there are so many variables that can be considered, a modeller has to draw a line somewhere. If too many factors are considered, individual sets of circumstances may have too few precedents to form a logical estimate (it could be skewed by a few lucky/unlucky shots). Additionally, some desirable factors may not have readily available data.

Hasn’t this been done before?

Yes, it has! The methodology I use is heavily influenced by Robert’s. I’m not doing anything particularly new here, I am just optimising this for live calculations using the best publicly available data, and automatically posting it to a Twitter account.

What data is available?

The data I have is near-live location data of all scoring shots, the player, the time, and the resulting score. From this I can determine the shot distance and angle (measured to the centreline). Save for manually recording additional data from the broadcast, that’s it. Post-game, when all the statistics are published, it’s possible to differentiate the shot context. However what’s important here is what’s available live.

As the data to be used is only distinguished by location, not context, the model built is fit using all available scoring-shot data with no shot context. This data is processed and a smoothed fit is produced to ensure all potential shot locations have a estimate consistent with historical scoring shots near it.

Historical shot accuracy for scoring shots with no shot context

Hey, that doesn’t look right…

You’re right, it doesn’t! However, it’s important to realise that this is not saying that any shot taken from 70m+ out on a slight angle has a 30% chance of being a goal. The key limitation of the data I have is that it only consists of scoring shots. So what the above plot is showing is that if a score is recorded from 70m+ out on a slight angle, it historically has a 30% chance of being a goal.

A proper xScore measure would consider all attempted shots, including those that miss (out on the full, falling short, smothered, etc.). Any shot should have an xScore of between 0 (no score) and 6 (goal). In the methodology presented here when only scoring shots are considered, the xScore measures between 1 and 6.

What can the bot do?

After every scoring shot, the bot will tweet:

the game hashtag, the game time and score recorded
the player, distance, angle, expected score (goal accuracy %)
the teams’ actual scores (and performance relative to xScore)

I may present this information in a different, more friendly way in the future as this is still a work in progress. I considered building it to send a pretty image of the shot trajectory with some data, but I’m aiming for this to be a simple messaging service to avoid heavily polluting people’s twitter streams with excessive images. Having said that, I may build some additional quarter-end reporting with some additional functionality as development continues.

Please follow if you are interested and any feedback to @AFLLab is appreciated!

Adam

Win Probability vs. Margin

Tony from Matter of Stats recently posted a great visualisation with a comparison of the Squiggle models’ predictions for the first six rounds of 2019. The visualisation shows how each model related the projected margin to the percentage change of winning the game.

Everything looks fairly nice and smooth apart from the AFL Lab, which looks very noisy! Is this ideal, problematic, or fine? Either way, it’s not necessarily a conclusion you can arrive at just from this image. What is definitely clear is the AFL Lab model does things a little differently. I thought it would be a nice idea to clarify what is causing this “abnormal” behaviour and that it is somewhat intentional.

The SOLDIER model is fitted by correlating player performances, team performances, and game conditions (weather and venue adjustments) from past AFL games to the resulting margin of that game – the output. Like all models, is it not deterministic; that is the chosen inputs to the model are not all of the variables that contribute to the outcome of a game. There is always some uncertainty based on what the model doesn’t know. In total, there are fourteen input variables that go in to the model to produce the single output of the game margin.

If, for a game previously unseen by the model, all of the player and team performances are known, the predicted model by the margin rarely differs from the actual margin by more than a goal. This method of cross-validation is a demonstration that the model is not over-fitted and is suitable for fresh observances.

When predicting the outcome of a future game one does not know what the inputs to the model will be! So, there needs to be a way to predict what the team and player performances will be in that game in order to determine the inputs; in order to determine the output. In examples, will Lance Franklin kick 0.4 or 6.1? Will Scott Pendlebury have 30 disposals at 90% efficiency, or 30 disposals at 60% efficiency? Naturally it’s impossible to predict this directly, but it is possible to forecast a distribution of expected performances, based on the players’ and teams’ past performances.

A player can perform better in some areas, and worse in others.

The SOLDIER model assumes all player and team inputs to a game to be predicted are normally distributed. The distribution measures (mean, variance) for each player and team input are calculated from their past performances. Because of this, as the inputs have a distribution of possible values, the output (predicted margin) also has a distribution – but due to the nonlinearity of the model the distribution of predicted margins is not necessarily normal (it may be skewed, bimodal, etc.)

As a brief aside, why is the model nonlinear and what does that mean? Consider a hypothetical case study game where Team A beats Team B by 20 points. If the same game was played again, and Team A performed 10% better (in some sort of overall measure), would the margin be exactly 10% higher, for 22 points? Maybe that performance increase resulted in only a single behind (for a margin of 21 points), or two goals (32 points). Even with a single performance measure, there is not necessarily a linear relation between the inputs (team and player performance) and outputs (margin). The SOLDIER model takes fourteen different performance measure inputs, and relates them to a single output. The combinations of categories over-, under- and par-performing is exhaustive and any such combination could lead to a different, or no change in the outcome.

Anyway, back to the main narrative of the post. Now that there are distributions of player and team performance, and a model relating performance to margin, how do the predictions come about? For a particular future game, it is simple to randomly sample a performance for each team and player and put this into the model to predict a margin. One such realisation is just that, one possibility of what could happen. To gauge an broader overview of the possibile outcomes, a large number (say 50,000) realisations can be rapidly calculated to get a distribution of margins. This is called a Monte-Carlo simulation.

From this distribution of margins, a number of predictions can be pulled out. Firstly, the median of this distribution is chosen as the predicted margin for the game. The proportion of realisations where the home team wins represents the home win probability. Even though the margin distribution may not be a normal distribution, the standard deviation of the distribution can be calculated and represents the margin standard deviation. These three predictions are sufficient to adequately describe what the model believes could occur – and are the predictions that advanced tipping competitions like the Monash competitions take.

The margin standard deviation calculated by the model is the key driver behind AFL Lab’s anomalous probability-margin “curve” demonstrated by MatterOfStats. The standard deviations the model produces are generally between 18 and 50 points, and mostly on the lower end. This far lower than many other models (usually between 30 and 40). I would argue that it makes sense that the standard deviation SHOULD vary between games – an expected blowout could be a modest or huge victory (large standard deviation), but a closely-matched game in wet conditions suggests a smaller total score and a lower standard deviation is appropriate. Due to this variable standard deviation, two games with the same predicted margin can have a vastly different home win probability.

For a game with a predicted margin of 20, the standard deviation changes the probability significantly.

It is this reason why my probability-margin “curve” is not a curve at all – the probability is a function of both the predicted margin and the margin standard deviation.

I do have some reservations on the low standard deviations produced by the model – the random sampling methodology currently used is flawed and still very much under construction. Hopefully by the end of the season I will have a large enough sample to work on improvements from.

Until next time, which hopefully won’t be as long.

-A

There’s No 2019 Preview, Why?

In the past 10 months of coding, I’ve built up a number of tools and modules that navigate the data I have collected and ratings I have calculated. I had lofty ambitions of using these, and developing more, to provide a comprehensive preview of the upcoming season from the SOLDIER perspective.

It took a while to properly elucidate, but it was soon blindingly obvious that although such a prediction was possible, it would be pointless and inaccurate! The reason is easy to explain. The key outcomes I have chosen to focus on with the SOLDIER model thus far are on form-based predictions with the purpose of predicting upcoming games. The game-prediction model uses recent (5-game) and longer-term (20-game) form of player and team performances to predict an outcome using the players selected on the team sheets. Predicting 24 rounds ahead (plus finals) is a very long bow to draw when the ammunition is a set of darts.

The Problem

Later in the 2018 season, I started producing some weekly predictions simulating the rest of the season to establish end-of-season predictions based on current form. Naturally, my first port of call in predicting 2019 was to use this method to pit the teams against each other on both level playing fields (round-robin, each team playing each other home and away) to rate each team, and simulating the actual 2019 fixture as a more practical measure.

The results were surprising at first.

Wow, are Geelong that good? Are Sydney that bad? There’s a lot to unpack here, but something doesn’t quite add up. With such a broad prediction, the first sanity check for me is to compare to the bookies. After all, they’re the professionals in this caper. Just looking at the top 8 percentages, the ladder simulation is a lot more certain of things than the odds suggest*. Sydney, in particular, are about a 50% chance with the bookies to get into the top 8.

While this serves as a strong argument for the uselessness of such a long-range prediction, it serves as a reminder of the strengths and disadvantages of what I’m modelling. Furthermore, it provides direction for what could be done to improve such predictions in the future

The hurdles to overcome are plentiful if I were to predict a season with the model as it is:

Which players will play each game?
What effect does the off-season have?
How do you account for natural evolution of players?
Will a team’s game plan change?
Will rule changes or “in-vogue” tactics change what statistical measures win games?

*mainly because the player/team form distribution is fixed in the above simulation rather than using a Brownian-motion inspired model

1. Which players will play each game?

The SOLDIER model encompasses player statistics and form. The chosen players for each team have a small but noticeable effect on the predicted outcome of the game. For the above simulations I used the “first choice team” as opined on afl.com.au for each club, and adjusted it for known injuries. But of course, no team is going unchanged all season; injuries play a part, younger players get tried out, and selection can be based on the opposition. A more sensible way would be to look at a squad (say, of 30) and average out to 22 players, which would be fairly easy to do.

2. What effect does the off-season have?

It’s simple to argue that form will not necessarily carry over the off-season. There are too many immeasurables and unknowables to look at the individual off-season and pre-seasons of all 500+ players and adjust “form” accordingly. Are there any rules of thumb? To probe this question, I browsed player data for the last five home-and-away rounds of 2017, and the first five home-and-away rounds of 2018, and looked at the difference of each players’ average performances in these two periods. While there were mostly drops in stats across the board going to a new season, it was not statistically significant. As an easily relatable example, out of the 293 qualifying players for this study, the players scored on average -0.5 less Supercoach points after the season break (p=0.7)*.

A further thought was probed that less- and more-experienced players may be affected differently by the summer break. Further splitting the already-filtered data into players with less than 50 games at end of 2017 (N=92) and more than 200 games (N=28) also proved fruitless with no statistically significant differences across the break. There’s more that could possibly be looked at here but I strongly suspect little progress.

*This isn’t the best measure to use here as Supercoach points are scaled per game, but the p-values are similar for other unscaled measures.

3. How do you account for the natural evolution of players?

Sam Walsh will definitely play this year, barring tragedy. So, how does one predict what Sam Walsh will produce this year? There is no data on how he performs against other AFL-quality teams in games for premiership points. How could I handle him and every other rookie that may or may not play this year?

Currently, if a debutante is playing the model will assign the player’s performance to be a plain old average of every first-game player’s historical outputs, regardless of the draft pick, playing role, team, etc. By the player’s second game, and subsequent games, it uses their personal recorded data. This is a decent trade-off for simplicity in hanging debutantes on a week-by-week basis, but does not hold for long-term predictions.

HPNFooty have done some magnificent work with player-value projection based on analysing a current player’s output and comparing with other players on a similar trajectory, discounting for other factors such as player age. On a slightly different arc, The Arc has used clustering algorithms to classify players into particular roles. By implementing similar concepts and merging the two together, it could be possible to (manually) assign a debutante a playing role (say, key defender or small forward) to project a more meaningful prediction for a season ahead.

Other thoughts this question brought up is how to handle players undergoing positional change (i.e. James Sicily, Tom McDonald) and old fogies put out to stud in the forward line (GAJ), but these are more one-offs that are probably not worth trying to manually override.

4. Will a team’s game plan change?

While player performance is a focus of this model, as important (if not more so) are the team measures that input into the predictions. Each team itself gets a rating in 6/7 SOLDIER categories based on team form. These measures incorporate team-aggregate statistics that cannot be allocated to individual players (say, tackles per opposition contested possessions). These could, conceivably, be a function of both team performance as a whole and the team’s game plan.

How well does this team form carry over to a new season? Do big personnel changes effect a noticeable change in a team’s output? These are questions I planned to have answered before this moment but they’ll have to wait.

5. Will rule changes affect game balance?

The AFL is an evolving competition, there are very frequent rule changes that never really allow the game to settle to a point where all strategies with a given set of rules are explored. Having said that, assuming the player and team performances are projected as well as possible over a whole season, will the model’s prediction be accurate when the effect of rule changes is unknown?

The fit of the model is updated every round when there is fresh data. It compares the player and team SOLDIER scores, as calculated from the published statistics, and fits them against the game results. More recent games are weighted more strongly to reflect the prevailing style of football – and what combinations and strategies beat what combinations and strategies. Significantly better results have been obtained using this approach rather than using all historic data equally weighted to fit the model.

The effects the new 2019 rules will have is very much unknown, and the response in tactics from teams will naturally evolve over the season.

Conclusion

I had planned on presenting more data to back up the above points but time got the better of me and hopefully I’ll expand on this throughout the season.

Without a number of pending and unplanned improvement to my processes, a long-term prediction covering a whole season is not going to be purposefully indicative of reality. Sure, Geelong could top the ladder, but for the above reasons I wouldn’t bank on it!

The SOLDIER model

In this first post for 2019 I will give an outline of the AFL Lab model to be used for its debut full season. I originally started this project from my love of sport and desire to learn about machine learning and data science in general. It also coincided with a career move, which left me with a bit of free time for a while! It is by no means complete, professional and optimised and never will be.

***

Summary

AFL is a team sport that, like many sports, relies on a combination of individual and team efforts. A number of freely-available statistics are accessible to the public, recorded for each player in each game. For many of these statistics, the difference between the teams’ aggregates correlates well with the game outcome. These key statistics are selected from the data and assigned one of seven categories (SOLDIER); and each player in each past game are given a rating in these seven areas. In addition to these “player” ratings, certain team features (derived statistics) that strongly correlate with match outcomes have been identified and this affords a “team” rating in these seven areas. These team statistics are not attributable to particular players and could be considered a descriptor of an overall game plan, or just team performance. A machine learning model was trained using standard supervised learning techniques and parameter tuning. The inputs to the model are the difference in the teams’ aggregate player ratings (seven variables), and the difference in the teams’ team ratings (seven variables), with the output being the game margin. Future games are predicted by using recent data for the selected players and teams to predict each the model inputs for the game, with appropriate error tolerances for variation in form. This allows Monte Carlo simulations of each game, producing a distribution of outcomes. Simulations of past seasons produce accuracy similar to other published AFL model results. The model has the potential to bring deeper insight into many facets of the sport including team tactics, the impact of individual players, and

Aim

The aim of building this model is to implement machine-learning techniques to predict and understand the outcomes of AFL games.

Raw Data

There are a number of sources available that compile and store statistics for AFL matches, without which projects like these just can’t go ahead. AFLTables provides a comprehensive coverage of historical matches in an easy-to-handle manner. Footywire provides additional statistics for more recent games, and AFL.com.au fills the gaps and provides some text describing games. All three of these sources are implemented and scraped responsibly to maintain a database.

The statistics recorded, and the availability of statistics has changed in the past decade. The full gamut of statistics that is used in the current model have been available since 2014 and so earlier data is not used. While it would be possible to adjust the following analysis to account for missing statistics in past data, a key focus of this work is to consider the changing nature and tactics of AFL football and as such, including earlier games may be counterproductive to understanding the game as it is played today.

SOLDIER

The raw statistics were analysed against game outcomes to understand which have the strongest correlation: In each game, for each raw statistic, the sum of away player contributions was subtracted from the sum of home player contributions to obtain a “margin” statistic (if the margin statistic is positive, the home team accrued more of the statistic). The margin statistics were tested against the score margin and a simple Pearson’s correlation coefficient was calculated.

Following this, rather than naively looking at the raw statistics, a number of features (derived statistics) were calculated and tested in the same manner. Features have the potential of providing better context for the raw statistics, for example, a team recording many Rebound 50s (defensive act where the ball is moved out of defensive 50) is not that impressive if their opponent has many more Inside 50s; their success at defending the opponent’s Inside 50s is important, not the raw number.

The relevant raw statistics and features were then allocated different categories depending on what aspect of gameplay they represent. Seven different categories were identified:

Scoring – Directly scoring goals/behind, setting up others who do the same.
Outside Play – Also called “Uncontested”. Staying out of the contest and being efficient at it.
Long Play – Moving the ball quickly with marks and kicks. Getting the ball in the forward 50.
Discipline – Also called “Defence”. Doing the tough stuff. Spoils, intercepts, winning free kicks (and not giving away free kicks)
Inside Play – Also called “Contested”. Getting the ball in the contest, clearing it from the contest, and tackling. Efficiency important but less so than uncontested play.
Experience – How experienced are the players? The number of games played, finals played, Brownlow votes received.
aeRial Play – Commanding the ball overhead. Contested marks, hit outs, raw height.

The raw statistics and some of the features can be directly attributed to individual players, but most of the features are representative of the team itself rather than the individuals. These team measures could be considered a way to quantify teamwork and/or game plan. Each chosen statistic and feature has been distributed to each of the above seven categories, each split up by whether they are player-specific or team-specific.

	Category	Player Examples	Team Measure Example
S	Scoring	Goals, Goal Assists, Points/I50, Marks I50	Percentage
O	Outside	Metres Gained, Uncontested Possessions	Cont. Pos. Ratio
L	Long Play	Marks, Kicks, Inside 50s	Inside 50 Efficiency
D	Discipline	One Percenters, Intercepts, Free Kicks	Rebound 50 Efficiency
I	Inside	Contested Possessions, Clearances, Tackles	Cont. Pos. Margin
E	Experience	Games Played, Past Brownlow Votes	HGA adjustment*
R	aeRial	Height, Contested Marks, Hit Outs	Cont. Marks conceded

*The Team Experience measure is currently taken to be a completely deterministic variable that depends on how far each team has travelled to get to the venue.

For each game, each player and each team get a rating in these seven categories based on the above statistics and features. From this, it is natural to consider extensions such as overall ratings, analysis of form, and determination of a player’s role in a team. However, for the moment, the focus will be on development of the model for predicting match outcomes.

Model Construction

Match outcomes are to be predicted using a machine-learning model. The large number of input variables chosen in this project favours machine-learning models over other models widely (and very successfully) adopted in the sports modelling space.

Machine-learning models, in particular supervised-learning models, are designed to learn from known results and determine non-linear relationships that relate the inputs to particular outcomes. The variety and complexity of machine-learning models is vast, each with their advantages and disadvantages. This project implements techniques in the https://scikit-learn.org/ libraries, allowing many models to be tested side-by-side.

The model has fourteen inputs, and a single output:

Inputs:

Margin of player SOLDIER scores (7 variables)
Margin of team SOLDIR scores (6 variables)
Venue/HGA adjustment (1 variable)

Output:

Points margin of the game

Fitting models is very simple once player and team SOLDIER scores are calculated and rescaled. A common measure for selecting a model and tuning its parameters is a train-and-test model, where a proportion (say, 70%) of the data is used to fit the model and is tested against the remaining proportion. However, predicting an unplayed game is quite different; the player and team SOLDIER scores are not known a priori. It is necessary make predictions as to how each player and each team will perform in a given game; in order to predict the outcome using the proposed model.

In a previous piece, I examined how one could measure a player’s form, and what other mitigating factors can affect a player’s output. For the game to be predicted, the form of the involved teams and their players are calculated (mean and variation) to determine probable distributions for the inputs to the model. As predicting unplayed games is the goal, simulating games using no foreknowledge (i.e. only considering the past) is the only appropriate way to test the model. The only exception is that the Team Experience (aka Home Ground Advantage) is known as this is determined from the fixture.

Results and Discussion

I have performed full simulations of the 2017 and 2018 seasons to test a variety of models and tune parameters. The testing procedure is as follows, using 2017 as an example:

Train model using pre-2017 data.
Predict round one performances using pre-2017 data.
Predict round one results a large number of times (N=10,000) and record.
Retrain model with real round one data.
Predict round two performances using pre-2017 data and real round one data.
Predict round two results a large number of times (N=10,000) and record.
Repeat 4-6 for remaining rounds.

The large number of predictions gives a distribution that allows a win probability and a median margin to be recorded for comparison against the actual results. In the following tables, the results from four models are presented with the number of tips they got correct, the number of “Bits” (higher is better) and the average error in the margin (lower is better). The “BEST” row is the best performances in each measure from squiggle.com.au.

2017

Model	Tips	Bits	Av Margin
SVR1	120	12.06	31.08
SVR2	125	12.72	30.27
XGBoost	128	11.19	30.18
KNR	121	1.73	30.48
BEST	137	20.57	29.18

2018

Model	Tips	Bits	Av Margin
SVR1	141	35.68	28.42
SVR2	143	34.98	28.11
XGBoost	141	29.55	28.57
KNR	150	33.39	27.80
BEST	147	39.76	26.55

Models

SVR1 (Support Vectors Regression, parameter set 1)
SVR2 (parameter set 2)
XGBoost (Gradient Boosting Regression)
KNR (K-Neighbours Regression)

What is immediately noticeable is not only that different models are better at different prediction types, but also performance is season-dependent. On that second point, if these machine-learning models are picking up gameplay and tactics patterns, doesn’t it make sense that this would change from season to season? In training the models, more recent data is given stronger weighting to reflect this and small improvements (consistent but not necessarily statistically significant) have been observed.

The actual performance of the SVR2 model appears to be the most consistent over many seasons and in 2018 was comparable in success to other models with published results. This model, with a few additional tweaks, is the one that will be adopted for the 2019 season.

A deeper investigation into individual games reveals that with all the models, there is a tendency to under-predict the target. Games expected to be blowouts are predicted to be merely comfortable victories. While this does not affect the tip for the game, it evidently does have affect the margin, and to a lesser extent Bits. One example is the 2018, Round 18 game between Carlton and Hawthorn. Hawthorn were expected to win by over 10 goals, and they did. The SVR2 model predicted a median margin of -25 points (away team’s favour).

2018 Round 18, Carlton vs Hawthorn predicted margin distribution (SVR2). Actual margin -72.

That this happens with all models tested suggests an issue with how the inputs to the model are calculated rather than the models themselves. Recall that each player and team performance is simulated based on samples from a normal distribution along with their individual means and variances. This infers that in a given game it’s equally probable that each player will perform better or worse than their average. This doesn’t really make sense! One would assume that against a very strong team, player outputs would be less than a normal distribution would suggest. Of course, against a very weak team, player outputs would be higher. The best way to adjust for this is not obvious and is a focus of ongoing work.

Conclusions and Further Work

The model as presented today is in working order, has the capacity of predicting results in the ballpark of other models, and still has many avenues to improve. In particular, the following have been of interest:

Home Ground Advantage: The model still uses a flat score based on where the teams are from and where the game is played. There is clearly a lot more to Home Ground Advantage than that.
Team Experience score: Currently this is where Home Ground Advantage lives, originally it was planned to be a measure of how experienced the team is playing together; are there a lot of list changes? Coaching staff changes? This is difficult to quantify, and difficult to account for without manual intervention so it has been shelved for the moment.
Weather Effects: Wet weather affects the outcome of AFL matches, especially with regards to the expected scores and efficiency (see Part 1 and Part 2)

The game prediction model is just one arm of this project but is definitely the most technical one. By learning about and improving this model it is hoped that further insights into the sport can be uncovered.

Environmental factors affecting AFL outcomes – the weather, part 2

Today I continue my focus on the weather. In particular, I will look at some key statistics that differentiate dry-weather football from wet-weather football. Unfortunately, for the most part, the results are completely self-evident. However, there is a nugget or two in there that I think are interesting.

In the first part, I discussed how the prevailing weather and conditions of the game affect the outcome, as part of the larger overview of how all environmental factors affect all aspects of the game. There are a few obvious independent variables within the weather space that could affect the game; precipitation, wind and heat.

Precipitation anecdotally affects the game in a number of ways. Rain falling during the game keeps the ball and the ground wet, impacting on the efficiency of skills and even the choice of skills (“wet weather football”). The lasting effect of rain, perhaps after it stops, and other effects such as dew also causes these impacts perhaps to a lesser extent. Wet weather games are anecdotally characterised by “scrappy football”; less handballing, more kicking, and low scores.

Wind comes in a couple of flavours. In all cases the main effect is expected to be on long kicking and consequently goal kicking accuracy. Prevailing winds down the length of the ground provide a bias towards scoring at one end of the ground (i.e. a “five goal breeze”). A prevailing cross wind is a bit more of an unknown. Swirling winds can result from changeable conditions or heavy weather conditions, but also from the geometry of the venue; large grandstands particularly at the goal ends can produce some erratic conditions. It’s possible that this can be somewhat predictable based on knowledge of the venue.

Footy is a winter game but heat sometimes plays a part, especially near the front and back ends of the season. I don’t expect heat to be a huge factor, perhaps it affects player fatigue and creates a more open, high-scoring game.

Evaluating the conditions in past games

It is a relatively straightforward procedure to watch a game of football (or merely some highlights) and, with some knowledge of the game, evaluate the effect of certain environmental conditions on the outcome. An avid football watcher could easily do this on a week-by-week basis and keep a database. However, lacking an ongoing database, it would be very time consuming to individually to do this for a past season’s games, let alone multiple seasons. How can one efficiently and accurately record the conditions at past matches?

In the previous piece I scraped daily rainfall data from the Bureau of Meteorology at the closest weather station to each AFL ground and attached the data to each game. Then I examined the distribution of total points scored for a few different rainfall ranges. The hope was that games with more rainfall would be lower scoring.

RainfallVsTotalPoints

Unfortunately this was quite unsuccessful. It did, however, elucidate that what really matters is the conditions at the ground at the time of the game. The “daily rainfall” numbers reset at 9am, there’s a good chance that game-day rain could fall well before or after the game has been played and not affect conditions at all.

I then moved on to looking at published match reports for past games. The main idea is that if the conditions affected the game significantly, it would be discussed in the match report.

Methodology for match report scraping

I chose to use the match reports published on http://www.afl.com.au for the simple reason that the URL is formulaic and therefore easy to scrape large quantities of data. In example, http://www.afl.com.au/match-centre/2018/17/adel-v-geel is from 2018, Round 17, is an Adelaide home game against Geelong. For all games from 2014 onwards, I scraped the match report text into a database for ease of handling.

I then used Microsoft Excel to flag match reports that contain certain weather-related keywords. The keywords I chose (i.e. rain, slippery, windy, storm, etc.) were borne through a brainstorm and through reading samples of match reports. This allowed me to pass by a vast majority of match reports where weather wasn’t (seemingly) a factor. I also, as a matter of curiosity, flagged some reports where the total points was particularly low.

For the flagged match reports, I pasted the report text into Notepad++ and defined a custom syntax to highlight the list of keywords. This allowed me to efficiently and selectively read match reports to summarise the conditions described.

False flags

If journalists could stop using the following cliches, that would be marvellous:

<TEAM> stormed into contention…
<TEAM> stormed home…
It was raining goals…
<TEAM> came home with a wet sail…
<PLAYER> put the heat on…
etc.

How do you represent the conditions quantitatively?

Now, we have a good summary of the weather for weather-affected games. How do we quantify this in a meaningful way so that it may be used in a model? As mentioned in the first piece, one could be as specific as they like in describing the conditions of a game that’s already happened. This would give a very good measure of the effect on past games. However, my interest (at the moment, at least) is modelling games that haven’t happened yet. Having sophisticated measures for conditions is useless if you can’t predict the conditions with the same accuracy you can measure it with. After looking at the summaries of conditions I recorded, I decided to record weather with four binary (yes or no) variables:

If there was mention of wind (or inferred through description of “sideways rain”, etc.) the game would be classified as “windy”.
If there was mention of heat it would be a “hot” game.
If conditions were slippery (wet ground, actual rain, dew, humidity, etc.) it would be “damp”.
If rain fell for a significant portion of the match if would be “rainy” (and of course, also “damp”)

These variables should be very easy to measure in the future, and also relatively predictable from looking at weather forecasts.

Some initial results

The first thing to do is to see if the data passes a sniff test. When looking at the rainfall data I looked at total points scored as a measure. Generally reports mention the rain/damp conditions moreso than wind or heat, so let’s start with this.

This was a relief, the time spent was worth it! The samples are statistically different (t-test: p(Dry~Damp)< $10^{-7}$ , p(Dry~Rain)< $10^{-16}$ , p(Damp~Rain) $\approx 0.0015$ ) and are logical in that a dry game is expected to be higher scoring than a damp game, and damp game higher scoring than a rainy game.

For what it’s worth, the “Rain” mode (peak of the curve) is approximately 132 points, “Damp” is 145. The median total score is probably a better measure though:

Dry: 178 points
Damp: 151.5 points
Rain: 136 points

Some problems

While these results look good, they must be scrutinised. The AFL Data Twitterati suggested a number of things to look into when I tweeted the above plot.

What if the conditions just aren’t mentioned in the match report?
Are certain match report authors more likely to mention the conditions?
Is there an agenda in the reporting that might affect exaggerating/understating of the conditions?

These are excellent points. The first was also a prime concern of mine when doing this. To alleviate this going forward, bow that I have a database of past games, for subsequent games I plan to record conditions week-by-week based on my own observations.

Over round 17, when watching games/highlights I kept some notes about the conditions. I noted two games where the conditions were present. Fremantle v Port was affected by rain (and atrocious skills, mind :/) and this was noted in the match report. Hawthorn v Brisbane in Tasmania was beset by dew (as mentioned by the commentators regularly) and conditions were slippery. There was no mention of the slick conditions in the match report. Arguably the conditions were on the minor side and scoring wasn’t hugely affected, but nevertheless I would want this to be recorded for my database.

It is conceivable that my match-report parsing process is mainly flagging games where the adverse conditions had a noticeable effect, or the reporter mentioned it in passing (“skills were good despite the tricky conditions” and the like appeared sometimes). The consequence is that the distributions plotted above are most likely biased. This is not good for predictive purposes; I can predict whether a game will be damp but not whether the teams will perform/score well despite the conditions.

What I can say for sure is the games marked as wet, damp or windy are affected. So let’s see what sets these games apart. Today I’m just going to look at the distributions of certain key statistics that are considerably affected by the weather. Most of it is really self-evident, but it’s always good to have some quantitative confirmation of well-known theories.

Wet Weather Football

What changes? Everything! Well, almost. Let’s start with something that shouldn’t be affected too much as a bit of a control measure. An inside 50 is the movement of the ball (by carrying or disposal) into the forward 50 from outside the 50. I would argue the numbers should be largely independent of the weather; the efficiency will be the main difference.

There’s a noticeable increase in Inside 50s in “Damp” games and it is a statistically significant difference. Speaking of efficiency, let’s look at how “Inside 50 Efficiency” is affected. I define this as:

$\text{I50 Efficiency}=\frac{\text{Inside 50s} - \text{Rebound 50s}}{\text{Inside 50s}}\times 100\%$

There is less efficiency in “Rain” games, as expected, but even less in “Damp” games! Perhaps this can be explained by teams not respecting slightly difficult conditions and trying to play a normal game style. While we’re on Inside 50s, Marks Inside 50 are a strong predictor of AFL success.

weather-marksi50 Indeed there are less Marks Inside 50s in weather-affected games. Not surprising at all, it’s harder to mark in the wet and harder to hit targets.

Scoring in the wet

Sticking with scoring still, goal accuracy is strongly affected by the weather, not just rain, the wind too.

weather-scoringshots weather-goalaccuracy

weather-totgoals weather-totbehinds2

There are less scoring shots, and a lower goal accuracy. Unfortunately there is no data available on goal attempts that fall short or are kicked out-on-the-full. Strangely (?), the number of behinds scored in games (including rushed behinds) are not distinguishable statistically in different conditions.

Moving the pill

Disposal efficiency is crap in the wet. It drops dramatically and is one of the key differences in wet-weather football. It’s more of a measure of performance rather than tactics. Something that is more of a measure of tactical changes is players choosing to kick or handball.

weather-efficiency weather-kph

Perhaps the two stats are related, kicking is more inefficient than handballing but there is the prevailing thought in the wet that you should boot it long rather than dish it around with the hands.

Nevertheless, in the modern game of many stoppages and flooding the contest there’s a lot of contested ball. In fact, in the wet there is much more contested ball than in the dry.

weather-CUratio weather-tackles

Interestingly, tackles per contested possession (a team measure I used called “Tackling Pressure”) is almost unchanged in the wet. I would have expected tackles to not “stick” as much but the definition of a tackle requires one to affect the efficiency of the disposal, and with many disposals inefficient by default in the wet there may naturally be more tackles recorded.

Picking up the soap

There are a lot of inefficient disposals in the wet, a lot of dropped marks and a lot of stoppages. Picking the ball up and having clean skills is going to be a boon in the wet. Without having numbers on things like “loose ball gets” (I know they’re recorded, just not publicly available!), I have to rely on looking at other stats to infer these things.

weather-clangers weather-intercepts

A clanger accounts for many different errors including unforced dropped marks, turnovers, free kicks conceded, etc. Also note that intercepts are the consequence of a turnovers. More evidence that wet-weather footy is a scrappy affair.

It’s the little things that count, or not?

Spoils, tap-ons, shepherds, smothers all come under the “One Percenter” stat. On average there are about 25 more per game in the wet. More one percenters fits the narrative of less clean possession. Unfortunately, the correlation of One Percenters with the outcome of winning a game of footy is very poor.

What about the wind?

In all of the plots above I have plotted distributions for “windy” games as well. I take these with a grain of salt, really. Most of the “windy” distributions are bimodal and probably could be further split into just wind-affected games and rain-and-wind-affected games, but then the sample size would be irrelevant. I would wager most games that are wind-affected but aren’t really obvious are just blown over (yes, I did) in the match reports, so I don’t have a record of it.

What about the heat?

There’s just not enough data to make any meaningful observations.

How do you win wet weather football?

Well I haven’t answered that, and I don’t think anyone can with the available stats. What I can say is that almost all facets of the game are affected. The reduced ability to execute skills properly is a clear result of wet conditions. Being more efficient correlates strongly with winning a game in the AFL so having those skills to handle the wet ball and dispose of it smartly is surely going to be effective. But that’s no revelation.

Scoring accuracy is strongly affected too, the obvious recommendations are kicking straighter and setting up easier shots at goal (introducing more possibilities of turnovers). The publicly available stats are just not good enough to evaluate things like this.

What would be real interesting is to look at stats like loose ball gets. Being able to capitalise on the natural inefficiency of disposals in the wet should be a good predictor of the desired outcome.

I would also think that player positioning would play a key role. Having players in the right zones; close to both pick up loose balls out of a contest and ~60 metres back to intercept long bombs forward seems like the way to go. With lower disposal efficiency it should be less about covering a player and more about covering a probable landing zone.

Aside from analysing player GPS data (which I don’t have and am not good enough to do anyway) a easier measure may be total distance run by a team. I don’t have this data either.

The first few plots are the most interesting to me. In “Damp” games (including things such as dew-affected games, wet ground, etc.) there are counter-intuitively more Inside 50s, and these are less efficient, than both “Dry” and “Rain” games. Do teams neglect to switch into “wet-weather mode” when they should?

I intended to use the weather data in my models to better predict things such as upcoming game totals and margins, and I shall, but with a bit of uncertainty regarding how many of the actually weather-affected games I’ve recorded.

Round 16 Review

Pretty close to a very unique 9/9 but the Giants didn’t quite get up in the end.

I barely lost out on the bits and I seem to be getting some better looking probabilities now (close to other models). It’s going to be difficult to catch up.

r16tables

Some devastating losses for Sydney, GWS, Essendon, Adelaide this week to really impact their finals chances. Sydney still odds on to make it, but they drop back to the pack of 6 fighting for 5 spots in the 8. GWS a bit of a smoky but injuries (and a pretty key suspension) will probably see them fade in the next couple once their expected “form” catches up.

Out of 10,000 simulations, Richmond made the Top 8 in every single one. It’d take something seriously dramatic (probably involving multiple key player outs) to change that significantly.

Hopefully by next week I’ll have a game-total model so I can simulate actual results (and thus, percentages) because it’s getting pretty bloody tight!

-Adam

Round 15 Review

A horror round for almost everyone it seems.

r15results

A couple of things I’m thankful for:

my model’s chronic underestimation of the margin came good this week!
Breust the late out for Hawthorn pushed the game in GWS’s favour, yielding me the right tip. (go player models!)

r15tables

Meanwhile, in the actual footy, things are getting tight around 8th-10th!

You’d have to say percentage is going to play a big part (unfortunately I can’t really simulate that yet)

-Adam

Round 14 Review

Not a brilliant week with the Melbourne tip but I was happy with it at the time.

r14results

The “exotic” tip brings me back to the pack a bit but it’s still a very good pack!

r14tables

Working on a few more things, including a simulation of the rest of the season.

2018-AfterR14

A lot of changes since last week, West Coast lost out a lot. These simulations are based off no changes to the latest team list, so with JJK/Darling back that’d probably change. The order is based on the mean ladder position from 10,000 season simulations. My percentages are a bit more definitive than some other (better) models’ ladder simulations but isn’t too far off!

I’m also trialing a new measure of the “best team”; I simulate a home-and-away round-robin fixture, so each team plays each other twice at each of their home grounds. I’m still working on a nicer way to present this, but for the moment I’m using the same ladder presentation.

2018-AfterR14-RR

I expected this to be a bit more definitive but it’s much more spread than I thought! Note that again, the percentage is chance of getting in the (hypothetical) top 8. There’s a tremendous divide from 14th and under and it’s VERY tight at the top. It’s a shame the draw isn’t even!

-Adam