Environmental factors affecting AFL outcomes – the weather, part 2

Today I continue my focus on the weather. In particular, I will look at some key statistics that differentiate dry-weather football from wet-weather football. Unfortunately, for the most part, the results are completely self-evident. However, there is a nugget or two in there that I think are interesting.

In the first part, I discussed how the prevailing weather and conditions of the game affect the outcome, as part of the larger overview of how all environmental factors affect all aspects of the game. There are a few obvious independent variables within the weather space that could affect the game; precipitation, wind and heat.

Precipitation anecdotally affects the game in a number of ways. Rain falling during the game keeps the ball and the ground wet, impacting on the efficiency of skills and even the choice of skills (“wet weather football”). The lasting effect of rain, perhaps after it stops, and other effects such as dew also causes these impacts perhaps to a lesser extent. Wet weather games are anecdotally characterised by “scrappy football”; less handballing, more kicking, and low scores.

Wind comes in a couple of flavours. In all cases the main effect is expected to be on long kicking and consequently goal kicking accuracy. Prevailing winds down the length of the ground provide a bias towards scoring at one end of the ground (i.e. a “five goal breeze”). A prevailing cross wind is a bit more of an unknown. Swirling winds can result from changeable conditions or heavy weather conditions, but also from the geometry of the venue; large grandstands particularly at the goal ends can produce some erratic conditions. It’s possible that this can be somewhat predictable based on knowledge of the venue.

Footy is a winter game but heat sometimes plays a part, especially near the front and back ends of the season. I don’t expect heat to be a huge factor, perhaps it affects player fatigue and creates a more open, high-scoring game.

Evaluating the conditions in past games

It is a relatively straightforward procedure to watch a game of football (or merely some highlights) and, with some knowledge of the game, evaluate the effect of certain environmental conditions on the outcome. An avid football watcher could easily do this on a week-by-week basis and keep a database. However, lacking an ongoing database, it would be very time consuming to individually to do this for a past season’s games, let alone multiple seasons. How can one efficiently and accurately record the conditions at past matches?

In the previous piece I scraped daily rainfall data from the Bureau of Meteorology at the closest weather station to each AFL ground and attached the data to each game. Then I examined the distribution of total points scored for a few different rainfall ranges. The hope was that games with more rainfall would be lower scoring.

RainfallVsTotalPoints

Unfortunately this was quite unsuccessful. It did, however, elucidate that what really matters is the conditions at the ground at the time of the game. The “daily rainfall” numbers reset at 9am, there’s a good chance that game-day rain could fall well before or after the game has been played and not affect conditions at all.

I then moved on to looking at published match reports for past games. The main idea is that if the conditions affected the game significantly, it would be discussed in the match report.

Methodology for match report scraping

I chose to use the match reports published on http://www.afl.com.au for the simple reason that the URL is formulaic and therefore easy to scrape large quantities of data. In example, http://www.afl.com.au/match-centre/2018/17/adel-v-geel is from 2018, Round 17, is an Adelaide home game against Geelong. For all games from 2014 onwards, I scraped the match report text into a database for ease of handling.

I then used Microsoft Excel to flag match reports that contain certain weather-related keywords. The keywords I chose (i.e. rain, slippery, windy, storm, etc.) were borne through a brainstorm and through reading samples of match reports. This allowed me to pass by a vast majority of match reports where weather wasn’t (seemingly) a factor. I also, as a matter of curiosity, flagged some reports where the total points was particularly low.

For the flagged match reports, I pasted the report text into Notepad++ and defined a custom syntax to highlight the list of keywords. This allowed me to efficiently and selectively read match reports to summarise the conditions described.

False flags

If journalists could stop using the following cliches, that would be marvellous:

  • <TEAM> stormed into contention…
  • <TEAM> stormed home…
  • It was raining goals…
  • <TEAM> came home with a wet sail…
  • <PLAYER> put the heat on…
  • etc.

How do you represent the conditions quantitatively?

Now, we have a good summary of the weather for weather-affected games. How do we quantify this in a meaningful way so that it may be used in a model? As mentioned in the first piece, one could be as specific as they like in describing the conditions of a game that’s already happened. This would give a very good measure of the effect on past games. However, my interest (at the moment, at least) is modelling games that haven’t happened yet. Having sophisticated measures for conditions is useless if you can’t predict the conditions with the same accuracy you can measure it with. After looking at the summaries of conditions I recorded, I decided to record weather with four binary (yes or no) variables:

  • If there was mention of wind (or inferred through description of “sideways rain”, etc.) the game would be classified as “windy”.
  • If there was mention of heat it would be a “hot” game.
  • If conditions were slippery (wet ground, actual rain, dew, humidity, etc.) it would be “damp”.
  • If rain fell for a significant portion of the match if would be “rainy” (and of course, also “damp”)

These variables should be very easy to measure in the future, and also relatively predictable from looking at weather forecasts.

Some initial results

The first thing to do is to see if the data passes a sniff test. When looking at the rainfall data I looked at total points scored as a measure. Generally reports mention the rain/damp conditions moreso than wind or heat, so let’s start with this.

drydamprain-totalpoints.png

This was a relief, the time spent was worth it! The samples are statistically different (t-test: p(Dry~Damp)<10^{-7}, p(Dry~Rain)<10^{-16}, p(Damp~Rain)\approx 0.0015) and are logical in that a dry game is expected to be higher scoring than a damp game, and damp game higher scoring than a rainy game.

For what it’s worth, the “Rain” mode (peak of the curve) is approximately 132 points, “Damp” is 145. The median total score is probably a better measure though:

  • Dry: 178 points
  • Damp: 151.5 points
  • Rain: 136 points

Some problems

While these results look good, they must be scrutinised. The AFL Data Twitterati suggested a number of things to look into when I tweeted the above plot.

  1. What if the conditions just aren’t mentioned in the match report?
  2. Are certain match report authors more likely to mention the conditions?
  3. Is there an agenda in the reporting that might affect exaggerating/understating of the conditions?

These are excellent points. The first was also a prime concern of mine when doing this. To alleviate this going forward, bow that I have a database of past games, for subsequent games I plan to record conditions week-by-week based on my own observations.

Over round 17, when watching games/highlights I kept some notes about the conditions. I noted two games where the conditions were present. Fremantle v Port was affected by rain (and atrocious skills, mind :/) and this was noted in the match report. Hawthorn v Brisbane in Tasmania was beset by dew (as mentioned by the commentators regularly) and conditions were slippery. There was no mention of the slick conditions in the match report. Arguably the conditions were on the minor side and scoring wasn’t hugely affected, but nevertheless I would want this to be recorded for my database.

It is conceivable that my match-report parsing process is mainly flagging games where the adverse conditions had a noticeable effect, or the reporter mentioned it in passing (“skills were good despite the tricky conditions” and the like appeared sometimes). The consequence is that the distributions plotted above are most likely biased. This is not good for predictive purposes; I can predict whether a game will be damp but not whether the teams will perform/score well despite the conditions.

What I can say for sure is the games marked as wet, damp or windy are affected. So let’s see what sets these games apart. Today I’m just going to look at the distributions of certain key statistics that are considerably affected by the weather. Most of it is really self-evident, but it’s always good to have some quantitative confirmation of well-known theories.

Wet Weather Football

What changes? Everything! Well, almost. Let’s start with something that shouldn’t be affected too much as a bit of a control measure. An inside 50 is the movement of the ball (by carrying or disposal) into the forward 50 from outside the 50. I would argue the numbers should be largely independent of the weather; the efficiency will be the main difference.

weather-inside50s.png

There’s a noticeable increase in Inside 50s in “Damp” games and it is a statistically significant difference. Speaking of efficiency, let’s look at how “Inside 50 Efficiency” is affected. I define this as:

\text{I50 Efficiency}=\frac{\text{Inside 50s} - \text{Rebound 50s}}{\text{Inside 50s}}\times 100\%

weather-I50efficiency.pngThere is less efficiency in “Rain” games, as expected, but even less in “Damp” games! Perhaps this can be explained by teams not respecting slightly difficult conditions and trying to play a normal game style. While we’re on Inside 50s, Marks Inside 50 are a strong predictor of AFL success.

weather-marksi50Indeed there are less Marks Inside 50s in weather-affected games. Not surprising at all, it’s harder to mark in the wet and harder to hit targets.

Scoring in the wet

Sticking with scoring still, goal accuracy is strongly affected by the weather, not just rain, the wind too.

weather-scoringshotsweather-goalaccuracy

weather-totgoalsweather-totbehinds2

There are less scoring shots, and a lower goal accuracy. Unfortunately there is no data available on goal attempts that fall short or are kicked out-on-the-full. Strangely (?), the number of behinds scored in games (including rushed behinds) are not distinguishable statistically in different conditions.

Moving the pill

Disposal efficiency is crap in the wet. It drops dramatically and is one of the key differences in wet-weather football. It’s more of a measure of performance rather than tactics. Something that is more of a measure of tactical changes is players choosing to kick or handball.

weather-efficiencyweather-kph

Perhaps the two stats are related, kicking is more inefficient than handballing but there is the prevailing thought in the wet that you should boot it long rather than dish it around with the hands.

Nevertheless, in the modern game of many stoppages and flooding the contest there’s a lot of contested ball. In fact, in the wet there is much more contested ball than in the dry.

weather-CUratioweather-tackles

Interestingly, tackles per contested possession (a team measure I used called “Tackling Pressure”) is almost unchanged in the wet. I would have expected tackles to not “stick” as much but the definition of a tackle requires one to affect the efficiency of the disposal, and with many disposals inefficient by default in the wet there may naturally be more tackles recorded.

Picking up the soap

There are a lot of inefficient disposals in the wet, a lot of dropped marks and a lot of stoppages. Picking the ball up and having clean skills is going to be a boon in the wet. Without having numbers on things like “loose ball gets” (I know they’re recorded, just not publicly available!), I have to rely on looking at other stats to infer these things.

weather-clangersweather-intercepts

A clanger accounts for many different errors including unforced dropped marks, turnovers, free kicks conceded, etc. Also note that intercepts are the consequence of a turnovers. More evidence that wet-weather footy is a scrappy affair.

It’s the little things that count, or not?

weather-onepercenters.png

Spoils, tap-ons, shepherds, smothers all come under the “One Percenter” stat. On average there are about 25 more per game in the wet. More one percenters fits the narrative of less clean possession. Unfortunately,  the correlation of One Percenters with the outcome of winning a game of footy is very poor.

What about the wind?

In all of the plots above I have plotted distributions for “windy” games as well. I take these with a grain of salt, really. Most of the “windy” distributions are bimodal and probably could be further split into just wind-affected games and rain-and-wind-affected games, but then the sample size would be irrelevant. I would wager most games that are wind-affected but aren’t really obvious are just blown over (yes, I did) in the match reports, so I don’t have a record of it.

What about the heat?

There’s just not enough data to make any meaningful observations.

How do you win wet weather football?

Well I haven’t answered that, and I don’t think anyone can with the available stats. What I can say is that almost all facets of the game are affected. The reduced ability to execute skills properly is a clear result of wet conditions. Being more efficient correlates strongly with winning a game in the AFL so having those skills to handle the wet ball and dispose of it smartly is surely going to be effective. But that’s no revelation.

Scoring accuracy is strongly affected too, the obvious recommendations are kicking straighter and setting up easier shots at goal (introducing more possibilities of turnovers). The publicly available stats are just not good enough to evaluate things like this.

What would be real interesting is to look at stats like loose ball gets. Being able to capitalise on the natural inefficiency of disposals in the wet should be a good predictor of the desired outcome.

I would also think that player positioning would play a key role. Having players in the right zones; close to both pick up loose balls out of a contest and ~60 metres back to intercept long bombs forward seems like the way to go. With lower disposal efficiency it should be less about covering a player and more about covering a probable landing zone.

Aside from analysing player GPS data (which I don’t have and am not good enough to do anyway) a easier measure may be total distance run by a team. I don’t have this data either.

The first few plots are the most interesting to me. In “Damp” games (including things such as dew-affected games, wet ground, etc.) there are counter-intuitively more Inside 50s, and these are less efficient, than both “Dry” and “Rain” games. Do teams neglect to switch into “wet-weather mode” when they should?

I intended to use the weather data in my models to better predict things such as upcoming game totals and margins, and I shall, but with a bit of uncertainty regarding how many of the actually weather-affected games I’ve recorded.

Advertisement