Win Probability vs. Margin

Tony from Matter of Stats recently posted a great visualisation with a comparison of the Squiggle models’ predictions for the first six rounds of 2019. The visualisation shows how each model related the projected margin to the percentage change of winning the game.

Source: Twitter @MatterOfStats

Everything looks fairly nice and smooth apart from the AFL Lab, which looks very noisy! Is this ideal, problematic, or fine? Either way, it’s not necessarily a conclusion you can arrive at just from this image. What is definitely clear is the AFL Lab model does things a little differently. I thought it would be a nice idea to clarify what is causing this “abnormal” behaviour and that it is somewhat intentional.

The SOLDIER model is fitted by correlating player performances, team performances, and game conditions (weather and venue adjustments) from past AFL games to the resulting margin of that game – the output. Like all models, is it not deterministic; that is the chosen inputs to the model are not all of the variables that contribute to the outcome of a game. There is always some uncertainty based on what the model doesn’t know. In total, there are fourteen input variables that go in to the model to produce the single output of the game margin.

The structure of the SOLDIER model

If, for a game previously unseen by the model, all of the player and team performances are known, the predicted model by the margin rarely differs from the actual margin by more than a goal. This method of cross-validation is a demonstration that the model is not over-fitted and is suitable for fresh observances.

When predicting the outcome of a future game one does not know what the inputs to the model will be! So, there needs to be a way to predict what the team and player performances will be in that game in order to determine the inputs; in order to determine the output. In examples, will Lance Franklin kick 0.4 or 6.1? Will Scott Pendlebury have 30 disposals at 90% efficiency, or 30 disposals at 60% efficiency? Naturally it’s impossible to predict this directly, but it is possible to forecast a distribution of expected performances, based on the players’ and teams’ past performances. 

A player can perform better in some areas, and worse in others.

The SOLDIER model assumes all player and team inputs to a game to be predicted are normally distributed. The distribution measures (mean, variance) for each player and team input are calculated from their past performances. Because of this, as the inputs have a distribution of possible values, the output (predicted margin) also has a distribution – but due to the nonlinearity of the model the distribution of predicted margins is not necessarily normal (it may be skewed, bimodal, etc.)

As a brief aside, why is the model nonlinear and what does that mean? Consider a hypothetical case study game where Team A beats Team B by 20 points. If the same game was played again, and Team A performed 10% better (in some sort of overall measure), would the margin be exactly 10% higher, for 22 points? Maybe that performance increase resulted in only a single behind (for a margin of 21 points), or two goals (32 points). Even with a single performance measure, there is not necessarily a linear relation between the inputs (team and player performance) and outputs (margin). The SOLDIER model takes fourteen different performance measure inputs, and relates them to a single output. The combinations of categories over-, under- and par-performing is exhaustive and any such combination could lead to a different, or no change in the outcome.

Anyway, back to the main narrative of the post. Now that there are distributions of player and team performance, and a model relating performance to margin, how do the predictions come about? For a particular future game, it is simple to randomly sample a performance for each team and player and put this into the model to predict a margin. One such realisation is just that, one possibility of what could happen. To gauge an broader overview of the possibile outcomes, a large number (say 50,000) realisations can be rapidly calculated to get a distribution of margins. This is called a Monte-Carlo simulation.

From this distribution of margins, a number of predictions can be pulled out. Firstly, the median of this distribution is chosen as the predicted margin for the game. The proportion of realisations where the home team wins represents the home win probability. Even though the margin distribution may not be a normal distribution, the standard deviation of the distribution can be calculated and represents the margin standard deviation. These three predictions are sufficient to adequately describe what the model believes could occur – and are the predictions that advanced tipping competitions like the Monash competitions take.

The margin standard deviation calculated by the model is the key driver behind AFL Lab’s anomalous probability-margin “curve” demonstrated by MatterOfStats. The standard deviations the model produces are generally between 18 and 50 points, and mostly on the lower end. This far lower than many other models (usually between 30 and 40). I would argue that it makes sense that the standard deviation SHOULD vary between games – an expected blowout could be a modest or huge victory (large standard deviation), but a closely-matched game in wet conditions suggests a smaller total score and a lower standard deviation is appropriate. Due to this variable standard deviation, two games with the same predicted margin can have a vastly different home win probability.

For a game with a predicted margin of 20, the standard deviation changes the probability significantly.

It is this reason why my probability-margin “curve” is not a curve at all – the probability is a function of both the predicted margin and the margin standard deviation.

I do have some reservations on the low standard deviations produced by the model – the random sampling methodology currently used is flawed and still very much under construction. Hopefully by the end of the season I will have a large enough sample to work on improvements from.

Until next time, which hopefully won’t be as long.

-A

Leave a comment