Statistical analyses in football: an interview with the Professor « Untold Arsenal: Arsenal News. Supporting the Lord Wenger; coach of the decade

By Phil Gregory

I was recently fortunate enough to be able to do a short interview with Bill Gerrard, Professor of Sports Management and Finance at Leeds University. Professor Gerrard is heavily involved in the game, having worked with top sides in fields as diverse as squad valuations and performance analysis, while advising the board of an MLS franchise too.

I’ve always been interested in the numbers behind the game, and they consistently surprise and impress me. Whether it’s the 100% pass completion of one Denilson Pereira Neves playing in a side with ten men or the ProZone numbers I saw that showed “lazy” Dimitar Berbatov covered more ground  Carlos Tevez, the numbers allow us to see what we cannot see through TV.   They also allow us to look past our preconceived notions of what is happening (simple prejudices, as every football fan will be all too familiar with in regards to particular players).

Statistical performance analysis is a relatively new concept in football, a sport that has traditionally been fairly slow to take up new ideas. Football is also a fairly tricky sport for performance analysts to get stuck into, typically 12 hours are needed per 90 minutes of footage to dissect and record every action a player makes. With 380 games taking place in the Premier League alone this season, the difficulties are obvious, but even so the availability of the data is far greater than the ability of teams to analyse it. While it’s not a holy grail, and won’t pick the “best” team for the manager it certainly offers insights that will allow them a competitive advantage over closed-minded competitors.

With the England squad only recently announced at the time of the interview, we first discussed the team itself and the difficulties a team game offers someone who sought to rate players by stats.

A sport like baseball is much easier to analyse, a hitter’s performance does not depend on the performance of his team-mates it’s just him versus the pitcher and fielders. In football there is significant inter-dependence of players meaning that raw data often needs to be subjectively considered in the context in which the actions were done.

Darren Bent hit 24 league goals compared to Defoe’s 18, but how do we account for the fact that Bent was playing for a weaker side.  How do we know who had the better service? Was either man the sole target through which all attacks went? It is these questions that limit the role that data has to play in team sports such as football.

Professor Gerrard also made the point that comparing who is the better of two players is often of limited relevance when considering the team game of football. I personally would argue that Bent is a better individual player than Heskey, but the latter is in the World Cup squad for what he offers the team (Wayne Rooney’s goal record with and without Heskey pretty much guaranteed the Villa man’s selection).

So how do performance analysts go about rating a player or team, I asked.  Professor Gerrard stated that that is wholly dependent on the structure of the team in question, and how they play the game.

Completed passes, for instance would usually be a good indicator: if a team is making and completing more passes, they are using the ball well which is bound to correlate to victory to some degree. However a side such as Bolton under Allardyce would rate a pass as successful if it was (for example) a defender’s forward pass taken long and then headed out for a throw-in, whereas the statistics would mark it down as an incomplete pass. Allardyce would disagree and argue that the pass was successful, as they have possession of the second ball, as well as significant territorial gain.

If it were Arsenal that were being rated (and indeed, this is the case for most of the rest of the league) pass completion percentages and the number of completed passes would be a fairly good guide for team performance. Arsène Wenger places significant emphasis on the tempo of the game (the time spent on the ball before it is passed on); many other teams don’t  (“we lacked sharpness in the final third”, that ever-familiar line!). Different statistics offer an insight into different teams’ performance.

I then asked Professor Gerrard whether everything in the game is really measurable. I’m sure we’ve all spoken to a Liverpool fan who has claimed that other players give an extra 10% when they’re playing next to Steven Gerrard (perhaps not quite so recently…) but how would a performance analyst measure this? Is it just a myth or is there tangible evidence in the numbers?

While we didn’t have data to hand to confirm or deny this particular theory, Professor Gerrard pointed out that such things are ultimately measurable, as they are likely to affect measurable statistics. Do the players cover more ground?  Do they tackle more? Do they spend less time on the ground “injured”?

He did offer a word of warning however.  Such effects are not always positive, and indeed might be counterproductive despite our subjective viewing of the game making us believe that they were otherwise. The player in question may dominate and intimidate the side (think Henry in his later years with us).

Do players pass the ball to that specific player more than another alternative forward, even when it is not the best option (as Anelka once famously alleged)? Do players spend less time on the ball and complete a lower percentage of their passes, i.e. are they rushing? Or do they cover less ground, do they tackle less knowing a star player will likely win them the match? Hard data can allow a manager to have complete confidence that a player is under-performing and allow him to challenge the player on what may even be a subconscious, (“he’s here, so I don’t need to try so hard”) approach. The most simple example, easily measurable by the fan at home is win percentages: are they higher or lower when the player in question is fit and playing?

The use of data and statistical analysis is risky for managers. Football is an inherently conservative industry and there exists a certain distrust of numbers. If a manager were to use data and statistics significantly and then fail to produce results they’d likely find themselves in a weaker position than a traditional, close-minded manager.

Professor Gerrard referred me to the example of Chelsea a few years back, when the midfield under Mourinho was the highly effective Makelele-Lampard-Essien axis. By looking at the passing statistics, teams saw the importance of Makelele to making that unit work, so aimed to limit the Frenchman’s influence on the game. Naturally, focussing heavily on one of the players ensured the others had significantly greater space and time on the ball, leading to an improvement in the performance of the other two which ultimately compensated for the reduced impact of Makelele. Such is the limitation of performance data in football: it can only be used to better inform tactical systems and strategies.

Professor Gerrard also spoke of certain statistics being more valuable to an onlooker than others. Any defensive statistics, whether a successful tackle or interception, decisively deny the opposition an opportunity to score, forcing them to first regain the ball. Possession statistics don’t have the same effect in ascertaining a team’s chance of winning.

Pass completion better correlates to not losing the game, which seems logical as a team that has the ball is effectively denying their opponent any opportunity to score until possession changes hands. This possession alone however does not necessarily translate into goalscoring chances, and therefore high possession doesn’t consistently correlate with high win percentages.

If one looks at the passing statistics for the entire Premier League, there is a rough correlation with final league position and passing ability. However when  the top four are removed, positions five to 20 show very little correlation: ideas of efficiency with the ball (as well as the previously mentioned conundrum of what is a successful pass) come into play. To properly assess a teams chance of winning with offensive statistics, more refined measures are needed, whether shots on target or chances created.

We then moved on to what I took to be a positional bias in certain performance analysis models. A quick glance at the top 20 in the Castrol Ranking reveals fourteen forwards (which comprise the entire top ten) and only three defenders, two midfielders and a single goalkeeper. Such a positional bias was also noticeable in the Fink Tank model, though less pronounced.

Professor Gerrard stated that his approach was to view a football team as a chain when it came to scoring goals. A team will score more goals if they have defenders who are good on the ball or a goalkeeper with excellent distribution, but ultimately what is most important is the final link in that chain, the forward. Hence any model exhibits a certain degree of bias towards the player who actually puts the ball in the net.

We moved on to discussing performance analysis and how they related to the transfer market. If we consider how players are valued, there is certainly a bias towards players how score goals. All things being equal, a forward will cost more than a attacking midfielder, who will cost more than a defensive midfielder, who will cost more than a defender who in turn is slightly more expensive than a goalkeeper. I asked Professor Gerrard whether the bias in for example the Castrol Rankings toward  “goal-getting” players as well as the higher transfer fees for such players acts as the market proving that these goal-getters are of higher value than their team-mates.

He disagreed, arguing that in fact they may be overvalued by the transfer market. He argued this by pointing out that if we consider shots on target, a striker’s shot conversion rate (shots on target : goals scored as a percentage) plus the opposition goalkeepers save percentage (number of saves made : shots on target as a percentage) will equal 100% (ignoring the odd block by a defender or beach ball intervention).

Hence, going back to the theory of that those at the end of the “chain” are the most important, we must consider there are two “chains” per team: an offensive chain (goalkeeper to defence to midfield to attack) and a defensive chain (striker to midfield to defence to goalkeeper).

So, if we are to argue that strikers should be highly valued for their ability to score goals, goalkeepers and defenders should be equally highly valued due to their ability to prevent a goal. After all, a goalkeeper saving a shot that would otherwise have gone in has, on the scoreboard, made the same contribution as his team-mate has who scored a goal.

Through this, Professor  Gerrard argues there may exist what economists would call an inefficiency in the market; something is undervalued compared to what it gets you for cost. Hence we can see why many teams that hit above their weight (the Blackburns, Stokes and Boltons of the past few years) are generally defensive sides: every pound spent on wages/transfer fees for defensive players will generally lead to more points gained than an equivalent spend on offensive players.

—————————-

Before you comment please read this

Next season’s squad: we’ve signed four, the fifth is identified, and the sixth would be another steal from Barca if it comes off.

The Untold Index

The Index of Very Ancient but still Quite Interesting Things

What to read in between World Cup matches

Similar Posts