Analytics first, sport second

This blog was written to fulfill a promise made to Ravi (@Scribblr_42) in a tweet back in February to explain why I strongly disagreed with a statement by Dean Oliver (@DeanO_Lytics) at the Opta Forum ( last February, and that Ravi ‘liked’.

During his presentation Dean Oliver displayed the following slide where the first point (as one can see from the pic below): “Know the sport first, analytics second

What Dean meant by this statement – as he later explained – was that a deep knowledge of a sport (one that is normally acquired by working within a club as a Performance Analyst) is more important that a knowledge of analytics.  I strongly objected to this statement and later posted a tweet of my disapproval.

Of course, anyone involved professionally in Performance Analysis of any sport, has to ‘know’ the sport.  But this deep knowledge is no longer of primary importance, not if one has an analytics role in a club.  Analytics is about analysing data – these days, lots of data (big data?).  Therefore the primary knowledge required for this task is knowledge and experience of advanced analytic techniques and tools.  Without this knowledge and experience is not possible for anyone to analyse data efficiently and effectively.  Any data!  Of course, such person must also ‘know’ the sport where the data comes from.  But, thanks to years of media coverage, TV in particular, any intelligent person that follows a particular sport has gained such knowledge. Not, of course, to the level that Dan implies, but enough to do his job.

It should be clear that I am not advocating that data analyst/scientist should replace Performance Analysts (PAs.)   Only that the latter should stop pretending that in today data-rich sports environment are capable to fully exploit the large amount of data available to them.  They are not!  They are not qualified for this task, nor, I dare say, have the aptitude.  Video analysis has been for years their main tool and focus, not statistics.  Sadly, this is the main reason why analytics has failed to gain a foothold in many team sports, football in particular.

However, I am not suggesting, , that clubs should get rid of their PAs, but only that they should take a back seat when data analysis is concerned.   At a recent ISPAS conference in Carlow  –  I put forward the suggestion that clubs should employ a Performance Data Analyst (PDA),  whose sole concern be of leading the data analysis of the sport, as well as helping PAs improve their data analysis skills.  Unlike PAs, the PDA does not need a deep knowledge of the sport to do his job well.  He also does not spend time on the pitch with players, but interacts only with PAs and mother coaching staff including the head coach.  Therefore does not need the communication skills of a PA; contrary to another important point that Dean makes in the same slide.

Sadly, I don’t think that my suggestion (which was greeted with contempt by the like of prof. Hughes at the Carlow conference) will be taken up by sport clubs any time soon.  Aside from the hostility of PAs to any challenge to their role of  ‘analysts’, there aren’t many data analysts/scientist to go round.  And even the few that are passionate for a sport are unlikely to accept the miserly salary they are likely to be offered by clubs when many business companies are prepared to pay them lots more.    This point is eloquently made by Ben Alamar in an article last year, which I reprinted in one of my tweets.


(Note: with the acronym PAs I am also referring in general to any member of the coaching staff who is involved in data analysis)

Posted in Soccer analytics, Sports Analytics, Uncategorized | Tagged , , | Leave a comment

Finding changes in tactics and their impact on a match – statistical and graphical analysis

This blog is a longer (and revised) version of my poster at the OptaPro Sports Analytic forum 2015, held in London on 5th February 2015.

When a team is goal down or up, a manager may change tactics (formation) in order to protect the advantage or chase the match. This is more likely to happen at the beginning of the second half, or in the last quarter of a match. Normally, in the first case, there is a change of tactics when a team is losing. In the last quarter of a match substitutes are introduced to either to hold onto winning score, or to chase a losing game. Also, at this time, if his team is winning, a manager may decide to settle for a draw, and change tactics accordingly.

A change of tactics is normally highlighted by match commentators and pundits if the match is broadcast live, or during a post-match video analysis. In contrast to this traditional method of analysis, this poster aims to discover any change of tactics by a team solely by analysing match event data, as provided by Opta.

We are not aware of previous attempt to this kind of analysis by using solely match data. The original intention was in fact to use match data, as characterized by Opta f24 feed, and player tracking data from TRACAB. Unfortunately attempts to use TRACAB data were not successful, and after much trying we decided to postpone such analysis to a later date.

Data and methods
The match analysed is the Newcastle-Hull played in Newcastle in the 2013-14 season. The match data was provided by Opta in the form of f24 match file. The analysis was carried out using statistical and graphical methods. The statistical analysis in particular relies heavily on the technique of classification and regression trees.  Both software used are designed for interactive analysis, and therefore particularly suited for exploratory analysis. MS Excel was also used for data preparation and parsing, as well as to create some graphs.

The discovery of tactical changes boils down to finding significant changes in performance by the two teams in the various contexts that characterize a game, for example between he 1st and 2nd half, or before/after a goal taken/scored.  This and other contexts, such as   before/after substitutions, and in between one time interval and the next are analysed, with ten interval/time segments used for the latter.

The changes that we looked for were:
1. Change in players’ position following a goal (for/against)
2. Change of role/position of substitutes compared with starting players
3. Change in activity (ball touches) by the teams during the course of the match
4. Change in activity in the final 3rd

To identify changes, the following variables were added to the Opta data:
1. Goal_T = G_0-0, G_0_1, etc., to identify time segments when the score was 0-0, 1-0, etc.
2. Final_3rd= 1 Final_3rd ball touches (0=all other ball touches)
3. Xo, Yo = (0,100) coordinates of pitch position of ball touches

Ideally, such analysis should look  at the performance of both teams. But because of time (and space) we focus mainly on Newcastle’s performance.

Analysis results

Newcastle-Hull 2-3 (2-1)
Goals: Remy(N) 9’, Brady(H) 25’, Remy(N) 43’, Elmohamady(H) 47’, Aluko(H) 75’

Opta Poster Fig0Fig. 0 the above chart is an attempt to plot a summary of the match. It shows ball touches by the teams in ten time intervals, and goal time.

Fig.1 Possession (ball touches) comparison by Team and goal-time intervals
Opta Poster Fig1Fig. 1 We compare Newcastle and Hull ball touches, and we find we find that their average X_o position (length of the pitch) is significantly different (SD). Newcastle has significantly more possession the attacking half. This advantage increase after Hull draws 1-1 (G_1-1) but falls as soon as Newcastle goes ahead (G_2-1)

Fig.2 Possession (ball touches) comparison by Role and goal-time intervals
Opta Poster Fig2Fig. 2 show the increased activity of Newcastle midfield after the draw by Hull (2-2), and its steep decline by the Fwd after going ahead (2-1).

Fig. 3 Average X position comparison
Opta Poster Fig3Fig.3 Here we analyse the average X position of both teams, and we find that Newcastle’s one was closer to the halfway line than Hull (SD), sign of a more attacking stance. Revealing of tactics is the position of Newcastle subs: they line up with the Full Backs (Wing backs, really), a defensive position they kept until Hull went ahead (2-3).

Fig. 4 Ball touches in the Final 3rd
Opta Poster Fig4Fig .4 Here we compare ball touches in the Final_3rd. Overall Newcastle was significantly (SD) more active in the final 3rd, but the Subs were below the team average. Goals for/against did not result in any significant change of activity in the final 3rd. However the Subs were active mainly in the centre, in contrast to the centre-right of the forwards they replaced.

Fig. 5 Subs vs. replaced players – comparing ball touches position
Opta Poster Fig5Fig. 5 should be seen in conjunction with Fig. 6, and shows the position of ball touches by subs and the players they replaced.

Fig. 6 X position – subs vs. starting players
Opta Poster Fig6Fig. 6 Here we compare Newcastle subs average position (X_o) with that of the players they replaced, and find that they took a more defensive position (SD).

Fig. 7 Significant changes in position by some Hull players
Opta Poster Fig7Fig.7 The position of ball touches of these two Hull players in the 1st and 2nd half looks significantly different, and strongly suggest a change of tactics in the 2nd half. The stats analysis in Fig. 8 confirms this visual intuition.

Fig. 8 Significant changes in position by some Hull players – stats
Opta Poster Fig8Fig. 8 this graph confirms statistically the intuition from Fig. 7. Both Hull goal scorers: Elmohmady (2-2, 47’), and Aluko (2-3, 75’, changed their positions significantly in the 2nd half). Aluko moved from left to right (Y), and Elmohamady forward (X).

Fig 9. Final 3rd ball touches by goal-time interval
Opta Poster Fig9Opta Poster Fig9_1
Fig. 9 shows the position of ball touches in the Final 3rd by the two teams taken during each score interval. A chart comparing the stats count is shown above

Fig. 10 Final 3rd ball touches by Half
Opta Poster Fig10Fig. 10 Final 3rd ball touches – the black horizontal line show the average position of ball touches in each half. Hull’ average changes from left to right (significantly, as shown in the following graph), while Newcastle’s stays roughly the same.

Fig. 11 Final 3rd ball touches – Vertical positional shift by Half
Opta Poster Fig11Fig. 11 Final 3rd average Y position – The graph shows that Hull changed its attacking direction from the left to the right in the 2nd half (period_id) – this change was statistically significant( SD). In contrast, after an initial switch, Newcastle kept to a central position for the rest of he match.

The analysis identified some significant changes in performance during the match that suggests a change of tactics. In particular, tactical changes by Newcastle can be said to have taken place after their subs were introduced. Ahead 2-1 with 25min of the match left to play, Newcastle subs took a more defensive position then the player they replaced. In contrast, Hull started the 2nd half by increasing its attacking effort and shifting its direction from left to right; a move that quickly resulted in a goal.

Clues to the above summary conclusion were given by the following results:

 Newcastle played significantly (SD) forward than Hull and dominated possession (ball touches) throughout the match, in particular after Hull’s first goal. It was dominant in the final 3rd. Despite of this advantage , Newcastle created fewer chances than Hull, and lost the match
 The results suggest that Newcastle did not try hard enough to win the game. Subs introduce at the 65’ when the match was finely poised at 2-2 , took a more defensive stance than their predecessors, and lined up with the full backs (wing backs, really). They only took a more forward position when Hull scored the winning goal, too late to change the result.
 Judging by their substitutes performance, Hull appeared to do more to win the game. Their average ball touch position of their subs was equal to that of the Forwards in whole match.
 Graphical analysis show s what appears to be a change of tactics by Hull in the second half with Aluko (moving from left to right), and Elmohamady playing more forward. The latter position may help to explain the defensive ball touches of Newcastle subs Guffron and Marveaux in that area of the pitch. It is likely that they were kept busy stopping the threat posed by these Hull players in their left defensive side of the pitch.

(Note: As many would have realised, this analysis is not complete. There are a few other aspects of the match that could have been studied, and could have probably shed more light on the if/when/how changes of tactics in the match occurred. However, the objective of this post was mainly to demonstrate how a data-based analysis with statistical and visual methods could give a more objective view of changes of formation/tactics in a match than one obtained solely by video analysis.)

Posted in Soccer analytics, Soccer match analysis, Sports Analytics | Tagged , , , | Leave a comment

“Taca la bala” says the wizard: a trip into the World Cup 2014

Data Tales

We published this tweet three days ago, before the two semi-finals of the World Cup 2014. Our prediction was correct: against any (brazilian) forecast Brazil was humiliated by Germany, while Argentina defeated Netherlands on penalties after a not exciting match. The final, thus, will be a very classic of football: Germany vs Argentina. How did we figure out the two winning teams? It was not a stroke of luck. It was more properly a “stroke of data”.

In 1970, our parents followed the “match of the century”, Italy-Germany 4-3, on a noisy black-and-white TV, tuned on the unique public channel the Italian government provided at that time. After many technological improvements, in 2006 we switched to LCD full-color screens, and watched the famous Zidane’s  headbutt  in high…

View original post 1,167 more words

Posted in Soccer analytics | Tagged , , , , | Leave a comment

Are defences dominating this World Cup?

Attacking teams sell tickets, but defensive ones win games.  So far, the 2014 World Cup is not the exception, especially considering that after the fireworks of the group stage, knockout clashes have resulted in many draws, and a dearth of goals, with only five in the quarter-finals.

After all, the history of the World Cup teaches that the team that had the best defence (and not even the best attack) won 42% of the games, while the one with the best offense only 21% (exactly as the teams that have both the best offense that the best defense, with the rest who took home the World Cup having neither one nor the other). In theory, therefore, teams with a better defence are twice as likely to win the competition.

To remind us, we only need to analyse the quarter-finalists in Brazil. The four teams that conceded fewer shots, except France, have gone trough. Of these Les Bleus have ended their tournament as the second team with fewer shots on target conceded (SOTCON), 2.4, against Brazil’s 2 per match – an excellent performance, but not enough to beat Germany. The progress of Deschamps squad was not helped by the fact that France was the third team for SOTCON. Against France 4 shots were enough to produce a goal, only fewer were needed against Brazil (2.5) and the Netherlands (3.75). All of this is confirmed by France’s third last save percentage between the teams which reached the quarter (75%, worse only Brazil with 60%, and Holland with 73.3%); this against by its SOTCON, the second of the group for Les Bleus. In other words, statistically France has conceded few scoring opportunities. But these were good one, easy to score goals from. Proof is Germany’s goal, with Varane failing to match Hummels’ strength; evidence of how an individual weakness can ruin the work of a group.

France’s statistical blip of is most likely explained by this event. Statistically, however, the Costa Rica story is more difficult to tell. Here we have best defensive record of all the teams in the quarter-finals (only 2 goals conceded), and the highest save percentage (91.7%) – with their goalkeeper Keylor Navas going home with a 90% record. And Costa Rica was also the team that best used the offside trap: 41 times in 5 games, with two masterful peaks: 11 against Italy and 13 with Holland. Italy’s Balotelli and his substitute Immobile were judged offside 6 times each, a record in the tournament until the quarterfinals, when van Persie top them with 9. So, why Pinto’s men left us? Because logic dictates that he had milked all his team technical ability. Also, perhaps, because his team had the worse SOTCON. In other words, it is true that to score a goal against Costa Rica 12 shots were needed (basically double those of second placed Germany, with 6.3), but it is also true that the 34.29% of these shots were on target. And those were far too many for even Navas to save.

Aside these two statistical blips, the four teams left in the competition are very close to the standard of a tournament that could be decided by the best defence. The Brazil one may not look impressive, it has the lowest save percentage (60%), but, as mentioned, this is the team that has conceded fewer shots on target per game and has the best SOTCON, an impressive 16.67%. This suggests that to beat Julio Cesar (at least with Thiago Silva on the pitch) a hell of a shot is required.

Germany is third last in shots conceded per game and SOTCON, but has the highest save percentage after Costa Rica, an impressive 84.2%, which means that to score a goal against Neuer 6.3 shots on target have been necessary so far. Much the same is true for Argentina, which gives away 3.4 shots on target per game but has the third save percentage (82.4%). If anything, it is much more difficult to explain the semi-final place of Holland (at least in a defensive key). The Dutch have the second-last and the penultimate SOTCON save percentage (73.3%). So far, they have conceded a small number of shots per game (3), but one wonders if that is going to continue against Messi and company?

Posted in Soccer analytics, Sports Analytics, Uncategorized | Tagged , , , | Leave a comment

Analytic insights and dubious corners stats

Does Manchester City’s much publicised analytic insight on corners stands up to scrutiny?
There are reasons to doubt it!

Corners are one of the key moments in a match that grab fans’ attention.  There is always high expectation that a corner will result in a goal. However, the probability that a corner will lead directly (first touch) to a goal is very low [1]. Investigating the stats suggests that more goals come from the penalty area scramble that often follows corners, and any goal that is scored tends to come many touches of the ball after the original corner kick.

However, it is difficult to prove or disprove these hypotheses either way, in view of the lack of trustworthy data publicly available.  Manchester City appear to have tried (they can afford to buy or collect the data), and their claim on corners has received much publicity as a key finding of their massive analytic’s effort (11+ analytics people). Shown below are some extracts of how this claim has been reported in the media.

City had gone 22 games and not scored a goal from a corner. After this….. they scored 8 goals in 15 games.”
( – (22/11/2011)

“The data revolution keeps stumbling on new truths. At Manchester City, for instance, the analysts finally persuaded the club’s then manager, Roberto Mancini, that the most dangerous corner kick is the inswinger, the ball that swings towards goal. Mancini had long argued (strictly from intuition) that outswingers were best. Eventually he capitulated and, in the 2011-2012 season, when City won the English title, they scored 15 goals from corners, the most in the Premier League. The decisive goal, Vincent Kompany’s header against Manchester United, came from an in swinging corner.”
From review of the book …

“Wilson recalls one particular period when Manchester City hadn’t scored from corners in over 22 games, so his team decided to analyse over 400 goals that were scored from corners. They noticed that about 75 percent resulted from so-called in-swinging corners, the type where the ball curves towards the goal. “In the next 12 games of the next season we scored nine goals from corners,” Wilson says. (date)

Since the first time I came across these articles, this claim struck me as poor example of the kind of useful insight that analytics can deliver, and definitely not one that should get so much attention. I also did not understand this fascination by the Man City analytics department with goals scored from corners given their low impact on the game, a fact highlighted in blogs [1]  by Chris Anderson (@soccerquant), although the year he analysed was a poor one for corners

So, when I saw this Man City’s claim recently mentioned e in Wired (see above) I could not stop myself reflecting on on the phrase, “In the next 12 games of the next season we scored nine goals from corners,” Wilson says. Nine goals from corner in the next 12 games! That can’t be right! I am aware that a few goals come from corners, so to score nine in twelve consecutive games struck me as a rather exceptional event. I decided to investigate.

Opta stats do not specify how corners are taken, so there is no way to find from their data whether this stat was true. Anyway, I only had Opta summary data for the 2011-2012 EPL season, as provided by the now defunct MCFC project. So, using Opta data was out. However, I remembered that corner type (in-swinging or out-swinging) was specified in detailed match commentaries that I had collected from the web in the past: from 2007-08 to 2011-12, when for some reason, and to my chagrin, I could no longer find them anywhere. (I am tempted to comment on the lack of free data on football beyond the few traditional match stats – but this is probably better left to another blog).

So, I extracted the relevant data from these commentaries, and produced the following charts:

MC All cornersFig. 1 – All corners
Fig. 1 shows the number of inswinging and outswinging corners taken in the years specified. It is clearly visible that more in-swinging corners are taken: 60% more on average.

MC All goalsFig. 2 – All goals from corners
Fig. 2 Show the goals scored from each type of corners. A weighted average of comers and goals, shows that on average 50% more goals are scored from in-swinging corners than out-swinging ones . I should add that I was rather puzzled by the small number of goals scored from corners in 2010-2011 (a stats which perhaps merits further investigation), but after checking and re-checking, using also published data, I had to accept that this was the case.
MC cornersFig. 3 – Man City corners
Fig. 3 shows that the mix of corner taking by Man City does not follow the general pattern shown in Fig.1 (more in-swinging than out-swinging are taken in all seasons), and instead it changes for from one season to the next. A significant change occurs in the 2011-12, when tree times more in-swinging than out-swinging corners are taken.
MC goalsFig. 4 – Man City goals from corners
Fig. 4 Man City scores very few goals from corners, with highest totals in 2009-10, and in 2011-12, when they score 9 goals, all from inswinging corners. The latter exploit is probably the most relevant statistics to keep in mind.

So, now that we have the stats, let’s look at each of the three claims reported above, in order of time.

Claim 1
The first was (supposedly) made by Gavin Fleig, MC Head of Performance Analysis at the time (Nov, 2011),. “…City had gone 22 games and not scored a goal from a corner. After this…. they scored 8 goals in 15 games“. No timeline is given for these event, but since this claim was made at a conference in November 2011, and cannot refer to a distant past, we can safely assume it fall within the range of my data. So, by looking at the charts (fig. 4), one can see that it could only have happened in the 2009-2010 season, the one preceding this claim.

During this season, according to my stats , Man City scored 7 not 8 goals (but keep in mind that I have counted only goals scored directly from corners) all season, and in the following days’ play: 4, 10, 13, 24, 32 (3). From this sequence we can see that it did not go 22 games without scoring from corners as claimed, but only 10 (14-23). Moreover, MC did not score 8 goals in the following 12 games as claimed, but only 4, of which 3 on day 32 in a 6-0 win against an already relegated Burnley, which hardly merit being included in the count.

Of course Gavin’s 22 consecutive goalless games could also include games played at the end of the previous season 2008-2009, when Man City scored only one goal from corners all season (a record?). I’ll leave readers to query that stat, but, as you‘ll remember, we are still left with the second part of the claim:” …8 goals in 15 games “, one that doesn’t tally.

Claim 2
Man City corners stats have achieved such iconic status to merit a mention in a much publicised recent book on football analytics,“The numbers game: Why everything you know about football is wrong”. I haven’t got round to read it yet, so I can only comment on what has been reported in a review, which states that in “…winning the 2011-2012 season, when City won the English title, they scored 15 goals from corners, the most in the Premier League.” Fifteen (15) goals is the official Opta figure. I only found nine (9) , and, significantly, all coming from in-swinging corners.

However, it is likely that this Opta stat includes “goals created from this particular match situation are defined here as occurring within three touches of a corner“, a rule mentioned in the already mentioned blog by Chris Anderson, one of the authors of this book, and titled “Why the Goal Value of Corners Is (Almost) Nil: Evidence from the EPL” [1]. His analysis should leave many fans wondering about the Man City analytics crowd fascination with corners.

Claim 3
Last but not least, we come to the claim, as reported in Wired, made by the top analytics man himself, Simon Wilson, Man City Strategic Performance Manager.

Wilson recalls one particular period when Manchester City hadn’t scored from corners in over 22 games, so his team decided to analyse over 400 goals that were scored from corners. They noticed that about 75 percent resulted from so-called in-swinging corners, the type where the ball curves towards the goal. “In the next 12 games of the next season we scored nine goals from corners“, writes Wired.

This article is very recent, dated 23rd January, 2014, but the claim is similar to that made by Gavin Fleig in Nov 2011 (two years earlier!). And must obviously refer to the same ‘fact’: the sequence of goalless games is the same, 22, and so is the one when goals were scored 12 (I’ll leave it to the statisticians among you to calculate the probability of this event being repeated). But the number of goals now jumps to 9, not 8 as claimed by Gavin Flieg. I have already commented on the accuracy of these stats in Claim 1, so I’ll just deal with Wilson’s other claims: that 400 goals from corners were analysed, and that 75% of these were scored from in-swinging corners.

First, though, I should point out that Wilson does not specify that the 9 goals were scored from inswinging corners, although it is clear from his premise that is what he means. But, as we have already seen, this not tally with my figures, which show that in 2009-2010 only 3 goals came from this type of corner, while 4 came from out-swinging ones – hardly a strong case for favouring in-swinging corners. But perhaps Wilson (speaking last year, I presume) is mixing the stats of 2009-10 with those of 2011-12 when 9 goals where scored – all from in-swinging corners. As to the 75% success of in-swinging claim, this does not tally with my stats that show only a 61% advantage – not an insignificant difference.

Concluding remarks
So, here you have it. My stats appear to refute Man City’s much publicised corner claim (claims?). What next?

As I started thinking of how to wrap-up my piece with some “Conclusions”, many thoughts came to mind, and began to write. But then doubt and caution won the day, and stopped.  I wondered: how could a big club like Man Cit, with claimed 11+ people working on analytics could make such an error? Surely among these there must be some with A-level stats, and knowledge of Excel (even though neither is really necessary for doing simple stats).

I hesitated, and came to the decision to leave my Conclusion to a later blog. This would give time to the varius people mentioned, as well as interested football analyst, a chance to set the record straight, and, perhaps, question my findings.


Notes on data
As a professional data analyst, I am often riled by the lack of free data on many topics of wide public interest (of which soccer is perhaps the least). I believe that anyone making a public claim based on the analysis of data should be prepared to make this available to all who request it. An action that in the digital age this is rather easy to do. The capacity for other people to repeat the analysis, and verify or refute a data-bases claim is of fundamental importance. Else statistics will always inhabit that realm between science and non-science, and won’t be taken seriously.

In line with this belief, the data I used – not all, but the one specific to Man City corners – is available on request received by fellow bona fide analysts. They must keep in mind however, that this data may be subject to copyright by the original publisher, and cannot be distributed at will.

Posted in Soccer analytics, Sports Analytics, Uncategorized | Tagged , , , , , , , | Leave a comment

Balls and Runs – an attempt to Cricket analytics

Having a rest from football, and since the battle for the Ashes  is on (England – Summer 2013), I have turned my attention to cricket.   Australia’s bowlers have been criticised by their lack of success, especially in the 2nd Test at Lords.  So here is my attempt at an analysis of their performance.  And that of the English batsmen that faced them.

The data

I have taken the data from the ball-by-ball commentary of the first two tests, Trent Bridge and Lords, published on the ESPN Cricinfo website.  Had to do some extensive  data cleaning and structuring to put it in the format I needed.  A snapshot of the resulting table I used for the analysis is shown below.

Fig. 1

Runs x ball

Some notes of explanation.   There is a row for each ball played – the variables used should be pretty clear.  Also, to facilitate the analysis , I have:

  • Allocated extras (byes, no balls, etc.) as runs to the respective bowler/batsmen.  Extras  are such a small percentage of the total runs as not to influence the results of my analysis
  • Allocated zero (0) runs to an OUT ball

The analysis
The purpose of my analysis is to find significant differences (SDs) in performance, between Tests, Innings, Bowlers, Batsmen, with respect to  a chosen performance variable.

The chosen variables  are:
1. Runs per ball – runs scored for each ball (1, 2,… to 6)
2. Number of runs – runs scored  (0, 1, 2,… to 6)

1. Runs per ball analysis

The starting point of the analysis is the distribution of the numbers  of runs scored from each ball in these two tests (Total).  So, in the Fig. 2  below, starting at the top, we can seen the first node  the order of the ball played (1st, 2nd, …) and the corresponding number of runs.

1.1  Runs x ball – Test and Innings
Fig. 2

runs-ball_inningsI must admit of being rather surprised to find little difference between the runs scored from each ball in the two Tests Total.  However this is not the case when we look at these Tests separately, where there is a SD in the distribution of runs per ball.  For example at Lords 19.27% of the total runs were scored from the 1st ball against 13.90% at Trent Bridge.  I have highlighted the highest percentage score in each test.

Going further down the tree,  I found a significant difference between the 1st and 2nd innings at Lords , and I have marked the highest scores.  There was no SD between innings  at Trent Bridge.

1.2 Runs x ball – Bowlers by Test
Fig. 3

ball-runs_bowlersFig. 3 shows that there is a SD in the number of runs conceded by Bowlers from each ball.  Those ball with the highest % of runs conceded are highlighted.  At Trent Bridge, for Agar  is the 5th one, for Pattinson and Siddle, the 2nd one, and so on.   Pattinson stands out from his mates at Lords for conceding most runs from the 5th ball (23.97%), and being the most effective with his 4th one (7.53%).

 1.3 Runs x ball – Batsmen by Test
Fig. 4

runs-ball_batsmanEngland’s Batsmen also show significant differences in the number of runs scored from different balls, as shown above in Fig. 4.  Again the analysis has been done by Test.  Note Bell’s preference for scoring most runs from the 5th ball.  I’ll leave to cricket fans to dig out  other interesting facts.

2. Runs scored

This aim of this analysis is to find significant differences (SDs) in the number of runs conceded/scored from each ball.  The starting point is a node (Fig 4) that shows the number of runs scored.  So we have that 1,974 balls scored no runs (0), 290 balls scored 1 run, and so on.

2.1 Runs scored – Test, Innings
Fig. 5

balls-runs_inningsThe data tree above shows first that there is a SD (in distribution of the runsxball) between the two Tests.  Significantly more ball were not scored at Trent Bridge than at Lords (79.51% vs. 74.77%).  Also  SD between the innings, but only at Trent Bridge.  The relevant figures are highlighted.

2.2. Runs scored – Bowlers by Test
Fig. 6

ball_runs_bowlersFig. 6 shows a comparison of the Australian bowlers in the two test.  Watson stands out at Trent Bridge for his economy,  93.04% no runs compared to his mates 78.15%.   However, he bowled  much fewer balls than them, which is something that could invalidate this comparison.  However, the objective of this post is to present facts, not to make a deep statement about performance.  At Lords too, Aussie bowlers showed SD in performance, as marked, with Smith perhaps the most expensive one.

2.3 Runs scored – Batsmen by Test
Fig. 7

ball_runs_batsmenIn both tests, Englands batsmen show SD in the numbers of runs scored from each ball . In the first, they divide into two groups, with the one headed by Bairstow apparently doing better (more balls scored) than the Anderson  group.  At Lords, Bresnan stands out with his poor (as judged solely by this analysis) performance, 43 runs from 166 balls.

Closing notes

This is it! Just wanted to show an example of a different way to analyse cricket performance.  I have done what what came easier to do.   There are other results that I could get; for example comparing individual performance of batsmen/bowlers across tests and innings.   Perhaps I’ll do this and other results next.

The main hurdle I have to overcome is getting the data and putting in a format suitable for this type of analysis.     The time and effort required for this is definitely a put-off.  However, I hope to continue my effort of this basic analysis for the next tests.  Will do more if I have the time.  Somehow I have a feeling that by enriching this data with additional variables, order and time of batting/bowling, for example, some interesting insights (patterns) may emerge that coaches/captains may find useful to explain/improve performance.

Posted in Sports Analytics, Uncategorized | Tagged , , , , , , , | 4 Comments

Best and worst defensive performance – 2013 EPL

The idea for this analysis came from reading Paul Power February post, where he sets out to analyse Defensive Efficiency (Deff). To measure it he lists seven variables, and reasons for his choice. These are:

1. Goals Conceded (GC)
2. Goals Conceded Difference (GC-D)
3. Total Shots Conceded (TSC)
4. Shots on Target Conceded (SoTC)
5. Shots on Target Conceded % (SoTC%)
6. Goals Conceded From Total Shots % (GCTS%)
7. Goals Conceded from Total Shots on Target % (GCSoT%)

In his next post he attempts a classification of teams at that time using 6. and 7. and draws some conclusions.

My take on it
I don’t agree with his choice of 2.  which I think this is only useful if one is trying to compare Home vs. Away Deff performance, not overall  one.  I also have doubts about the contribution of 5. SoTC%; I’ll leave both out of my analysis.

My analysis has a similar aim, but I am going to split Deff into two components, Defensive effectiveness (Deff), and  Defensive efficiency (Deff%), and analyse them separately.  The first measures how much a defence in effective in restricting shooting opportunities to opponents; the second its efficiency in preventing goals being scored.

Defensive effectiveness (Deff)
For this analysis I have taken the list of variables that follows.  As well as Totals for Goals, Shots and Shots on target, as done by Paul, I have added  Home and Away figures.  These would help me cluster teams that have a similar defensive profile. (PS As would have figures for Goals conceded In and Out of the Box, of course, if I had them)

HGC       Home Goals Conceded
AGC        Away Goals Conceded
TGC        Total Goals Conceded
HSC        Home Shots Conceded
ASC        Away Shots on Target Conceded
HSTC     Home Shots on Target Conceded
ASTC     Away Shots on Target Conceded
TSC        Total Shots Conceded
TSTC     Total Shots on Target Conceded

The data comes from the 2013 EPL, as given  by the website Football Data (link),and is shown in the table below:

Deff dataThe Deff analysis
To classify teams  according to these metrics I am going to use the Cluster Analysis method I have  used in my previous post .  The results are shown in the picture below.  The data has been normalised: high values are shown in red and low in green (best performers), as in the colour scale shown.

From left to right, the following picture shows:

  • The order in which team have been ranked with respect to the performance parameters
  • A heat map that shows the normalized values of these parameters
  • A dendogram that shows how teams have been clustered

Deff heatmapThe analysis splits the teams in five major clusters, with the teams with the best Deff  all having different shades of green – below average values- at the top.

To my surprise, Arsenal, whose defensive performance was much criticised during much of the season, tops the list.  Also Man Utd, the champions, are not among the best defensively, and belong to the second best cluster of teams  that share a similar defensive profile.   Stoke is the surprise entry in this second cluster – not bad for a team who just avoided relegation to be in (just) with the champions.  But I guess this can be explained with the fact that they score and concede few shots and goals (sorry, I haven’t got time for a deeper analysis).

As for the bottom half of the table, nobody would be shocked to find Reading at the bottom, and  alone because so much worse than all the others relegated companions.  But I guess not many would have expect ed Swansea to be just above them.   I seem to recall too many heavy defeats though. I’ll leave you to reflect on the other teams positions.

The Deff% analysis
This analysis is aimed at classifying teams in respect to their Deff%  and clustering teams with similar profile.   So, while in my first analysis I have taken values, I now take ratios % of Goals conceded o Shots conceded.

  1. HGC% = HGC/home shots
  2. HGCT%=HGC/home shots on target
  3. AGC%=AGC/away shots
  4. AGTC%=AGC/away shots on target
  5. TGC%=TGC/totals shots
  6. TGCT%=TGCT/total shots on target

The results are shown in the figure below:

Deff% compareThe heat map on  the left show a ‘sequencing’ of  the Teams in order of Deff%, with the best at the top.  The one on the right shows an attempt to cluster and order team at the same time.

There are some discrepancies between the two images, as the one on the left  show the  ordering clusters and not of individual teams (clustering is not an exact science).  What is clear, though, is that Chelsea and West Ham have the best Deff%, and that Southampton, Wigan and Newcastle have the worst one in that order – but there is not much to choose between them.  Could this be the main reason for the Toons steep fall from last year grace?  I’ll leave you to ponder about the other results.  I’ll just comments on some of that really stand out.

High flying Tottenham  (at least for most of the season) is just above the trio of relegated teams .  This probably accounts for the missed  Champions league spot, and something AVB will need to address in order to improve performance  for next season.  I am sure that he knows that despite his  (apparent) dislike of statistics.

Arsenal,  appears to be the odd team out in his cluster on account of his poor Home performance , which is of a bottom three placing. But is obviously lifted by his outstanding Away one.

And, finally,it appears that it wasn’t because of their Deff% that QPR and Reading were relegated.  Looks like they may have conceded  some decisive goals in their fight for salvation.

Final analysis
So, which team have the best and worst defensive record?  A combined cluster analysis  of the two metrics should in theory give the answer.  But I had problem getting a meaningful and consistent classification, probably on account of the mix of values (Deff) and ratios (Deff%).  So, I /we’ll have to do it by inspection of the combined heat maps below, and views are bounds to differ:

Deff compare ALLMine is that Arsenal does not deserve top spot because of his very poor Home record in conceding goals. Man City appears to have the best combined record, followed by Man Utd, and Chelsea, the latter on account of its best Deff%.

As for the the teams with the worst record, the picture is much less clear, and contrasting results make it rather confusing.  Swansea is a case in point , with one step from the bottom with regards to Deff, and in third position with Deff%.  And then there’s Tottenham, fifth in the League table, and in the top cluster for Deff, but near the bottom in Deff% – where is its true place? Even worse is the dilemma facing who wants to judge the relegated teams, with Reading with the worst Deff of all, and Deff% near mid-table.  And what about Sunderland? I’ll leave you to make up your own mind on this and the other teams.

Posted in Soccer analytics, Sports Analytics | Tagged , , , , , , , , , | Leave a comment