Balls and Runs – an attempt to Cricket analytics

Having a rest from football, and since the battle for the Ashes  is on (England – Summer 2013), I have turned my attention to cricket.   Australia’s bowlers have been criticised by their lack of success, especially in the 2nd Test at Lords.  So here is my attempt at an analysis of their performance.  And that of the English batsmen that faced them.

The data

I have taken the data from the ball-by-ball commentary of the first two tests, Trent Bridge and Lords, published on the ESPN Cricinfo website.  Had to do some extensive  data cleaning and structuring to put it in the format I needed.  A snapshot of the resulting table I used for the analysis is shown below.

Fig. 1

Runs x ball

Some notes of explanation.   There is a row for each ball played – the variables used should be pretty clear.  Also, to facilitate the analysis , I have:

  • Allocated extras (byes, no balls, etc.) as runs to the respective bowler/batsmen.  Extras  are such a small percentage of the total runs as not to influence the results of my analysis
  • Allocated zero (0) runs to an OUT ball

The analysis
The purpose of my analysis is to find significant differences (SDs) in performance, between Tests, Innings, Bowlers, Batsmen, with respect to  a chosen performance variable.

The chosen variables  are:
1. Runs per ball – runs scored for each ball (1, 2,… to 6)
2. Number of runs – runs scored  (0, 1, 2,… to 6)

1. Runs per ball analysis

The starting point of the analysis is the distribution of the numbers  of runs scored from each ball in these two tests (Total).  So, in the Fig. 2  below, starting at the top, we can seen the first node  the order of the ball played (1st, 2nd, …) and the corresponding number of runs.

1.1  Runs x ball – Test and Innings
Fig. 2

runs-ball_inningsI must admit of being rather surprised to find little difference between the runs scored from each ball in the two Tests Total.  However this is not the case when we look at these Tests separately, where there is a SD in the distribution of runs per ball.  For example at Lords 19.27% of the total runs were scored from the 1st ball against 13.90% at Trent Bridge.  I have highlighted the highest percentage score in each test.

Going further down the tree,  I found a significant difference between the 1st and 2nd innings at Lords , and I have marked the highest scores.  There was no SD between innings  at Trent Bridge.

1.2 Runs x ball – Bowlers by Test
Fig. 3

ball-runs_bowlersFig. 3 shows that there is a SD in the number of runs conceded by Bowlers from each ball.  Those ball with the highest % of runs conceded are highlighted.  At Trent Bridge, for Agar  is the 5th one, for Pattinson and Siddle, the 2nd one, and so on.   Pattinson stands out from his mates at Lords for conceding most runs from the 5th ball (23.97%), and being the most effective with his 4th one (7.53%).

 1.3 Runs x ball – Batsmen by Test
Fig. 4

runs-ball_batsmanEngland’s Batsmen also show significant differences in the number of runs scored from different balls, as shown above in Fig. 4.  Again the analysis has been done by Test.  Note Bell’s preference for scoring most runs from the 5th ball.  I’ll leave to cricket fans to dig out  other interesting facts.

2. Runs scored

This aim of this analysis is to find significant differences (SDs) in the number of runs conceded/scored from each ball.  The starting point is a node (Fig 4) that shows the number of runs scored.  So we have that 1,974 balls scored no runs (0), 290 balls scored 1 run, and so on.

2.1 Runs scored – Test, Innings
Fig. 5

balls-runs_inningsThe data tree above shows first that there is a SD (in distribution of the runsxball) between the two Tests.  Significantly more ball were not scored at Trent Bridge than at Lords (79.51% vs. 74.77%).  Also  SD between the innings, but only at Trent Bridge.  The relevant figures are highlighted.

2.2. Runs scored – Bowlers by Test
Fig. 6

ball_runs_bowlersFig. 6 shows a comparison of the Australian bowlers in the two test.  Watson stands out at Trent Bridge for his economy,  93.04% no runs compared to his mates 78.15%.   However, he bowled  much fewer balls than them, which is something that could invalidate this comparison.  However, the objective of this post is to present facts, not to make a deep statement about performance.  At Lords too, Aussie bowlers showed SD in performance, as marked, with Smith perhaps the most expensive one.

2.3 Runs scored – Batsmen by Test
Fig. 7

ball_runs_batsmenIn both tests, Englands batsmen show SD in the numbers of runs scored from each ball . In the first, they divide into two groups, with the one headed by Bairstow apparently doing better (more balls scored) than the Anderson  group.  At Lords, Bresnan stands out with his poor (as judged solely by this analysis) performance, 43 runs from 166 balls.

Closing notes

This is it! Just wanted to show an example of a different way to analyse cricket performance.  I have done what what came easier to do.   There are other results that I could get; for example comparing individual performance of batsmen/bowlers across tests and innings.   Perhaps I’ll do this and other results next.

The main hurdle I have to overcome is getting the data and putting in a format suitable for this type of analysis.     The time and effort required for this is definitely a put-off.  However, I hope to continue my effort of this basic analysis for the next tests.  Will do more if I have the time.  Somehow I have a feeling that by enriching this data with additional variables, order and time of batting/bowling, for example, some interesting insights (patterns) may emerge that coaches/captains may find useful to explain/improve performance.


About soccerlogic

Data analyst/miner of 23 years experience. Pretty sure I was first (1998) to apply Statistical Analysis and Machine Learning to study performance in soccer. I probably invented Soccer Analytics or, as I called it then, Football Intelligence. Haven't stop learning since, and experimenting new analysis that can help teams improve performance.
This entry was posted in Sports Analytics, Uncategorized and tagged , , , , , , , . Bookmark the permalink.

4 Responses to Balls and Runs – an attempt to Cricket analytics

  1. Imran Khan says:

    Hi, this is some really interesting analysis. I was wondering whether you went any further with this? Also, how did you go about obtaining the data? I imagine it was just a web scraper off Cricinfo.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s