Mohamed Salah at Roma and Liverpool

  1. Introduction

In this post, the shot performance of M Salah at Roma an Livepool is compared.  The objective is to find any statistically significant (p<0.05) and non significant (but revealing) differences.   Significant differences are indicated by the acronym SSD (Statistical Significant Difference) in their comment line.

The Decision Trees method (https://en.wikipedia.org/wiki/Decision_tree_learning) is used for the analysis, in particuar the CART algorithms.

  1. Stats and Analysis

Salah has played for Roma in Serie A in the 2016-17 season, and for Liverpool  in the EPL in 2017-18

Salah 01

Salah 02

Salah 03

Salah 04

Salah 07

3.  Summary

The analysis shows that Salah performance has much improved at Liverpool.  The number of goals scored – double his tally (31 vs. 15) at Roma with roughly the same number of matches and minutes played – is a strong indicator.  But we must account that at Roma he had fewer shots (74 vs. 132) as Dzeko was the main target there.

The quality of his shot has improved significantly (fig. 2); unfortunately, this is a rather subjective evaluation parameter, and can’t be given too much weight.  But he is definitely shooting more from his left foot at Liverpool (Fig. 3). And under more defensive pressure than at Roma (Fig. 4), while receiving more Open Play passes (fig. 5).

Salah shooting location has also shifted from the right to the centre-right.  At Roma, his shots came mainly from the right side of the goal (Pic. 1), while at Liverpool were more evenly distributed, though the right side remained favourite.   Logically, this is also the side where most of his shots assists came from.

@@@

Data by @StrataBet
 StrataBet Logo
This article was written with the aid of StrataData (www.stratagem.co), which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform (https://app.stratabet.com), in addition to StrataBet Premium Recommendations (https://stratatips.co).

 

Advertisements
Posted in shot statistics, Soccer analytics, Sports Analytics | Tagged , , , , , , , , , , , , | Leave a comment

Is xG any good at predicting game outcomes?

Introduction

One can’t afford to ignore Expected Goals (xG) now that Match of the Day are giving the metric such a huge profile.

I’m not a massive fan of xG, but I thought it was worth further investigation and so, thanks to data from StrataData ((www.stratagem.co), I have been doing some work on it.

Naturally, I have read the work published by the many experts on the subject, such as M Caley of @MC_of_A, and @11tegen11.  I have also looked at https://jameswgrayson.wordpress.com/ but his useful contribution on the subject is historic, and because it stopped years ago, I feel it is unlikely to be of value in today’s data rich environment.  One thing I noticed is that while many analysts praise the predictive powers of xG nobody provides any concrete proof of this important aspect of this metric. Take for example 11tegen11:

“On 11tegen11, we’ve made the case for expected goals……. being the single best predictor for future match outcomes, better than points, goals, shots or shots on target” (1)

Yes, but then he goes on to elaborate some theoretical proof of this statement, using Poisson distribution and Monte Carlo method.

As a data miner, I don’t have much time for that.  What I was expecting (at least) was some sort of summary table that compared actual match results with those predicted by his model, and some statistical analysis of these results that supported his claim.  Perhaps he did that in some of his later blogs, but I couldn’t find any.  I have also looked for others, but all I could find were summary tables that compared overall actual game results with predicted xG.  Perhaps I am not being very thorough in my search, so I am ready to change my mind if you can show me some evidence to the contrary.

First attempt at prediction

So for my first contribution to the xG debate, I have investigated its predictive powers.  For me, this means computing the historic xG of two teams in a forthcoming game and predicting that the team with the higher one will win, and if similar the game will be drawn.  I published the first results in my twitter account (@soccerlogic) last weekend, as in the table below, and (hopefully) will do so for the weeks ahead, until the end of the season.

Table 1. xG prediction

Each week I will also aim to publish a summary table of predicted vs. actual results to give an account of how good these predictions were.

This is, of course, just one aspect of my work on xG, which, given the small sample of games involved, to the end of the season, won’t be sufficient to say much about the predictive powers of xG.  So I intend to follow (in parallel) this initial effort, by doing a similar work on historic data and publishing the results here on my blog.

Some explanation of how Team xG results were computed

  • Data: – shot data from the EPL; seasons 2016-17 plus 2017-18 (March 5) – just over 14,000 shots.  Train data: 2016-2018 (Feb 12) | Test data: Feb 13 – March 12 match data  (20 matches – 48 goals). No penalties and Dangerous
  • Model:- binary classification with binary DV [0,1] (no-goal, goal)
  • Attributes (Ind. Vars):- shotLoc_X, shotLoc_Y, bodyPart, shotQuality, assistLoc_x, assisLoc_y, assistType
  • Notes:- No penalty data | Dangerous Moments (Chances) also left out of Goal count ( but included in no-goal).

Model building process

  1. Use Decision Tree to find significant attributes in Train data
  2. Use significant attributes to create a number of NN (Neural Network) models
  3. Validate models using Test data (details above)
  4. Select most accurate model using a combined % accuracy + goals predicted.
  5. Best model has 88.5% validation accuracy and predicts 29/48 = 60% goals)
  6. Best model used to ‘score’ (compute probabilities of all shots) Train data
  7. Compute team xG (home and away) = avg xG of All matches played (Home, Away)
  8. Results (historic team xG) are shown in the table below

 Table 2. Team xG (2016-17-18 (March 5)

 Blog_01 xG Teams

Additional information on model selection, analysis, results

  • Only some of the attributes (variables) in the data were used in the model. Some were left out because of their fractional contribution to accuracy, others because considered to be subjective (e.g. chanceRating). Of the latter, only shotQuality was included because of its significant impact on accuracy.  This attribute acts basically a proxy for Opta’s gmlocy, gmlocz which basically indicate the goal-direction of a shot, are highly significant in determining xG.
  • Last but not least, shot location does not appear to be “By far the most important predictor.” as 11tegen11 claims, however defined. And definitely not ‘by far’.  ShotQuality (as gmlocy, gmlocz do) affect xG equally if not more than location’ – as one can shoot from a ‘good’ location, but if the shot is directed to the keeper, it has a high probability of being saved.
  • Various classification techniques (Log R, SMO, Naive Bays, Random Forest, etc.) available in Weka 3.8 were also used to build and validate the model; while some achieved similar accuracy to NN, their goal prediction was far lower (looking for someone to volunteer an explanation for that).

Future work

In later work on this I’ll add data from the Championship, and possibly the MLS.  One objective is to verify whether more data does make any difference to the accuracy of the model.  Soon I will also publish data on players’ xG.

(1) http://11tegen11.net/2015/08/14/a-close-look-at-my-new-expected-goals-model/

Data by @StrataBet
 StrataBet Logo
This article was written with the aid of StrataData (www.stratagem.co), which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform (https://app.stratabet.com), in addition to StrataBet Premium Recommendations (https://stratatips.co).

 

Aside | Posted on by | Tagged , , , , , , , , , , , | Leave a comment

Conversion rates (SG) by shot Location

1. Introduction

This blog is a continuation of my last one on shooting in the MLS (Star shooters of the MLS – April 1, 2017) .  Here shall be looking at how shot location affects shot outcome, as defined by the metric SG (goals/shots), or conversion rate.

As soccer fans, intuitively we know that all shot do not have the same probability of being converted to a goal, and that shot location plays an important part in this outcome.  So, it is important to discover which which location have a high shot conversion rate.  Next we classify teams according to their shot location and  shot conversion profile.

2. Methodology

As in the previous analysis I shall be using the Decision Trees (https://en.wikipedia.org/wiki/Decision_tree_learning) method for the analysis. Before I can start, I need to add a new variable to the data, which I call  Zones_XY , and assign values to it.  To create Zones_XY I divide the last 3rd of the pitch into a grid of 50 Zones (5x*10y).  The result is that each shot is now associated with a Zone location specified  by the  variable Zone_XY.  This takes values A1,A2, A3, … and so on.

The purpose of my is analysis is to classify these Zones by shot conversion rate SG,  and cluster similar ones together.  Instead of considering all shots, I am going to analyse only shots resulting from Regular play.

3. The analysis

As done in my preceding blog, I analysis the variable Result, which takes the value of 1 or 0 depending whether the outcome of shot is a goal or not.  In contrast with my previous effort, I define Result as continuous (numeric) variable, and therefore  here I am using Decision Trees (DT) to perform a regression  type of analysis

The result of are shown in the graph below  (3.1).  The top node is my starting point; it show the average conversion rate (SG) for all shots (8,495), is 0.106.  I then split its total by its PatternofPlay components. and obtain the Regular play node with the shots I want to analyse; then is just a matter of running the algorithm which creates the tree shown below.

3.1 Conversion rate (SG) by Location (Zone_XY) – Regular play

xG Regular tree

Legend:  Shots (6,095) from Regular play have an SG 0.096; which means that we can expect on average a goal to be scored every 10 shots.  The analysis find, as expected, that the Zones_XY variable I have created is significant in explaining difference in SG, and creates seven Zone_XY clusters, each one with a SG that is significantly different from the other (95% confidence level). These are shown ordered from left to right, and vary from Zones with near zero to 0.27 SG.  And if we map these result onto a football pitch we obtain the following picture

Map 3.1

Shots xG Zone

Legend: The map on the left show the x,y location of shots (yellow) and goals (red).  On the right,  the same map is shown divided into my Zones_XY, with the colored ones mirroring the results of the DT analysis.    We can see that most Zones (green and gray) have zero or infinitesimal probability that a shot taken from them will result in a goal . There are however 15  Zones (green to red) where we can expect a better outcome, which varies from 0.01 to 0.27.

4. Digging deeper – Teams

DT is a great tool for exploratory analysis as it allows to easily drill down into the data and find answer to obvious questions a football analyst or fan may want ask. For example, I can find out if this (overall) shooting profile I have just discovered applies to all Teams in the MLS, or there are differences among them that are statistically significant.

For this analysis, I need to create a new variable Avg_SG which maps each shot to the SG values of the Zones it was taken from, as computed in the previous analysis.  The result is a categorical variable with seven categories (the 15  Zones share 7 different SG values) which I can now analyse using DT.  The result is shown in the graph below, and summarised for easier readability in the table that follows it.

Graph 4.1 Shot conversion profile – Teams

xg Zone tree

Team clusters SG

Legend: the top node of this tree shows the Avg_SG variable I have just created and its six categories; this is the overall shot conversion profile (Regular play).  We can see, for example, that most shots (0.296% of the total, or 1 ,680, ) are taken from Avg_04, that is from the Zones which share an SG of 0.04.  We can read the other categories (Avg_01, Avg_04, … etc.) in the same way.

Running the DT algorithm then creates five nodes (clusters), each one with a group of teams that share a shot conversion profile significantly different from the others. By profile here I mean the six values (vector) taken by the Avg_SG variable.  For example, those of of Teams in the tagged node are: 0.079, 0.270, 0.184, 0.192, 0.131, 0.143.  The table below (4.1) shows the shot conversion profile of each cluster of teams expressed in %  for easier readability and comparison between them.

Table 4.1  Shot conversion  profile – Teams

Team clusters SG M1

Legend: All teams takes most shots from Zones with a SG of 0.04 (Avg_04). Nearly half of  {FC Dallas,…} shots come from the lowest SG Zones.  In contrast {Columbus, …} have the best shooting record from the high SG Zones. I’ll leave to readers (MLS fans in particular) to discover other interesting facts in the results shown.

5.  Final notes

While for this analysis I focused on shot Location, the DT algorithm tells me that shot Direction – the trajectory of the shot to the goal face – is ranked before Zones_XY as a predictor of SG.  So perhaps, a better predictor of shot conversion would be taking both shot location and direction into account.

As one would expect, shooters also have ‘preferred’ shooting zones, and thus different profiles.  DT found 14 of them for the 58 players considered  – far too many to be included in this blog.

I was going to compare these results with those obtained by others using the expG metrics, and draw some conclusion.  Unfortunately, I realised that this effort would take too much of my time and was better left to a later blog.

Aside | Posted on by | Tagged , , , , , | Leave a comment

Shooting stars of the MLS

  1. Introduction

With the 2017 MLS season approaching, I am going to take a look at last year (2016-17 season) results and in particular at the attacking effectiveness of Teams and Players.   One way to measure this is to compute the percentage (%) of Shots that are converted into Goals, which I shall call  SG%.

The purpose of this analysis is to classify Teams and Players with respect to this metric, and thus provide a series graphics and tables to show  the results.  The analysis is performed using a clustering algorithm that splits Teams into clusters, such that those in each cluster have similar SG%, and are significantly different form than to those in other cluster.   Players are clustered in the same way..

The easiest and (probably) best way to cluster with respect to a binary variable (goals vs. shots) is to use the algorithm known as classification by decision tree induction (DT).  As well as providing great flexibility, this has the advantage to display the results in easy-to-understand graphics. The tool used to for the analysis is a professional software that creates DTs using the CHAID technique.  This is named from the CHI-square test used to validate the statistical significance of the results (diversity between clusters).

This DT method and, in particular, its sophisticated implementation in this software makes it easy to compute not only the overall performance of Teams and Players, but also to perform a conditional analysis of their SG% metric.  That is to compute the SG% for all particular conditions/contexts reported in the data, such as Patterns of play (Regular, Fast-break, Set-piece, etc. ), Assisted vs. Solo , Cross vs. Other shots, etc.

Details of the results of this analysis follow.  First we show the graphic tree with the clusters (tree nodes) created by the algorithm and the Teams or Players belonging to them.  The cluster with with the highest SG% is tagged, and details of the Teams or Players shot profile are given in the table together with their individual SG%.

  1. Team analysis

Graph 2.1 Team shooting effectivenessAll plays

team_1

Legend:  Of a total 8,496 shots only 900 were converted into goals (10.59%), as shown in top node.  Teams are split (automatically by the algorithm) into three clusters. The one tagged has the best SG% at 12.96%, and contain the five teams with the highest SG%.

team_1_table

Graph 2.2 Top shooters from Crosses

team_2

Legend:   Teams do slightly better with converting Crosses to Goals, than from other type of passes:  11.25% vs. 10.45% (difference not significant).  However some teams do significantly better than other in scoring from Crosses, 14.95% vs. 8.80%.

Graph 2.3 Top shooters by Pattern of play

team_3

Legend:  Only shooting from Fast-breaks (Fb) is significantly more effective than other types. In fact, scoring from Fast-breaks is more than twice as likely (21.31%) than from any other play ( 9.51%). There is also a wide difference between teams in scoring from Fb, and they split into two clusters with SG% of 27.83% and 9.09% respectively.  For other plays (Corner, Fk, etc.), the performance gap among Teams is narrower, with LA Galaxy and NY topping the list, at 13.14%.  Performances are of all plays are summarised in the table below.

Table 2.2 Summary of Team shooting performance by play

team_table2

  1. Player analysis

Graph 3.1 Best shooters

palyers_1

Legend:  Best shooters are players that convert more shots into goals (i.e. have higher SG%).  The algorithm splits them into four clusters, with the best averaging a 18.96% success rate.

Table 3.1 Effectiveness (SG%) of top shooters

player_1-Tble

Graph 3.2 Shots from cross

player_2Legend: The success rate (SG%) from Crosses (1) is 11.25%, and 10.45% from other shots(0).  This is not significantly different and the split has been forced manually to focus on Cross.  Players divide into four clusters, with a wide margin between the top rated shooters (28.74%) and the others.

Table 3.2 Top goal-scorers from Crosses

player_2_Table

Graph 3.3 Assisted (1) vs. Individual play (0)

player_3

Legend: Player are divided into four cluster, with the best ones (in tagged node) averaging an 19,21 SG%, and separated by a wide margin by the second best (12.5%).

Table 3.3 Best shooters from Assists

player_3_table

Graph 3.4 Top Assist providers (passers)

player_4

Legend: Leading assist providers also are also split in four cluster, with TFC’s Giovinco topping the cluster (tagged) of best ones  with a 27.08% SG%.

Table 3.4 Top Assists providers

player_4-table

player_all_X

player_7

Legend:  The graph above has been created by combining the clusters with the highest SG% for each pattern-of-play.  While Bradley WP is listed first in four of them, this is not because he has the highest SG% (although he may well have), but because players in a cluster are shown (by the software) in a alphabetical order, and not by highest to lowest SG%.

Graph 3.6 Best headers (Head > Goal)

player_8

Legend: SG% of Headers (10.47%) is not significantly different from other attempts (10.62%) – the first split has been forced in order to analyse Headers separately.

Table 3.6 Best headers

player_9_table

Disclaimer: I am aware that the data used  in this analysis (collected from WhoScored) does not correspond 100% to the official data.  I am confident, however, that any differences from official results are likely to be relatively minor, and such as to make little difference to the accuracy of the results.

 

Posted in shot statistics, Soccer analytics, Sports Analytics, Uncategorized | Tagged , , , , , , , | Leave a comment

Can we judge performance accurately without knowing The plan?

Football performance is judged by analysing video and data from a match, or from many matches.  But do video and data provide all the information we need to judge performance accurately?  I don’t think so.  A vital piece of information is missing: the head coach tactical plan(s).

A team should have a tactical plan.  In fact, it should have at least two: an offensive and a defensive one.  Winning in football is about scoring goals, and stopping the opposition doing the same.  So an attacking plan is needed to specify how to create goal opportunities, and a defence one to prevent the opposition creating them.   But even two plans are not enough. Other plans are needed  to cope with the changing circumstances of the game: winning/losing positions, players sent off or injured, etc.   Decision making may get complicated during the ups and downs of a football match, but the coach who has planned in advance for all (most?) of  them has a better chance of making the right decision, and get an edge on the opposition.

But let’s keep it simple and just look at the attacking and defensive plan.  The attacking plan should specify a sequence of passes aimed at reaching a shooting position.   It should start from a possession and include the position of the players and their movement on and off the ball.  A defensive one would specify what players should do (position, action) when the opposition has the ball.

But… is there a plan?

Although the need for such plan (or plans) seems obvious to me, I wonder what is happening out there, in real football, in the training grounds.  Do coaches make plans to such level of detail? Do they write them down, communicate them to the players, and practise them in training?  Somehow I doubt it: rarely the word plan is mentioned by pundits and media analysts of the game.  A notable exception was after England’s unexpected defeat by Iceland when the manager was blamed for not having one.  Formation is the word that seems to be used in its place.  But formation only specifies position, not action.  Should we then assume that whatever players do is part of a plan, and just judge the execution?

Given a formation, for example,  how can we judge accurately the performance of a midfielder that always attempts  long passes (and loses most of them) when he also had the option of an easy forward pass to his right?  We can’t!   Our stats will show that he has performed poorly.  But without knowing the plan this may be the wrong conclusion.  We don’t know if the coach has instructed him to act in this way. Perhaps the forward is at fault for not taking the right position to collect the passes.  As for the free player on his right, may be the plan says that he should not be there, but yards forward, and thus taking a defender with him, etc. etc…

So, what is the worth of our analysis if we only know what we can see?

Posted in Soccer analytics, Sports Analytics | Tagged , , , | Leave a comment

Possession chains and passing sequences

Background

Few days ago,  I tweeted that the ‘newer concept’ of ‘possession chains‘ proposed by Marek Kwiatkowski (@statlurker) in his latest blog*  was ‘very familiar to me’.  I also attached  text taken from my website (www.soccerlogic.com), where I write of ‘passing sequences’: a similar (same?) concept to Marek’s ‘possessions‘.  When @SportsDataChal asked me if I had published anything on the subject, I replied that I had only showed graphics  on my website, and promised that I would publish more on my blog.  Since I have no time (and inclination) to write anew on the subject, my intention was to fish out past notes on the subject and publish them without any editing.

Possession chains/passing sequences/event chains/link-plays/…

That is what I am doing below.  First an extract from Marek’s blog where he introduces his ‘possession chains’, then my three pieces on the subject.  The first is taken from an unedited note (rant?) on football analytics, the second from a marketing document aimed at football clubs. and the third from a document/proposal submitted (then) to the Capello index developers. A graphic representation of passing sequences copied from my website, is shown at the bottom (Euro 2004, Portugal).

From Marek’s blog*

Luckily, a newer concept is emerging into view and taking a central place: the possession chain (possession for short). A possession is a sequence of consecutive on-the-ball events when the ball is under the effective control of a single team. A football game can then be seen as an (ordered) collection of sequences. It is a very positive development since possessions make much more sense as the fundamental building blocks of the game than events. This is because they are inherently dynamic — they span time and space. I believe that they should be studied for their own sake, and if you only compute them to figure out who should get partial credit for the shot at the end of it, then in my opinion, you are doing analytics wrong – or at least not as well as you could be.”

1. SoccerLogic’s Event Chains/Passing sequences – (2004)

“One of the main reasons to use a football analysis tool is to identify event chains, that is to identify what events that led up to a specific situation. For example, if a team scores it is Interesting to see what events that happened just before the scoring. For instance, a goal could have come after five successive short passes in a row in the team. It could also have come after that a defensive player lost the ball to an attacking player who shot immediately. To know what events occurred just before one goal is not very important but if there is recurring patterns in what kind of events that have occurred just before a goal, it is very interesting information. If, for example, goals very often are made after a number of short successive passes within the attacking team, the coach can draw the conclusion that a way of scoring is to use short passes in the offensive play.

The software program should be able to aid the match analyst in the identification of recurring event chains. A requirement for this is that there is a database of event chains from previous games, as described in the previous section. Some kind of event chains could possibly be identified in just one game, but in most cases several games have to be analysed in order to identify recurring event chains. A way of identifying event chains is to compare the five events (passes, shots, dribbles etc) that happened just before every goal and then compare if there are similarities.”

2. From a SoccerLogic marketing document to football clubs – (2005)

“One of Soccerlogic many useful features is a very effective method for analysing event chains or passing sequences. The purpose of this analysis is to find recurring passing patterns.  These provide crucial information for understanding a team’s style of play: the tactical/strategic elements of its performance.

SoccerLogic can display event chains leading to (and following) any key event of a match, such as a goal, a foul, a shot on goal, a cross, etc. in rich graphic details. It can also create summary views (trellis) of chains leading to any particular event; these make it easy to compare passing movements and identify recurring patterns.  For greater accuracy, event chains of many matches can be analysed together.  Computer-based statistical analysis is then used to find among them trends and patterns correlating to good/poor performance.  This information provides a coach with an objective assessment of the effectiveness of his decisions, and helps him devise winning strategies for subsequent games.”

3. A data-based method for assessing Team performance – (2011)

“Football performance analysis normally focuses on players, not least because their stats are easier to collect and process.  Judging a team’s performance is not so simple. Team stats normally published in the media (corners, shots, possession, etc.)  tell only a small part of the story.   Players are also the focus of the Castrol and the recent (and controversial) Capello performance index.  There are no similar indexes for Teams, which are normally assessed solely on form – Wins/Losses and goals scored.   Since in football the final result often does not reflect performance on the pitch, this is not a satisfactory way to judge a team’s performance.

Given the amount of match data that is collected today, I am surprised that nobody has come up with a better method.  I guess it has much to do with the lack of skilled sports data analysts to fully exploit this data.  So, I developed my own solution.  I think it offers a very effective way to measure team’s performance, and can provide interesting stats to the media, as well be as valuable info to coaches.

My method is based on the analysis of the Possession of each team.  A Possession is defined as a sequence of events (ball touches) which starts when a team gets the ball and ends when the team loses it to the opposition.  The method views a football match as a series of alternate Possession.  This is not unique to football, but can be applied to any ball game (basketball, hockey, rugby, etc.).  I think basketball is the only ball game that appears to be  analysed this way.  But, compared with basketball, football is a very low scoring game, so the challenge is to find a useful Measure of Performance (MoP).

A Possession is the true expression of team performance because it describes how players work together to achieve a goal, and is specified by the following attributes:..
(details follow)

120_Passing_sequences

*http://statsbomb.com/2016/08/towards-a-new-kind-of-analytics/

Posted in Passing sequences, Possession chain analysis, Soccer analytics, Soccer match analysis, Sports Analytics | Leave a comment

Rank and Cluster of teams by Shot Statistics – EPL 2015-16

Thanks to ‘s for sharing the data (http://cartilagefreecaptain.sbnation.com/2014/2/12/5404348/english-premier-league-shot-statistics) which allowed this analysis to take place.  Michael’s Glossary of the stats copied from the same blog has been added at the bottom.

I have taken taken the data from the blog mentioned above: three tables with advanced shot statistics for the 2015-2016 English Premier League, and used advanced statistical techniques to  Cluster an Rank the teams with respect to each table of stats.  Note that Michael’s data is updated up to May 2, 2016; so, for most teams, the stats of the last two matches are not included .  Given the debacle suffered by Tottenham in in the last two games, this is probably the team whose ranking may not reflect the final results.

Color scale 2015-16 Shots

Red means more, and signifies better stats in Attack and worse in Defence (more Shots, Goals, etc. conceded).  Fancy has a mixture of both positive and negative stats.

Attack

Attack_1 2015-16

Legend: Arsenal has the best Attacking stats, followed by Man City, Liverpool, and Tottenham.   These teams also share a significant advantage over teams in the second and following clusters.  Surprisingly, relegated Newcastle tops the last cluster – so Defence appears to have been the problem.

Defence

Defence 2015-16

Legend: Man City tops the Defence stats ahead of Liverpool, apparently not greatly affected by Sakho’s absence in the last games.  Strangely, Leicester has similar stats to those of relegated Norwich and Aston Villa.

Fancy

Fancy 2015-16

Legend: Taking both Defence and Attack shot stats into account, Tottenham has the more positive mix, closely followed by Man City and Arsenal.  However, given this team dismal performance in the last two matches (stats not included in the data), perhaps its top spot may mot be justified.

Glossary

Shot locations are based on Michael’s map matrix below. Penalties are not included.

Shot zone

DZS: Shots from the danger zone, which is zones 1-3, the close and central areas of the box.

WS: Shots from the wide areas of the 18-yard-box, zones 4-5.

SoB: Shots from outside the 18-yard-box, zones 6-8.

%Cross: Percentage of shots from the danger zone assisted by crosses.

%TB: Percentage of shots from zones 1-5 assisted by through-balls.

SoT: Shots on target

DZ Pass: Shots assisted by passes in or around the danger zone.

Counter: Shots attempted from counterattacking moves

Est. Poss.: Shot attempted from established possession in the opposition half

NPG: Goals not from penalties or from own goals. (xG is meant to nearly sum to NPG, but it does not quite because (a) the fit isn’t exactly perfect and (b) xG deprecates shots off rebounds and only counts for a team one shot from every attacking move.)

TSD: Total Shots Difference, shots taken minus shots allowed.

SoTD: Shots on target difference, shots on target minus shots on target conceded.

xG: Expected goals scored or conceded based on shot type, assist type and shot location, speed of attack and a few more factors. For the explanation of the components of xG, see my full open-method expected goals methodology.

Expected goals here does not sum to the same total as goals, because it excludes penalties and own goals, as well as deprecating the value of chances off rebounds.

Posted in shot statistics, Soccer analytics, Soccer match analysis, Sports Analytics | Tagged , , , , , , | Leave a comment