Conversion rates (SG) by shot Location

1. Introduction

This blog is a continuation of my last one on shooting in the MLS (Star shooters of the MLS – April 1, 2017) .  Here shall be looking at how shot location affects shot outcome, as defined by the metric SG (goals/shots), or conversion rate.

As soccer fans, intuitively we know that all shot do not have the same probability of being converted to a goal, and that shot location plays an important part in this outcome.  So, the question I am trying to answer is: which locations gives a higher or lower probability of scoring; how many we a is which location have

2. Methodology

As in the previous analysis I shall be using the Decision Trees (https://en.wikipedia.org/wiki/Decision_tree_learning) method for the analysis. Before I can start, I need to add a new variable to the data, which I call  Zones_XY , and assign values to it.  To create Zones_XY I divide the last 3rd of the pitch into a grid of 50 Zones (5x*10y).  The result is that each shot is now associated with a Zone location specified  by the  variable Zone_XY.  This takes values A1,A2, A3, … and so on.

The purpose of my is analysis is to classify these Zones by shot conversion rate SG,  and cluster similar ones together.  Instead of considering all shots, I am going to analyse only shots resulting from Regular play.

3. The analysis

As done in my preceding blog, I analysis the variable Result, which takes the value of 1 or 0 depending whether the outcome of shot is a goal or not.  In contrast with my previous effort, I define Result as continuous (numeric) variable, and therefore  here I am using Decision Trees (DT) to perform a regression  type of analysis

The result of are shown in the graph below  (3.1).  The top node is my starting point; it show the average conversion rate (SG) for all shots (8,495), is 0.106.  I then split its total by its PatternofPlay components. and obtain the Regular play node with the shots I want to analyse; then is just a matter of running the algorithm which creates the tree shown below.

3.1 Conversion rate (SG) by Location (Zone_XY) – Regular play

xG Regular tree

Legend:  Shots (6,095) from Regular play have an SG 0.096; which means that we can expect on average a goal to be scored every 10 shots.  The analysis find, as expected, that the Zones_XY variable I have created is significant in explaining difference in SG, and creates seven Zone_XY clusters, each one with a SG that is significantly different from the other (95% confidence level). These are shown ordered from left to right, and vary from Zones with near zero to 0.27 SG.  And if we map these result onto a football pitch we obtain the following picture

Map 3.1

Shots xG Zone

Legend: The map on the left show the x,y location of shots (yellow) and goals (red).  On the right,  the same map is shown divided into my Zones_XY, with the colored ones mirroring the results of the DT analysis.    We can see that most Zones (green and gray) have zero or infinitesimal probability that a shot taken from them will result in a goal . There are however 15  Zones (green to red) where we can expect a better outcome, which varies from 0.01 to 0.27.

4. Digging deeper – Teams

DT is a great tool for exploratory analysis as it allows to easily drill down into the data and find answer to obvious questions a football analyst or fan may want ask. For example, I can find out if this (overall) shooting profile I have just discovered applies to all Teams in the MLS, or there are differences among them that are statistically significant.

For this analysis, I need to create a new variable Avg_SG which maps each shot to the SG values of the Zones it was taken from, as computed in the previous analysis.  The result is a categorical variable with seven categories (the 15  Zones share 7 different SG values) which I can now analyse using DT.  The result is shown in the graph below, and summarised for easier readability in the table that follows it.

Graph 4.1 Shot conversion profile – Teams

xg Zone tree

Team clusters SG

Legend: the top node of this tree shows the Avg_SG variable I have just created and its six categories; this is the overall shot conversion profile (Regular play).  We can see, for example, that most shots (0.296% of the total, or 1 ,680, ) are taken from Avg_04, that is from the Zones which share an SG of 0.04.  We can read the other categories (Avg_01, Avg_04, … etc.) in the same way.

Running the DT algorithm then creates five nodes (clusters), each one with a group of teams that share a shot conversion profile significantly different from the others. By profile here I mean the six values (vector) taken by the Avg_SG variable.  For example, those of of Teams in the tagged node are: 0.079, 0.270, 0.184, 0.192, 0.131, 0.143.  The table below (4.1) shows the shot conversion profile of each cluster of teams expressed in %  for easier readability and comparison between them.

Table 4.1  Shot conversion  profile – Teams

Team clusters SG M1

Legend: All teams takes most shots from Zones with a SG of 0.04 (Avg_04). Nearly half of  {FC Dallas,…} shots come from the lowest SG Zones.  In contrast {Columbus, …} have the best shooting record from the high SG Zones. I’ll leave to readers (MLS fans in particular) to discover other interesting facts in the results shown.

5.  Final notes

While for this analysis I focused on shot Location, the DT algorithm tells me that shot Direction – the trajectory of the shot to the goal face – is ranked before Zones_XY as a predictor of SG.  So perhaps, a better predictor of shot conversion would be taking both shot location and direction into account.

As one would expect, shooters also have ‘preferred’ shooting zones, and thus different profiles.  DT found 14 of them for the 58 players considered  – far too many to be included in this blog.

I was going to compare these results with those obtained by others using the expG metrics, and draw some conclusion.  Unfortunately, I realised that this effort would take too much of my time and was better left to a later blog.

Advertisements
Posted in shot statistics, Soccer analytics, Sports Analytics | Tagged , , , , , | Leave a comment

Shooting stars of the MLS

  1. Introduction

With the 2017 MLS season approaching, I am going to take a look at last year (2016-17 season) results and in particular at the attacking effectiveness of Teams and Players.   One way to measure this is to compute the percentage (%) of Shots that are converted into Goals, which I shall call  SG%.

The purpose of this analysis is to classify Teams and Players with respect to this metric, and thus provide a series graphics and tables to show  the results.  The analysis is performed using a clustering algorithm that splits Teams into clusters, such that those in each cluster have similar SG%, and are significantly different form than to those in other cluster.   Players are clustered in the same way..

The easiest and (probably) best way to cluster with respect to a binary variable (goals vs. shots) is to use the algorithm known as classification by decision tree induction (DT).  As well as providing great flexibility, this has the advantage to display the results in easy-to-understand graphics. The tool used to for the analysis is a professional software that creates DTs using the CHAID technique.  This is named from the CHI-square test used to validate the statistical significance of the results (diversity between clusters).

This DT method and, in particular, its sophisticated implementation in this software makes it easy to compute not only the overall performance of Teams and Players, but also to perform a conditional analysis of their SG% metric.  That is to compute the SG% for all particular conditions/contexts reported in the data, such as Patterns of play (Regular, Fast-break, Set-piece, etc. ), Assisted vs. Solo , Cross vs. Other shots, etc.

Details of the results of this analysis follow.  First we show the graphic tree with the clusters (tree nodes) created by the algorithm and the Teams or Players belonging to them.  The cluster with with the highest SG% is tagged, and details of the Teams or Players shot profile are given in the table together with their individual SG%.

  1. Team analysis

Graph 2.1 Team shooting effectivenessAll plays

team_1

Legend:  Of a total 8,496 shots only 900 were converted into goals (10.59%), as shown in top node.  Teams are split (automatically by the algorithm) into three clusters. The one tagged has the best SG% at 12.96%, and contain the five teams with the highest SG%.

team_1_table

Graph 2.2 Top shooters from Crosses

team_2

Legend:   Teams do slightly better with converting Crosses to Goals, than from other type of passes:  11.25% vs. 10.45% (difference not significant).  However some teams do significantly better than other in scoring from Crosses, 14.95% vs. 8.80%.

Graph 2.3 Top shooters by Pattern of play

team_3

Legend:  Only shooting from Fast-breaks (Fb) is significantly more effective than other types. In fact, scoring from Fast-breaks is more than twice as likely (21.31%) than from any other play ( 9.51%). There is also a wide difference between teams in scoring from Fb, and they split into two clusters with SG% of 27.83% and 9.09% respectively.  For other plays (Corner, Fk, etc.), the performance gap among Teams is narrower, with LA Galaxy and NY topping the list, at 13.14%.  Performances are of all plays are summarised in the table below.

Table 2.2 Summary of Team shooting performance by play

team_table2

  1. Player analysis

Graph 3.1 Best shooters

palyers_1

Legend:  Best shooters are players that convert more shots into goals (i.e. have higher SG%).  The algorithm splits them into four clusters, with the best averaging a 18.96% success rate.

Table 3.1 Effectiveness (SG%) of top shooters

player_1-Tble

Graph 3.2 Shots from cross

player_2Legend: The success rate (SG%) from Crosses (1) is 11.25%, and 10.45% from other shots(0).  This is not significantly different and the split has been forced manually to focus on Cross.  Players divide into four clusters, with a wide margin between the top rated shooters (28.74%) and the others.

Table 3.2 Top goal-scorers from Crosses

player_2_Table

Graph 3.3 Assisted (1) vs. Individual play (0)

player_3

Legend: Player are divided into four cluster, with the best ones (in tagged node) averaging an 19,21 SG%, and separated by a wide margin by the second best (12.5%).

Table 3.3 Best shooters from Assists

player_3_table

Graph 3.4 Top Assist providers (passers)

player_4

Legend: Leading assist providers also are also split in four cluster, with TFC’s Giovinco topping the cluster (tagged) of best ones  with a 27.08% SG%.

Table 3.4 Top Assists providers

player_4-table

player_all_X

player_7

Legend:  The graph above has been created by combining the clusters with the highest SG% for each pattern-of-play.  While Bradley WP is listed first in four of them, this is not because he has the highest SG% (although he may well have), but because players in a cluster are shown (by the software) in a alphabetical order, and not by highest to lowest SG%.

Graph 3.6 Best headers (Head > Goal)

player_8

Legend: SG% of Headers (10.47%) is not significantly different from other attempts (10.62%) – the first split has been forced in order to analyse Headers separately.

Table 3.6 Best headers

player_9_table

Disclaimer: I am aware that the data used  in this analysis (collected from WhoScored) does not correspond 100% to the official data.  I am confident, however, that any differences from official results are likely to be relatively minor, and such as to make little difference to the accuracy of the results.

 

Posted in shot statistics, Soccer analytics, Sports Analytics, Uncategorized | Tagged , , , , , , , | Leave a comment

Can we judge performance accurately without knowing The plan?

Football performance is judged by analysing video and data from a match, or from many matches.  But do video and data provide all the information we need to judge performance accurately?  I don’t think so.  A vital piece of information is missing: the head coach tactical plan(s).

A team should have a tactical plan.  In fact, it should have at least two: an offensive and a defensive one.  Winning in football is about scoring goals, and stopping the opposition doing the same.  So an attacking plan is needed to specify how to create goal opportunities, and a defence one to prevent the opposition creating them.   But even two plans are not enough. Other plans are needed  to cope with the changing circumstances of the game: winning/losing positions, players sent off or injured, etc.   Decision making may get complicated during the ups and downs of a football match, but the coach who has planned in advance for all (most?) of  them has a better chance of making the right decision, and get an edge on the opposition.

But let’s keep it simple and just look at the attacking and defensive plan.  The attacking plan should specify a sequence of passes aimed at reaching a shooting position.   It should start from a possession and include the position of the players and their movement on and off the ball.  A defensive one would specify what players should do (position, action) when the opposition has the ball.

But… is there a plan?

Although the need for such plan (or plans) seems obvious to me, I wonder what is happening out there, in real football, in the training grounds.  Do coaches make plans to such level of detail? Do they write them down, communicate them to the players, and practise them in training?  Somehow I doubt it: rarely the word plan is mentioned by pundits and media analysts of the game.  A notable exception was after England’s unexpected defeat by Iceland when the manager was blamed for not having one.  Formation is the word that seems to be used in its place.  But formation only specifies position, not action.  Should we then assume that whatever players do is part of a plan, and just judge the execution?

Given a formation, for example,  how can we judge accurately the performance of a midfielder that always attempts  long passes (and loses most of them) when he also had the option of an easy forward pass to his right?  We can’t!   Our stats will show that he has performed poorly.  But without knowing the plan this may be the wrong conclusion.  We don’t know if the coach has instructed him to act in this way. Perhaps the forward is at fault for not taking the right position to collect the passes.  As for the free player on his right, may be the plan says that he should not be there, but yards forward, and thus taking a defender with him, etc. etc…

So, what is the worth of our analysis if we only know what we can see?

Posted in Soccer analytics, Sports Analytics | Tagged , , , | Leave a comment

Possession chains and passing sequences

Background

Few days ago,  I tweeted that the ‘newer concept’ of ‘possession chains‘ proposed by Marek Kwiatkowski (@statlurker) in his latest blog*  was ‘very familiar to me’.  I also attached  text taken from my website (www.soccerlogic.com), where I write of ‘passing sequences’: a similar (same?) concept to Marek’s ‘possessions‘.  When @SportsDataChal asked me if I had published anything on the subject, I replied that I had only showed graphics  on my website, and promised that I would publish more on my blog.  Since I have no time (and inclination) to write anew on the subject, my intention was to fish out past notes on the subject and publish them without any editing.

Possession chains/passing sequences/event chains/link-plays/…

That is what I am doing below.  First an extract from Marek’s blog where he introduces his ‘possession chains’, then my three pieces on the subject.  The first is taken from an unedited note (rant?) on football analytics, the second from a marketing document aimed at football clubs. and the third from a document/proposal submitted (then) to the Capello index developers. A graphic representation of passing sequences copied from my website, is shown at the bottom (Euro 2004, Portugal).

From Marek’s blog*

Luckily, a newer concept is emerging into view and taking a central place: the possession chain (possession for short). A possession is a sequence of consecutive on-the-ball events when the ball is under the effective control of a single team. A football game can then be seen as an (ordered) collection of sequences. It is a very positive development since possessions make much more sense as the fundamental building blocks of the game than events. This is because they are inherently dynamic — they span time and space. I believe that they should be studied for their own sake, and if you only compute them to figure out who should get partial credit for the shot at the end of it, then in my opinion, you are doing analytics wrong – or at least not as well as you could be.”

1. SoccerLogic’s Event Chains/Passing sequences – (2004)

“One of the main reasons to use a football analysis tool is to identify event chains, that is to identify what events that led up to a specific situation. For example, if a team scores it is Interesting to see what events that happened just before the scoring. For instance, a goal could have come after five successive short passes in a row in the team. It could also have come after that a defensive player lost the ball to an attacking player who shot immediately. To know what events occurred just before one goal is not very important but if there is recurring patterns in what kind of events that have occurred just before a goal, it is very interesting information. If, for example, goals very often are made after a number of short successive passes within the attacking team, the coach can draw the conclusion that a way of scoring is to use short passes in the offensive play.

The software program should be able to aid the match analyst in the identification of recurring event chains. A requirement for this is that there is a database of event chains from previous games, as described in the previous section. Some kind of event chains could possibly be identified in just one game, but in most cases several games have to be analysed in order to identify recurring event chains. A way of identifying event chains is to compare the five events (passes, shots, dribbles etc) that happened just before every goal and then compare if there are similarities.”

2. From a SoccerLogic marketing document to football clubs – (2005)

“One of Soccerlogic many useful features is a very effective method for analysing event chains or passing sequences. The purpose of this analysis is to find recurring passing patterns.  These provide crucial information for understanding a team’s style of play: the tactical/strategic elements of its performance.

SoccerLogic can display event chains leading to (and following) any key event of a match, such as a goal, a foul, a shot on goal, a cross, etc. in rich graphic details. It can also create summary views (trellis) of chains leading to any particular event; these make it easy to compare passing movements and identify recurring patterns.  For greater accuracy, event chains of many matches can be analysed together.  Computer-based statistical analysis is then used to find among them trends and patterns correlating to good/poor performance.  This information provides a coach with an objective assessment of the effectiveness of his decisions, and helps him devise winning strategies for subsequent games.”

3. A data-based method for assessing Team performance – (2011)

“Football performance analysis normally focuses on players, not least because their stats are easier to collect and process.  Judging a team’s performance is not so simple. Team stats normally published in the media (corners, shots, possession, etc.)  tell only a small part of the story.   Players are also the focus of the Castrol and the recent (and controversial) Capello performance index.  There are no similar indexes for Teams, which are normally assessed solely on form – Wins/Losses and goals scored.   Since in football the final result often does not reflect performance on the pitch, this is not a satisfactory way to judge a team’s performance.

Given the amount of match data that is collected today, I am surprised that nobody has come up with a better method.  I guess it has much to do with the lack of skilled sports data analysts to fully exploit this data.  So, I developed my own solution.  I think it offers a very effective way to measure team’s performance, and can provide interesting stats to the media, as well be as valuable info to coaches.

My method is based on the analysis of the Possession of each team.  A Possession is defined as a sequence of events (ball touches) which starts when a team gets the ball and ends when the team loses it to the opposition.  The method views a football match as a series of alternate Possession.  This is not unique to football, but can be applied to any ball game (basketball, hockey, rugby, etc.).  I think basketball is the only ball game that appears to be  analysed this way.  But, compared with basketball, football is a very low scoring game, so the challenge is to find a useful Measure of Performance (MoP).

A Possession is the true expression of team performance because it describes how players work together to achieve a goal, and is specified by the following attributes:..
(details follow)

120_Passing_sequences

*http://statsbomb.com/2016/08/towards-a-new-kind-of-analytics/

Posted in Passing sequences, Possession chain analysis, Soccer analytics, Soccer match analysis, Sports Analytics | Leave a comment

Rank and Cluster of teams by Shot Statistics – EPL 2015-16

Thanks to ‘s for sharing the data (http://cartilagefreecaptain.sbnation.com/2014/2/12/5404348/english-premier-league-shot-statistics) which allowed this analysis to take place.  Michael’s Glossary of the stats copied from the same blog has been added at the bottom.

I have taken taken the data from the blog mentioned above: three tables with advanced shot statistics for the 2015-2016 English Premier League, and used advanced statistical techniques to  Cluster an Rank the teams with respect to each table of stats.  Note that Michael’s data is updated up to May 2, 2016; so, for most teams, the stats of the last two matches are not included .  Given the debacle suffered by Tottenham in in the last two games, this is probably the team whose ranking may not reflect the final results.

Color scale 2015-16 Shots

Red means more, and signifies better stats in Attack and worse in Defence (more Shots, Goals, etc. conceded).  Fancy has a mixture of both positive and negative stats.

Attack

Attack_1 2015-16

Legend: Arsenal has the best Attacking stats, followed by Man City, Liverpool, and Tottenham.   These teams also share a significant advantage over teams in the second and following clusters.  Surprisingly, relegated Newcastle tops the last cluster – so Defence appears to have been the problem.

Defence

Defence 2015-16

Legend: Man City tops the Defence stats ahead of Liverpool, apparently not greatly affected by Sakho’s absence in the last games.  Strangely, Leicester has similar stats to those of relegated Norwich and Aston Villa.

Fancy

Fancy 2015-16

Legend: Taking both Defence and Attack shot stats into account, Tottenham has the more positive mix, closely followed by Man City and Arsenal.  However, given this team dismal performance in the last two matches (stats not included in the data), perhaps its top spot may mot be justified.

Glossary

Shot locations are based on Michael’s map matrix below. Penalties are not included.

Shot zone

DZS: Shots from the danger zone, which is zones 1-3, the close and central areas of the box.

WS: Shots from the wide areas of the 18-yard-box, zones 4-5.

SoB: Shots from outside the 18-yard-box, zones 6-8.

%Cross: Percentage of shots from the danger zone assisted by crosses.

%TB: Percentage of shots from zones 1-5 assisted by through-balls.

SoT: Shots on target

DZ Pass: Shots assisted by passes in or around the danger zone.

Counter: Shots attempted from counterattacking moves

Est. Poss.: Shot attempted from established possession in the opposition half

NPG: Goals not from penalties or from own goals. (xG is meant to nearly sum to NPG, but it does not quite because (a) the fit isn’t exactly perfect and (b) xG deprecates shots off rebounds and only counts for a team one shot from every attacking move.)

TSD: Total Shots Difference, shots taken minus shots allowed.

SoTD: Shots on target difference, shots on target minus shots on target conceded.

xG: Expected goals scored or conceded based on shot type, assist type and shot location, speed of attack and a few more factors. For the explanation of the components of xG, see my full open-method expected goals methodology.

Expected goals here does not sum to the same total as goals, because it excludes penalties and own goals, as well as deprecating the value of chances off rebounds.

Posted in shot statistics, Soccer analytics, Soccer match analysis, Sports Analytics | Tagged , , , , , , | Leave a comment

Analytics first, sport second

This blog was written to fulfill a promise made to Ravi (@Scribblr_42) in a tweet back in February to explain why I strongly disagreed with a statement by Dean Oliver (@DeanO_Lytics) at the Opta Forum (http://bit.ly/1Uw5DCO) last February, and that Ravi ‘liked’.

During his presentation Dean Oliver displayed the following slide where the first point (as one can see from the pic below): “Know the sport first, analytics second

What Dean meant by this statement – as he later explained – was that a deep knowledge of a sport (one that is normally acquired by working within a club as a Performance Analyst) is more important that a knowledge of analytics.  I strongly objected to this statement and later posted a tweet of my disapproval.

Of course, anyone involved professionally in Performance Analysis of any sport, has to ‘know’ the sport.  But this deep knowledge is no longer of primary importance, not if one has an analytics role in a club.  Analytics is about analysing data – these days, lots of data (big data?).  Therefore the primary knowledge required for this task is knowledge and experience of advanced analytic techniques and tools.  Without this knowledge and experience is not possible for anyone to analyse data efficiently and effectively.  Any data!  Of course, such person must also ‘know’ the sport where the data comes from.  But, thanks to years of media coverage, TV in particular, any intelligent person that follows a particular sport has gained such knowledge. Not, of course, to the level that Dan implies, but enough to do his job.

It should be clear that I am not advocating that data analyst/scientist should replace Performance Analysts (PAs.)   Only that the latter should stop pretending that in today data-rich sports environment are capable to fully exploit the large amount of data available to them.  They are not!  They are not qualified for this task, nor, I dare say, have the aptitude.  Video analysis has been for years their main tool and focus, not statistics.  Sadly, this is the main reason why analytics has failed to gain a foothold in many team sports, football in particular.

However, I am not suggesting, , that clubs should get rid of their PAs, but only that they should take a back seat when data analysis is concerned.   At a recent ISPAS conference in Carlow  – http://www.itcarlow.ie/research/conferences-workshops/ispas-2016-workshop.htm-  I put forward the suggestion that clubs should employ a Performance Data Analyst (PDA),  whose sole concern be of leading the data analysis of the sport, as well as helping PAs improve their data analysis skills.  Unlike PAs, the PDA does not need a deep knowledge of the sport to do his job well.  He also does not spend time on the pitch with players, but interacts only with PAs and mother coaching staff including the head coach.  Therefore does not need the communication skills of a PA; contrary to another important point that Dean makes in the same slide.

Sadly, I don’t think that my suggestion (which was greeted with contempt by the like of prof. Hughes at the Carlow conference) will be taken up by sport clubs any time soon.  Aside from the hostility of PAs to any challenge to their role of  ‘analysts’, there aren’t many data analysts/scientist to go round.  And even the few that are passionate for a sport are unlikely to accept the miserly salary they are likely to be offered by clubs when many business companies are prepared to pay them lots more.    This point is eloquently made by Ben Alamar in an article last year, which I reprinted in one of my tweets.

Tweet_DeanOl

(Note: with the acronym PAs I am also referring in general to any member of the coaching staff who is involved in data analysis)

Posted in Soccer analytics, Sports Analytics, Uncategorized | Tagged , , | Leave a comment

Finding changes in tactics and their impact on a match – statistical and graphical analysis

This blog is a longer (and revised) version of my poster at the OptaPro Sports Analytic forum 2015, held in London on 5th February 2015.

Introduction
When a team is goal down or up, a manager may change tactics (formation) in order to protect the advantage or chase the match. This is more likely to happen at the beginning of the second half, or in the last quarter of a match. Normally, in the first case, there is a change of tactics when a team is losing. In the last quarter of a match substitutes are introduced to either to hold onto winning score, or to chase a losing game. Also, at this time, if his team is winning, a manager may decide to settle for a draw, and change tactics accordingly.

A change of tactics is normally highlighted by match commentators and pundits if the match is broadcast live, or during a post-match video analysis. In contrast to this traditional method of analysis, this poster aims to discover any change of tactics by a team solely by analysing match event data, as provided by Opta.

We are not aware of previous attempt to this kind of analysis by using solely match data. The original intention was in fact to use match data, as characterized by Opta f24 feed, and player tracking data from TRACAB. Unfortunately attempts to use TRACAB data were not successful, and after much trying we decided to postpone such analysis to a later date.

Data and methods
The match analysed is the Newcastle-Hull played in Newcastle in the 2013-14 season. The match data was provided by Opta in the form of f24 match file. The analysis was carried out using statistical and graphical methods. The statistical analysis in particular relies heavily on the technique of classification and regression trees.  Both software used are designed for interactive analysis, and therefore particularly suited for exploratory analysis. MS Excel was also used for data preparation and parsing, as well as to create some graphs.

The discovery of tactical changes boils down to finding significant changes in performance by the two teams in the various contexts that characterize a game, for example between he 1st and 2nd half, or before/after a goal taken/scored.  This and other contexts, such as   before/after substitutions, and in between one time interval and the next are analysed, with ten interval/time segments used for the latter.

The changes that we looked for were:
1. Change in players’ position following a goal (for/against)
2. Change of role/position of substitutes compared with starting players
3. Change in activity (ball touches) by the teams during the course of the match
4. Change in activity in the final 3rd

To identify changes, the following variables were added to the Opta data:
1. Goal_T = G_0-0, G_0_1, etc., to identify time segments when the score was 0-0, 1-0, etc.
2. Final_3rd= 1 Final_3rd ball touches (0=all other ball touches)
3. Xo, Yo = (0,100) coordinates of pitch position of ball touches

Ideally, such analysis should look  at the performance of both teams. But because of time (and space) we focus mainly on Newcastle’s performance.

Analysis results

Newcastle-Hull 2-3 (2-1)
Goals: Remy(N) 9’, Brady(H) 25’, Remy(N) 43’, Elmohamady(H) 47’, Aluko(H) 75’

Opta Poster Fig0Fig. 0 the above chart is an attempt to plot a summary of the match. It shows ball touches by the teams in ten time intervals, and goal time.

Fig.1 Possession (ball touches) comparison by Team and goal-time intervals
Opta Poster Fig1Fig. 1 We compare Newcastle and Hull ball touches, and we find we find that their average X_o position (length of the pitch) is significantly different (SD). Newcastle has significantly more possession the attacking half. This advantage increase after Hull draws 1-1 (G_1-1) but falls as soon as Newcastle goes ahead (G_2-1)

Fig.2 Possession (ball touches) comparison by Role and goal-time intervals
Opta Poster Fig2Fig. 2 show the increased activity of Newcastle midfield after the draw by Hull (2-2), and its steep decline by the Fwd after going ahead (2-1).

Fig. 3 Average X position comparison
Opta Poster Fig3Fig.3 Here we analyse the average X position of both teams, and we find that Newcastle’s one was closer to the halfway line than Hull (SD), sign of a more attacking stance. Revealing of tactics is the position of Newcastle subs: they line up with the Full Backs (Wing backs, really), a defensive position they kept until Hull went ahead (2-3).

Fig. 4 Ball touches in the Final 3rd
Opta Poster Fig4Fig .4 Here we compare ball touches in the Final_3rd. Overall Newcastle was significantly (SD) more active in the final 3rd, but the Subs were below the team average. Goals for/against did not result in any significant change of activity in the final 3rd. However the Subs were active mainly in the centre, in contrast to the centre-right of the forwards they replaced.

Fig. 5 Subs vs. replaced players – comparing ball touches position
Opta Poster Fig5Fig. 5 should be seen in conjunction with Fig. 6, and shows the position of ball touches by subs and the players they replaced.

Fig. 6 X position – subs vs. starting players
Opta Poster Fig6Fig. 6 Here we compare Newcastle subs average position (X_o) with that of the players they replaced, and find that they took a more defensive position (SD).

Fig. 7 Significant changes in position by some Hull players
Opta Poster Fig7Fig.7 The position of ball touches of these two Hull players in the 1st and 2nd half looks significantly different, and strongly suggest a change of tactics in the 2nd half. The stats analysis in Fig. 8 confirms this visual intuition.

Fig. 8 Significant changes in position by some Hull players – stats
Opta Poster Fig8Fig. 8 this graph confirms statistically the intuition from Fig. 7. Both Hull goal scorers: Elmohmady (2-2, 47’), and Aluko (2-3, 75’, changed their positions significantly in the 2nd half). Aluko moved from left to right (Y), and Elmohamady forward (X).

Fig 9. Final 3rd ball touches by goal-time interval
Opta Poster Fig9Opta Poster Fig9_1
Fig. 9 shows the position of ball touches in the Final 3rd by the two teams taken during each score interval. A chart comparing the stats count is shown above

Fig. 10 Final 3rd ball touches by Half
Opta Poster Fig10Fig. 10 Final 3rd ball touches – the black horizontal line show the average position of ball touches in each half. Hull’ average changes from left to right (significantly, as shown in the following graph), while Newcastle’s stays roughly the same.

Fig. 11 Final 3rd ball touches – Vertical positional shift by Half
Opta Poster Fig11Fig. 11 Final 3rd average Y position – The graph shows that Hull changed its attacking direction from the left to the right in the 2nd half (period_id) – this change was statistically significant( SD). In contrast, after an initial switch, Newcastle kept to a central position for the rest of he match.

Conclusions
The analysis identified some significant changes in performance during the match that suggests a change of tactics. In particular, tactical changes by Newcastle can be said to have taken place after their subs were introduced. Ahead 2-1 with 25min of the match left to play, Newcastle subs took a more defensive position then the player they replaced. In contrast, Hull started the 2nd half by increasing its attacking effort and shifting its direction from left to right; a move that quickly resulted in a goal.

Clues to the above summary conclusion were given by the following results:

 Newcastle played significantly (SD) forward than Hull and dominated possession (ball touches) throughout the match, in particular after Hull’s first goal. It was dominant in the final 3rd. Despite of this advantage , Newcastle created fewer chances than Hull, and lost the match
 The results suggest that Newcastle did not try hard enough to win the game. Subs introduce at the 65’ when the match was finely poised at 2-2 , took a more defensive stance than their predecessors, and lined up with the full backs (wing backs, really). They only took a more forward position when Hull scored the winning goal, too late to change the result.
 Judging by their substitutes performance, Hull appeared to do more to win the game. Their average ball touch position of their subs was equal to that of the Forwards in whole match.
 Graphical analysis show s what appears to be a change of tactics by Hull in the second half with Aluko (moving from left to right), and Elmohamady playing more forward. The latter position may help to explain the defensive ball touches of Newcastle subs Guffron and Marveaux in that area of the pitch. It is likely that they were kept busy stopping the threat posed by these Hull players in their left defensive side of the pitch.

(Note: As many would have realised, this analysis is not complete. There are a few other aspects of the match that could have been studied, and could have probably shed more light on the if/when/how changes of tactics in the match occurred. However, the objective of this post was mainly to demonstrate how a data-based analysis with statistical and visual methods could give a more objective view of changes of formation/tactics in a match than one obtained solely by video analysis.)

Posted in Soccer analytics, Soccer match analysis, Sports Analytics | Tagged , , , | Leave a comment