# CCSG by the Numbers: Intro to Statistics

By Alexander Brazier Rymek

How can we, as fans, tell if a player is good or not? The answer that comes to mind is simply by watching the game or the eye test. By watching the game as a whole, individual players, and the interactions between the two, you should be able to tell the good from the bad. As much as the following article might seem like it's trying to convince you otherwise, the eye test is still necessary to give context to the statistics, and one should not be used without the other.

But what if you wanted to know *exactly* how good a player is, where they excel at, where they might have flaws, and whether they can be measured against other players? This is where statistics come in. For the last two years, I have been working to quantify these strengths and weaknesses, and the goal of this series is both to help you familiarize yourself with statistical methods and introduce you to my specific model of statistical analysis in order to discover how our players stack up against their peers. I promise we will return to ATO in a second, but before that, I have to introduce how exactly we can rate players using statistics.

**STATISTICS:**

The first step is data collection. Take, for example, a hypothetical Player A and Player B (hereby referred to as A and B), and say they both scored seven goals during the CPL season. Now, broadly speaking, they are both equally valuable players. The main goal of football is to score goals, and they both contributed the same towards that end. However, they might not be as equally *efficient*. The next step would be to look at their games played, which adds further context. But what if they both played the same amount of games?

If both players played a hypothetical 90 minutes or one full game without being substituted off, who could score the most goals? This is called **per90 statistics** and helps normalize every stat. By dividing a player's goals by their minutes and then creating a baseline of a full game by multiplying that number by 90, we can achieve a Goals p90 statistic (Goals/Minutes x 90). If Player A scored their seven goals from before in 600 minutes, and Player B scored their seven in 1000, we see that A achieved a Goals p90 of 1.05, meaning in a full game, they would score around one goal, and B achieved 0.63, and would take around a game and a half to score their goal. By using these methods, we can see that Player A is much more efficient at scoring goals on a 90-minute basis, regardless of how many games they actually played. We could, therefore, say they had more goal-scoring impact.

A final comparative metric used in sports is **Percentile Rankings**. In the simplest terms, in a sample of data, how good was the result, really? The percentile formula will spit out a number, which essentially shows what portion of the sample the data point was higher than. Take, for example, Player A’s 1.05 Goals per 90 from earlier. If I were to run that calculation for every player in the CPL, how many players would they be better than at scoring goals? In 2023, the answer for that would be literally every single player because scoring at a goal-per-game pace is unheard of. The formula would thus return a percentile ranking of 100% because they were better than 100% of the players in the CPL at scoring goals. For context, a ranking of 50% would mean that they are exactly average.

**INITIAL OUTPUT:**

To begin my analysis, I took 18 different statistics, including Goals, Assists, Passes, Tackles, etc, from every CPL player this season, converted them into p90 numbers, and then ranked them against each other using the percentile formula. After I added a colour gradient to help with comprehension (Blue = Above Average, Grey = Average, and Orange = Below Average), my spreadsheet can spit out something like this:

Above is Ballou Tabla’s statistical profile from the 2022 season, which can be used to assess his strengths and weaknesses. I know there are a lot of numbers and colours there, but you can start by focusing on the darker blue boxes, which hold his better metrics. You can see on the left his Goals and Assists (G/A), Shot Attempts, and 1v1% (Dribble Success Rate) rank very highly. For the G/A box in the top left, that number means that he was better than 85.5% of the league at scoring and creating goals. Being an excellent winger, his best statistics make sense because he was very good at creating offence.

His worst categories would be Blocks, Clearances, and Touches, which also make sense for a winger. He wouldn't have been in a position to block a shot or clear a chance anyway, and his low touches indicate both that he might’ve been slightly underused in our system last year but also that he was very good at creating a lot with very few opportunities. This would be an example of using the eye test and knowledge of the game to contextualize his statistics because, as I mentioned in the introduction, they should both be used equally when evaluating players. All-in-all, you could tell simply by watching him that he was a very good player for us last year, and his statistical profile clearly backs this up.

**MY MODEL:**

I mentioned above that his strengths and weaknesses make *sense* for his position, and therein lies the final portion of this article, which will be the rating system that I came up with. How can we judge players if they play two different positions and thus have statistical profiles that vary wildly from one another? For example, a Striker and Centre-Back, the most diametrically opposed positions on the field? The simplest answer would be to simply average out their percentile ranks, right?

However, different positions have different jobs on the field, and therefore, their actions on the field and resultant statistics will have different quantities and qualities. For example, a Striker’s job is to score goals and not block shots, and a Centre-Back’s job is the opposite (although scoring goals would naturally be appreciated). In theory, it is possible that a Striker will actually record zero blocked shots simply because they are almost never behind the ball that deep in their own half. Their percentile ranking for blocked shots would, therefore, be 0%, but they shouldn’t be penalized for that because that isn’t their role. Conversely, if a Striker is not scoring any goals but blocking a lot of shots, their rating should not be the same as a colleague doing the opposite and actually contributing what is expected of them as a Striker.

Long story short, I tried to balance each statistic based on how important they were to fulfilling a player's position on the field. For example, a Striker’s Goals and Assists (G/A) percentile rankings matter a lot more than their blocked shots because that is what they are there to do. In weighing all the stats such that the total weight is equal to 100, G/A accounts for more than half of the final output (for a Striker). Of course, the weight changes for each statistic based on the position. Each positional weight takes into account the 18 statistics I collected and ranked and produces a final Weighted Percentile Average (WPA). It sounds fancy, but don’t worry about it. When looking at my work, all you need to know at a glance is that **60 is typically the average in any given season**, anything **65-75 is good**, and **80+ is great**. The highest grades in a season typically fall in the 87-89 range. Tabla’s 2022 grade is 76.1, which is really good. Note that this does NOT mean Tabla was better than 76.1% of CPL players in 2022. Even though it uses percentile rankings, it acts more like a rating system akin to one you would see on football apps or websites.

A final adjustment takes into account a player's minutes played and their WPA grade. The player’s grade can get boosted under two conditions, the first being if they played better than average for more minutes than average because it is harder to play at a high level over a larger sample size. I consider this a reward mechanism because of said difficulty. They can also get boosted if they played fewer minutes and graded lower than average because they perhaps didn’t have enough opportunities to make an impact, but I would consider this a reduction in punishment rather than a reward.

Secondly, the player’s grade can get lowered under the opposite conditions. If they played worse than average but played more minutes, they get lowered because they had the opportunities and couldn’t make the most of it. This is entirely a way to punish those who kept getting selected but couldn’t perform. Finally, they also get lowered if they played better than average but over a smaller sample size to ensure they are not graded the same as someone who played more minutes but at the same high level. By taking minutes and WPA grade into consideration, I came up with a formula that returns a scalable factor, which then gets added or subtracted from the original grade. After running Tabla’s rating through this formula that acknowledges that he played better than average for more minutes than average, it returns a slightly higher adjusted WPA grade of 78.7. All this information and more subsequently gets presented in something that looks like this:

You can see his position, games, and minutes in the centre boxes, his adjusted WPA grade (78.7) on the left, as well as where that ranks league-wide (18th out of 187 outfield players). The boxes on the right are the same numbers as the statistical profile from before, just aggregated slightly. The same colour gradients from earlier apply, such that blue is above average and orange is below average. The radar chart in the bottom middle takes the six categories I deemed the most important and universal (**G/A**,** Chance Creation** - Key Passes/Chances Created, **Passing** - Pass Accuracy/Attempts, **Touches**, **Strength** - Duels/Aerials, and **Defending** - Blocks/Clearances), and arranges them clockwise. You can tell at a glance the general strengths and weaknesses of the player by looking at the area and shape of the radar chart.

**HOW THIS APPLIES TO ATO AND CCSG: **

Finally, to return to ATO, I have run all of our players through the model and produced ratings for everyone, in addition to their player cards. Over the off-season, I will be writing articles grading every single outfield player who played for Atlético in 2023 to highlight their strengths and weaknesses and if they played in the CPL in 2022, whether they improved or regressed. Along with their rankings, I will give a few of my thoughts on each, as well as what their contract situation looks like, so we can plan a path for next season together.

That’s all for me today, hopefully this article gave you an introduction to how statistics can be used to evaluate players, and how they can be interpreted. If you have any questions about my model, statistics in general, or anything else, feel free to contact me on Twitter (linked below) or respond to the tweet with the article linked. Have a wonderful week, and I’ll see you next time.

##### About Alexander:

When he isn't busy playing or watching sports (or going to school at uOttawa), Alexander is busy managing his Atlético Ottawa database, which he started in 2020, and tracks everything you can think of about the club and its players. He also runs a Twitter account dedicated to analyzing and rating CPL players using statistics, __CPL by the Numbers__.