Once a sports season comes to a close, any good data scientist will tell you that a ratings system needs to be evaluated. A proper evaluation helps support what decisions, if any, need to be made before the beginning of the next season. Decisions typically fall in one of these categories:
- Minor tweaks to formulas, maybe altering coefficients by a hair
- Moderate tweaks to formulas, such as re-weighting data elements
- Major changes, maybe even going as far as to discard games that can be considered outliers
- Throwing everything away and starting over, but not before first hiding in the corner of the room curled up in a ball questioning why you do this and whether or not you’ve just wasted a colossal amount of time and whether or not this is why your friends don’t introduce you at parties
So, let’s start with the basics and review the final Elo ratings for the 2020 season:
- North Texas SC: 1616
- Greenville Triumph: 1613
- Union Omaha: 1551
- Forward Madison: 1529
- Chattanooga Red Wolves: 1517
- FC Tucson: 1499
- Richmond Kickers: 1479
- South Georgia Tormenta: 1477
- New England Revoluion II: 1473
- Fort Lauderdale CF: 1426
- Orlando City B: 1303
So, the first thing I always look for are noticeable tiers; does it seem like teams grouped themselves into specific tiers, and do those tiers seem logical? So North Texas and Greenville are in a tier of their own, leaving Union Omaha behind by a fair margin. The most likely explanation are the results of matches decided by more than one goal:
- Union Omaha: 1-2 record (1-1 home, 0-1 away)
- Greenville Triumph: 4-0 record (2-0 home, 2-0 away)
- North Texas: 3-0 record (1-0 home, 2-0 away)
Don’t forget that North Texas started the season with a higher Elo rating, so Greenville essentially had to play catchup to finish virtually level with them. Should that have mattered so much? More on that, later.
You may look at Madison and Richmond as outliers here, and you’d be partially right. Madison in the end had a season where they sort of kept treading water, but they did win some matches with outrageous scorelines (4-0 home to Tormenta, 3-1 home to OCB, 4-0 away to Revs II). These results certainly helped Madison’s rating recover from its early season drop.
Richmond, on the other hand appear to deserve some more “respect”. A fourth place team finishing seventh in the ratings, it can even be argued that they finished the season in a separate tier behind FC Tucson. So, how can Richmond’s final rating be explained? Probably in a similar way to their season in general. The inability to seal the deal, and a little bit of bad luck:
- Richmond started strong at home, beating Madison, Greenville and Tucson to open their season. The bad luck there is that all three of these matches happened when their opponents had lower ratings than how they finished their seasons.
- Richmond couldn’t close out their season at home. Their last four home games resulted in three losses, including a devastating 4-0 loss at the hands of Revs II. Remember, losing at home to bad teams can have devastating effects to a team’s rating (Union Omaha had their own Revs II debacle).
So while certain anomalies within these ratings can be defended, ratings like these are created for primarily one reason: to be predictive. Were these ratings predictive? I would have liked to have a normal length season’s worth of data (28 matches per team as opposed to 16 for most, 15 for others), but I feel comfortable saying that even with this amount of data I can conclude that these ratings were in fact not very predictive. Going back to the explainer document from the beginning of the year, you’ll see where part of the calculation process is determining the likelihood a team is to win a specific match (in the event the match doesn’t end in a tie). This calculation is critical, it needs to be accurate so that ratings fluctuate appropriately.
Let’s start from a small number of games and work our way up. The first example is a small sample size, but easy to understand: In the 2020 season, there were six matches where the away team was predicted to win at least 50% of the time (by chance, only one of these matches didn’t involve either OCB at home or Greenville on the road). In those six matches, the away team finished only 1-3-2 (W-D-L). A small sample, and with 3 draws, but still I assumed away teams would win more of these games. In comparison, the 2019 season had 11 such matches, and the away team’s record was 6-2-3.
Let’s take a look at a larger sample size, with a more narrow approach. The 2020 season had 29 matches where the home team had somewhere between a 60%-70% chance of winning. Of those matches that ended in a result, the home team finished with 12 wins and 10 losses, a winning percentage of 54.5% (and an actual winning percentage of 53.4% when including draws). The 2019 season had 66 such matches, and the home team won 29 while losing 16 for a winning percentage of 64.4% (59.8% actual when including draws). Clearly, 2019 was much, much more predictive than 2020.
I would like to have seen more data, but I’m comfortable saying that this version of an Elo rating wasn’t very predictive. I have not worked out this scenario, but I wonder if the rating would have been more accurate had all teams started this season on a level playing field just like last season. In general, I would avoid using an Elo system on only a yearly basis because Elo is designed to trend over long periods of time while also allowing you to analyze past matches because those ratings remain static. For just one season, I’d much rather have a rating system that is fluid and reevaluates matches constantly as more data is entered. And since I’m the before mentioned nerd that doesn’t get introduced at parties, it’s not crazy to hear that I have in fact been working on such a system parallel with the Elo ratings this season. What were those results? I’m still evaluating it, so perhaps you’ll get to read more in the future. But, at 1,083 words, maybe I should let you enjoy the rest of your day.
Go sports!
LikeLike