DELO Ratings 2016

Maybe you’re familiar with Elo ratings, but I’m going to assume you’re not. Arpad Elo was a Hungarian-born American physics professor who developed a ratings system, originally for chess but later applied to all sorts of sports, including American football, baseball and snooker. It’s now used frequently (in a variety of adapted versions) on fivethirtyeight.com to try to find the best team in history in various sports, amongst other things.

The idea is relatively simple, but the methodology is much more complicated. All competitors, be they individuals or teams, start with a ranking – often 1000 or 1500 – and each time they play they add on points or lose points, according to the result. That can just be a win/lose/draw based metric or it can get much more complex to try to account for home field advantage and scale of victory. If the team you beat is very good you score lots of points, if they’re not so good, fewer points. Likewise, lose a game and lose points based on the skill of the opposition – the better the opposition, the fewer points lost.

In order to try to compare eras, previous results from previous years/decades/whatever have to be removed in some manner so the rating only applies to the team of now and ensure they’re not getting false credit for performance too far in the past. Another adjustment 538 have mentioned they make is between seasons. In all sports, the team line-ups change one year to the next. In American sports, the system is set up to try to even the playing field, with the worst teams getting to draft players first. As this is supposed to be a return towards the middle, they adjust their rankings between seasons by reducing above average teams slightly and increasing below average teams slightly so that next season they all start a little closer together.

All this got me thinking, could we develop an Elo system for the Dynabowl? A DELO system, if you will. So I gave it a shot. I’m going to outline my methodology, share my results, and the provide a means to download my source spreadsheet so you (YES, YOU!) can see if you can improve on it.

The first problem I encountered was that Elo is specifically designed for situations where two teams are playing each other with the result changing a teams ranking. While that does occur in fantasy football, the teams aren’t directly influencing each other’s performance. If the top scoring team one week played the second top scoring team, it would be harsh to penalise the second team DELO points when they would have won any other game.

What i decided to do was look at a teams scored in comparison to the weekly average score achieved. If you beat the weekly average your rating goes up. If you drop below the weekly average your rating goes down. I also (pretty much arbitrarily) decided to exclude the top and bottom scores each week from the average. This was a gut based decision where I felt one or other of the numbers being an outlier could sway the overall average too much in one direction or the other so I felt it better to take the middle 8 scores and average them. This may be the wrong approach – I didn’t check it against an average of all 10 – but it’s the one I decided to take and I think I made a working system in the end. You may decide otherwise.

The next step to decide was how to calculate the points. I decided, again arbitrarily, to start every team with 1000 points. It felt like a high enough total that I could get some big enough variation, and it felt in keeping with the Elo rankings I’d seen produced elsewhere.

So how many points should get added on or taken away? The obvious answer is however many points above or below average the team scored that week. However, I needed to reflect the way Elo works. As I said, with Elo you get more credit for beating a good team than a bad team and so on. Here I thought that if a team has a lower DELO rating than average and scores well it should get ‘extra credit’, while a poor team scoring poorly shouldn’t be penalised as much as a good team performing badly.

This led me to produce a weighting spread. I said that if a team has a DELO between 950 and 1050 whatever their points difference from average gets added or subtracted at a rate of 100% (i.e. if you were 10 points above average you would get 10 points x 100% added to your DELO. If you start at 1000, your DELO would go up to 1010). The scale then was for every 100 points further away from this central spread you got credited with 10% more or less points. Another example: A team has a DELO of 800, falling in the 750-850 bracket. If this team scores 10 points above average, their DELO would go up by 10 x 120% = 12 points. However, if they scored 10 below average it would only go down by 10 x 80% = 8 points. So a bad team gets more credit for performing well than they lose for performing badly. And vice versa.

Finally, I decided I needed to perform the same between-season adjustment to bring teams back to closer to the 1000 starting point. At first I moved teams 10% closer, but then decided this wasn’t enough so moved it to 20%, which seemed to work. Again, it’s pretty arbitrary, but I’m trying to make a system that seems to represent team skill pretty accurately and this seemed to work. So what do I mean by moving teams 20% closer to 1000 points? I mean if a team had 1100 points at the end of the season, they would lose 100 x 20% = 20 points from their total and begin the next season at 1080. Each team would stay in the same order, but teams with a bigger lead over others would lose more points and the field would close up again, pending the next season’s battle commencement.

Now, I said finally, but there is a final, final step I made, but I applied this later, after I decided the system wasn’t working properly. Before that, I was still pretty happy, but I needed to leave it for a while and come back with a fresh mind. When I did, I decided that, despite the intra-season adjustment, not every team quite matched up by the end of the season to where their talent seemed to lie. I thought some more about 538’s Elo system for eras of sports and how they had to be removing old activity from the ranking to make sure they were appropriately evaluating the current team and I realised my rankings still included too much residual effect. I played around with some options until I found one that appeared to work.

I hit upon a formula which removed half the ranking points earned (or lost) in the same week of the previous year. Again, this feels arbitrary, but again it seems to reach the point where teams are fairly evaluated for their most recent performances. Specifically, their most recent season long performances (i.e. 16 games).

So what does all this show? Here’s a table:

2014 2015
Low Week High Week Final Low Week High Week Final
East Flanders Flahutes 754 16 998 2 754 724 6 799 1 760
Here Comes The Brees 873 16 1021 5 873 760 15 928 7 793
Tamworth Two 968 10 1100 6 976 926 5 1106 14 1087
The 4th Dynmension: Dynasty of Sadness 739 14 953 1 782 836 7 922 14 856
Dynasore Losers 988 1 1185 12 1156 885 14 1149 3 905
DynaForOne Firebirds 907 4 1254 16 1254 1021 16 1255 1 1021
Dynablaster Bombermen 967 9 1041 12 1021 903 12 1065 4 997
Champions of the Sun 1000 6 1149 14 1144 1120 1 1395 16 1395
Kelkowski Don’t Play By No Dyna Rules 1040 1 1166 8 1161 1022 4 1121 14 1049
Dyna Hard 1005 2 1126 11 1086 1087 6 1280 14 1265

 

Remember, these scores essentially represent the the sum total of performance over the previous 16 weeks. They should kind of link to the total points scored, but when you scored them matters. Scoring a lot of points in a week when, overall, comparatively few points were scored will net you a lot more DELO ranking points than scoring them in a high scoring week. But sure, you could use points scored as a measure. But would that be a fair way of comparing teams across seasons? A high scoring team in a high scoring year may be less impressive than a slightly lower scoring team in a much lower scoring year. This accommodates for that.

And what’s the first thing it tells us? That Max’s winning team in 2015 was quite significantly better than Neil’s winning team in 2014, and even Dyna Hard in 2015 were better than Neil’s team. However, the context that needs to go with that is that Firebirds had a shocking start to 2014. In week 4 they had the second lowest DELO in the league (907), and they recovered from that point. They scored DELO 347 points from week 5 to week 16. Champions of the Sun, by comparison, scored only 198 across the same 12 weeks in 2015. So the Champions were more consistently good. They won 11 regular season games scoring over 3800 points in the process. The 2014 Firebirds, by comparison, snuck into the last playoff slot on the last week of the season with a 7-6 record, scoring 3480 points, before producing an incredible post-season. In fact, the 100 DELO points scored in week 16 is the greatest gain or loss by any team in any given week.

From this, perhaps we can say that the Firebirds produced the most dominant stretch, but for the season as a whole, Champions of the Sun were the better team.

Tune in next time for a breakdown of Offensive, Defensive and Special Teams DELO!

Access the file of data HERE!.

Commish

I am the Commissioner of the DynaBowl Fantasy Football Dynasty League. What I say goes.

Leave a Reply