Welcome to Elo 2.0! See the most recent blog post to learn more.
2017 April 01 (Adam)

2012 has arrived! I finished curating the data early this week and set aside the time today to upload it and fix all the errors I uncovered while curating. This update adds 46 tournaments and around 170,000 matches. Some thoughts about our progress and what’s yet to happen:

  1. First, K=36 is too big. We started the site with about 105 tournaments’ (two years’) worth of results, and a common refrain was that the data set was too small to draw many conclusions out of it. This was a fact. It’s bigger now but of course there’s still plenty more that can be added. I calibrated the K constant as well as I could given the data I had at the time, but many people were not close to their “accurate” rating and that had an effect on the process. When I chose K=36, it felt like any value of K in the mid-30s was equally valid, but tests now seem to suggest that a better choice is in the 29-32 range. I’m not going to mess with it right now, but when I add the next batch of old tournaments expect an adjustment. Changing K will cause all the ratings to move toward the middle a bit, but adding a block of tournaments to the back end of the timeline causes them to jump up. So we might as well try to save this change for when the ratings were going to experience a discontinuity anyway. Right now I have no plans to use differing values of K for different tournament types, but when this next calibration occurs I’ll investigate whether some other combination of constants leads to having more predictive power.
  2. How much of the usable data is here? We currently have 231 Grand Prix and 24 Pro Tours/World Championships. The tournaments get sparser going back in time: 2012 had 46 tournaments itself, while 2010 and 2011 together are 46 tournaments and 2008 and 2009 together are 48 tournaments. My plan is to tackle those two batches next.

    From the beginning of time until where our data stops, there have been 375 tournaments (301 GPs, 74 individual PTs + Worlds), but I admit I’ll be thrilled if any of the pre-2000 tournaments have good enough records kept that we can add them to the project. @Quick_MTGStat says in his PT Elo spreadsheet that the PT data breaks down before PT Los Angeles 1999. That would suggest the beginning 75 tournaments might be out of reach, leaving about 300 left to obtain. In any case, I think right now we have 40% of everything and 46% of what we’re going to get.

  3. At some point we’ll need to address the issue of ratings decay. This topic has come up a couple of times recently because LSV has sat at the #1 spot for most of the year while on the coverage team. To be clear, I don’t think that LSV’s position would be affected by a decay process. His last tournament was seven months ago. If decay set in on that time scale, players in places that don’t get many tournaments per year like Australia and Latin America would feel the brunt of it. Hell, I personally had a seven-month gap between tournaments because there wasn’t one within a nine-hour drive of my apartment. If you felt in note (1) like the data set was too small when it only had two years’ worth of tournaments, then you have to feel in this note like the right time scale before being considered inactive is longer than that. Right now I’m thinking three years or so is a reasonable starting point, but given that we’re only at five years in the data set, there’s not too many people that have experienced a three-year layoff. So for that reason I’m not going to worry about this topic quite yet.

    There are other aspects of adjustments to the ratings that should be addressed, too. As time goes on, more people will have established a high rating, and so it becomes easier over time to reach higher and higher peaks. This happens largely because there is a constant flow of people entering the system, doing poorly in one tournament, and not playing again. If they go 2-3 and have their rating fall from 1500 to 1482, then they’ve just created 18 points out of thin air that the rest of us can fight over. These points will slowly concentrate themselves among veteran players, and in doing so the number of high-rated players will climb. This makes it challenging to compare peaks across eras, since hitting 2000 nowadays will be easier to accomplish than hitting 2000 several years ago. There’s a relatively simple fix to this: “tax” wins by a very small amount to pull points out of the system. There’s some discussion of how something similar to this is implemented in the European Go Database (where apparently they have the opposite problem?) and at some point I want to think about how to implement something along those lines. They apparently rake 1.6% of the points out of each match. That feels a little high to me, but the idea seems promising. Should all matches be subject to a tax? Only if the winner has a 1700+ rating? Should the tax increase as your rating does? These are all knobs I’ll work on turning.

  4. There are some systematic inaccuracies in the data that I want to correct before I begin scraping 2010 and 2011:

    • Recent tournaments sometimes will use the code “Loss L-L” to represent a double no-show. I only recently noticed this; right now we’re incorrectly displaying these matches all as wins for player B. I think in most of these cases, no Magic occured. I’ll work on stripping them out of the data.
    • I worked on collecting all the final standings pages to help me curate 2012. This was something I hadn’t had access to before (we were only using the results pages for each round). In some tournaments, the number of people in the final standings and the number of people who appear in one of the results pages is way off. I think this is because some of the tournaments paired everyone who registered, then dropped the no-shows after they didn’t appear for a match. I should be able to purge those too.
    • I also want to build in an “internal consistency” check — there are some results that are backwards, and if I compared the “apparent” match point total based on the results pages to the official tally on the final standings I might be able to fix some of those. I think the judges who enter the results from the match slips have around a 99.95% accuracy rate, but even that corresponds to one or two matches per GP being put in incorrectly.
  5. GP Costa Rica 2012, round 2 doesn’t seem to exist. The results page and the pairings page are both copies of the round two standings. So that round is just not present. I looked for a bit on the Wayback Machine to try to find it, but I think it was originally uploaded wrong five years ago. If you played in this round and can tell me your opponent and the match result (this is available by logging into www.PlaneswalkerPoints.com and checking your history), I’d love to include whatever we can get from it. I suspect as we go back further we’ll find more examples of lost rounds or entire tournaments, and I want to add a page to the links bar that keeps track of what results we don’t have. It’s kind of like the Retrosheet project, but for Magic.
  6. There are other site improvements we’re going to work on making. We want to expand the leaderboard to give people better access to ways to look at the data. We also will add a stats hub to display some of the results of playing with the dataset. I came up with a few such things this week in a Twitter thread; they should be on the site somewhere. There are some less visible upgrades to how things work behind the scenes that will be undertaken soon too. I think that’s the pattern you should expect for the next couple of months: a couple of weeks of site improvments and fulfilling feature requests, then a bout of data scraping and curating. Having said that, if there are features you’d like to see on the site, or questions you’ve wanted answers to that we can help with, drop us a line at mtgeloproject at gmail dot com. Your feedback is always welcome!